Re: can operators emit on a different from the operator itself thread?

2016-10-13 Thread Amol Kekre
Vlad,
I agree that the check should be ON by default. Ability to turn it off for
entire app is fine, per port is not needed.

Thks
Amol

On Wed, Oct 12, 2016 at 10:34 PM, Tushar Gosavi 
wrote:

> +1 for on by default and ability to turn it off for entire application.
>
> - Tushar.
>
>
> On Thu, Oct 13, 2016 at 11:00 AM, Pradeep A. Dalvi 
> wrote:
> > +1 for ON by default
> > +1 for disabling it for all output ports
> >
> > With the kind of issues we have observed being faced by developers in the
> > past, I strongly believe this check should be ON by default.
> > However at the same time I feel, it shall be one-time check, mostly in
> > Development phase and before going into Production. Having said that, if
> > disabling it at application level i.e. for all operators and their
> > respective output ports would it make implementation simpler, then that
> can
> > be targeted first. Thoughts?
> >
> > --prad
> >
> > On Thu, Oct 13, 2016 at 7:32 AM, Vlad Rozov 
> wrote:
> >
> >> I run jmh test and check takes 1ns on my MacBook Pro and on the lab
> >> machine. This corresponds to 3% degradation at 30 million
> events/second. I
> >> think we can move forward with the check ON by default. Do we need an
> >> ability to turn OFF check for a specific operator and/or port? My
> thought
> >> is that such ability is not necessary and it should be OK to disable
> check
> >> for all output ports in an application.
> >>
> >> Vlad
> >>
> >>
> >> On 10/12/16 11:56, Amol Kekre wrote:
> >>
> >>> In case there turns out to be a penalty, we can introduce a "check for
> >>> thread affinity" mode that triggers this check. My initial thought is
> to
> >>> make this check ON by default. We should wait till benchmarks are
> >>> available
> >>> before discussing adding this check.
> >>>
> >>> Thks
> >>> Amol
> >>>
> >>>
> >>> On Wed, Oct 12, 2016 at 11:07 AM, Sanjay Pujare <
> san...@datatorrent.com>
> >>> wrote:
> >>>
> >>> A JIRA has been created for adding this thread affinity check
>  https://issues.apache.org/jira/browse/APEXCORE-510 . I have made this
>  enhancement in a branch
>  https://github.com/sanjaypujare/apex-core/tree/malhar-510.
>  thread_affinity
>  and I have been benchmarking the performance with this change. I will
> be
>  publishing the results in the above JIRA where we can discuss them and
>  hopefully agree on merging this change.
> 
>  On Thu, Aug 11, 2016 at 1:41 PM, Sanjay Pujare <
> san...@datatorrent.com>
>  wrote:
> 
>  You are right, I was subconsciously thinking about the THREAD_LOCAL
> case
> > with a single container and a simple DAG and in that case Vlad’s
> >
>  assumption
> 
> > might not be valid but may be it is.
> >
> > On 8/11/16, 11:47 AM, "Munagala Ramanath" 
> wrote:
> >
> >  If I understand Vlad correctly, what he is saying is that each
> >
>  operator
> 
> >  saves currentThread in
> >  its own setup() and checks it in its own output methods. The
> > threads
> >
>  in
> 
> >  different operators are
> >  running potentially on different nodes and/or processes and
> there
> >
>  will
> 
> > be
> >  no connection between them.
> >
> >  Ram
> >
> >  On Thu, Aug 11, 2016 at 11:41 AM, Sanjay Pujare <
> > san...@datatorrent.com>
> >  wrote:
> >
> >  > Name check is expensive, agreed, but there isn’t anything else
> > currently.
> >  > Ideally the stram engine (considering that it is an engine
> >
>  providing
> 
> >  > resources like threads etc) should use a ThreadFactory or a
> > ThreadGroup to
> >  > create operator threads so identification and adding
> > functionality
> >
>  is
> 
> >  > easier.
> >  >
> >  > The idea of checking for the same thread between setup() and
> > emit()
> > won’t
> >  > work because the emit() check will have to be in the Sink
> > hierarchy
> > and
> >  > AFAIK a Sink object doesn’t have access to the corresponding
> > operator,
> >  > right? Another more fundamental problem probably is that these
> > threads
> >  > don’t have to match. The emit() for any operator (or rather a
> > Sink
> > related
> >  > to an operator) is ultimately triggered by an emitTuple() on
> the
> > topmost
> >  > input operator in that path which happens in that input
> > operator’s
> > thread
> >  > which doesn’t have to match the thread calling setup() in the
> > downstream
> >  > operators, right?
> >  >
> >  >
> >  > On 8/11/16, 10:59 AM, "Vlad Rozov" 
> >
>  wrote:
> 
> >  >
> >  > Name verification is too 

Re: can operators emit on a different from the operator itself thread?

2016-10-12 Thread Tushar Gosavi
+1 for on by default and ability to turn it off for entire application.

- Tushar.


On Thu, Oct 13, 2016 at 11:00 AM, Pradeep A. Dalvi  wrote:
> +1 for ON by default
> +1 for disabling it for all output ports
>
> With the kind of issues we have observed being faced by developers in the
> past, I strongly believe this check should be ON by default.
> However at the same time I feel, it shall be one-time check, mostly in
> Development phase and before going into Production. Having said that, if
> disabling it at application level i.e. for all operators and their
> respective output ports would it make implementation simpler, then that can
> be targeted first. Thoughts?
>
> --prad
>
> On Thu, Oct 13, 2016 at 7:32 AM, Vlad Rozov  wrote:
>
>> I run jmh test and check takes 1ns on my MacBook Pro and on the lab
>> machine. This corresponds to 3% degradation at 30 million events/second. I
>> think we can move forward with the check ON by default. Do we need an
>> ability to turn OFF check for a specific operator and/or port? My thought
>> is that such ability is not necessary and it should be OK to disable check
>> for all output ports in an application.
>>
>> Vlad
>>
>>
>> On 10/12/16 11:56, Amol Kekre wrote:
>>
>>> In case there turns out to be a penalty, we can introduce a "check for
>>> thread affinity" mode that triggers this check. My initial thought is to
>>> make this check ON by default. We should wait till benchmarks are
>>> available
>>> before discussing adding this check.
>>>
>>> Thks
>>> Amol
>>>
>>>
>>> On Wed, Oct 12, 2016 at 11:07 AM, Sanjay Pujare 
>>> wrote:
>>>
>>> A JIRA has been created for adding this thread affinity check
 https://issues.apache.org/jira/browse/APEXCORE-510 . I have made this
 enhancement in a branch
 https://github.com/sanjaypujare/apex-core/tree/malhar-510.
 thread_affinity
 and I have been benchmarking the performance with this change. I will be
 publishing the results in the above JIRA where we can discuss them and
 hopefully agree on merging this change.

 On Thu, Aug 11, 2016 at 1:41 PM, Sanjay Pujare 
 wrote:

 You are right, I was subconsciously thinking about the THREAD_LOCAL case
> with a single container and a simple DAG and in that case Vlad’s
>
 assumption

> might not be valid but may be it is.
>
> On 8/11/16, 11:47 AM, "Munagala Ramanath"  wrote:
>
>  If I understand Vlad correctly, what he is saying is that each
>
 operator

>  saves currentThread in
>  its own setup() and checks it in its own output methods. The
> threads
>
 in

>  different operators are
>  running potentially on different nodes and/or processes and there
>
 will

> be
>  no connection between them.
>
>  Ram
>
>  On Thu, Aug 11, 2016 at 11:41 AM, Sanjay Pujare <
> san...@datatorrent.com>
>  wrote:
>
>  > Name check is expensive, agreed, but there isn’t anything else
> currently.
>  > Ideally the stram engine (considering that it is an engine
>
 providing

>  > resources like threads etc) should use a ThreadFactory or a
> ThreadGroup to
>  > create operator threads so identification and adding
> functionality
>
 is

>  > easier.
>  >
>  > The idea of checking for the same thread between setup() and
> emit()
> won’t
>  > work because the emit() check will have to be in the Sink
> hierarchy
> and
>  > AFAIK a Sink object doesn’t have access to the corresponding
> operator,
>  > right? Another more fundamental problem probably is that these
> threads
>  > don’t have to match. The emit() for any operator (or rather a
> Sink
> related
>  > to an operator) is ultimately triggered by an emitTuple() on the
> topmost
>  > input operator in that path which happens in that input
> operator’s
> thread
>  > which doesn’t have to match the thread calling setup() in the
> downstream
>  > operators, right?
>  >
>  >
>  > On 8/11/16, 10:59 AM, "Vlad Rozov" 
>
 wrote:

>  >
>  > Name verification is too expensive, it will be sufficient to
> store
>  > currentThread during setup() and verify that it is the same
> during
>  > emit.
>  > Checks should be supported not only for DefaultOutputPort, so
>
 we

> may
>  > have it implemented in various Sinks.
>  >
>  > Vlad
>  >
>  > On 8/11/16 10:21, Sanjay Pujare wrote:
>  > > Thinking more about this – all of the “operator” threads
> are
> 

Re: can operators emit on a different from the operator itself thread?

2016-10-12 Thread Pradeep A. Dalvi
+1 for ON by default
+1 for disabling it for all output ports

With the kind of issues we have observed being faced by developers in the
past, I strongly believe this check should be ON by default.
However at the same time I feel, it shall be one-time check, mostly in
Development phase and before going into Production. Having said that, if
disabling it at application level i.e. for all operators and their
respective output ports would it make implementation simpler, then that can
be targeted first. Thoughts?

--prad

On Thu, Oct 13, 2016 at 7:32 AM, Vlad Rozov  wrote:

> I run jmh test and check takes 1ns on my MacBook Pro and on the lab
> machine. This corresponds to 3% degradation at 30 million events/second. I
> think we can move forward with the check ON by default. Do we need an
> ability to turn OFF check for a specific operator and/or port? My thought
> is that such ability is not necessary and it should be OK to disable check
> for all output ports in an application.
>
> Vlad
>
>
> On 10/12/16 11:56, Amol Kekre wrote:
>
>> In case there turns out to be a penalty, we can introduce a "check for
>> thread affinity" mode that triggers this check. My initial thought is to
>> make this check ON by default. We should wait till benchmarks are
>> available
>> before discussing adding this check.
>>
>> Thks
>> Amol
>>
>>
>> On Wed, Oct 12, 2016 at 11:07 AM, Sanjay Pujare 
>> wrote:
>>
>> A JIRA has been created for adding this thread affinity check
>>> https://issues.apache.org/jira/browse/APEXCORE-510 . I have made this
>>> enhancement in a branch
>>> https://github.com/sanjaypujare/apex-core/tree/malhar-510.
>>> thread_affinity
>>> and I have been benchmarking the performance with this change. I will be
>>> publishing the results in the above JIRA where we can discuss them and
>>> hopefully agree on merging this change.
>>>
>>> On Thu, Aug 11, 2016 at 1:41 PM, Sanjay Pujare 
>>> wrote:
>>>
>>> You are right, I was subconsciously thinking about the THREAD_LOCAL case
 with a single container and a simple DAG and in that case Vlad’s

>>> assumption
>>>
 might not be valid but may be it is.

 On 8/11/16, 11:47 AM, "Munagala Ramanath"  wrote:

  If I understand Vlad correctly, what he is saying is that each

>>> operator
>>>
  saves currentThread in
  its own setup() and checks it in its own output methods. The
 threads

>>> in
>>>
  different operators are
  running potentially on different nodes and/or processes and there

>>> will
>>>
 be
  no connection between them.

  Ram

  On Thu, Aug 11, 2016 at 11:41 AM, Sanjay Pujare <
 san...@datatorrent.com>
  wrote:

  > Name check is expensive, agreed, but there isn’t anything else
 currently.
  > Ideally the stram engine (considering that it is an engine

>>> providing
>>>
  > resources like threads etc) should use a ThreadFactory or a
 ThreadGroup to
  > create operator threads so identification and adding
 functionality

>>> is
>>>
  > easier.
  >
  > The idea of checking for the same thread between setup() and
 emit()
 won’t
  > work because the emit() check will have to be in the Sink
 hierarchy
 and
  > AFAIK a Sink object doesn’t have access to the corresponding
 operator,
  > right? Another more fundamental problem probably is that these
 threads
  > don’t have to match. The emit() for any operator (or rather a
 Sink
 related
  > to an operator) is ultimately triggered by an emitTuple() on the
 topmost
  > input operator in that path which happens in that input
 operator’s
 thread
  > which doesn’t have to match the thread calling setup() in the
 downstream
  > operators, right?
  >
  >
  > On 8/11/16, 10:59 AM, "Vlad Rozov" 

>>> wrote:
>>>
  >
  > Name verification is too expensive, it will be sufficient to
 store
  > currentThread during setup() and verify that it is the same
 during
  > emit.
  > Checks should be supported not only for DefaultOutputPort, so

>>> we
>>>
 may
  > have it implemented in various Sinks.
  >
  > Vlad
  >
  > On 8/11/16 10:21, Sanjay Pujare wrote:
  > > Thinking more about this – all of the “operator” threads
 are
 created
  > by the Stram engine with appropriate names. So we can put checks
 in
 the
  > DefaultOutputPort.emit() or in the various implementations of
 Sink.put()
  > that the current-thread is one created by the Stram engine (by
 verifying
  > the name).
  >   

Re: can operators emit on a different from the operator itself thread?

2016-10-12 Thread Vlad Rozov
I run jmh test and check takes 1ns on my MacBook Pro and on the lab 
machine. This corresponds to 3% degradation at 30 million events/second. 
I think we can move forward with the check ON by default. Do we need an 
ability to turn OFF check for a specific operator and/or port? My 
thought is that such ability is not necessary and it should be OK to 
disable check for all output ports in an application.


Vlad

On 10/12/16 11:56, Amol Kekre wrote:

In case there turns out to be a penalty, we can introduce a "check for
thread affinity" mode that triggers this check. My initial thought is to
make this check ON by default. We should wait till benchmarks are available
before discussing adding this check.

Thks
Amol


On Wed, Oct 12, 2016 at 11:07 AM, Sanjay Pujare 
wrote:


A JIRA has been created for adding this thread affinity check
https://issues.apache.org/jira/browse/APEXCORE-510 . I have made this
enhancement in a branch
https://github.com/sanjaypujare/apex-core/tree/malhar-510.thread_affinity
and I have been benchmarking the performance with this change. I will be
publishing the results in the above JIRA where we can discuss them and
hopefully agree on merging this change.

On Thu, Aug 11, 2016 at 1:41 PM, Sanjay Pujare 
wrote:


You are right, I was subconsciously thinking about the THREAD_LOCAL case
with a single container and a simple DAG and in that case Vlad’s

assumption

might not be valid but may be it is.

On 8/11/16, 11:47 AM, "Munagala Ramanath"  wrote:

 If I understand Vlad correctly, what he is saying is that each

operator

 saves currentThread in
 its own setup() and checks it in its own output methods. The threads

in

 different operators are
 running potentially on different nodes and/or processes and there

will

be
 no connection between them.

 Ram

 On Thu, Aug 11, 2016 at 11:41 AM, Sanjay Pujare <
san...@datatorrent.com>
 wrote:

 > Name check is expensive, agreed, but there isn’t anything else
currently.
 > Ideally the stram engine (considering that it is an engine

providing

 > resources like threads etc) should use a ThreadFactory or a
ThreadGroup to
 > create operator threads so identification and adding functionality

is

 > easier.
 >
 > The idea of checking for the same thread between setup() and emit()
won’t
 > work because the emit() check will have to be in the Sink hierarchy
and
 > AFAIK a Sink object doesn’t have access to the corresponding
operator,
 > right? Another more fundamental problem probably is that these
threads
 > don’t have to match. The emit() for any operator (or rather a Sink
related
 > to an operator) is ultimately triggered by an emitTuple() on the
topmost
 > input operator in that path which happens in that input operator’s
thread
 > which doesn’t have to match the thread calling setup() in the
downstream
 > operators, right?
 >
 >
 > On 8/11/16, 10:59 AM, "Vlad Rozov" 

wrote:

 >
 > Name verification is too expensive, it will be sufficient to
store
 > currentThread during setup() and verify that it is the same
during
 > emit.
 > Checks should be supported not only for DefaultOutputPort, so

we

may
 > have it implemented in various Sinks.
 >
 > Vlad
 >
 > On 8/11/16 10:21, Sanjay Pujare wrote:
 > > Thinking more about this – all of the “operator” threads are
created
 > by the Stram engine with appropriate names. So we can put checks in
the
 > DefaultOutputPort.emit() or in the various implementations of
Sink.put()
 > that the current-thread is one created by the Stram engine (by
verifying
 > the name).
 > >
 > > We can even use a special Thread object for operator threads
so the
 > above detection is easier.
 > >
 > >
 > >
 > > On 8/10/16, 6:11 PM, "Amol Kekre" 
wrote:
 > >
 > >  +1 on debug proposal. Even if tuples lands up within the
 > window, it breaks
 > >  all guarantees. A rerun (after restart from a

checkpoint)

can
 > have tuples
 > >  in different windows from this thread. A separate thread
simply
 > exposes
 > >  users to unwarranted risks.
 > >
 > >  Thks
 > >  Amol
 > >
 > >
 > >  On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov <
 > v.ro...@datatorrent.com> wrote:
 > >
 > >  > Tuples emitted between end and begin windows is only
one of
 > possible
 > >  > behaviors that emitting tuples on a separate from the
 > operator thread may
 > >  > introduce. It will be good to have both checks in

place

at
 > run-time and if
 > >  > checking for the operator thread for every emitted
tuple 

Re: can operators emit on a different from the operator itself thread?

2016-10-12 Thread Amol Kekre
In case there turns out to be a penalty, we can introduce a "check for
thread affinity" mode that triggers this check. My initial thought is to
make this check ON by default. We should wait till benchmarks are available
before discussing adding this check.

Thks
Amol


On Wed, Oct 12, 2016 at 11:07 AM, Sanjay Pujare 
wrote:

> A JIRA has been created for adding this thread affinity check
> https://issues.apache.org/jira/browse/APEXCORE-510 . I have made this
> enhancement in a branch
> https://github.com/sanjaypujare/apex-core/tree/malhar-510.thread_affinity
> and I have been benchmarking the performance with this change. I will be
> publishing the results in the above JIRA where we can discuss them and
> hopefully agree on merging this change.
>
> On Thu, Aug 11, 2016 at 1:41 PM, Sanjay Pujare 
> wrote:
>
> > You are right, I was subconsciously thinking about the THREAD_LOCAL case
> > with a single container and a simple DAG and in that case Vlad’s
> assumption
> > might not be valid but may be it is.
> >
> > On 8/11/16, 11:47 AM, "Munagala Ramanath"  wrote:
> >
> > If I understand Vlad correctly, what he is saying is that each
> operator
> > saves currentThread in
> > its own setup() and checks it in its own output methods. The threads
> in
> > different operators are
> > running potentially on different nodes and/or processes and there
> will
> > be
> > no connection between them.
> >
> > Ram
> >
> > On Thu, Aug 11, 2016 at 11:41 AM, Sanjay Pujare <
> > san...@datatorrent.com>
> > wrote:
> >
> > > Name check is expensive, agreed, but there isn’t anything else
> > currently.
> > > Ideally the stram engine (considering that it is an engine
> providing
> > > resources like threads etc) should use a ThreadFactory or a
> > ThreadGroup to
> > > create operator threads so identification and adding functionality
> is
> > > easier.
> > >
> > > The idea of checking for the same thread between setup() and emit()
> > won’t
> > > work because the emit() check will have to be in the Sink hierarchy
> > and
> > > AFAIK a Sink object doesn’t have access to the corresponding
> > operator,
> > > right? Another more fundamental problem probably is that these
> > threads
> > > don’t have to match. The emit() for any operator (or rather a Sink
> > related
> > > to an operator) is ultimately triggered by an emitTuple() on the
> > topmost
> > > input operator in that path which happens in that input operator’s
> > thread
> > > which doesn’t have to match the thread calling setup() in the
> > downstream
> > > operators, right?
> > >
> > >
> > > On 8/11/16, 10:59 AM, "Vlad Rozov" 
> wrote:
> > >
> > > Name verification is too expensive, it will be sufficient to
> > store
> > > currentThread during setup() and verify that it is the same
> > during
> > > emit.
> > > Checks should be supported not only for DefaultOutputPort, so
> we
> > may
> > > have it implemented in various Sinks.
> > >
> > > Vlad
> > >
> > > On 8/11/16 10:21, Sanjay Pujare wrote:
> > > > Thinking more about this – all of the “operator” threads are
> > created
> > > by the Stram engine with appropriate names. So we can put checks in
> > the
> > > DefaultOutputPort.emit() or in the various implementations of
> > Sink.put()
> > > that the current-thread is one created by the Stram engine (by
> > verifying
> > > the name).
> > > >
> > > > We can even use a special Thread object for operator threads
> > so the
> > > above detection is easier.
> > > >
> > > >
> > > >
> > > > On 8/10/16, 6:11 PM, "Amol Kekre" 
> > wrote:
> > > >
> > > >  +1 on debug proposal. Even if tuples lands up within the
> > > window, it breaks
> > > >  all guarantees. A rerun (after restart from a
> checkpoint)
> > can
> > > have tuples
> > > >  in different windows from this thread. A separate thread
> > simply
> > > exposes
> > > >  users to unwarranted risks.
> > > >
> > > >  Thks
> > > >  Amol
> > > >
> > > >
> > > >  On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov <
> > > v.ro...@datatorrent.com> wrote:
> > > >
> > > >  > Tuples emitted between end and begin windows is only
> > one of
> > > possible
> > > >  > behaviors that emitting tuples on a separate from the
> > > operator thread may
> > > >  > introduce. It will be good to have both checks in
> place
> > at
> > > run-time and if
> > > >  > checking for the operator thread for every emitted
> > tuple is
> > > too expensive,
> > > >  > we may have it enabled only in 

Re: can operators emit on a different from the operator itself thread?

2016-10-12 Thread Sanjay Pujare
A JIRA has been created for adding this thread affinity check
https://issues.apache.org/jira/browse/APEXCORE-510 . I have made this
enhancement in a branch
https://github.com/sanjaypujare/apex-core/tree/malhar-510.thread_affinity
and I have been benchmarking the performance with this change. I will be
publishing the results in the above JIRA where we can discuss them and
hopefully agree on merging this change.

On Thu, Aug 11, 2016 at 1:41 PM, Sanjay Pujare 
wrote:

> You are right, I was subconsciously thinking about the THREAD_LOCAL case
> with a single container and a simple DAG and in that case Vlad’s assumption
> might not be valid but may be it is.
>
> On 8/11/16, 11:47 AM, "Munagala Ramanath"  wrote:
>
> If I understand Vlad correctly, what he is saying is that each operator
> saves currentThread in
> its own setup() and checks it in its own output methods. The threads in
> different operators are
> running potentially on different nodes and/or processes and there will
> be
> no connection between them.
>
> Ram
>
> On Thu, Aug 11, 2016 at 11:41 AM, Sanjay Pujare <
> san...@datatorrent.com>
> wrote:
>
> > Name check is expensive, agreed, but there isn’t anything else
> currently.
> > Ideally the stram engine (considering that it is an engine providing
> > resources like threads etc) should use a ThreadFactory or a
> ThreadGroup to
> > create operator threads so identification and adding functionality is
> > easier.
> >
> > The idea of checking for the same thread between setup() and emit()
> won’t
> > work because the emit() check will have to be in the Sink hierarchy
> and
> > AFAIK a Sink object doesn’t have access to the corresponding
> operator,
> > right? Another more fundamental problem probably is that these
> threads
> > don’t have to match. The emit() for any operator (or rather a Sink
> related
> > to an operator) is ultimately triggered by an emitTuple() on the
> topmost
> > input operator in that path which happens in that input operator’s
> thread
> > which doesn’t have to match the thread calling setup() in the
> downstream
> > operators, right?
> >
> >
> > On 8/11/16, 10:59 AM, "Vlad Rozov"  wrote:
> >
> > Name verification is too expensive, it will be sufficient to
> store
> > currentThread during setup() and verify that it is the same
> during
> > emit.
> > Checks should be supported not only for DefaultOutputPort, so we
> may
> > have it implemented in various Sinks.
> >
> > Vlad
> >
> > On 8/11/16 10:21, Sanjay Pujare wrote:
> > > Thinking more about this – all of the “operator” threads are
> created
> > by the Stram engine with appropriate names. So we can put checks in
> the
> > DefaultOutputPort.emit() or in the various implementations of
> Sink.put()
> > that the current-thread is one created by the Stram engine (by
> verifying
> > the name).
> > >
> > > We can even use a special Thread object for operator threads
> so the
> > above detection is easier.
> > >
> > >
> > >
> > > On 8/10/16, 6:11 PM, "Amol Kekre" 
> wrote:
> > >
> > >  +1 on debug proposal. Even if tuples lands up within the
> > window, it breaks
> > >  all guarantees. A rerun (after restart from a checkpoint)
> can
> > have tuples
> > >  in different windows from this thread. A separate thread
> simply
> > exposes
> > >  users to unwarranted risks.
> > >
> > >  Thks
> > >  Amol
> > >
> > >
> > >  On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov <
> > v.ro...@datatorrent.com> wrote:
> > >
> > >  > Tuples emitted between end and begin windows is only
> one of
> > possible
> > >  > behaviors that emitting tuples on a separate from the
> > operator thread may
> > >  > introduce. It will be good to have both checks in place
> at
> > run-time and if
> > >  > checking for the operator thread for every emitted
> tuple is
> > too expensive,
> > >  > we may have it enabled only in DEBUG or mode with more
> checks
> > in place.
> > >  >
> > >  > Vlad
> > >  >
> > >  >
> > >  > Sanjay just reminded me of my typo -> I meant between
> > end_window and
> > >  >> start_window :)
> > >  >>
> > >  >> Thks
> > >  >> Amol
> > >  >>
> > >  >> On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujare <
> > san...@datatorrent.com>
> > >  >> wrote:
> > >  >>
> > >  >> If the goal is to do this validation through static
> 

Re: can operators emit on a different from the operator itself thread?

2016-08-12 Thread Sanjay Pujare
You are right, I was subconsciously thinking about the THREAD_LOCAL case with a 
single container and a simple DAG and in that case Vlad’s assumption might not 
be valid but may be it is.

On 8/11/16, 11:47 AM, "Munagala Ramanath"  wrote:

If I understand Vlad correctly, what he is saying is that each operator
saves currentThread in
its own setup() and checks it in its own output methods. The threads in
different operators are
running potentially on different nodes and/or processes and there will be
no connection between them.

Ram

On Thu, Aug 11, 2016 at 11:41 AM, Sanjay Pujare 
wrote:

> Name check is expensive, agreed, but there isn’t anything else currently.
> Ideally the stram engine (considering that it is an engine providing
> resources like threads etc) should use a ThreadFactory or a ThreadGroup to
> create operator threads so identification and adding functionality is
> easier.
>
> The idea of checking for the same thread between setup() and emit() won’t
> work because the emit() check will have to be in the Sink hierarchy and
> AFAIK a Sink object doesn’t have access to the corresponding operator,
> right? Another more fundamental problem probably is that these threads
> don’t have to match. The emit() for any operator (or rather a Sink related
> to an operator) is ultimately triggered by an emitTuple() on the topmost
> input operator in that path which happens in that input operator’s thread
> which doesn’t have to match the thread calling setup() in the downstream
> operators, right?
>
>
> On 8/11/16, 10:59 AM, "Vlad Rozov"  wrote:
>
> Name verification is too expensive, it will be sufficient to store
> currentThread during setup() and verify that it is the same during
> emit.
> Checks should be supported not only for DefaultOutputPort, so we may
> have it implemented in various Sinks.
>
> Vlad
>
> On 8/11/16 10:21, Sanjay Pujare wrote:
> > Thinking more about this – all of the “operator” threads are created
> by the Stram engine with appropriate names. So we can put checks in the
> DefaultOutputPort.emit() or in the various implementations of Sink.put()
> that the current-thread is one created by the Stram engine (by verifying
> the name).
> >
> > We can even use a special Thread object for operator threads so the
> above detection is easier.
> >
> >
> >
> > On 8/10/16, 6:11 PM, "Amol Kekre"  wrote:
> >
> >  +1 on debug proposal. Even if tuples lands up within the
> window, it breaks
> >  all guarantees. A rerun (after restart from a checkpoint) can
> have tuples
> >  in different windows from this thread. A separate thread simply
> exposes
> >  users to unwarranted risks.
> >
> >  Thks
> >  Amol
> >
> >
> >  On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov <
> v.ro...@datatorrent.com> wrote:
> >
> >  > Tuples emitted between end and begin windows is only one of
> possible
> >  > behaviors that emitting tuples on a separate from the
> operator thread may
> >  > introduce. It will be good to have both checks in place at
> run-time and if
> >  > checking for the operator thread for every emitted tuple is
> too expensive,
> >  > we may have it enabled only in DEBUG or mode with more checks
> in place.
> >  >
> >  > Vlad
> >  >
> >  >
> >  > Sanjay just reminded me of my typo -> I meant between
> end_window and
> >  >> start_window :)
> >  >>
> >  >> Thks
> >  >> Amol
> >  >>
> >  >> On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujare <
> san...@datatorrent.com>
> >  >> wrote:
> >  >>
> >  >> If the goal is to do this validation through static analysis
> of operator
> >  >>> code, I guess it is possible but is going to be
> non-trivial. And there
> >  >>> could be false positives and false negatives.
> >  >>>
> >  >>> Also I suppose this discussion applies to processor
> operators (those
> >  >>> having both in and out ports) so Ram’s example of
> JdbcPollInputOperator
> >  >>> may
> >  >>> not be applicable here?
> >  >>>
> >  >>> On 8/10/16, 2:04 PM, "Ashwin Chandra Putta" <
> ashwinchand...@gmail.com>
> >  >>> wrote:
> >  >>>
> >  >>>  In a separate thread I mean.
> >  >>>
> >  

Re: can operators emit on a different from the operator itself thread?

2016-08-11 Thread Vlad Rozov
Correct, except that it is Sink not an Operator that will need to save 
current thread during setup(). Sink does not need access to an Operator, 
it is sufficient to rely on the platform to call setup() method on the 
Operator thread.


Vlad


On 8/11/16 11:47, Munagala Ramanath wrote:

If I understand Vlad correctly, what he is saying is that each operator
saves currentThread in
its own setup() and checks it in its own output methods. The threads in
different operators are
running potentially on different nodes and/or processes and there will be
no connection between them.

Ram

On Thu, Aug 11, 2016 at 11:41 AM, Sanjay Pujare 
wrote:


Name check is expensive, agreed, but there isn’t anything else currently.
Ideally the stram engine (considering that it is an engine providing
resources like threads etc) should use a ThreadFactory or a ThreadGroup to
create operator threads so identification and adding functionality is
easier.

The idea of checking for the same thread between setup() and emit() won’t
work because the emit() check will have to be in the Sink hierarchy and
AFAIK a Sink object doesn’t have access to the corresponding operator,
right? Another more fundamental problem probably is that these threads
don’t have to match. The emit() for any operator (or rather a Sink related
to an operator) is ultimately triggered by an emitTuple() on the topmost
input operator in that path which happens in that input operator’s thread
which doesn’t have to match the thread calling setup() in the downstream
operators, right?


On 8/11/16, 10:59 AM, "Vlad Rozov"  wrote:

 Name verification is too expensive, it will be sufficient to store
 currentThread during setup() and verify that it is the same during
emit.
 Checks should be supported not only for DefaultOutputPort, so we may
 have it implemented in various Sinks.

 Vlad

 On 8/11/16 10:21, Sanjay Pujare wrote:
 > Thinking more about this – all of the “operator” threads are created
by the Stram engine with appropriate names. So we can put checks in the
DefaultOutputPort.emit() or in the various implementations of Sink.put()
that the current-thread is one created by the Stram engine (by verifying
the name).
 >
 > We can even use a special Thread object for operator threads so the
above detection is easier.
 >
 >
 >
 > On 8/10/16, 6:11 PM, "Amol Kekre"  wrote:
 >
 >  +1 on debug proposal. Even if tuples lands up within the
window, it breaks
 >  all guarantees. A rerun (after restart from a checkpoint) can
have tuples
 >  in different windows from this thread. A separate thread simply
exposes
 >  users to unwarranted risks.
 >
 >  Thks
 >  Amol
 >
 >
 >  On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov <
v.ro...@datatorrent.com> wrote:
 >
 >  > Tuples emitted between end and begin windows is only one of
possible
 >  > behaviors that emitting tuples on a separate from the
operator thread may
 >  > introduce. It will be good to have both checks in place at
run-time and if
 >  > checking for the operator thread for every emitted tuple is
too expensive,
 >  > we may have it enabled only in DEBUG or mode with more checks
in place.
 >  >
 >  > Vlad
 >  >
 >  >
 >  > Sanjay just reminded me of my typo -> I meant between
end_window and
 >  >> start_window :)
 >  >>
 >  >> Thks
 >  >> Amol
 >  >>
 >  >> On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujare <
san...@datatorrent.com>
 >  >> wrote:
 >  >>
 >  >> If the goal is to do this validation through static analysis
of operator
 >  >>> code, I guess it is possible but is going to be
non-trivial. And there
 >  >>> could be false positives and false negatives.
 >  >>>
 >  >>> Also I suppose this discussion applies to processor
operators (those
 >  >>> having both in and out ports) so Ram’s example of
JdbcPollInputOperator
 >  >>> may
 >  >>> not be applicable here?
 >  >>>
 >  >>> On 8/10/16, 2:04 PM, "Ashwin Chandra Putta" <
ashwinchand...@gmail.com>
 >  >>> wrote:
 >  >>>
 >  >>>  In a separate thread I mean.
 >  >>>
 >  >>>  Regards,
 >  >>>  Ashwin.
 >  >>>
 >  >>>  On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta <
 >  >>>  ashwinchand...@gmail.com> wrote:
 >  >>>
 >  >>>  > + dev@apex.apache.org
 >  >>>  > - us...@apex.apache.org
 >  >>>  >
 >  >>>  > This is one of those best practices that we learn by
experience
 >  >>> during
 >  >>>  > operator development. It will save a lot of time
during operator
 >  >>>  > development if we 

Re: can operators emit on a different from the operator itself thread?

2016-08-11 Thread Munagala Ramanath
If I understand Vlad correctly, what he is saying is that each operator
saves currentThread in
its own setup() and checks it in its own output methods. The threads in
different operators are
running potentially on different nodes and/or processes and there will be
no connection between them.

Ram

On Thu, Aug 11, 2016 at 11:41 AM, Sanjay Pujare 
wrote:

> Name check is expensive, agreed, but there isn’t anything else currently.
> Ideally the stram engine (considering that it is an engine providing
> resources like threads etc) should use a ThreadFactory or a ThreadGroup to
> create operator threads so identification and adding functionality is
> easier.
>
> The idea of checking for the same thread between setup() and emit() won’t
> work because the emit() check will have to be in the Sink hierarchy and
> AFAIK a Sink object doesn’t have access to the corresponding operator,
> right? Another more fundamental problem probably is that these threads
> don’t have to match. The emit() for any operator (or rather a Sink related
> to an operator) is ultimately triggered by an emitTuple() on the topmost
> input operator in that path which happens in that input operator’s thread
> which doesn’t have to match the thread calling setup() in the downstream
> operators, right?
>
>
> On 8/11/16, 10:59 AM, "Vlad Rozov"  wrote:
>
> Name verification is too expensive, it will be sufficient to store
> currentThread during setup() and verify that it is the same during
> emit.
> Checks should be supported not only for DefaultOutputPort, so we may
> have it implemented in various Sinks.
>
> Vlad
>
> On 8/11/16 10:21, Sanjay Pujare wrote:
> > Thinking more about this – all of the “operator” threads are created
> by the Stram engine with appropriate names. So we can put checks in the
> DefaultOutputPort.emit() or in the various implementations of Sink.put()
> that the current-thread is one created by the Stram engine (by verifying
> the name).
> >
> > We can even use a special Thread object for operator threads so the
> above detection is easier.
> >
> >
> >
> > On 8/10/16, 6:11 PM, "Amol Kekre"  wrote:
> >
> >  +1 on debug proposal. Even if tuples lands up within the
> window, it breaks
> >  all guarantees. A rerun (after restart from a checkpoint) can
> have tuples
> >  in different windows from this thread. A separate thread simply
> exposes
> >  users to unwarranted risks.
> >
> >  Thks
> >  Amol
> >
> >
> >  On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov <
> v.ro...@datatorrent.com> wrote:
> >
> >  > Tuples emitted between end and begin windows is only one of
> possible
> >  > behaviors that emitting tuples on a separate from the
> operator thread may
> >  > introduce. It will be good to have both checks in place at
> run-time and if
> >  > checking for the operator thread for every emitted tuple is
> too expensive,
> >  > we may have it enabled only in DEBUG or mode with more checks
> in place.
> >  >
> >  > Vlad
> >  >
> >  >
> >  > Sanjay just reminded me of my typo -> I meant between
> end_window and
> >  >> start_window :)
> >  >>
> >  >> Thks
> >  >> Amol
> >  >>
> >  >> On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujare <
> san...@datatorrent.com>
> >  >> wrote:
> >  >>
> >  >> If the goal is to do this validation through static analysis
> of operator
> >  >>> code, I guess it is possible but is going to be
> non-trivial. And there
> >  >>> could be false positives and false negatives.
> >  >>>
> >  >>> Also I suppose this discussion applies to processor
> operators (those
> >  >>> having both in and out ports) so Ram’s example of
> JdbcPollInputOperator
> >  >>> may
> >  >>> not be applicable here?
> >  >>>
> >  >>> On 8/10/16, 2:04 PM, "Ashwin Chandra Putta" <
> ashwinchand...@gmail.com>
> >  >>> wrote:
> >  >>>
> >  >>>  In a separate thread I mean.
> >  >>>
> >  >>>  Regards,
> >  >>>  Ashwin.
> >  >>>
> >  >>>  On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta <
> >  >>>  ashwinchand...@gmail.com> wrote:
> >  >>>
> >  >>>  > + dev@apex.apache.org
> >  >>>  > - us...@apex.apache.org
> >  >>>  >
> >  >>>  > This is one of those best practices that we learn by
> experience
> >  >>> during
> >  >>>  > operator development. It will save a lot of time
> during operator
> >  >>>  > development if we can catch and throw validation
> error when
> >  >>> someone
> >  >>> emits
> >  >>>  > tuples in a non 

Re: can operators emit on a different from the operator itself thread?

2016-08-11 Thread Sanjay Pujare
Name check is expensive, agreed, but there isn’t anything else currently. 
Ideally the stram engine (considering that it is an engine providing resources 
like threads etc) should use a ThreadFactory or a ThreadGroup to create 
operator threads so identification and adding functionality is easier.

The idea of checking for the same thread between setup() and emit() won’t work 
because the emit() check will have to be in the Sink hierarchy and AFAIK a Sink 
object doesn’t have access to the corresponding operator, right? Another more 
fundamental problem probably is that these threads don’t have to match. The 
emit() for any operator (or rather a Sink related to an operator) is ultimately 
triggered by an emitTuple() on the topmost input operator in that path which 
happens in that input operator’s thread which doesn’t have to match the thread 
calling setup() in the downstream operators, right?


On 8/11/16, 10:59 AM, "Vlad Rozov"  wrote:

Name verification is too expensive, it will be sufficient to store 
currentThread during setup() and verify that it is the same during emit. 
Checks should be supported not only for DefaultOutputPort, so we may 
have it implemented in various Sinks.

Vlad

On 8/11/16 10:21, Sanjay Pujare wrote:
> Thinking more about this – all of the “operator” threads are created by 
the Stram engine with appropriate names. So we can put checks in the 
DefaultOutputPort.emit() or in the various implementations of Sink.put() that 
the current-thread is one created by the Stram engine (by verifying the name).
>
> We can even use a special Thread object for operator threads so the above 
detection is easier.
>
>
>
> On 8/10/16, 6:11 PM, "Amol Kekre"  wrote:
>
>  +1 on debug proposal. Even if tuples lands up within the window, it 
breaks
>  all guarantees. A rerun (after restart from a checkpoint) can have 
tuples
>  in different windows from this thread. A separate thread simply 
exposes
>  users to unwarranted risks.
>  
>  Thks
>  Amol
>  
>  
>  On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov 
 wrote:
>  
>  > Tuples emitted between end and begin windows is only one of 
possible
>  > behaviors that emitting tuples on a separate from the operator 
thread may
>  > introduce. It will be good to have both checks in place at 
run-time and if
>  > checking for the operator thread for every emitted tuple is too 
expensive,
>  > we may have it enabled only in DEBUG or mode with more checks in 
place.
>  >
>  > Vlad
>  >
>  >
>  > Sanjay just reminded me of my typo -> I meant between end_window 
and
>  >> start_window :)
>  >>
>  >> Thks
>  >> Amol
>  >>
>  >> On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujare 

>  >> wrote:
>  >>
>  >> If the goal is to do this validation through static analysis of 
operator
>  >>> code, I guess it is possible but is going to be non-trivial. And 
there
>  >>> could be false positives and false negatives.
>  >>>
>  >>> Also I suppose this discussion applies to processor operators 
(those
>  >>> having both in and out ports) so Ram’s example of 
JdbcPollInputOperator
>  >>> may
>  >>> not be applicable here?
>  >>>
>  >>> On 8/10/16, 2:04 PM, "Ashwin Chandra Putta" 

>  >>> wrote:
>  >>>
>  >>>  In a separate thread I mean.
>  >>>
>  >>>  Regards,
>  >>>  Ashwin.
>  >>>
>  >>>  On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta <
>  >>>  ashwinchand...@gmail.com> wrote:
>  >>>
>  >>>  > + dev@apex.apache.org
>  >>>  > - us...@apex.apache.org
>  >>>  >
>  >>>  > This is one of those best practices that we learn by 
experience
>  >>> during
>  >>>  > operator development. It will save a lot of time during 
operator
>  >>>  > development if we can catch and throw validation error 
when
>  >>> someone
>  >>> emits
>  >>>  > tuples in a non separate thread.
>  >>>  >
>  >>>  > Regards,
>  >>>  > Ashwin
>  >>>  >
>  >>>  > On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath <
>  >>> r...@datatorrent.com>
>  >>>  > wrote:
>  >>>  >
>  >>>  >> For cases where use of a different thread is needed, it 
can write
>  >>> tuples
>  >>>  >> to a queue from where the operator thread pulls them --
>  >>>  >> JdbcPollInputOperator in Malhar has an example.

Re: can operators emit on a different from the operator itself thread?

2016-08-11 Thread Vlad Rozov
Name verification is too expensive, it will be sufficient to store 
currentThread during setup() and verify that it is the same during emit. 
Checks should be supported not only for DefaultOutputPort, so we may 
have it implemented in various Sinks.


Vlad

On 8/11/16 10:21, Sanjay Pujare wrote:

Thinking more about this – all of the “operator” threads are created by the 
Stram engine with appropriate names. So we can put checks in the 
DefaultOutputPort.emit() or in the various implementations of Sink.put() that 
the current-thread is one created by the Stram engine (by verifying the name).

We can even use a special Thread object for operator threads so the above 
detection is easier.



On 8/10/16, 6:11 PM, "Amol Kekre"  wrote:

 +1 on debug proposal. Even if tuples lands up within the window, it breaks
 all guarantees. A rerun (after restart from a checkpoint) can have tuples
 in different windows from this thread. A separate thread simply exposes
 users to unwarranted risks.
 
 Thks

 Amol
 
 
 On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov  wrote:
 
 > Tuples emitted between end and begin windows is only one of possible

 > behaviors that emitting tuples on a separate from the operator thread may
 > introduce. It will be good to have both checks in place at run-time and 
if
 > checking for the operator thread for every emitted tuple is too 
expensive,
 > we may have it enabled only in DEBUG or mode with more checks in place.
 >
 > Vlad
 >
 >
 > Sanjay just reminded me of my typo -> I meant between end_window and
 >> start_window :)
 >>
 >> Thks
 >> Amol
 >>
 >> On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujare 
 >> wrote:
 >>
 >> If the goal is to do this validation through static analysis of operator
 >>> code, I guess it is possible but is going to be non-trivial. And there
 >>> could be false positives and false negatives.
 >>>
 >>> Also I suppose this discussion applies to processor operators (those
 >>> having both in and out ports) so Ram’s example of JdbcPollInputOperator
 >>> may
 >>> not be applicable here?
 >>>
 >>> On 8/10/16, 2:04 PM, "Ashwin Chandra Putta" 
 >>> wrote:
 >>>
 >>>  In a separate thread I mean.
 >>>
 >>>  Regards,
 >>>  Ashwin.
 >>>
 >>>  On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta <
 >>>  ashwinchand...@gmail.com> wrote:
 >>>
 >>>  > + dev@apex.apache.org
 >>>  > - us...@apex.apache.org
 >>>  >
 >>>  > This is one of those best practices that we learn by experience
 >>> during
 >>>  > operator development. It will save a lot of time during operator
 >>>  > development if we can catch and throw validation error when
 >>> someone
 >>> emits
 >>>  > tuples in a non separate thread.
 >>>  >
 >>>  > Regards,
 >>>  > Ashwin
 >>>  >
 >>>  > On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath <
 >>> r...@datatorrent.com>
 >>>  > wrote:
 >>>  >
 >>>  >> For cases where use of a different thread is needed, it can 
write
 >>> tuples
 >>>  >> to a queue from where the operator thread pulls them --
 >>>  >> JdbcPollInputOperator in Malhar has an example.
 >>>  >>
 >>>  >> Ram
 >>>  >>
 >>>  >> On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com <
 >>> hsy...@gmail.com
 >>>  >> wrote:
 >>>  >>
 >>>  >>> Hey Vlad,
 >>>  >>>
 >>>  >>> Thanks for bringing this up. Is there an easy way to detect
 >>> unexpected
 >>>  >>> use of emit method without hurt the performance. Or at least 
if
 >>> we
 >>> can
 >>>  >>> detect this in debug mode.
 >>>  >>>
 >>>  >>> Regards,
 >>>  >>> Siyuan
 >>>  >>>
 >>>  >>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov <
 >>> v.ro...@datatorrent.com>
 >>>  >>> wrote:
 >>>  >>>
 >>>   The short answer is no, creating worker thread to emit tuples
 >>> is
 >>> not
 >>>   supported by Apex and will lead to an undefined behavior.
 >>> Operators in Apex
 >>>   have strong thread affinity and all interaction with the
 >>> platform
 >>> must
 >>>   happen on the operator thread.
 >>>  
 >>>   Vlad
 >>>  
 >>>  >>>
 >>>  >>>
 >>>  >>
 >>>  >
 >>>  >
 >>>  > --
 >>>  >
 >>>  > Regards,
 >>>  > Ashwin.
 >>>  >
 >>>
 >>>
 >>>
 >>>  --
 >>>
 >>>  Regards,
 >>>  Ashwin.
 >>>
 >>>
 >>>
 >>>
 >>>
 >
 







Re: can operators emit on a different from the operator itself thread?

2016-08-11 Thread Sanjay Pujare
Thinking more about this – all of the “operator” threads are created by the 
Stram engine with appropriate names. So we can put checks in the 
DefaultOutputPort.emit() or in the various implementations of Sink.put() that 
the current-thread is one created by the Stram engine (by verifying the name).

We can even use a special Thread object for operator threads so the above 
detection is easier.



On 8/10/16, 6:11 PM, "Amol Kekre"  wrote:

+1 on debug proposal. Even if tuples lands up within the window, it breaks
all guarantees. A rerun (after restart from a checkpoint) can have tuples
in different windows from this thread. A separate thread simply exposes
users to unwarranted risks.

Thks
Amol


On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov  wrote:

> Tuples emitted between end and begin windows is only one of possible
> behaviors that emitting tuples on a separate from the operator thread may
> introduce. It will be good to have both checks in place at run-time and if
> checking for the operator thread for every emitted tuple is too expensive,
> we may have it enabled only in DEBUG or mode with more checks in place.
>
> Vlad
>
>
> Sanjay just reminded me of my typo -> I meant between end_window and
>> start_window :)
>>
>> Thks
>> Amol
>>
>> On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujare 
>> wrote:
>>
>> If the goal is to do this validation through static analysis of operator
>>> code, I guess it is possible but is going to be non-trivial. And there
>>> could be false positives and false negatives.
>>>
>>> Also I suppose this discussion applies to processor operators (those
>>> having both in and out ports) so Ram’s example of JdbcPollInputOperator
>>> may
>>> not be applicable here?
>>>
>>> On 8/10/16, 2:04 PM, "Ashwin Chandra Putta" 
>>> wrote:
>>>
>>>  In a separate thread I mean.
>>>
>>>  Regards,
>>>  Ashwin.
>>>
>>>  On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta <
>>>  ashwinchand...@gmail.com> wrote:
>>>
>>>  > + dev@apex.apache.org
>>>  > - us...@apex.apache.org
>>>  >
>>>  > This is one of those best practices that we learn by experience
>>> during
>>>  > operator development. It will save a lot of time during operator
>>>  > development if we can catch and throw validation error when
>>> someone
>>> emits
>>>  > tuples in a non separate thread.
>>>  >
>>>  > Regards,
>>>  > Ashwin
>>>  >
>>>  > On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath <
>>> r...@datatorrent.com>
>>>  > wrote:
>>>  >
>>>  >> For cases where use of a different thread is needed, it can 
write
>>> tuples
>>>  >> to a queue from where the operator thread pulls them --
>>>  >> JdbcPollInputOperator in Malhar has an example.
>>>  >>
>>>  >> Ram
>>>  >>
>>>  >> On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com <
>>> hsy...@gmail.com
>>>  >> wrote:
>>>  >>
>>>  >>> Hey Vlad,
>>>  >>>
>>>  >>> Thanks for bringing this up. Is there an easy way to detect
>>> unexpected
>>>  >>> use of emit method without hurt the performance. Or at least if
>>> we
>>> can
>>>  >>> detect this in debug mode.
>>>  >>>
>>>  >>> Regards,
>>>  >>> Siyuan
>>>  >>>
>>>  >>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov <
>>> v.ro...@datatorrent.com>
>>>  >>> wrote:
>>>  >>>
>>>   The short answer is no, creating worker thread to emit tuples
>>> is
>>> not
>>>   supported by Apex and will lead to an undefined behavior.
>>> Operators in Apex
>>>   have strong thread affinity and all interaction with the
>>> platform
>>> must
>>>   happen on the operator thread.
>>>  
>>>   Vlad
>>>  
>>>  >>>
>>>  >>>
>>>  >>
>>>  >
>>>  >
>>>  > --
>>>  >
>>>  > Regards,
>>>  > Ashwin.
>>>  >
>>>
>>>
>>>
>>>  --
>>>
>>>  Regards,
>>>  Ashwin.
>>>
>>>
>>>
>>>
>>>
>





Re: can operators emit on a different from the operator itself thread?

2016-08-10 Thread Amol Kekre
+1 on debug proposal. Even if tuples lands up within the window, it breaks
all guarantees. A rerun (after restart from a checkpoint) can have tuples
in different windows from this thread. A separate thread simply exposes
users to unwarranted risks.

Thks
Amol


On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov  wrote:

> Tuples emitted between end and begin windows is only one of possible
> behaviors that emitting tuples on a separate from the operator thread may
> introduce. It will be good to have both checks in place at run-time and if
> checking for the operator thread for every emitted tuple is too expensive,
> we may have it enabled only in DEBUG or mode with more checks in place.
>
> Vlad
>
>
> Sanjay just reminded me of my typo -> I meant between end_window and
>> start_window :)
>>
>> Thks
>> Amol
>>
>> On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujare 
>> wrote:
>>
>> If the goal is to do this validation through static analysis of operator
>>> code, I guess it is possible but is going to be non-trivial. And there
>>> could be false positives and false negatives.
>>>
>>> Also I suppose this discussion applies to processor operators (those
>>> having both in and out ports) so Ram’s example of JdbcPollInputOperator
>>> may
>>> not be applicable here?
>>>
>>> On 8/10/16, 2:04 PM, "Ashwin Chandra Putta" 
>>> wrote:
>>>
>>>  In a separate thread I mean.
>>>
>>>  Regards,
>>>  Ashwin.
>>>
>>>  On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta <
>>>  ashwinchand...@gmail.com> wrote:
>>>
>>>  > + dev@apex.apache.org
>>>  > - us...@apex.apache.org
>>>  >
>>>  > This is one of those best practices that we learn by experience
>>> during
>>>  > operator development. It will save a lot of time during operator
>>>  > development if we can catch and throw validation error when
>>> someone
>>> emits
>>>  > tuples in a non separate thread.
>>>  >
>>>  > Regards,
>>>  > Ashwin
>>>  >
>>>  > On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath <
>>> r...@datatorrent.com>
>>>  > wrote:
>>>  >
>>>  >> For cases where use of a different thread is needed, it can write
>>> tuples
>>>  >> to a queue from where the operator thread pulls them --
>>>  >> JdbcPollInputOperator in Malhar has an example.
>>>  >>
>>>  >> Ram
>>>  >>
>>>  >> On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com <
>>> hsy...@gmail.com
>>>  >> wrote:
>>>  >>
>>>  >>> Hey Vlad,
>>>  >>>
>>>  >>> Thanks for bringing this up. Is there an easy way to detect
>>> unexpected
>>>  >>> use of emit method without hurt the performance. Or at least if
>>> we
>>> can
>>>  >>> detect this in debug mode.
>>>  >>>
>>>  >>> Regards,
>>>  >>> Siyuan
>>>  >>>
>>>  >>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov <
>>> v.ro...@datatorrent.com>
>>>  >>> wrote:
>>>  >>>
>>>   The short answer is no, creating worker thread to emit tuples
>>> is
>>> not
>>>   supported by Apex and will lead to an undefined behavior.
>>> Operators in Apex
>>>   have strong thread affinity and all interaction with the
>>> platform
>>> must
>>>   happen on the operator thread.
>>>  
>>>   Vlad
>>>  
>>>  >>>
>>>  >>>
>>>  >>
>>>  >
>>>  >
>>>  > --
>>>  >
>>>  > Regards,
>>>  > Ashwin.
>>>  >
>>>
>>>
>>>
>>>  --
>>>
>>>  Regards,
>>>  Ashwin.
>>>
>>>
>>>
>>>
>>>
>


Re: can operators emit on a different from the operator itself thread?

2016-08-10 Thread Sanjay Pujare
If the goal is to do this validation through static analysis of operator code, 
I guess it is possible but is going to be non-trivial. And there could be false 
positives and false negatives.

Also I suppose this discussion applies to processor operators (those having 
both in and out ports) so Ram’s example of JdbcPollInputOperator may not be 
applicable here?

On 8/10/16, 2:04 PM, "Ashwin Chandra Putta"  wrote:

In a separate thread I mean.

Regards,
Ashwin.

On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta <
ashwinchand...@gmail.com> wrote:

> + dev@apex.apache.org
> - us...@apex.apache.org
>
> This is one of those best practices that we learn by experience during
> operator development. It will save a lot of time during operator
> development if we can catch and throw validation error when someone emits
> tuples in a non separate thread.
>
> Regards,
> Ashwin
>
> On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath 
> wrote:
>
>> For cases where use of a different thread is needed, it can write tuples
>> to a queue from where the operator thread pulls them --
>> JdbcPollInputOperator in Malhar has an example.
>>
>> Ram
>>
>> On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com 
>> wrote:
>>
>>> Hey Vlad,
>>>
>>> Thanks for bringing this up. Is there an easy way to detect unexpected
>>> use of emit method without hurt the performance. Or at least if we can
>>> detect this in debug mode.
>>>
>>> Regards,
>>> Siyuan
>>>
>>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov 
>>> wrote:
>>>
 The short answer is no, creating worker thread to emit tuples is not
 supported by Apex and will lead to an undefined behavior. Operators in 
Apex
 have strong thread affinity and all interaction with the platform must
 happen on the operator thread.

 Vlad

>>>
>>>
>>
>
>
> --
>
> Regards,
> Ashwin.
>



-- 

Regards,
Ashwin.





Re: can operators emit on a different from the operator itself thread?

2016-08-10 Thread Amol Kekre
Sanjay just reminded me of my typo -> I meant between end_window and
start_window :)

Thks
Amol

On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujare 
wrote:

> If the goal is to do this validation through static analysis of operator
> code, I guess it is possible but is going to be non-trivial. And there
> could be false positives and false negatives.
>
> Also I suppose this discussion applies to processor operators (those
> having both in and out ports) so Ram’s example of JdbcPollInputOperator may
> not be applicable here?
>
> On 8/10/16, 2:04 PM, "Ashwin Chandra Putta" 
> wrote:
>
> In a separate thread I mean.
>
> Regards,
> Ashwin.
>
> On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta <
> ashwinchand...@gmail.com> wrote:
>
> > + dev@apex.apache.org
> > - us...@apex.apache.org
> >
> > This is one of those best practices that we learn by experience
> during
> > operator development. It will save a lot of time during operator
> > development if we can catch and throw validation error when someone
> emits
> > tuples in a non separate thread.
> >
> > Regards,
> > Ashwin
> >
> > On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath <
> r...@datatorrent.com>
> > wrote:
> >
> >> For cases where use of a different thread is needed, it can write
> tuples
> >> to a queue from where the operator thread pulls them --
> >> JdbcPollInputOperator in Malhar has an example.
> >>
> >> Ram
> >>
> >> On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com  >
> >> wrote:
> >>
> >>> Hey Vlad,
> >>>
> >>> Thanks for bringing this up. Is there an easy way to detect
> unexpected
> >>> use of emit method without hurt the performance. Or at least if we
> can
> >>> detect this in debug mode.
> >>>
> >>> Regards,
> >>> Siyuan
> >>>
> >>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov <
> v.ro...@datatorrent.com>
> >>> wrote:
> >>>
>  The short answer is no, creating worker thread to emit tuples is
> not
>  supported by Apex and will lead to an undefined behavior.
> Operators in Apex
>  have strong thread affinity and all interaction with the platform
> must
>  happen on the operator thread.
> 
>  Vlad
> 
> >>>
> >>>
> >>
> >
> >
> > --
> >
> > Regards,
> > Ashwin.
> >
>
>
>
> --
>
> Regards,
> Ashwin.
>
>
>
>


Re: can operators emit on a different from the operator itself thread?

2016-08-10 Thread Amol Kekre
Send too soon. A quicker way would be to catch emit happening between
start_window and end_window and flag an error. Catching "another thread"
for every tuple may have a huge performance hit.

Thks
Amol


On Wed, Aug 10, 2016 at 2:31 PM, Amol Kekre  wrote:

>
> Currently user can code it that way. IMHO Apex should catch this and flag
> error.
>
> Thks
> Amol
>
>
> On Wed, Aug 10, 2016 at 2:04 PM, Ashwin Chandra Putta <
> ashwinchand...@gmail.com> wrote:
>
>> In a separate thread I mean.
>>
>> Regards,
>> Ashwin.
>>
>> On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta <
>> ashwinchand...@gmail.com> wrote:
>>
>> > + dev@apex.apache.org
>> > - us...@apex.apache.org
>> >
>> > This is one of those best practices that we learn by experience during
>> > operator development. It will save a lot of time during operator
>> > development if we can catch and throw validation error when someone
>> emits
>> > tuples in a non separate thread.
>> >
>> > Regards,
>> > Ashwin
>> >
>> > On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath > >
>> > wrote:
>> >
>> >> For cases where use of a different thread is needed, it can write
>> tuples
>> >> to a queue from where the operator thread pulls them --
>> >> JdbcPollInputOperator in Malhar has an example.
>> >>
>> >> Ram
>> >>
>> >> On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com 
>> >> wrote:
>> >>
>> >>> Hey Vlad,
>> >>>
>> >>> Thanks for bringing this up. Is there an easy way to detect unexpected
>> >>> use of emit method without hurt the performance. Or at least if we can
>> >>> detect this in debug mode.
>> >>>
>> >>> Regards,
>> >>> Siyuan
>> >>>
>> >>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov > >
>> >>> wrote:
>> >>>
>>  The short answer is no, creating worker thread to emit tuples is not
>>  supported by Apex and will lead to an undefined behavior. Operators
>> in Apex
>>  have strong thread affinity and all interaction with the platform
>> must
>>  happen on the operator thread.
>> 
>>  Vlad
>> 
>> >>>
>> >>>
>> >>
>> >
>> >
>> > --
>> >
>> > Regards,
>> > Ashwin.
>> >
>>
>>
>>
>> --
>>
>> Regards,
>> Ashwin.
>>
>
>


Re: can operators emit on a different from the operator itself thread?

2016-08-10 Thread Amol Kekre
Currently user can code it that way. IMHO Apex should catch this and flag
error.

Thks
Amol


On Wed, Aug 10, 2016 at 2:04 PM, Ashwin Chandra Putta <
ashwinchand...@gmail.com> wrote:

> In a separate thread I mean.
>
> Regards,
> Ashwin.
>
> On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta <
> ashwinchand...@gmail.com> wrote:
>
> > + dev@apex.apache.org
> > - us...@apex.apache.org
> >
> > This is one of those best practices that we learn by experience during
> > operator development. It will save a lot of time during operator
> > development if we can catch and throw validation error when someone emits
> > tuples in a non separate thread.
> >
> > Regards,
> > Ashwin
> >
> > On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath 
> > wrote:
> >
> >> For cases where use of a different thread is needed, it can write tuples
> >> to a queue from where the operator thread pulls them --
> >> JdbcPollInputOperator in Malhar has an example.
> >>
> >> Ram
> >>
> >> On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com 
> >> wrote:
> >>
> >>> Hey Vlad,
> >>>
> >>> Thanks for bringing this up. Is there an easy way to detect unexpected
> >>> use of emit method without hurt the performance. Or at least if we can
> >>> detect this in debug mode.
> >>>
> >>> Regards,
> >>> Siyuan
> >>>
> >>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov 
> >>> wrote:
> >>>
>  The short answer is no, creating worker thread to emit tuples is not
>  supported by Apex and will lead to an undefined behavior. Operators
> in Apex
>  have strong thread affinity and all interaction with the platform must
>  happen on the operator thread.
> 
>  Vlad
> 
> >>>
> >>>
> >>
> >
> >
> > --
> >
> > Regards,
> > Ashwin.
> >
>
>
>
> --
>
> Regards,
> Ashwin.
>


Re: can operators emit on a different from the operator itself thread?

2016-08-10 Thread Ashwin Chandra Putta
In a separate thread I mean.

Regards,
Ashwin.

On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta <
ashwinchand...@gmail.com> wrote:

> + dev@apex.apache.org
> - us...@apex.apache.org
>
> This is one of those best practices that we learn by experience during
> operator development. It will save a lot of time during operator
> development if we can catch and throw validation error when someone emits
> tuples in a non separate thread.
>
> Regards,
> Ashwin
>
> On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath 
> wrote:
>
>> For cases where use of a different thread is needed, it can write tuples
>> to a queue from where the operator thread pulls them --
>> JdbcPollInputOperator in Malhar has an example.
>>
>> Ram
>>
>> On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com 
>> wrote:
>>
>>> Hey Vlad,
>>>
>>> Thanks for bringing this up. Is there an easy way to detect unexpected
>>> use of emit method without hurt the performance. Or at least if we can
>>> detect this in debug mode.
>>>
>>> Regards,
>>> Siyuan
>>>
>>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov 
>>> wrote:
>>>
 The short answer is no, creating worker thread to emit tuples is not
 supported by Apex and will lead to an undefined behavior. Operators in Apex
 have strong thread affinity and all interaction with the platform must
 happen on the operator thread.

 Vlad

>>>
>>>
>>
>
>
> --
>
> Regards,
> Ashwin.
>



-- 

Regards,
Ashwin.


Re: can operators emit on a different from the operator itself thread?

2016-08-10 Thread Ashwin Chandra Putta
+ dev@apex.apache.org
- us...@apex.apache.org

This is one of those best practices that we learn by experience during
operator development. It will save a lot of time during operator
development if we can catch and throw validation error when someone emits
tuples in a non separate thread.

Regards,
Ashwin

On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath 
wrote:

> For cases where use of a different thread is needed, it can write tuples
> to a queue from where the operator thread pulls them --
> JdbcPollInputOperator in Malhar has an example.
>
> Ram
>
> On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com 
> wrote:
>
>> Hey Vlad,
>>
>> Thanks for bringing this up. Is there an easy way to detect unexpected
>> use of emit method without hurt the performance. Or at least if we can
>> detect this in debug mode.
>>
>> Regards,
>> Siyuan
>>
>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov 
>> wrote:
>>
>>> The short answer is no, creating worker thread to emit tuples is not
>>> supported by Apex and will lead to an undefined behavior. Operators in Apex
>>> have strong thread affinity and all interaction with the platform must
>>> happen on the operator thread.
>>>
>>> Vlad
>>>
>>
>>
>


-- 

Regards,
Ashwin.