Re: can operators emit on a different from the operator itself thread?
Vlad, I agree that the check should be ON by default. Ability to turn it off for entire app is fine, per port is not needed. Thks Amol On Wed, Oct 12, 2016 at 10:34 PM, Tushar Gosaviwrote: > +1 for on by default and ability to turn it off for entire application. > > - Tushar. > > > On Thu, Oct 13, 2016 at 11:00 AM, Pradeep A. Dalvi > wrote: > > +1 for ON by default > > +1 for disabling it for all output ports > > > > With the kind of issues we have observed being faced by developers in the > > past, I strongly believe this check should be ON by default. > > However at the same time I feel, it shall be one-time check, mostly in > > Development phase and before going into Production. Having said that, if > > disabling it at application level i.e. for all operators and their > > respective output ports would it make implementation simpler, then that > can > > be targeted first. Thoughts? > > > > --prad > > > > On Thu, Oct 13, 2016 at 7:32 AM, Vlad Rozov > wrote: > > > >> I run jmh test and check takes 1ns on my MacBook Pro and on the lab > >> machine. This corresponds to 3% degradation at 30 million > events/second. I > >> think we can move forward with the check ON by default. Do we need an > >> ability to turn OFF check for a specific operator and/or port? My > thought > >> is that such ability is not necessary and it should be OK to disable > check > >> for all output ports in an application. > >> > >> Vlad > >> > >> > >> On 10/12/16 11:56, Amol Kekre wrote: > >> > >>> In case there turns out to be a penalty, we can introduce a "check for > >>> thread affinity" mode that triggers this check. My initial thought is > to > >>> make this check ON by default. We should wait till benchmarks are > >>> available > >>> before discussing adding this check. > >>> > >>> Thks > >>> Amol > >>> > >>> > >>> On Wed, Oct 12, 2016 at 11:07 AM, Sanjay Pujare < > san...@datatorrent.com> > >>> wrote: > >>> > >>> A JIRA has been created for adding this thread affinity check > https://issues.apache.org/jira/browse/APEXCORE-510 . I have made this > enhancement in a branch > https://github.com/sanjaypujare/apex-core/tree/malhar-510. > thread_affinity > and I have been benchmarking the performance with this change. I will > be > publishing the results in the above JIRA where we can discuss them and > hopefully agree on merging this change. > > On Thu, Aug 11, 2016 at 1:41 PM, Sanjay Pujare < > san...@datatorrent.com> > wrote: > > You are right, I was subconsciously thinking about the THREAD_LOCAL > case > > with a single container and a simple DAG and in that case Vlad’s > > > assumption > > > might not be valid but may be it is. > > > > On 8/11/16, 11:47 AM, "Munagala Ramanath" > wrote: > > > > If I understand Vlad correctly, what he is saying is that each > > > operator > > > saves currentThread in > > its own setup() and checks it in its own output methods. The > > threads > > > in > > > different operators are > > running potentially on different nodes and/or processes and > there > > > will > > > be > > no connection between them. > > > > Ram > > > > On Thu, Aug 11, 2016 at 11:41 AM, Sanjay Pujare < > > san...@datatorrent.com> > > wrote: > > > > > Name check is expensive, agreed, but there isn’t anything else > > currently. > > > Ideally the stram engine (considering that it is an engine > > > providing > > > > resources like threads etc) should use a ThreadFactory or a > > ThreadGroup to > > > create operator threads so identification and adding > > functionality > > > is > > > > easier. > > > > > > The idea of checking for the same thread between setup() and > > emit() > > won’t > > > work because the emit() check will have to be in the Sink > > hierarchy > > and > > > AFAIK a Sink object doesn’t have access to the corresponding > > operator, > > > right? Another more fundamental problem probably is that these > > threads > > > don’t have to match. The emit() for any operator (or rather a > > Sink > > related > > > to an operator) is ultimately triggered by an emitTuple() on > the > > topmost > > > input operator in that path which happens in that input > > operator’s > > thread > > > which doesn’t have to match the thread calling setup() in the > > downstream > > > operators, right? > > > > > > > > > On 8/11/16, 10:59 AM, "Vlad Rozov" > > > wrote: > > > > > > > Name verification is too
Re: can operators emit on a different from the operator itself thread?
+1 for on by default and ability to turn it off for entire application. - Tushar. On Thu, Oct 13, 2016 at 11:00 AM, Pradeep A. Dalviwrote: > +1 for ON by default > +1 for disabling it for all output ports > > With the kind of issues we have observed being faced by developers in the > past, I strongly believe this check should be ON by default. > However at the same time I feel, it shall be one-time check, mostly in > Development phase and before going into Production. Having said that, if > disabling it at application level i.e. for all operators and their > respective output ports would it make implementation simpler, then that can > be targeted first. Thoughts? > > --prad > > On Thu, Oct 13, 2016 at 7:32 AM, Vlad Rozov wrote: > >> I run jmh test and check takes 1ns on my MacBook Pro and on the lab >> machine. This corresponds to 3% degradation at 30 million events/second. I >> think we can move forward with the check ON by default. Do we need an >> ability to turn OFF check for a specific operator and/or port? My thought >> is that such ability is not necessary and it should be OK to disable check >> for all output ports in an application. >> >> Vlad >> >> >> On 10/12/16 11:56, Amol Kekre wrote: >> >>> In case there turns out to be a penalty, we can introduce a "check for >>> thread affinity" mode that triggers this check. My initial thought is to >>> make this check ON by default. We should wait till benchmarks are >>> available >>> before discussing adding this check. >>> >>> Thks >>> Amol >>> >>> >>> On Wed, Oct 12, 2016 at 11:07 AM, Sanjay Pujare >>> wrote: >>> >>> A JIRA has been created for adding this thread affinity check https://issues.apache.org/jira/browse/APEXCORE-510 . I have made this enhancement in a branch https://github.com/sanjaypujare/apex-core/tree/malhar-510. thread_affinity and I have been benchmarking the performance with this change. I will be publishing the results in the above JIRA where we can discuss them and hopefully agree on merging this change. On Thu, Aug 11, 2016 at 1:41 PM, Sanjay Pujare wrote: You are right, I was subconsciously thinking about the THREAD_LOCAL case > with a single container and a simple DAG and in that case Vlad’s > assumption > might not be valid but may be it is. > > On 8/11/16, 11:47 AM, "Munagala Ramanath" wrote: > > If I understand Vlad correctly, what he is saying is that each > operator > saves currentThread in > its own setup() and checks it in its own output methods. The > threads > in > different operators are > running potentially on different nodes and/or processes and there > will > be > no connection between them. > > Ram > > On Thu, Aug 11, 2016 at 11:41 AM, Sanjay Pujare < > san...@datatorrent.com> > wrote: > > > Name check is expensive, agreed, but there isn’t anything else > currently. > > Ideally the stram engine (considering that it is an engine > providing > > resources like threads etc) should use a ThreadFactory or a > ThreadGroup to > > create operator threads so identification and adding > functionality > is > > easier. > > > > The idea of checking for the same thread between setup() and > emit() > won’t > > work because the emit() check will have to be in the Sink > hierarchy > and > > AFAIK a Sink object doesn’t have access to the corresponding > operator, > > right? Another more fundamental problem probably is that these > threads > > don’t have to match. The emit() for any operator (or rather a > Sink > related > > to an operator) is ultimately triggered by an emitTuple() on the > topmost > > input operator in that path which happens in that input > operator’s > thread > > which doesn’t have to match the thread calling setup() in the > downstream > > operators, right? > > > > > > On 8/11/16, 10:59 AM, "Vlad Rozov" > wrote: > > > > Name verification is too expensive, it will be sufficient to > store > > currentThread during setup() and verify that it is the same > during > > emit. > > Checks should be supported not only for DefaultOutputPort, so > we > may > > have it implemented in various Sinks. > > > > Vlad > > > > On 8/11/16 10:21, Sanjay Pujare wrote: > > > Thinking more about this – all of the “operator” threads > are >
Re: can operators emit on a different from the operator itself thread?
+1 for ON by default +1 for disabling it for all output ports With the kind of issues we have observed being faced by developers in the past, I strongly believe this check should be ON by default. However at the same time I feel, it shall be one-time check, mostly in Development phase and before going into Production. Having said that, if disabling it at application level i.e. for all operators and their respective output ports would it make implementation simpler, then that can be targeted first. Thoughts? --prad On Thu, Oct 13, 2016 at 7:32 AM, Vlad Rozovwrote: > I run jmh test and check takes 1ns on my MacBook Pro and on the lab > machine. This corresponds to 3% degradation at 30 million events/second. I > think we can move forward with the check ON by default. Do we need an > ability to turn OFF check for a specific operator and/or port? My thought > is that such ability is not necessary and it should be OK to disable check > for all output ports in an application. > > Vlad > > > On 10/12/16 11:56, Amol Kekre wrote: > >> In case there turns out to be a penalty, we can introduce a "check for >> thread affinity" mode that triggers this check. My initial thought is to >> make this check ON by default. We should wait till benchmarks are >> available >> before discussing adding this check. >> >> Thks >> Amol >> >> >> On Wed, Oct 12, 2016 at 11:07 AM, Sanjay Pujare >> wrote: >> >> A JIRA has been created for adding this thread affinity check >>> https://issues.apache.org/jira/browse/APEXCORE-510 . I have made this >>> enhancement in a branch >>> https://github.com/sanjaypujare/apex-core/tree/malhar-510. >>> thread_affinity >>> and I have been benchmarking the performance with this change. I will be >>> publishing the results in the above JIRA where we can discuss them and >>> hopefully agree on merging this change. >>> >>> On Thu, Aug 11, 2016 at 1:41 PM, Sanjay Pujare >>> wrote: >>> >>> You are right, I was subconsciously thinking about the THREAD_LOCAL case with a single container and a simple DAG and in that case Vlad’s >>> assumption >>> might not be valid but may be it is. On 8/11/16, 11:47 AM, "Munagala Ramanath" wrote: If I understand Vlad correctly, what he is saying is that each >>> operator >>> saves currentThread in its own setup() and checks it in its own output methods. The threads >>> in >>> different operators are running potentially on different nodes and/or processes and there >>> will >>> be no connection between them. Ram On Thu, Aug 11, 2016 at 11:41 AM, Sanjay Pujare < san...@datatorrent.com> wrote: > Name check is expensive, agreed, but there isn’t anything else currently. > Ideally the stram engine (considering that it is an engine >>> providing >>> > resources like threads etc) should use a ThreadFactory or a ThreadGroup to > create operator threads so identification and adding functionality >>> is >>> > easier. > > The idea of checking for the same thread between setup() and emit() won’t > work because the emit() check will have to be in the Sink hierarchy and > AFAIK a Sink object doesn’t have access to the corresponding operator, > right? Another more fundamental problem probably is that these threads > don’t have to match. The emit() for any operator (or rather a Sink related > to an operator) is ultimately triggered by an emitTuple() on the topmost > input operator in that path which happens in that input operator’s thread > which doesn’t have to match the thread calling setup() in the downstream > operators, right? > > > On 8/11/16, 10:59 AM, "Vlad Rozov" >>> wrote: >>> > > Name verification is too expensive, it will be sufficient to store > currentThread during setup() and verify that it is the same during > emit. > Checks should be supported not only for DefaultOutputPort, so >>> we >>> may > have it implemented in various Sinks. > > Vlad > > On 8/11/16 10:21, Sanjay Pujare wrote: > > Thinking more about this – all of the “operator” threads are created > by the Stram engine with appropriate names. So we can put checks in the > DefaultOutputPort.emit() or in the various implementations of Sink.put() > that the current-thread is one created by the Stram engine (by verifying > the name). >
Re: can operators emit on a different from the operator itself thread?
I run jmh test and check takes 1ns on my MacBook Pro and on the lab machine. This corresponds to 3% degradation at 30 million events/second. I think we can move forward with the check ON by default. Do we need an ability to turn OFF check for a specific operator and/or port? My thought is that such ability is not necessary and it should be OK to disable check for all output ports in an application. Vlad On 10/12/16 11:56, Amol Kekre wrote: In case there turns out to be a penalty, we can introduce a "check for thread affinity" mode that triggers this check. My initial thought is to make this check ON by default. We should wait till benchmarks are available before discussing adding this check. Thks Amol On Wed, Oct 12, 2016 at 11:07 AM, Sanjay Pujarewrote: A JIRA has been created for adding this thread affinity check https://issues.apache.org/jira/browse/APEXCORE-510 . I have made this enhancement in a branch https://github.com/sanjaypujare/apex-core/tree/malhar-510.thread_affinity and I have been benchmarking the performance with this change. I will be publishing the results in the above JIRA where we can discuss them and hopefully agree on merging this change. On Thu, Aug 11, 2016 at 1:41 PM, Sanjay Pujare wrote: You are right, I was subconsciously thinking about the THREAD_LOCAL case with a single container and a simple DAG and in that case Vlad’s assumption might not be valid but may be it is. On 8/11/16, 11:47 AM, "Munagala Ramanath" wrote: If I understand Vlad correctly, what he is saying is that each operator saves currentThread in its own setup() and checks it in its own output methods. The threads in different operators are running potentially on different nodes and/or processes and there will be no connection between them. Ram On Thu, Aug 11, 2016 at 11:41 AM, Sanjay Pujare < san...@datatorrent.com> wrote: > Name check is expensive, agreed, but there isn’t anything else currently. > Ideally the stram engine (considering that it is an engine providing > resources like threads etc) should use a ThreadFactory or a ThreadGroup to > create operator threads so identification and adding functionality is > easier. > > The idea of checking for the same thread between setup() and emit() won’t > work because the emit() check will have to be in the Sink hierarchy and > AFAIK a Sink object doesn’t have access to the corresponding operator, > right? Another more fundamental problem probably is that these threads > don’t have to match. The emit() for any operator (or rather a Sink related > to an operator) is ultimately triggered by an emitTuple() on the topmost > input operator in that path which happens in that input operator’s thread > which doesn’t have to match the thread calling setup() in the downstream > operators, right? > > > On 8/11/16, 10:59 AM, "Vlad Rozov" wrote: > > Name verification is too expensive, it will be sufficient to store > currentThread during setup() and verify that it is the same during > emit. > Checks should be supported not only for DefaultOutputPort, so we may > have it implemented in various Sinks. > > Vlad > > On 8/11/16 10:21, Sanjay Pujare wrote: > > Thinking more about this – all of the “operator” threads are created > by the Stram engine with appropriate names. So we can put checks in the > DefaultOutputPort.emit() or in the various implementations of Sink.put() > that the current-thread is one created by the Stram engine (by verifying > the name). > > > > We can even use a special Thread object for operator threads so the > above detection is easier. > > > > > > > > On 8/10/16, 6:11 PM, "Amol Kekre" wrote: > > > > +1 on debug proposal. Even if tuples lands up within the > window, it breaks > > all guarantees. A rerun (after restart from a checkpoint) can > have tuples > > in different windows from this thread. A separate thread simply > exposes > > users to unwarranted risks. > > > > Thks > > Amol > > > > > > On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov < > v.ro...@datatorrent.com> wrote: > > > > > Tuples emitted between end and begin windows is only one of > possible > > > behaviors that emitting tuples on a separate from the > operator thread may > > > introduce. It will be good to have both checks in place at > run-time and if > > > checking for the operator thread for every emitted tuple
Re: can operators emit on a different from the operator itself thread?
In case there turns out to be a penalty, we can introduce a "check for thread affinity" mode that triggers this check. My initial thought is to make this check ON by default. We should wait till benchmarks are available before discussing adding this check. Thks Amol On Wed, Oct 12, 2016 at 11:07 AM, Sanjay Pujarewrote: > A JIRA has been created for adding this thread affinity check > https://issues.apache.org/jira/browse/APEXCORE-510 . I have made this > enhancement in a branch > https://github.com/sanjaypujare/apex-core/tree/malhar-510.thread_affinity > and I have been benchmarking the performance with this change. I will be > publishing the results in the above JIRA where we can discuss them and > hopefully agree on merging this change. > > On Thu, Aug 11, 2016 at 1:41 PM, Sanjay Pujare > wrote: > > > You are right, I was subconsciously thinking about the THREAD_LOCAL case > > with a single container and a simple DAG and in that case Vlad’s > assumption > > might not be valid but may be it is. > > > > On 8/11/16, 11:47 AM, "Munagala Ramanath" wrote: > > > > If I understand Vlad correctly, what he is saying is that each > operator > > saves currentThread in > > its own setup() and checks it in its own output methods. The threads > in > > different operators are > > running potentially on different nodes and/or processes and there > will > > be > > no connection between them. > > > > Ram > > > > On Thu, Aug 11, 2016 at 11:41 AM, Sanjay Pujare < > > san...@datatorrent.com> > > wrote: > > > > > Name check is expensive, agreed, but there isn’t anything else > > currently. > > > Ideally the stram engine (considering that it is an engine > providing > > > resources like threads etc) should use a ThreadFactory or a > > ThreadGroup to > > > create operator threads so identification and adding functionality > is > > > easier. > > > > > > The idea of checking for the same thread between setup() and emit() > > won’t > > > work because the emit() check will have to be in the Sink hierarchy > > and > > > AFAIK a Sink object doesn’t have access to the corresponding > > operator, > > > right? Another more fundamental problem probably is that these > > threads > > > don’t have to match. The emit() for any operator (or rather a Sink > > related > > > to an operator) is ultimately triggered by an emitTuple() on the > > topmost > > > input operator in that path which happens in that input operator’s > > thread > > > which doesn’t have to match the thread calling setup() in the > > downstream > > > operators, right? > > > > > > > > > On 8/11/16, 10:59 AM, "Vlad Rozov" > wrote: > > > > > > Name verification is too expensive, it will be sufficient to > > store > > > currentThread during setup() and verify that it is the same > > during > > > emit. > > > Checks should be supported not only for DefaultOutputPort, so > we > > may > > > have it implemented in various Sinks. > > > > > > Vlad > > > > > > On 8/11/16 10:21, Sanjay Pujare wrote: > > > > Thinking more about this – all of the “operator” threads are > > created > > > by the Stram engine with appropriate names. So we can put checks in > > the > > > DefaultOutputPort.emit() or in the various implementations of > > Sink.put() > > > that the current-thread is one created by the Stram engine (by > > verifying > > > the name). > > > > > > > > We can even use a special Thread object for operator threads > > so the > > > above detection is easier. > > > > > > > > > > > > > > > > On 8/10/16, 6:11 PM, "Amol Kekre" > > wrote: > > > > > > > > +1 on debug proposal. Even if tuples lands up within the > > > window, it breaks > > > > all guarantees. A rerun (after restart from a > checkpoint) > > can > > > have tuples > > > > in different windows from this thread. A separate thread > > simply > > > exposes > > > > users to unwarranted risks. > > > > > > > > Thks > > > > Amol > > > > > > > > > > > > On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov < > > > v.ro...@datatorrent.com> wrote: > > > > > > > > > Tuples emitted between end and begin windows is only > > one of > > > possible > > > > > behaviors that emitting tuples on a separate from the > > > operator thread may > > > > > introduce. It will be good to have both checks in > place > > at > > > run-time and if > > > > > checking for the operator thread for every emitted > > tuple is > > > too expensive, > > > > > we may have it enabled only in
Re: can operators emit on a different from the operator itself thread?
A JIRA has been created for adding this thread affinity check https://issues.apache.org/jira/browse/APEXCORE-510 . I have made this enhancement in a branch https://github.com/sanjaypujare/apex-core/tree/malhar-510.thread_affinity and I have been benchmarking the performance with this change. I will be publishing the results in the above JIRA where we can discuss them and hopefully agree on merging this change. On Thu, Aug 11, 2016 at 1:41 PM, Sanjay Pujarewrote: > You are right, I was subconsciously thinking about the THREAD_LOCAL case > with a single container and a simple DAG and in that case Vlad’s assumption > might not be valid but may be it is. > > On 8/11/16, 11:47 AM, "Munagala Ramanath" wrote: > > If I understand Vlad correctly, what he is saying is that each operator > saves currentThread in > its own setup() and checks it in its own output methods. The threads in > different operators are > running potentially on different nodes and/or processes and there will > be > no connection between them. > > Ram > > On Thu, Aug 11, 2016 at 11:41 AM, Sanjay Pujare < > san...@datatorrent.com> > wrote: > > > Name check is expensive, agreed, but there isn’t anything else > currently. > > Ideally the stram engine (considering that it is an engine providing > > resources like threads etc) should use a ThreadFactory or a > ThreadGroup to > > create operator threads so identification and adding functionality is > > easier. > > > > The idea of checking for the same thread between setup() and emit() > won’t > > work because the emit() check will have to be in the Sink hierarchy > and > > AFAIK a Sink object doesn’t have access to the corresponding > operator, > > right? Another more fundamental problem probably is that these > threads > > don’t have to match. The emit() for any operator (or rather a Sink > related > > to an operator) is ultimately triggered by an emitTuple() on the > topmost > > input operator in that path which happens in that input operator’s > thread > > which doesn’t have to match the thread calling setup() in the > downstream > > operators, right? > > > > > > On 8/11/16, 10:59 AM, "Vlad Rozov" wrote: > > > > Name verification is too expensive, it will be sufficient to > store > > currentThread during setup() and verify that it is the same > during > > emit. > > Checks should be supported not only for DefaultOutputPort, so we > may > > have it implemented in various Sinks. > > > > Vlad > > > > On 8/11/16 10:21, Sanjay Pujare wrote: > > > Thinking more about this – all of the “operator” threads are > created > > by the Stram engine with appropriate names. So we can put checks in > the > > DefaultOutputPort.emit() or in the various implementations of > Sink.put() > > that the current-thread is one created by the Stram engine (by > verifying > > the name). > > > > > > We can even use a special Thread object for operator threads > so the > > above detection is easier. > > > > > > > > > > > > On 8/10/16, 6:11 PM, "Amol Kekre" > wrote: > > > > > > +1 on debug proposal. Even if tuples lands up within the > > window, it breaks > > > all guarantees. A rerun (after restart from a checkpoint) > can > > have tuples > > > in different windows from this thread. A separate thread > simply > > exposes > > > users to unwarranted risks. > > > > > > Thks > > > Amol > > > > > > > > > On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov < > > v.ro...@datatorrent.com> wrote: > > > > > > > Tuples emitted between end and begin windows is only > one of > > possible > > > > behaviors that emitting tuples on a separate from the > > operator thread may > > > > introduce. It will be good to have both checks in place > at > > run-time and if > > > > checking for the operator thread for every emitted > tuple is > > too expensive, > > > > we may have it enabled only in DEBUG or mode with more > checks > > in place. > > > > > > > > Vlad > > > > > > > > > > > > Sanjay just reminded me of my typo -> I meant between > > end_window and > > > >> start_window :) > > > >> > > > >> Thks > > > >> Amol > > > >> > > > >> On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujare < > > san...@datatorrent.com> > > > >> wrote: > > > >> > > > >> If the goal is to do this validation through static >
Re: can operators emit on a different from the operator itself thread?
You are right, I was subconsciously thinking about the THREAD_LOCAL case with a single container and a simple DAG and in that case Vlad’s assumption might not be valid but may be it is. On 8/11/16, 11:47 AM, "Munagala Ramanath"wrote: If I understand Vlad correctly, what he is saying is that each operator saves currentThread in its own setup() and checks it in its own output methods. The threads in different operators are running potentially on different nodes and/or processes and there will be no connection between them. Ram On Thu, Aug 11, 2016 at 11:41 AM, Sanjay Pujare wrote: > Name check is expensive, agreed, but there isn’t anything else currently. > Ideally the stram engine (considering that it is an engine providing > resources like threads etc) should use a ThreadFactory or a ThreadGroup to > create operator threads so identification and adding functionality is > easier. > > The idea of checking for the same thread between setup() and emit() won’t > work because the emit() check will have to be in the Sink hierarchy and > AFAIK a Sink object doesn’t have access to the corresponding operator, > right? Another more fundamental problem probably is that these threads > don’t have to match. The emit() for any operator (or rather a Sink related > to an operator) is ultimately triggered by an emitTuple() on the topmost > input operator in that path which happens in that input operator’s thread > which doesn’t have to match the thread calling setup() in the downstream > operators, right? > > > On 8/11/16, 10:59 AM, "Vlad Rozov" wrote: > > Name verification is too expensive, it will be sufficient to store > currentThread during setup() and verify that it is the same during > emit. > Checks should be supported not only for DefaultOutputPort, so we may > have it implemented in various Sinks. > > Vlad > > On 8/11/16 10:21, Sanjay Pujare wrote: > > Thinking more about this – all of the “operator” threads are created > by the Stram engine with appropriate names. So we can put checks in the > DefaultOutputPort.emit() or in the various implementations of Sink.put() > that the current-thread is one created by the Stram engine (by verifying > the name). > > > > We can even use a special Thread object for operator threads so the > above detection is easier. > > > > > > > > On 8/10/16, 6:11 PM, "Amol Kekre" wrote: > > > > +1 on debug proposal. Even if tuples lands up within the > window, it breaks > > all guarantees. A rerun (after restart from a checkpoint) can > have tuples > > in different windows from this thread. A separate thread simply > exposes > > users to unwarranted risks. > > > > Thks > > Amol > > > > > > On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov < > v.ro...@datatorrent.com> wrote: > > > > > Tuples emitted between end and begin windows is only one of > possible > > > behaviors that emitting tuples on a separate from the > operator thread may > > > introduce. It will be good to have both checks in place at > run-time and if > > > checking for the operator thread for every emitted tuple is > too expensive, > > > we may have it enabled only in DEBUG or mode with more checks > in place. > > > > > > Vlad > > > > > > > > > Sanjay just reminded me of my typo -> I meant between > end_window and > > >> start_window :) > > >> > > >> Thks > > >> Amol > > >> > > >> On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujare < > san...@datatorrent.com> > > >> wrote: > > >> > > >> If the goal is to do this validation through static analysis > of operator > > >>> code, I guess it is possible but is going to be > non-trivial. And there > > >>> could be false positives and false negatives. > > >>> > > >>> Also I suppose this discussion applies to processor > operators (those > > >>> having both in and out ports) so Ram’s example of > JdbcPollInputOperator > > >>> may > > >>> not be applicable here? > > >>> > > >>> On 8/10/16, 2:04 PM, "Ashwin Chandra Putta" < > ashwinchand...@gmail.com> > > >>> wrote: > > >>> > > >>> In a separate thread I mean. > > >>> > >
Re: can operators emit on a different from the operator itself thread?
Correct, except that it is Sink not an Operator that will need to save current thread during setup(). Sink does not need access to an Operator, it is sufficient to rely on the platform to call setup() method on the Operator thread. Vlad On 8/11/16 11:47, Munagala Ramanath wrote: If I understand Vlad correctly, what he is saying is that each operator saves currentThread in its own setup() and checks it in its own output methods. The threads in different operators are running potentially on different nodes and/or processes and there will be no connection between them. Ram On Thu, Aug 11, 2016 at 11:41 AM, Sanjay Pujarewrote: Name check is expensive, agreed, but there isn’t anything else currently. Ideally the stram engine (considering that it is an engine providing resources like threads etc) should use a ThreadFactory or a ThreadGroup to create operator threads so identification and adding functionality is easier. The idea of checking for the same thread between setup() and emit() won’t work because the emit() check will have to be in the Sink hierarchy and AFAIK a Sink object doesn’t have access to the corresponding operator, right? Another more fundamental problem probably is that these threads don’t have to match. The emit() for any operator (or rather a Sink related to an operator) is ultimately triggered by an emitTuple() on the topmost input operator in that path which happens in that input operator’s thread which doesn’t have to match the thread calling setup() in the downstream operators, right? On 8/11/16, 10:59 AM, "Vlad Rozov" wrote: Name verification is too expensive, it will be sufficient to store currentThread during setup() and verify that it is the same during emit. Checks should be supported not only for DefaultOutputPort, so we may have it implemented in various Sinks. Vlad On 8/11/16 10:21, Sanjay Pujare wrote: > Thinking more about this – all of the “operator” threads are created by the Stram engine with appropriate names. So we can put checks in the DefaultOutputPort.emit() or in the various implementations of Sink.put() that the current-thread is one created by the Stram engine (by verifying the name). > > We can even use a special Thread object for operator threads so the above detection is easier. > > > > On 8/10/16, 6:11 PM, "Amol Kekre" wrote: > > +1 on debug proposal. Even if tuples lands up within the window, it breaks > all guarantees. A rerun (after restart from a checkpoint) can have tuples > in different windows from this thread. A separate thread simply exposes > users to unwarranted risks. > > Thks > Amol > > > On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov < v.ro...@datatorrent.com> wrote: > > > Tuples emitted between end and begin windows is only one of possible > > behaviors that emitting tuples on a separate from the operator thread may > > introduce. It will be good to have both checks in place at run-time and if > > checking for the operator thread for every emitted tuple is too expensive, > > we may have it enabled only in DEBUG or mode with more checks in place. > > > > Vlad > > > > > > Sanjay just reminded me of my typo -> I meant between end_window and > >> start_window :) > >> > >> Thks > >> Amol > >> > >> On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujare < san...@datatorrent.com> > >> wrote: > >> > >> If the goal is to do this validation through static analysis of operator > >>> code, I guess it is possible but is going to be non-trivial. And there > >>> could be false positives and false negatives. > >>> > >>> Also I suppose this discussion applies to processor operators (those > >>> having both in and out ports) so Ram’s example of JdbcPollInputOperator > >>> may > >>> not be applicable here? > >>> > >>> On 8/10/16, 2:04 PM, "Ashwin Chandra Putta" < ashwinchand...@gmail.com> > >>> wrote: > >>> > >>> In a separate thread I mean. > >>> > >>> Regards, > >>> Ashwin. > >>> > >>> On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta < > >>> ashwinchand...@gmail.com> wrote: > >>> > >>> > + dev@apex.apache.org > >>> > - us...@apex.apache.org > >>> > > >>> > This is one of those best practices that we learn by experience > >>> during > >>> > operator development. It will save a lot of time during operator > >>> > development if we
Re: can operators emit on a different from the operator itself thread?
If I understand Vlad correctly, what he is saying is that each operator saves currentThread in its own setup() and checks it in its own output methods. The threads in different operators are running potentially on different nodes and/or processes and there will be no connection between them. Ram On Thu, Aug 11, 2016 at 11:41 AM, Sanjay Pujarewrote: > Name check is expensive, agreed, but there isn’t anything else currently. > Ideally the stram engine (considering that it is an engine providing > resources like threads etc) should use a ThreadFactory or a ThreadGroup to > create operator threads so identification and adding functionality is > easier. > > The idea of checking for the same thread between setup() and emit() won’t > work because the emit() check will have to be in the Sink hierarchy and > AFAIK a Sink object doesn’t have access to the corresponding operator, > right? Another more fundamental problem probably is that these threads > don’t have to match. The emit() for any operator (or rather a Sink related > to an operator) is ultimately triggered by an emitTuple() on the topmost > input operator in that path which happens in that input operator’s thread > which doesn’t have to match the thread calling setup() in the downstream > operators, right? > > > On 8/11/16, 10:59 AM, "Vlad Rozov" wrote: > > Name verification is too expensive, it will be sufficient to store > currentThread during setup() and verify that it is the same during > emit. > Checks should be supported not only for DefaultOutputPort, so we may > have it implemented in various Sinks. > > Vlad > > On 8/11/16 10:21, Sanjay Pujare wrote: > > Thinking more about this – all of the “operator” threads are created > by the Stram engine with appropriate names. So we can put checks in the > DefaultOutputPort.emit() or in the various implementations of Sink.put() > that the current-thread is one created by the Stram engine (by verifying > the name). > > > > We can even use a special Thread object for operator threads so the > above detection is easier. > > > > > > > > On 8/10/16, 6:11 PM, "Amol Kekre" wrote: > > > > +1 on debug proposal. Even if tuples lands up within the > window, it breaks > > all guarantees. A rerun (after restart from a checkpoint) can > have tuples > > in different windows from this thread. A separate thread simply > exposes > > users to unwarranted risks. > > > > Thks > > Amol > > > > > > On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov < > v.ro...@datatorrent.com> wrote: > > > > > Tuples emitted between end and begin windows is only one of > possible > > > behaviors that emitting tuples on a separate from the > operator thread may > > > introduce. It will be good to have both checks in place at > run-time and if > > > checking for the operator thread for every emitted tuple is > too expensive, > > > we may have it enabled only in DEBUG or mode with more checks > in place. > > > > > > Vlad > > > > > > > > > Sanjay just reminded me of my typo -> I meant between > end_window and > > >> start_window :) > > >> > > >> Thks > > >> Amol > > >> > > >> On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujare < > san...@datatorrent.com> > > >> wrote: > > >> > > >> If the goal is to do this validation through static analysis > of operator > > >>> code, I guess it is possible but is going to be > non-trivial. And there > > >>> could be false positives and false negatives. > > >>> > > >>> Also I suppose this discussion applies to processor > operators (those > > >>> having both in and out ports) so Ram’s example of > JdbcPollInputOperator > > >>> may > > >>> not be applicable here? > > >>> > > >>> On 8/10/16, 2:04 PM, "Ashwin Chandra Putta" < > ashwinchand...@gmail.com> > > >>> wrote: > > >>> > > >>> In a separate thread I mean. > > >>> > > >>> Regards, > > >>> Ashwin. > > >>> > > >>> On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta < > > >>> ashwinchand...@gmail.com> wrote: > > >>> > > >>> > + dev@apex.apache.org > > >>> > - us...@apex.apache.org > > >>> > > > >>> > This is one of those best practices that we learn by > experience > > >>> during > > >>> > operator development. It will save a lot of time > during operator > > >>> > development if we can catch and throw validation > error when > > >>> someone > > >>> emits > > >>> > tuples in a non
Re: can operators emit on a different from the operator itself thread?
Name check is expensive, agreed, but there isn’t anything else currently. Ideally the stram engine (considering that it is an engine providing resources like threads etc) should use a ThreadFactory or a ThreadGroup to create operator threads so identification and adding functionality is easier. The idea of checking for the same thread between setup() and emit() won’t work because the emit() check will have to be in the Sink hierarchy and AFAIK a Sink object doesn’t have access to the corresponding operator, right? Another more fundamental problem probably is that these threads don’t have to match. The emit() for any operator (or rather a Sink related to an operator) is ultimately triggered by an emitTuple() on the topmost input operator in that path which happens in that input operator’s thread which doesn’t have to match the thread calling setup() in the downstream operators, right? On 8/11/16, 10:59 AM, "Vlad Rozov"wrote: Name verification is too expensive, it will be sufficient to store currentThread during setup() and verify that it is the same during emit. Checks should be supported not only for DefaultOutputPort, so we may have it implemented in various Sinks. Vlad On 8/11/16 10:21, Sanjay Pujare wrote: > Thinking more about this – all of the “operator” threads are created by the Stram engine with appropriate names. So we can put checks in the DefaultOutputPort.emit() or in the various implementations of Sink.put() that the current-thread is one created by the Stram engine (by verifying the name). > > We can even use a special Thread object for operator threads so the above detection is easier. > > > > On 8/10/16, 6:11 PM, "Amol Kekre" wrote: > > +1 on debug proposal. Even if tuples lands up within the window, it breaks > all guarantees. A rerun (after restart from a checkpoint) can have tuples > in different windows from this thread. A separate thread simply exposes > users to unwarranted risks. > > Thks > Amol > > > On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov wrote: > > > Tuples emitted between end and begin windows is only one of possible > > behaviors that emitting tuples on a separate from the operator thread may > > introduce. It will be good to have both checks in place at run-time and if > > checking for the operator thread for every emitted tuple is too expensive, > > we may have it enabled only in DEBUG or mode with more checks in place. > > > > Vlad > > > > > > Sanjay just reminded me of my typo -> I meant between end_window and > >> start_window :) > >> > >> Thks > >> Amol > >> > >> On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujare > >> wrote: > >> > >> If the goal is to do this validation through static analysis of operator > >>> code, I guess it is possible but is going to be non-trivial. And there > >>> could be false positives and false negatives. > >>> > >>> Also I suppose this discussion applies to processor operators (those > >>> having both in and out ports) so Ram’s example of JdbcPollInputOperator > >>> may > >>> not be applicable here? > >>> > >>> On 8/10/16, 2:04 PM, "Ashwin Chandra Putta" > >>> wrote: > >>> > >>> In a separate thread I mean. > >>> > >>> Regards, > >>> Ashwin. > >>> > >>> On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta < > >>> ashwinchand...@gmail.com> wrote: > >>> > >>> > + dev@apex.apache.org > >>> > - us...@apex.apache.org > >>> > > >>> > This is one of those best practices that we learn by experience > >>> during > >>> > operator development. It will save a lot of time during operator > >>> > development if we can catch and throw validation error when > >>> someone > >>> emits > >>> > tuples in a non separate thread. > >>> > > >>> > Regards, > >>> > Ashwin > >>> > > >>> > On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath < > >>> r...@datatorrent.com> > >>> > wrote: > >>> > > >>> >> For cases where use of a different thread is needed, it can write > >>> tuples > >>> >> to a queue from where the operator thread pulls them -- > >>> >> JdbcPollInputOperator in Malhar has an example.
Re: can operators emit on a different from the operator itself thread?
Name verification is too expensive, it will be sufficient to store currentThread during setup() and verify that it is the same during emit. Checks should be supported not only for DefaultOutputPort, so we may have it implemented in various Sinks. Vlad On 8/11/16 10:21, Sanjay Pujare wrote: Thinking more about this – all of the “operator” threads are created by the Stram engine with appropriate names. So we can put checks in the DefaultOutputPort.emit() or in the various implementations of Sink.put() that the current-thread is one created by the Stram engine (by verifying the name). We can even use a special Thread object for operator threads so the above detection is easier. On 8/10/16, 6:11 PM, "Amol Kekre"wrote: +1 on debug proposal. Even if tuples lands up within the window, it breaks all guarantees. A rerun (after restart from a checkpoint) can have tuples in different windows from this thread. A separate thread simply exposes users to unwarranted risks. Thks Amol On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov wrote: > Tuples emitted between end and begin windows is only one of possible > behaviors that emitting tuples on a separate from the operator thread may > introduce. It will be good to have both checks in place at run-time and if > checking for the operator thread for every emitted tuple is too expensive, > we may have it enabled only in DEBUG or mode with more checks in place. > > Vlad > > > Sanjay just reminded me of my typo -> I meant between end_window and >> start_window :) >> >> Thks >> Amol >> >> On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujare >> wrote: >> >> If the goal is to do this validation through static analysis of operator >>> code, I guess it is possible but is going to be non-trivial. And there >>> could be false positives and false negatives. >>> >>> Also I suppose this discussion applies to processor operators (those >>> having both in and out ports) so Ram’s example of JdbcPollInputOperator >>> may >>> not be applicable here? >>> >>> On 8/10/16, 2:04 PM, "Ashwin Chandra Putta" >>> wrote: >>> >>> In a separate thread I mean. >>> >>> Regards, >>> Ashwin. >>> >>> On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta < >>> ashwinchand...@gmail.com> wrote: >>> >>> > + dev@apex.apache.org >>> > - us...@apex.apache.org >>> > >>> > This is one of those best practices that we learn by experience >>> during >>> > operator development. It will save a lot of time during operator >>> > development if we can catch and throw validation error when >>> someone >>> emits >>> > tuples in a non separate thread. >>> > >>> > Regards, >>> > Ashwin >>> > >>> > On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath < >>> r...@datatorrent.com> >>> > wrote: >>> > >>> >> For cases where use of a different thread is needed, it can write >>> tuples >>> >> to a queue from where the operator thread pulls them -- >>> >> JdbcPollInputOperator in Malhar has an example. >>> >> >>> >> Ram >>> >> >>> >> On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com < >>> hsy...@gmail.com >>> >> wrote: >>> >> >>> >>> Hey Vlad, >>> >>> >>> >>> Thanks for bringing this up. Is there an easy way to detect >>> unexpected >>> >>> use of emit method without hurt the performance. Or at least if >>> we >>> can >>> >>> detect this in debug mode. >>> >>> >>> >>> Regards, >>> >>> Siyuan >>> >>> >>> >>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov < >>> v.ro...@datatorrent.com> >>> >>> wrote: >>> >>> >>> The short answer is no, creating worker thread to emit tuples >>> is >>> not >>> supported by Apex and will lead to an undefined behavior. >>> Operators in Apex >>> have strong thread affinity and all interaction with the >>> platform >>> must >>> happen on the operator thread. >>> >>> Vlad >>> >>> >>> >>> >>> >>> >> >>> > >>> > >>> > -- >>> > >>> > Regards, >>> > Ashwin. >>> > >>> >>> >>> >>> -- >>> >>> Regards, >>> Ashwin. >>> >>> >>> >>> >>> >
Re: can operators emit on a different from the operator itself thread?
Thinking more about this – all of the “operator” threads are created by the Stram engine with appropriate names. So we can put checks in the DefaultOutputPort.emit() or in the various implementations of Sink.put() that the current-thread is one created by the Stram engine (by verifying the name). We can even use a special Thread object for operator threads so the above detection is easier. On 8/10/16, 6:11 PM, "Amol Kekre"wrote: +1 on debug proposal. Even if tuples lands up within the window, it breaks all guarantees. A rerun (after restart from a checkpoint) can have tuples in different windows from this thread. A separate thread simply exposes users to unwarranted risks. Thks Amol On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozov wrote: > Tuples emitted between end and begin windows is only one of possible > behaviors that emitting tuples on a separate from the operator thread may > introduce. It will be good to have both checks in place at run-time and if > checking for the operator thread for every emitted tuple is too expensive, > we may have it enabled only in DEBUG or mode with more checks in place. > > Vlad > > > Sanjay just reminded me of my typo -> I meant between end_window and >> start_window :) >> >> Thks >> Amol >> >> On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujare >> wrote: >> >> If the goal is to do this validation through static analysis of operator >>> code, I guess it is possible but is going to be non-trivial. And there >>> could be false positives and false negatives. >>> >>> Also I suppose this discussion applies to processor operators (those >>> having both in and out ports) so Ram’s example of JdbcPollInputOperator >>> may >>> not be applicable here? >>> >>> On 8/10/16, 2:04 PM, "Ashwin Chandra Putta" >>> wrote: >>> >>> In a separate thread I mean. >>> >>> Regards, >>> Ashwin. >>> >>> On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta < >>> ashwinchand...@gmail.com> wrote: >>> >>> > + dev@apex.apache.org >>> > - us...@apex.apache.org >>> > >>> > This is one of those best practices that we learn by experience >>> during >>> > operator development. It will save a lot of time during operator >>> > development if we can catch and throw validation error when >>> someone >>> emits >>> > tuples in a non separate thread. >>> > >>> > Regards, >>> > Ashwin >>> > >>> > On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath < >>> r...@datatorrent.com> >>> > wrote: >>> > >>> >> For cases where use of a different thread is needed, it can write >>> tuples >>> >> to a queue from where the operator thread pulls them -- >>> >> JdbcPollInputOperator in Malhar has an example. >>> >> >>> >> Ram >>> >> >>> >> On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com < >>> hsy...@gmail.com >>> >> wrote: >>> >> >>> >>> Hey Vlad, >>> >>> >>> >>> Thanks for bringing this up. Is there an easy way to detect >>> unexpected >>> >>> use of emit method without hurt the performance. Or at least if >>> we >>> can >>> >>> detect this in debug mode. >>> >>> >>> >>> Regards, >>> >>> Siyuan >>> >>> >>> >>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov < >>> v.ro...@datatorrent.com> >>> >>> wrote: >>> >>> >>> The short answer is no, creating worker thread to emit tuples >>> is >>> not >>> supported by Apex and will lead to an undefined behavior. >>> Operators in Apex >>> have strong thread affinity and all interaction with the >>> platform >>> must >>> happen on the operator thread. >>> >>> Vlad >>> >>> >>> >>> >>> >>> >> >>> > >>> > >>> > -- >>> > >>> > Regards, >>> > Ashwin. >>> > >>> >>> >>> >>> -- >>> >>> Regards, >>> Ashwin. >>> >>> >>> >>> >>> >
Re: can operators emit on a different from the operator itself thread?
+1 on debug proposal. Even if tuples lands up within the window, it breaks all guarantees. A rerun (after restart from a checkpoint) can have tuples in different windows from this thread. A separate thread simply exposes users to unwarranted risks. Thks Amol On Wed, Aug 10, 2016 at 6:05 PM, Vlad Rozovwrote: > Tuples emitted between end and begin windows is only one of possible > behaviors that emitting tuples on a separate from the operator thread may > introduce. It will be good to have both checks in place at run-time and if > checking for the operator thread for every emitted tuple is too expensive, > we may have it enabled only in DEBUG or mode with more checks in place. > > Vlad > > > Sanjay just reminded me of my typo -> I meant between end_window and >> start_window :) >> >> Thks >> Amol >> >> On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujare >> wrote: >> >> If the goal is to do this validation through static analysis of operator >>> code, I guess it is possible but is going to be non-trivial. And there >>> could be false positives and false negatives. >>> >>> Also I suppose this discussion applies to processor operators (those >>> having both in and out ports) so Ram’s example of JdbcPollInputOperator >>> may >>> not be applicable here? >>> >>> On 8/10/16, 2:04 PM, "Ashwin Chandra Putta" >>> wrote: >>> >>> In a separate thread I mean. >>> >>> Regards, >>> Ashwin. >>> >>> On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta < >>> ashwinchand...@gmail.com> wrote: >>> >>> > + dev@apex.apache.org >>> > - us...@apex.apache.org >>> > >>> > This is one of those best practices that we learn by experience >>> during >>> > operator development. It will save a lot of time during operator >>> > development if we can catch and throw validation error when >>> someone >>> emits >>> > tuples in a non separate thread. >>> > >>> > Regards, >>> > Ashwin >>> > >>> > On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath < >>> r...@datatorrent.com> >>> > wrote: >>> > >>> >> For cases where use of a different thread is needed, it can write >>> tuples >>> >> to a queue from where the operator thread pulls them -- >>> >> JdbcPollInputOperator in Malhar has an example. >>> >> >>> >> Ram >>> >> >>> >> On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com < >>> hsy...@gmail.com >>> >> wrote: >>> >> >>> >>> Hey Vlad, >>> >>> >>> >>> Thanks for bringing this up. Is there an easy way to detect >>> unexpected >>> >>> use of emit method without hurt the performance. Or at least if >>> we >>> can >>> >>> detect this in debug mode. >>> >>> >>> >>> Regards, >>> >>> Siyuan >>> >>> >>> >>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov < >>> v.ro...@datatorrent.com> >>> >>> wrote: >>> >>> >>> The short answer is no, creating worker thread to emit tuples >>> is >>> not >>> supported by Apex and will lead to an undefined behavior. >>> Operators in Apex >>> have strong thread affinity and all interaction with the >>> platform >>> must >>> happen on the operator thread. >>> >>> Vlad >>> >>> >>> >>> >>> >>> >> >>> > >>> > >>> > -- >>> > >>> > Regards, >>> > Ashwin. >>> > >>> >>> >>> >>> -- >>> >>> Regards, >>> Ashwin. >>> >>> >>> >>> >>> >
Re: can operators emit on a different from the operator itself thread?
If the goal is to do this validation through static analysis of operator code, I guess it is possible but is going to be non-trivial. And there could be false positives and false negatives. Also I suppose this discussion applies to processor operators (those having both in and out ports) so Ram’s example of JdbcPollInputOperator may not be applicable here? On 8/10/16, 2:04 PM, "Ashwin Chandra Putta"wrote: In a separate thread I mean. Regards, Ashwin. On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta < ashwinchand...@gmail.com> wrote: > + dev@apex.apache.org > - us...@apex.apache.org > > This is one of those best practices that we learn by experience during > operator development. It will save a lot of time during operator > development if we can catch and throw validation error when someone emits > tuples in a non separate thread. > > Regards, > Ashwin > > On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath > wrote: > >> For cases where use of a different thread is needed, it can write tuples >> to a queue from where the operator thread pulls them -- >> JdbcPollInputOperator in Malhar has an example. >> >> Ram >> >> On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com >> wrote: >> >>> Hey Vlad, >>> >>> Thanks for bringing this up. Is there an easy way to detect unexpected >>> use of emit method without hurt the performance. Or at least if we can >>> detect this in debug mode. >>> >>> Regards, >>> Siyuan >>> >>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov >>> wrote: >>> The short answer is no, creating worker thread to emit tuples is not supported by Apex and will lead to an undefined behavior. Operators in Apex have strong thread affinity and all interaction with the platform must happen on the operator thread. Vlad >>> >>> >> > > > -- > > Regards, > Ashwin. > -- Regards, Ashwin.
Re: can operators emit on a different from the operator itself thread?
Sanjay just reminded me of my typo -> I meant between end_window and start_window :) Thks Amol On Wed, Aug 10, 2016 at 2:36 PM, Sanjay Pujarewrote: > If the goal is to do this validation through static analysis of operator > code, I guess it is possible but is going to be non-trivial. And there > could be false positives and false negatives. > > Also I suppose this discussion applies to processor operators (those > having both in and out ports) so Ram’s example of JdbcPollInputOperator may > not be applicable here? > > On 8/10/16, 2:04 PM, "Ashwin Chandra Putta" > wrote: > > In a separate thread I mean. > > Regards, > Ashwin. > > On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta < > ashwinchand...@gmail.com> wrote: > > > + dev@apex.apache.org > > - us...@apex.apache.org > > > > This is one of those best practices that we learn by experience > during > > operator development. It will save a lot of time during operator > > development if we can catch and throw validation error when someone > emits > > tuples in a non separate thread. > > > > Regards, > > Ashwin > > > > On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath < > r...@datatorrent.com> > > wrote: > > > >> For cases where use of a different thread is needed, it can write > tuples > >> to a queue from where the operator thread pulls them -- > >> JdbcPollInputOperator in Malhar has an example. > >> > >> Ram > >> > >> On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com > > >> wrote: > >> > >>> Hey Vlad, > >>> > >>> Thanks for bringing this up. Is there an easy way to detect > unexpected > >>> use of emit method without hurt the performance. Or at least if we > can > >>> detect this in debug mode. > >>> > >>> Regards, > >>> Siyuan > >>> > >>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov < > v.ro...@datatorrent.com> > >>> wrote: > >>> > The short answer is no, creating worker thread to emit tuples is > not > supported by Apex and will lead to an undefined behavior. > Operators in Apex > have strong thread affinity and all interaction with the platform > must > happen on the operator thread. > > Vlad > > >>> > >>> > >> > > > > > > -- > > > > Regards, > > Ashwin. > > > > > > -- > > Regards, > Ashwin. > > > >
Re: can operators emit on a different from the operator itself thread?
Send too soon. A quicker way would be to catch emit happening between start_window and end_window and flag an error. Catching "another thread" for every tuple may have a huge performance hit. Thks Amol On Wed, Aug 10, 2016 at 2:31 PM, Amol Kekrewrote: > > Currently user can code it that way. IMHO Apex should catch this and flag > error. > > Thks > Amol > > > On Wed, Aug 10, 2016 at 2:04 PM, Ashwin Chandra Putta < > ashwinchand...@gmail.com> wrote: > >> In a separate thread I mean. >> >> Regards, >> Ashwin. >> >> On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta < >> ashwinchand...@gmail.com> wrote: >> >> > + dev@apex.apache.org >> > - us...@apex.apache.org >> > >> > This is one of those best practices that we learn by experience during >> > operator development. It will save a lot of time during operator >> > development if we can catch and throw validation error when someone >> emits >> > tuples in a non separate thread. >> > >> > Regards, >> > Ashwin >> > >> > On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath > > >> > wrote: >> > >> >> For cases where use of a different thread is needed, it can write >> tuples >> >> to a queue from where the operator thread pulls them -- >> >> JdbcPollInputOperator in Malhar has an example. >> >> >> >> Ram >> >> >> >> On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com >> >> wrote: >> >> >> >>> Hey Vlad, >> >>> >> >>> Thanks for bringing this up. Is there an easy way to detect unexpected >> >>> use of emit method without hurt the performance. Or at least if we can >> >>> detect this in debug mode. >> >>> >> >>> Regards, >> >>> Siyuan >> >>> >> >>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov > > >> >>> wrote: >> >>> >> The short answer is no, creating worker thread to emit tuples is not >> supported by Apex and will lead to an undefined behavior. Operators >> in Apex >> have strong thread affinity and all interaction with the platform >> must >> happen on the operator thread. >> >> Vlad >> >> >>> >> >>> >> >> >> > >> > >> > -- >> > >> > Regards, >> > Ashwin. >> > >> >> >> >> -- >> >> Regards, >> Ashwin. >> > >
Re: can operators emit on a different from the operator itself thread?
Currently user can code it that way. IMHO Apex should catch this and flag error. Thks Amol On Wed, Aug 10, 2016 at 2:04 PM, Ashwin Chandra Putta < ashwinchand...@gmail.com> wrote: > In a separate thread I mean. > > Regards, > Ashwin. > > On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta < > ashwinchand...@gmail.com> wrote: > > > + dev@apex.apache.org > > - us...@apex.apache.org > > > > This is one of those best practices that we learn by experience during > > operator development. It will save a lot of time during operator > > development if we can catch and throw validation error when someone emits > > tuples in a non separate thread. > > > > Regards, > > Ashwin > > > > On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath> > wrote: > > > >> For cases where use of a different thread is needed, it can write tuples > >> to a queue from where the operator thread pulls them -- > >> JdbcPollInputOperator in Malhar has an example. > >> > >> Ram > >> > >> On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com > >> wrote: > >> > >>> Hey Vlad, > >>> > >>> Thanks for bringing this up. Is there an easy way to detect unexpected > >>> use of emit method without hurt the performance. Or at least if we can > >>> detect this in debug mode. > >>> > >>> Regards, > >>> Siyuan > >>> > >>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov > >>> wrote: > >>> > The short answer is no, creating worker thread to emit tuples is not > supported by Apex and will lead to an undefined behavior. Operators > in Apex > have strong thread affinity and all interaction with the platform must > happen on the operator thread. > > Vlad > > >>> > >>> > >> > > > > > > -- > > > > Regards, > > Ashwin. > > > > > > -- > > Regards, > Ashwin. >
Re: can operators emit on a different from the operator itself thread?
In a separate thread I mean. Regards, Ashwin. On Wed, Aug 10, 2016 at 2:01 PM, Ashwin Chandra Putta < ashwinchand...@gmail.com> wrote: > + dev@apex.apache.org > - us...@apex.apache.org > > This is one of those best practices that we learn by experience during > operator development. It will save a lot of time during operator > development if we can catch and throw validation error when someone emits > tuples in a non separate thread. > > Regards, > Ashwin > > On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanath> wrote: > >> For cases where use of a different thread is needed, it can write tuples >> to a queue from where the operator thread pulls them -- >> JdbcPollInputOperator in Malhar has an example. >> >> Ram >> >> On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com >> wrote: >> >>> Hey Vlad, >>> >>> Thanks for bringing this up. Is there an easy way to detect unexpected >>> use of emit method without hurt the performance. Or at least if we can >>> detect this in debug mode. >>> >>> Regards, >>> Siyuan >>> >>> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov >>> wrote: >>> The short answer is no, creating worker thread to emit tuples is not supported by Apex and will lead to an undefined behavior. Operators in Apex have strong thread affinity and all interaction with the platform must happen on the operator thread. Vlad >>> >>> >> > > > -- > > Regards, > Ashwin. > -- Regards, Ashwin.
Re: can operators emit on a different from the operator itself thread?
+ dev@apex.apache.org - us...@apex.apache.org This is one of those best practices that we learn by experience during operator development. It will save a lot of time during operator development if we can catch and throw validation error when someone emits tuples in a non separate thread. Regards, Ashwin On Wed, Aug 10, 2016 at 1:57 PM, Munagala Ramanathwrote: > For cases where use of a different thread is needed, it can write tuples > to a queue from where the operator thread pulls them -- > JdbcPollInputOperator in Malhar has an example. > > Ram > > On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com > wrote: > >> Hey Vlad, >> >> Thanks for bringing this up. Is there an easy way to detect unexpected >> use of emit method without hurt the performance. Or at least if we can >> detect this in debug mode. >> >> Regards, >> Siyuan >> >> On Wed, Aug 10, 2016 at 11:27 AM, Vlad Rozov >> wrote: >> >>> The short answer is no, creating worker thread to emit tuples is not >>> supported by Apex and will lead to an undefined behavior. Operators in Apex >>> have strong thread affinity and all interaction with the platform must >>> happen on the operator thread. >>> >>> Vlad >>> >> >> > -- Regards, Ashwin.