Re: [DISCUSSION] Custom Control Tuples

2017-02-14 Thread Pramod Immaneni
There have been some recent developments and discussions on the schema side (link below) that warrant a reconsideration of how control tuples get delivered. http://apache.markmail.org/search/?q=apex+list%3Aorg.apache.apex.dev+schema+ discovery+support#query:apex%20list%3Aorg.apache.apex.dev% 20sch

Re: [DISCUSSION] Custom Control Tuples

2016-12-02 Thread David Yan
Bhupesh, Sandesh, Tushar: Thanks for volunteering. This task probably needs all three of you to work closely together. The subtasks so far are: https://issues.apache.org/jira/browse/APEXCORE-580 https://issues.apache.org/jira/browse/APEXCORE-581 Please first review the subtasks and see whether

Re: [DISCUSSION] Custom Control Tuples

2016-12-01 Thread Tushar Gosavi
I am also interested working on this feature. - Tushar. On Thu, Dec 1, 2016 at 10:27 AM, Bhupesh Chawda wrote: > I would like to work on https://issues.apache.org/jira/browse/APEXCORE-580. > > ~ Bhupesh > > On Thu, Dec 1, 2016 at 5:42 AM, Sandesh Hegde > wrote: > >> I am interested in working

Re: [DISCUSSION] Custom Control Tuples

2016-11-30 Thread Bhupesh Chawda
I would like to work on https://issues.apache.org/jira/browse/APEXCORE-580. ~ Bhupesh On Thu, Dec 1, 2016 at 5:42 AM, Sandesh Hegde wrote: > I am interested in working on the following subtask > > https://issues.apache.org/jira/browse/APEXCORE-581 > > Thanks > > > On Wed, Nov 30, 2016 at 2:07 P

Re: [DISCUSSION] Custom Control Tuples

2016-11-30 Thread Sandesh Hegde
I am interested in working on the following subtask https://issues.apache.org/jira/browse/APEXCORE-581 Thanks On Wed, Nov 30, 2016 at 2:07 PM David Yan wrote: > I have created an umbrella ticket for control tuple support: > > https://issues.apache.org/jira/browse/APEXCORE-579 > > Currently it

Re: [DISCUSSION] Custom Control Tuples

2016-11-30 Thread David Yan
I have created an umbrella ticket for control tuple support: https://issues.apache.org/jira/browse/APEXCORE-579 Currently it has two subtasks. Please have a look at them and see whether I'm missing anything or if you have anything to add. You are welcome to add more subtasks or comment on the exi

Re: [DISCUSSION] Custom Control Tuples

2016-11-28 Thread Bhupesh Chawda
+1 for the plan. I would be interested in contributing to this feature. ~ Bhupesh On Nov 29, 2016 03:26, "Sandesh Hegde" wrote: > I am interested in contributing to this feature. > > On Mon, Nov 28, 2016 at 1:54 PM David Yan wrote: > > > I think we should probably go ahead with option 1 since

Re: [DISCUSSION] Custom Control Tuples

2016-11-28 Thread Sandesh Hegde
I am interested in contributing to this feature. On Mon, Nov 28, 2016 at 1:54 PM David Yan wrote: > I think we should probably go ahead with option 1 since this works with > most use cases and prevents developers from shooting themselves in the foot > in terms of idempotency. > > We can have a c

Re: [DISCUSSION] Custom Control Tuples

2016-11-28 Thread David Yan
I think we should probably go ahead with option 1 since this works with most use cases and prevents developers from shooting themselves in the foot in terms of idempotency. We can have a configuration property that enables option 2 later if we have concrete use cases that call for it. Please shar

Re: [DISCUSSION] Custom Control Tuples

2016-11-27 Thread Bhupesh Chawda
It appears that option 1 is more favored due to unavailability of a use case which could use option 2. However, option 2 is problematic in specific cases, like presence of multiple input ports for example. In case of a linear DAG where control tuples are flowing in order with the data tuples, it s

Re: [DISCUSSION] Custom Control Tuples

2016-11-10 Thread David Yan
Good question Tushar. The callback should be called only once. The way to implement this is to keep a list of control tuple hashes for the given streaming window and only do the callback when the operator has not seen it before. Other thoughts? David On Thu, Nov 10, 2016 at 9:32 AM, Tushar Gosav

Re: [DISCUSSION] Custom Control Tuples

2016-11-10 Thread Tushar Gosavi
Hi David, What would be the behaviour in case where we have a DAG with following operators, the number in bracket is number of partitions, X is NxM partitioning. A(1) X B(4) X C(2) If A sends a control tuple, it will be sent to all 4 partition of B, and from each partition from B it goes to C, i.

Re: [DISCUSSION] Custom Control Tuples

2016-11-03 Thread David Yan
Hi Bhupesh, Since each input port has its own incoming control tuple, I would imagine there would be an additional DefaultInputPort.processControl method that operator developers can override. If we go for option 1, my thinking is that the control tuples would always be delivered at the next windo

Re: [DISCUSSION] Custom Control Tuples

2016-11-03 Thread Pramod Immaneni
The control tuple could be delivered to the operator only after it is received from all upstream partitions but still allow other data from an upstream partition after it's control tuple is received, we don't have to necessarily block and do complete synchronization like in end window. You are righ

Re: [DISCUSSION] Custom Control Tuples

2016-11-03 Thread Bhupesh Chawda
I have a question regarding the callback for a control tuple. Will it be similar to InputPort::process() method? Something like InputPort::processControlTuple(t) ? Or will it be a method of the operator similar to beginWindow()? When we say that the control tuple will be delivered at window bounda

Re: [DISCUSSION] Custom Control Tuples

2016-11-03 Thread Thomas Weise
I don't see how that would work. Suppose you have a file splitter and multiple partitions of block readers. The "end of file" event cannot be processed downstream until all block readers are done. I also think that this is related to the batch demarcation discussion and there should be a single gen

Re: [DISCUSSION] Custom Control Tuples

2016-11-03 Thread Thomas Weise
There is not guarantee about the ordering of events within a streaming window with multiple upstream partitions. This would require a synchronization logic similar to what the streaming window provides, hence I would expect it to be best supported as part of the same window synchronization. On Wed

Re: [DISCUSSION] Custom Control Tuples

2016-11-02 Thread Pramod Immaneni
Suppose I am processing data in a file and I want to do something at the end of a file at the output operator, I would send an end file control tuple and act on it when I receive it at the output. In a single window I may end up processing multiple files and if I don't have multiple ports and logic

Re: [DISCUSSION] Custom Control Tuples

2016-11-02 Thread Pramod Immaneni
With option 2, users can still do idempotent processing by delaying their processing of the control tuples to end window. They have the flexibility with this option. In the usual scenarios, you will have one port and given that control tuples will be sent to all partitions, all the data sent before

Re: [DISCUSSION] Custom Control Tuples

2016-11-02 Thread Thomas Weise
The use cases listed in the original discussion don't call for option 2. It seems to come with additional complexity and implementation cost. Can those in favor of option 2 please also provide the use case for it. Thanks, Thomas On Wed, Nov 2, 2016 at 10:36 PM, Siyuan Hua wrote: > I will vote

Re: [DISCUSSION] Custom Control Tuples

2016-11-02 Thread Siyuan Hua
I will vote for approach 1. First of all that one sounds easier to do to me. And I think idempotency is important. It may run at the cost of higher latency but I think it is ok And in addition, when in the future if users do need realtime control tuple processing, we can always add the option on

Re: [DISCUSSION] Custom Control Tuples

2016-11-02 Thread David Yan
Pramod, To answer your questions, the control tuples will be delivered to all downstream partitions, and an additional emitControl method (actual name TBD) can be added to DefaultOutputPort without breaking backward compatibility. Also, to clarify, each operator should have the ability to block f

Re: [DISCUSSION] Custom Control Tuples

2016-11-02 Thread Amol Kekre
A feature that incurs risk with processing order, and more so with idempotency is a big enough reason to worry about with option 2. Is there is a critical use case that needs this feature? Thks Amol On Wed, Nov 2, 2016 at 1:25 PM, Pramod Immaneni wrote: > I like approach 2 as it gives more fle

Re: [DISCUSSION] Custom Control Tuples

2016-11-02 Thread Pradeep A. Dalvi
As a rule of thumb in any real time operating system, control tuples should always be handled using Priority Queues. We may try to control priorities by defining levels. And shall not be delivered at window boundaries. In short, control tuples shall never be treated as any other tuples in real ti

Re: [DISCUSSION] Custom Control Tuples

2016-11-02 Thread Pramod Immaneni
I like approach 2 as it gives more flexibility and also allows for low-latency options. I think the following are important as well. 1. Delivering control tuples to all downstream partitions. 2. What mechanism will the operator developer use to send the control tuple? Will it be an additional meho

Re: [DISCUSSION] Custom Control Tuples

2016-11-02 Thread David Yan
Hi all, I would like to renew the discussion of control tuples. Last time, we were in a debate about whether: 1) the platform should enforce that control tuples are delivered at window boundaries only or: 2) the platform should deliver control tuples just as other tuples and it's the operator

Re: [DISCUSSION] Custom Control Tuples

2016-06-28 Thread Vlad Rozov
It is not clear how operator will emit custom control tuple at window boundaries. One way is to cache/accumulate control tuples in the operator output port till window closes (END_WINDOW is inserted into the output sink) or only allow an operator to emit control tuples inside the endWindow(). T

Re: [DISCUSSION] Custom Control Tuples

2016-06-27 Thread Amol Kekre
I agree with David. Allowing control tuples within a window (along with data tuples) creates very dangerous situation where guarantees are impacted. It is much safer to enable control tuples (send/receive) at window boundaries (after END_WINDOW of window N, and before BEGIN_WINDOW for window N+1).

Re: [DISCUSSION] Custom Control Tuples

2016-06-27 Thread Thomas Weise
The windowing we discuss here is in general event time based, arrival time is a special case of it. I don't think state changes can be made independent of the streaming window boundary as it would prevent idempotent processing and transitively exactly once. For that to work, tuples need to be pres

Re: [DISCUSSION] Custom Control Tuples

2016-06-27 Thread David Yan
If we allow these flexibilities for control tuples, we are back to square one and the regular data tuple is specifically for that. When we talk about control tuples, I think it's safe to make these two assumptions: 1) The control tuple is always sent and handled at streaming window boundary. 2)

Re: [DISCUSSION] Custom Control Tuples

2016-06-27 Thread David Yan
I think for session tracking, if the session boundaries are allowed to be not aligned with the streaming window boundaries, the user will have a much bigger problem with idempotency. And in most cases, session tracking is event time based, not ingression time or processing time based, so this may n

Re: [DISCUSSION] Custom Control Tuples

2016-06-27 Thread Chinmay Kolhatkar
I hope I'm not commenting too late on this thread. >From the above discussion, it looks like the requirement is to have a custom tuple which has following 2 capabilities to influence streaming engine on: 1. When to send it (between windows/within window) 2. Where to send it (all partitions, some p

Re: [DISCUSSION] Custom Control Tuples

2016-06-27 Thread Vlad Rozov
Ability to send custom control tuples within window may be useful, for example, for sessions tracking, where session boundaries are not aligned with window boundaries and 500 ms latency is not acceptable for an application. Thank you, Vlad On 6/25/16 10:52, Thomas Weise wrote: It should not

Re: [DISCUSSION] Custom Control Tuples

2016-06-25 Thread Thomas Weise
It should not matter from where the control tuple is triggered. It will be good to have a generic mechanism to propagate it and other things can be accomplished outside the engine. For example, the new comprehensive support for windowing will all be in Malhar, nothing that the engine needs to know

Re: [DISCUSSION] Custom Control Tuples

2016-06-25 Thread Amol Kekre
I did not say that "notably" does not mean "exclusive" Thks Amol On Sat, Jun 25, 2016 at 9:29 AM, Sandesh Hegde wrote: > Why restrict the control tuples to input operators? > > On Sat, Jun 25, 2016 at 9:07 AM Amol Kekre wrote: > > > David, > > We should avoid control tuple within the window b

Re: [DISCUSSION] Custom Control Tuples

2016-06-25 Thread Sandesh Hegde
Why restrict the control tuples to input operators? On Sat, Jun 25, 2016 at 9:07 AM Amol Kekre wrote: > David, > We should avoid control tuple within the window by simply restricting it > through API. This can be done by calling something like "sendControlTuple" > between windows, notably in inp

Re: [DISCUSSION] Custom Control Tuples

2016-06-25 Thread Amol Kekre
David, We should avoid control tuple within the window by simply restricting it through API. This can be done by calling something like "sendControlTuple" between windows, notably in input operators. Thks Amol On Sat, Jun 25, 2016 at 7:32 AM, Munagala Ramanath wrote: > What would the API look

Re: [DISCUSSION] Custom Control Tuples

2016-06-25 Thread Munagala Ramanath
What would the API look like for option 1 ? Another operator callback called controlTuple() or does the operator code have to check each incoming tuple to see if it was data or control ? Ram On Fri, Jun 24, 2016 at 11:42 PM, David Yan wrote: > It looks like option 1 is preferred by the communit

Re: [DISCUSSION] Custom Control Tuples

2016-06-25 Thread Sasha Parfenov
+1 I think this is a great feature, and is needed for all the reasons stated above, as well as supporting integrations with frameworks like Apache Beam. Giving users more control by supporting custom serialized objects, Option 1, would be my preference. Thanks, Sasha On Sat, Jun 25, 2016 at 6:1

Re: [DISCUSSION] Custom Control Tuples

2016-06-25 Thread Pramod Immaneni
For the use cases you mentioned, I think 1) and 2) are more likely to be controlled directly by the application, 3) and 4) are more likely going to be triggered externally and directly handled by the engine and 3) is already being implemented that way (apexcore-163). The control tuples emitted by

Re: [DISCUSSION] Custom Control Tuples

2016-06-24 Thread David Yan
It looks like option 1 is preferred by the community. But let me elaborate why I brought up the option of piggy backing BEGIN and END_WINDOW Option 2 implicitly enforces that the operations related to the custom control tuple be done at the streaming window boundary. For most operations, it makes

Re: [DISCUSSION] Custom Control Tuples

2016-06-24 Thread Vlad Rozov
+1 for option 1. Thank you, Vlad On 6/24/16 14:35, Bright Chen wrote: +1 It also can help to Shutdown the application gracefully. Bright On Jun 24, 2016, at 1:35 PM, Siyuan Hua wrote: +1 I think it's good to have custom control tuple and I prefer the 1 option. Also I think we should thin

Re: [DISCUSSION] Custom Control Tuples

2016-06-24 Thread Bright Chen
+1 It also can help to Shutdown the application gracefully. Bright > On Jun 24, 2016, at 1:35 PM, Siyuan Hua wrote: > > +1 > > I think it's good to have custom control tuple and I prefer the 1 option. > > Also I think we should think about couple different callbacks, that could > be operator l

Re: [DISCUSSION] Custom Control Tuples

2016-06-24 Thread Siyuan Hua
+1 I think it's good to have custom control tuple and I prefer the 1 option. Also I think we should think about couple different callbacks, that could be operator level(triggered when an operator receives an control tuple) or dag level(triggered when control tuple flow over the whole dag) Regard

Re: [DISCUSSION] Custom Control Tuples

2016-06-24 Thread David Yan
My initial thinking is that the custom control tuples, just like the existing control tuples, will only be generated from the input operators and will be propagated downstream to all operators in the DAG. So the NxM partitioning scenario works just like how other control tuples work, i.e. the callb

Re: [DISCUSSION] Custom Control Tuples

2016-06-24 Thread Tushar Gosavi
+1 for the feature I am in favor of option 1, but we may need an helper method to avoid compiler error on typed port, as calling port.emit(controlTuple) will be an error if type of control tuple and port does not match. or new method in outputPort object , emitControlTuple(ControlTuple). Can you

Re: [DISCUSSION] Custom Control Tuples

2016-06-24 Thread Amol Kekre
Overall I am strongly +1 on this ask. During pre-open source days we had discussed these and had kept it on the table for future. I believe the time has come I would prefer option 1 that is sent between previous window's END_WINDOW and new window's BEGIN_WINDOW control tuple. Piggy backing BEGIN_W