Re: [DISCUSS] Custom Control Tuples Design

2017-02-15 Thread Amol Kekre
This is needed, the batch start-end have similar semantics as start-end window from operational/functional perspective. Thks Amol *Join us at Apex Big Data World-San Jose , April 4, 2017!* [image: http://www.apexbigdata.com/san-jose-register.html]

Re: [DISCUSS] Custom Control Tuples Design

2017-02-14 Thread Bhupesh Chawda
+1 for having an immediate delivery mechanism as well. I would suggest that the other delivery mechanism stays at end of window, to be consistent, as I think it may be difficult to determine the last arrival of the tuple. ~ Bhupesh On Wed, Feb 15, 2017 at 7:04 AM, Pramod Immaneni

Re: [DISCUSS] Custom Control Tuples Design

2017-02-14 Thread Pramod Immaneni
There have been some recent developments and discussions on the schema side (link below) that warrant a reconsideration of how control tuples get delivered.

Re: [DISCUSS] Custom Control Tuples Design

2017-01-09 Thread Bhupesh Chawda
Hi All, Based on some discussion here is what is planned for the propagation feature for control tuples. The signature of the *processControl()* method in *ControlAwareDefaultInputPort* which is implemented by the operator developer will be as follows: *public abstract boolean

Re: [DISCUSS] Custom Control Tuples Design

2017-01-09 Thread Tushar Gosavi
On Sun, Jan 8, 2017 at 11:49 PM, Vlad Rozov wrote: > +1 to manage propagation at an operator level. An operator is either control > tuple aware and needs to manage how control tuples are routed from input > ports to output ports or it is not. In the later case it does not

Re: [DISCUSS] Custom Control Tuples Design

2017-01-09 Thread Bhupesh Chawda
Hi All, Whether the operator is control tuple aware can be determined by the type of ports it uses; specifically the Input Port. I think this should be a sufficient condition to infer the control tuple awareness of the operator. Any cases where this assumption may not hold? Are you suggesting

Re: [DISCUSS] Custom Control Tuples Design

2017-01-08 Thread Vlad Rozov
+1 to manage propagation at an operator level. An operator is either control tuple aware and needs to manage how control tuples are routed from input ports to output ports or it is not. In the later case it does not matter how many input and output ports the operator has and it is the Apex

Re: [DISCUSS] Custom Control Tuples Design

2017-01-05 Thread Bhupesh Chawda
Agreed Thomas. I was referring to the persona of the operator developer. The user of the operator would not be doing anything related to the propagation of control tuples. Actually, the behavior of the operator wrt. propagation of control tuples would be part of the operator documentation. Also,

Re: [DISCUSS] Custom Control Tuples Design

2017-01-05 Thread Thomas Weise
I think it is important to be clear on the roles with regard to this functionality. The user of the operator should not have to do anything to get it to work. So while I suggested to consider attributes earlier, there should not be any need for the user to set those. The operator needs to work as

Re: [DISCUSS] Custom Control Tuples Design

2017-01-04 Thread Bhupesh Chawda
I think we all agree on the use case for selective propagation. The question is about where to have the control - at the operator level or at the port level. For this ability, we have the following options: 1. Operator disables the propagation on selected output ports. Other output ports

Re: [DISCUSS] Custom Control Tuples Design

2017-01-04 Thread Thomas Weise
Yes, I think that for any of these cases the operator developer will turn of implicit propagation for the operator and then write the code to route or create control tuples as needed. Thomas On Wed, Jan 4, 2017 at 12:59 PM, Amol Kekre wrote: > I agree that by default the

Re: [DISCUSS] Custom Control Tuples Design

2017-01-04 Thread Amol Kekre
I agree that by default the propagation must be implicit, i.e. if the operator does nothing, the control tuple propagates. I do think users should have control on deciding to "not propagate" or "create new" and in these cases they would need to do something explicit (override)? The following

Re: [DISCUSS] Custom Control Tuples Design

2017-01-04 Thread Thomas Weise
I think there is (1) implicit propagation just like other control tuples where the operator code isn't involved and (2) where the operator developer wants to decide how control tuples are created or routed and will receive and emit them on the output ports as desired. I don't see a use case for

Re: [DISCUSS] Custom Control Tuples Design

2017-01-04 Thread Amol Kekre
Yes, there is a chance that two output ports will have different send requirements. Thks Amol On Wed, Jan 4, 2017 at 10:59 AM, Bhupesh Chawda wrote: > Wouldn't having this with output ports give a finer control on the > propagation of control tuples? > We might have

Re: [DISCUSS] Custom Control Tuples Design

2017-01-04 Thread Bhupesh Chawda
Wouldn't having this with output ports give a finer control on the propagation of control tuples? We might have an operator with two output ports each of which creates two different pipelines downstream. We would be able to say that one pipeline gets the control tuples and the other doesn't. ~

Re: [DISCUSS] Custom Control Tuples Design

2017-01-04 Thread Thomas Weise
I'm referring to the operator that needs to make the decision to propagate or not. The tuples come from an input port, so it seems appropriate to say "don't propagate control tuples from this port". No matter how many output ports there are. Output ports are there for an operator to emit new

Re: [DISCUSS] Custom Control Tuples Design

2017-01-04 Thread Thomas Weise
Wouldn't it be more intuitive to control this with an attribute on the input port? On Tue, Jan 3, 2017 at 11:06 PM, Bhupesh Chawda wrote: > Hi Pramod, > > I was thinking of a method setPropagateControlTuples(boolean propagate) on > the output port of the operator. >

Re: [DISCUSS] Custom Control Tuples Design

2017-01-03 Thread Pramod Immaneni
2 sounds good. Have you thought about what the method would look like. On Sat, Dec 31, 2016 at 8:29 PM, Bhupesh Chawda wrote: > Yes, that makes sense. > We have following options: > 1. Make the annotation false by default and force the user to forward the > control

Re: [DISCUSS] Custom Control Tuples Design

2017-01-03 Thread Bhupesh Chawda
Yes David, that is correct. The annotation is true by default. However, it may happen that in the course of processing, we may want to stop the propagation of control tuples to the downstream operators. In this case, we should have an option to block the control tuples propagation to downstream

Re: [DISCUSS] Custom Control Tuples Design

2017-01-03 Thread David Yan
The annotation should be true by default. If an operator does not care about the control tuples, it should propagate them because the downstream might care about it. For example, let's say the original DAG looks like: A->B And A emits control tuples that B cares about, and of course along with

Re: [DISCUSS] Custom Control Tuples Design

2017-01-03 Thread Bhupesh Chawda
Hi All, I have created a review only PR based on the discussion so far. This will also help make the discussion easier and we can continue with the review in parallel. Here is the PR: https://github.com/apache/apex-core/pull/440 Please help review this. I am still working on documentation and

Re: [DISCUSS] Custom Control Tuples Design

2016-12-31 Thread Bhupesh Chawda
Yes, that makes sense. We have following options: 1. Make the annotation false by default and force the user to forward the control tuples explicitly. 2. Annotation is true by default and static way of blocking stays as it is. We provide another way for blocking programmatically, perhaps by means

Re: [DISCUSS] Custom Control Tuples Design

2016-12-29 Thread Pramod Immaneni
Bhupesh, Annotation seems like a static way to stop propagation. Give these are programmatically generated I would think the operators should be able to stop (consume without propagating) programmatically as well. Thanks On Thu, Dec 29, 2016 at 8:48 AM, Bhupesh Chawda

Re: [DISCUSS] Custom Control Tuples Design

2016-12-29 Thread Bhupesh Chawda
Thanks Vlad, I am trying out the approach you mentioned regarding having another interface which allows sinks to put a control tuple. Regarding the delivery of control tuples, here is what I am planning to do: All the control tuples which are emitted in a particular window are delivered after all

Re: [DISCUSS] Custom Control Tuples Design

2016-12-28 Thread Vlad Rozov
Custom control tuples are control tuples emitted by an operator itself and not by the platform. Prior to the introduction of the custom control tuples, only Apex engine itself puts control tuples into various sinks, so the engine created necessary Tuple objects with the corresponding type

Re: [DISCUSS] Custom Control Tuples Design

2016-12-23 Thread Bhupesh Chawda
Hi Vlad, Thanks for the pointer on delegating the wrapping of the user tuple to the control port. I was trying this out today. The problem I see us if we introduce a putControlTuple() method in Sink, then a lot of the existing sinks would change. Also the changes seemed redundant as, the existing

Re: [DISCUSS] Custom Control Tuples Design

2016-12-22 Thread Vlad Rozov
Why is it necessary to wrap in the OutputPort? Can't it be delegated to a Sink by introducing new putControlTuple method? Thank you, Vlad On 12/21/16 22:10, Bhupesh Chawda wrote: Hi Vlad, The problem in using the Tuple class as the wrapper is that the Ports belong to the API and we want to

Re: [DISCUSS] Custom Control Tuples Design

2016-12-21 Thread Tushar Gosavi
Hi Bhupesh, We could have a marker interface with just one method getType() in apex-api, our CustomControlTuple and stram Tuple will implement this new interface. interface ControlTuple { MessageType getType(); } Add CUSTOM_TUPLE in MessageType enum and bring it in apex-api. Our

Re: [DISCUSS] Custom Control Tuples Design

2016-12-21 Thread Bhupesh Chawda
There is another issue with the current approach which fails in case tuples skip serialization. To address this, we have the following options: One of the options, is that we have a common class in API which is the parent for both Tuple class in Stram and also the parent of the custom control

Re: [DISCUSS] Custom Control Tuples Design

2016-12-21 Thread Bhupesh Chawda
Hi Vlad, The problem in using the Tuple class as the wrapper is that the Ports belong to the API and we want to wrap the payload object of the control tuple into the Tuple class which is not part of the API. The output port will just get the payload of the user control tuple. For example, if the

Re: [DISCUSS] Custom Control Tuples Design

2016-12-21 Thread Vlad Rozov
Hi Bhupesh, it should not be a CustomWrapper. The wrapper object should be CustomControlTuple that extends Tuple. There is already code that checks for Tuple instance. The "unWrap" name is misleading, IMO. It should be something like customControlTuple.getPayload() or

Re: [DISCUSS] Custom Control Tuples Design

2016-12-21 Thread Bhupesh Chawda
Hi Vlad. Yes, the API should not change. We can take an Object instead, and later wrap it into the required class. Our InputPort.put and emitControl method would look something like the following where we handle the wrapping and unwrapping internally. public void put(T tuple) { if (tuple

Re: [DISCUSS] Custom Control Tuples Design

2016-12-20 Thread Vlad Rozov
A wrapper class is required for the control tuples delivery, but Port/Operator API should use Control Tuple payload object only. Implementation of the wrapper class may change from version to version, but API should not be affected by the change. I guess, assumption is that default input and

Re: [DISCUSS] Custom Control Tuples Design

2016-12-20 Thread Bhupesh Chawda
I investigated this and seems like it is better to have a wrapper class for the user object. This would serve 2 purposes: 1. Allow us to distinguish a custom control tuple from other payload tuples. 2. For the same control tuple received from different upstream partitions, we would

Re: [DISCUSS] Custom Control Tuples Design

2016-12-18 Thread David Yan
This C type parameter is going to fix the control tuple type at compile time and this is actually not what we want. Note that the operator may receive or emit multiple different control tuple types. David On Dec 17, 2016 3:33 AM, "Tushar Gosavi" wrote: We do not need to

Re: [DISCUSS] Custom Control Tuples Design

2016-12-17 Thread Tushar Gosavi
We do not need to create an interface for data emitted through emitControl or processed through processControl. Internally we could wrap the user object in ControlTuple. you can add type parameter for control tuple object on ports. DefaultInputPort D is the data type and C is the control

Re: [DISCUSS] Custom Control Tuples Design

2016-12-16 Thread Bhupesh Chawda
Agreed Vlad and David. I am just suggesting there should be a wrapper for the user object. It can be a marker interface and we can call it something else like "CustomControl". The user object will be wrapped in another class "ControlTuple" which traverses the BufferServer and will perhaps be

Re: [DISCUSS] Custom Control Tuples Design

2016-12-16 Thread Vlad Rozov
I agree with David. Payload of the control tuple is in the userObject and operators/ports don't need to be exposed to the implementation of the ControlTuple class. With the proposed interface operators developers are free to extend ControlTuple further and I don't think that such capability

Re: [DISCUSS] Custom Control Tuples Design

2016-12-16 Thread Bhupesh Chawda
Hi David, Actually, I was thinking of another API class called ControlTuple, different from the actual tuple class in buffer server or stram. This could serve as a way for the Buffer server publisher to understand that it is a control tuple and needs to be wrapped differently. ~ Bhupesh On

Re: [DISCUSS] Custom Control Tuples Design

2016-12-16 Thread David Yan
// DefaultInputPort public void processControl(ControlTuple tuple) { // Default Implementation to avoid need to implement it in all implementations } {code} {code} // DefaultOutputPort public void emitControl(ControlTuple tuple) { } I think we don't need to expose the

Re: [DISCUSS] Custom Control Tuples Design

2016-12-16 Thread Bhupesh Chawda
Yes, control tuples would be delivered in the same window in which they were generated. Only the tuples which are duplicated due to shuffling across multiple stages would be deduplicated and then sent only once. ~ Bhupesh On Dec 16, 2016 20:30, "Thomas Weise" wrote: Hi

Re: [DISCUSS] Custom Control Tuples Design

2016-12-16 Thread Thomas Weise
Hi Bhupesh, I don't see anything stateful here as control tuples are delivered at the end of the streaming window in which they were generated? Also, control tuples would need to be broadcast to all partitions, and therefore also joined downstream by the engine? (In addition to the case you

Re: [DISCUSS] Custom Control Tuples Design

2016-12-16 Thread Bhupesh Chawda
As I understand from the discussion on the other thread, we want custom control tuples to behave like existing control tuples similar to begin window and end window. However the fact that we are allowing the user to bundle a user object inside the control tuple differentiates it from the existing

[DISCUSS] Custom Control Tuples Design

2016-12-15 Thread Bhupesh Chawda
Hi All, Here are the initial interfaces: {code} // DefaultInputPort public void processControl(ControlTuple tuple) { // Default Implementation to avoid need to implement it in all implementations } {code} {code} // DefaultOutputPort public void emitControl(ControlTuple tuple) {