Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-09-16 Thread Kostas Kloudas
Hi all, Thanks for keeping the discussion running while I was on holidays! I am catching up currently and I will post in the voting thread if I have any comments :) Cheers, Kostas On Wed, Sep 16, 2020 at 11:25 AM David Anderson wrote: > > Aljoscha, > > Thanks for the thorough response. I'm

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-09-16 Thread David Anderson
Aljoscha, Thanks for the thorough response. I'm still wanting to think about and discuss the Trigger topic some more, but I'm content with where you've left it for now. Everything else seems good. David On Fri, Sep 11, 2020 at 2:08 PM Aljoscha Krettek wrote: > Thanks for the thoughtful

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-09-11 Thread Aljoscha Krettek
Thanks for the thoughtful comments! I'll try and address them inline below. I'm hoping to start a VOTE thread soon if there are no other comments by the end of today. On 10.09.20 15:40, David Anderson wrote: Having just re-read FLIP-134, I think it mostly makes sense, though I'm not exactly

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-09-10 Thread David Anderson
Having just re-read FLIP-134, I think it mostly makes sense, though I'm not exactly looking forward to figuring out how to explain it without making it seem overly complicated. A few points: I'm a bit confused by the discussion around custom window Triggers. Yes, I agree that complex, mixed

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-09-10 Thread Dawid Wysakowicz
Thanks for the explanation! Makes sense now! What do you think about adding this behaviour of WindowAssigner in streaming mode as well? I mean the behaviour of emitting at the end of a Window. I think it would make sense in the STREAM mode as well and keep the two modes more aligned. Best, Dawid

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-09-10 Thread Aljoscha Krettek
On 10.09.20 11:30, Dawid Wysakowicz wrote: I am not sure about the option for ignoring the Triggers. Do you mean to ignore all the Triggers including e.g. Flink's such as CountTrigger, EventTimeTrigger etc.? Won't it effectively disable the WindowOperator whatsoever. Or even worse make it

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-09-10 Thread Dawid Wysakowicz
Thanks for the update Aljoscha! I am not sure about the option for ignoring the Triggers. Do you mean to ignore all the Triggers including e.g. Flink's such as CountTrigger, EventTimeTrigger etc.? Won't it effectively disable the WindowOperator whatsoever. Or even worse make it unusable with ever

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-09-09 Thread Aljoscha Krettek
I updated the FLIP, you can check out the changes here: https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=158871522=16=15 There is still the open question of what IGNORE means for getProcessingTime(). Plus, I introduced a setting for ignoring Triggers because I think

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-09-08 Thread Dawid Wysakowicz
Hey Aljoscha A couple of thoughts for the two remaining TODOs in the doc: # Processing Time Support in BATCH/BOUNDED execution mode I think there are two somewhat orthogonal problems around this topic:     1. Firing processing timers at the end of the job     2. Having processing timers in the

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-09-01 Thread Aljoscha Krettek
Hmm, it seems I left out the Dev ML in my mail. Looping that back in.. On 28.08.20 13:54, Dawid Wysakowicz wrote: @Aljoscha Let me bring back to the ML some of the points we discussed offline. Ad. 1 Yes I agree it's not just about scheduling. It includes more changes to the runtime. We might

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-24 Thread Kostas Kloudas
Thanks a lot for the discussion! I will open a voting thread shortly! Kostas On Mon, Aug 24, 2020 at 9:46 AM Kostas Kloudas wrote: > > Hi Guowei, > > Thanks for the insightful comment! > > I agree that this can be a limitation of the current runtime, but I > think that this FLIP can go on as

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-24 Thread Kostas Kloudas
Hi Guowei, Thanks for the insightful comment! I agree that this can be a limitation of the current runtime, but I think that this FLIP can go on as it discusses mainly the semantics that the DataStream API will expose when applied on bounded data. There will definitely be other FLIPs that will

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-24 Thread Guowei Ma
Hi, Klou Thanks for your proposal. It's a very good idea. Just a little comment about the "Batch vs Streaming Scheduling". In the AUTOMATIC execution mode maybe we could not pick BATCH execution mode even if all sources are bounded. For example some applications would use the

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-20 Thread Kostas Kloudas
Hi all, Thanks for the comments! @Dawid: "execution.mode" can be a nice alternative and from a quick look it is not used currently by any configuration option. I will update the FLIP accordingly. @David: Given that having the option to allow timers to fire at the end of the job is already in

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-18 Thread David Anderson
Being able to optionally fire registered processing time timers at the end of a job would be interesting, and would help in (at least some of) the cases I have in mind. I don't have a better idea. David On Mon, Aug 17, 2020 at 8:24 PM Kostas Kloudas wrote: > Hi Kurt and David, > > Thanks a lot

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-18 Thread Kostas Kloudas
t, > Yun > > > --Original Mail -- > Sender:Kostas Kloudas > Send Date:Tue Aug 18 02:24:21 2020 > Recipients:David Anderson > CC:dev , user > Subject:Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input >> >> Hi Ku

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-18 Thread Dawid Wysakowicz
*CC:*dev , user > *Subject:*Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded > Input > > Hi Kurt and David, > > Thanks a lot for the insightful feedback! > > @Kurt: For the topic of checkpointing with Batch Scheduling, I total

Re: Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-17 Thread Yun Gao
-134: DataStream Semantics for Bounded Input Hi Kurt and David, Thanks a lot for the insightful feedback! @Kurt: For the topic of checkpointing with Batch Scheduling, I totally agree with you that it requires a lot more work and careful thinking on the semantics. This FLIP was written under

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-17 Thread Kostas Kloudas
Hi Kurt and David, Thanks a lot for the insightful feedback! @Kurt: For the topic of checkpointing with Batch Scheduling, I totally agree with you that it requires a lot more work and careful thinking on the semantics. This FLIP was written under the assumption that if the user wants to have

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-17 Thread David Anderson
Kostas, I'm pleased to see some concrete details in this FLIP. I wonder if the current proposal goes far enough in the direction of recognizing the need some users may have for "batch" and "bounded streaming" to be treated differently. If I've understood it correctly, the section on scheduling

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-16 Thread Kurt Young
Hi Kostas, Thanks for starting this discussion. The first part of this FLIP: "Batch vs Streaming Scheduling" looks reasonable to me. However, there is another dimension I think we should also take into consideration, which is whether checkpointing is enabled. This option is orthogonal (but not

[DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-12 Thread Kostas Kloudas
Hi all, As described in FLIP-131 [1], we are aiming at deprecating the DataSet API in favour of the DataStream API and the Table API. After this work is done, the user will be able to write a program using the DataStream API and this will execute efficiently on both bounded and unbounded data.