Re: Implementing FLIP-2 and FLIP-4
Hi AJ, sorry for not getting back to you earlier, I was too busy and only read your mail now. Adding the context is not a simple optimization of the case you described. In your case, you will get 30-second windows where the elements are assigned to those windows based on their timestamp. If you have one big daily window and do 30-second speculative (early) firing of that window based on processing time you will at each firing possibly have elements over that complete day in the window, i.e. you progressively output a more refined result for that 1-day window. Does that make sense to you? Cheers, Aljoscha On Wed, 14 Sep 2016 at 18:21 AJ Heller wrote: > Thank you, Aljoscha! I look forward to reading the papers you mentioned. > > Regarding FLIP-2, are there any new use cases that a Window Function > Context enables? If not, my understanding is that adding a this context > would be an optimization over what is currently possible, but maybe > inefficient. For example of how I think this would work, instead of a > "firing reason" context to let you differentiate between (e.g.) > every-30-second early firings and a daily primary firing, I imagine you > could split the stream, where one exclusively emits 30 second aggregates > and the other exclusively emits daily, and deal with them separately. > > If that is the case, that it amounts to an optimization: have you > considered wheter the added complexity is worth the potential efficiency > gain? Otherwise, if it amounts to more than a small optimization, I'd be > very interested to understand what this change would enable, I currently > don't see it. I am under time pressure to choose a viable project (the idea > was to be solidified yesterday, actually), and I would very much like to > work on this now if I can justify it. If not, I would still very much like > to work on this, but the timing will have to be different. > > Again, thank you Aljoscha, and I apologize for the rushed nature of my > situation. > > Best, > -aj heller > > On Wed, Sep 14, 2016 at 1:19 AM, Aljoscha Krettek > wrote: > > > Hi AJ, > > the idea for evictors initially came from IBM Infosphere Streams, if I'm > > not mistaken: > > http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/ > > com.ibm.streams.dev.doc/doc/windowhandling.html > > The > > first version of the windowing system used a combination of > > triggers/evictors to do the windowing, this is describe in Jonas Traub's > > thesis: http://www.diva-portal.se/smash/get/diva2:861798/FULLTEXT01.pdf. > > > > I'm quite skeptical about having support for Evictors in the first place. > > They make computation inefficient because you always have to keep a list > of > > all elements and cannot incrementally aggregate using a reduce function. > > Also, it is quite tricky to figure out how to do eviction based on > > ProcessingTime with a good interface. If you have some ideas how this > could > > be improved I'm open to anything. > > > > For now, I would suggest to focus on FLIP-2, since quite a number of > people > > would be interested in having that. I would also not put any energy in > > trying to figure out how the context can be shared between evictors and > > other parts of the system. If we keep evictors I would like to keep the > API > > and implementation completely separate from anything else that's going on > > in the system. > > > > On implementation, the context would probably created by the > WindowOperator > > or by the InternalWindowFunction. > > > > Cheers, > > Aljoscha > > > > On Mon, 12 Sep 2016 at 08:27 AJ Heller wrote: > > > > > Could you point me towards the inspiration for Evictors? Are there any > > > papers, perhaps, that lay the groundwork for mutable windows like this? > > > > > > After much research this weekend, I found that Evictors are unique to > > > Flink. Conceptually, it looks to me like Dataflow windows are > build-only. > > > Looking into other Dataflow implementations: I didn't find anything in > > > either the Apache Beam SDK docs or the Google Cloud Dataflow API docs > > that > > > mention allowing you to remove elements from a window. I'm hesitant to > > > tread new ground in mutability. > > > > > > What do you think about reimplementing Evictors as a kind of cyclic > > filter > > > operation? Would it be possible? I believe this would fit into the > > Dataflow > > > model better, but I'm still in the early stages of becoming familiar > with > > > Flink, and I haven't read the ABS paper [1] yet to know if there are > > > snapshot implications. I also don't (yet) see why you couldn't optimize > > > such a cyclic operation with mutable operations under the hood. > > > > > > [1]: http://arxiv.org/abs/1506.08603 > > > > > > > > > On Fri, Sep 9, 2016 at 11:46 AM, AJ Heller wrote: > > > > > >> Thank you for offering your support, I'm excited to dig in! > > >> > > >> I have some work to do getting up to speed on the windowing internals. > > >> And I still need to get my bearing on the Evictor chang
Re: Implementing FLIP-2 and FLIP-4
Thank you, Aljoscha! I look forward to reading the papers you mentioned. Regarding FLIP-2, are there any new use cases that a Window Function Context enables? If not, my understanding is that adding a this context would be an optimization over what is currently possible, but maybe inefficient. For example of how I think this would work, instead of a "firing reason" context to let you differentiate between (e.g.) every-30-second early firings and a daily primary firing, I imagine you could split the stream, where one exclusively emits 30 second aggregates and the other exclusively emits daily, and deal with them separately. If that is the case, that it amounts to an optimization: have you considered wheter the added complexity is worth the potential efficiency gain? Otherwise, if it amounts to more than a small optimization, I'd be very interested to understand what this change would enable, I currently don't see it. I am under time pressure to choose a viable project (the idea was to be solidified yesterday, actually), and I would very much like to work on this now if I can justify it. If not, I would still very much like to work on this, but the timing will have to be different. Again, thank you Aljoscha, and I apologize for the rushed nature of my situation. Best, -aj heller On Wed, Sep 14, 2016 at 1:19 AM, Aljoscha Krettek wrote: > Hi AJ, > the idea for evictors initially came from IBM Infosphere Streams, if I'm > not mistaken: > http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/ > com.ibm.streams.dev.doc/doc/windowhandling.html > The > first version of the windowing system used a combination of > triggers/evictors to do the windowing, this is describe in Jonas Traub's > thesis: http://www.diva-portal.se/smash/get/diva2:861798/FULLTEXT01.pdf. > > I'm quite skeptical about having support for Evictors in the first place. > They make computation inefficient because you always have to keep a list of > all elements and cannot incrementally aggregate using a reduce function. > Also, it is quite tricky to figure out how to do eviction based on > ProcessingTime with a good interface. If you have some ideas how this could > be improved I'm open to anything. > > For now, I would suggest to focus on FLIP-2, since quite a number of people > would be interested in having that. I would also not put any energy in > trying to figure out how the context can be shared between evictors and > other parts of the system. If we keep evictors I would like to keep the API > and implementation completely separate from anything else that's going on > in the system. > > On implementation, the context would probably created by the WindowOperator > or by the InternalWindowFunction. > > Cheers, > Aljoscha > > On Mon, 12 Sep 2016 at 08:27 AJ Heller wrote: > > > Could you point me towards the inspiration for Evictors? Are there any > > papers, perhaps, that lay the groundwork for mutable windows like this? > > > > After much research this weekend, I found that Evictors are unique to > > Flink. Conceptually, it looks to me like Dataflow windows are build-only. > > Looking into other Dataflow implementations: I didn't find anything in > > either the Apache Beam SDK docs or the Google Cloud Dataflow API docs > that > > mention allowing you to remove elements from a window. I'm hesitant to > > tread new ground in mutability. > > > > What do you think about reimplementing Evictors as a kind of cyclic > filter > > operation? Would it be possible? I believe this would fit into the > Dataflow > > model better, but I'm still in the early stages of becoming familiar with > > Flink, and I haven't read the ABS paper [1] yet to know if there are > > snapshot implications. I also don't (yet) see why you couldn't optimize > > such a cyclic operation with mutable operations under the hood. > > > > [1]: http://arxiv.org/abs/1506.08603 > > > > > > On Fri, Sep 9, 2016 at 11:46 AM, AJ Heller wrote: > > > >> Thank you for offering your support, I'm excited to dig in! > >> > >> I have some work to do getting up to speed on the windowing internals. > >> And I still need to get my bearing on the Evictor changes, I plan to > read > >> through the list archive and documents today. Vishnu, are your changes > >> already publicly viewable? > >> > >> Regarding the window modifications in FLIP-2, I see Vishnu that you've > >> suggested an interface for the EvictorContext object, and Aljoscha, you > >> suggested an abstract Context class. Does it make sense for them to > agree? > >> The other big difference I've seen in the signatures is wheter the > Window > >> is contained in the context or not. > >> > >> Have you considered modifying the signature of the methods to accept ` >> extends Context>` or ``? At least in terms of > >> FLIP-2, this would allow each process window function to define and work > >> with its own context (without downcasting, anyway), and similarly in the > >> future, there'd be less work in changing Context subclasses when new >
Re: Implementing FLIP-2 and FLIP-4
Hi AJ, the idea for evictors initially came from IBM Infosphere Streams, if I'm not mistaken: http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.dev.doc/doc/windowhandling.html The first version of the windowing system used a combination of triggers/evictors to do the windowing, this is describe in Jonas Traub's thesis: http://www.diva-portal.se/smash/get/diva2:861798/FULLTEXT01.pdf. I'm quite skeptical about having support for Evictors in the first place. They make computation inefficient because you always have to keep a list of all elements and cannot incrementally aggregate using a reduce function. Also, it is quite tricky to figure out how to do eviction based on ProcessingTime with a good interface. If you have some ideas how this could be improved I'm open to anything. For now, I would suggest to focus on FLIP-2, since quite a number of people would be interested in having that. I would also not put any energy in trying to figure out how the context can be shared between evictors and other parts of the system. If we keep evictors I would like to keep the API and implementation completely separate from anything else that's going on in the system. On implementation, the context would probably created by the WindowOperator or by the InternalWindowFunction. Cheers, Aljoscha On Mon, 12 Sep 2016 at 08:27 AJ Heller wrote: > Could you point me towards the inspiration for Evictors? Are there any > papers, perhaps, that lay the groundwork for mutable windows like this? > > After much research this weekend, I found that Evictors are unique to > Flink. Conceptually, it looks to me like Dataflow windows are build-only. > Looking into other Dataflow implementations: I didn't find anything in > either the Apache Beam SDK docs or the Google Cloud Dataflow API docs that > mention allowing you to remove elements from a window. I'm hesitant to > tread new ground in mutability. > > What do you think about reimplementing Evictors as a kind of cyclic filter > operation? Would it be possible? I believe this would fit into the Dataflow > model better, but I'm still in the early stages of becoming familiar with > Flink, and I haven't read the ABS paper [1] yet to know if there are > snapshot implications. I also don't (yet) see why you couldn't optimize > such a cyclic operation with mutable operations under the hood. > > [1]: http://arxiv.org/abs/1506.08603 > > > On Fri, Sep 9, 2016 at 11:46 AM, AJ Heller wrote: > >> Thank you for offering your support, I'm excited to dig in! >> >> I have some work to do getting up to speed on the windowing internals. >> And I still need to get my bearing on the Evictor changes, I plan to read >> through the list archive and documents today. Vishnu, are your changes >> already publicly viewable? >> >> Regarding the window modifications in FLIP-2, I see Vishnu that you've >> suggested an interface for the EvictorContext object, and Aljoscha, you >> suggested an abstract Context class. Does it make sense for them to agree? >> The other big difference I've seen in the signatures is wheter the Window >> is contained in the context or not. >> >> Have you considered modifying the signature of the methods to accept `> extends Context>` or ``? At least in terms of >> FLIP-2, this would allow each process window function to define and work >> with its own context (without downcasting, anyway), and similarly in the >> future, there'd be less work in changing Context subclasses when new >> abstract methods are added to Context. >> >> But I may be getting ahead of myself. Could you point me towards where >> contexts are/would be created? I'm not clear on the ownership and lifecycle >> of these objects yet. >> > >
Re: Implementing FLIP-2 and FLIP-4
Could you point me towards the inspiration for Evictors? Are there any papers, perhaps, that lay the groundwork for mutable windows like this? After much research this weekend, I found that Evictors are unique to Flink. Conceptually, it looks to me like Dataflow windows are build-only. Looking into other Dataflow implementations: I didn't find anything in either the Apache Beam SDK docs or the Google Cloud Dataflow API docs that mention allowing you to remove elements from a window. I'm hesitant to tread new ground in mutability. What do you think about reimplementing Evictors as a kind of cyclic filter operation? Would it be possible? I believe this would fit into the Dataflow model better, but I'm still in the early stages of becoming familiar with Flink, and I haven't read the ABS paper [1] yet to know if there are snapshot implications. I also don't (yet) see why you couldn't optimize such a cyclic operation with mutable operations under the hood. [1]: http://arxiv.org/abs/1506.08603 On Fri, Sep 9, 2016 at 11:46 AM, AJ Heller wrote: > Thank you for offering your support, I'm excited to dig in! > > I have some work to do getting up to speed on the windowing internals. And > I still need to get my bearing on the Evictor changes, I plan to read > through the list archive and documents today. Vishnu, are your changes > already publicly viewable? > > Regarding the window modifications in FLIP-2, I see Vishnu that you've > suggested an interface for the EvictorContext object, and Aljoscha, you > suggested an abstract Context class. Does it make sense for them to agree? > The other big difference I've seen in the signatures is wheter the Window > is contained in the context or not. > > Have you considered modifying the signature of the methods to accept ` extends Context>` or ``? At least in terms of > FLIP-2, this would allow each process window function to define and work > with its own context (without downcasting, anyway), and similarly in the > future, there'd be less work in changing Context subclasses when new > abstract methods are added to Context. > > But I may be getting ahead of myself. Could you point me towards where > contexts are/would be created? I'm not clear on the ownership and lifecycle > of these objects yet. >