Re: Implementing FLIP-2 and FLIP-4

2016-09-17 Thread Aljoscha Krettek
Hi AJ,
sorry for not getting back to you earlier, I was too busy and only read
your mail now.

Adding the context is not a simple optimization of the case you described.
In your case, you will get 30-second windows where the elements are
assigned to those windows based on their timestamp. If you have one big
daily window and do 30-second speculative (early) firing of that window
based on processing time you will at each firing possibly have elements
over that complete day in the window, i.e. you progressively output a more
refined result for that 1-day window.

Does that make sense to you?

Cheers,
Aljoscha

On Wed, 14 Sep 2016 at 18:21 AJ Heller  wrote:

> Thank you, Aljoscha! I look forward to reading the papers you mentioned.
>
> Regarding FLIP-2, are there any new use cases that a Window Function
> Context enables? If not, my understanding is that adding a this context
> would be an optimization over what is currently possible, but maybe
> inefficient. For example of how I think this would work, instead of a
> "firing reason" context to let you differentiate between (e.g.)
> every-30-second early firings and a daily primary firing, I imagine you
> could split the stream, where one exclusively emits 30 second aggregates
> and the other exclusively emits daily, and deal with them separately.
>
> If that is the case, that it amounts to an optimization: have you
> considered wheter the added complexity is worth the potential efficiency
> gain? Otherwise, if it amounts to more than a small optimization, I'd be
> very interested to understand what this change would enable, I currently
> don't see it. I am under time pressure to choose a viable project (the idea
> was to be solidified yesterday, actually), and I would very much like to
> work on this now if I can justify it. If not, I would still very much like
> to work on this, but the timing will have to be different.
>
> Again, thank you Aljoscha, and I apologize for the rushed nature of my
> situation.
>
> Best,
> -aj heller
>
> On Wed, Sep 14, 2016 at 1:19 AM, Aljoscha Krettek 
> wrote:
>
> > Hi AJ,
> > the idea for evictors initially came from IBM Infosphere Streams, if I'm
> > not mistaken:
> > http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/
> > com.ibm.streams.dev.doc/doc/windowhandling.html
> > The
> > first version of the windowing system used a combination of
> > triggers/evictors to do the windowing, this is describe in Jonas Traub's
> > thesis: http://www.diva-portal.se/smash/get/diva2:861798/FULLTEXT01.pdf.
> >
> > I'm quite skeptical about having support for Evictors in the first place.
> > They make computation inefficient because you always have to keep a list
> of
> > all elements and cannot incrementally aggregate using a reduce function.
> > Also, it is quite tricky to figure out how to do eviction based on
> > ProcessingTime with a good interface. If you have some ideas how this
> could
> > be improved I'm open to anything.
> >
> > For now, I would suggest to focus on FLIP-2, since quite a number of
> people
> > would be interested in having that. I would also not put any energy in
> > trying to figure out how the context can be shared between evictors and
> > other parts of the system. If we keep evictors I would like to keep the
> API
> > and implementation completely separate from anything else that's going on
> > in the system.
> >
> > On implementation, the context would probably created by the
> WindowOperator
> > or by the InternalWindowFunction.
> >
> > Cheers,
> > Aljoscha
> >
> > On Mon, 12 Sep 2016 at 08:27 AJ Heller  wrote:
> >
> > > Could you point me towards the inspiration for Evictors? Are there any
> > > papers, perhaps, that lay the groundwork for mutable windows like this?
> > >
> > > After much research this weekend, I found that Evictors are unique to
> > > Flink. Conceptually, it looks to me like Dataflow windows are
> build-only.
> > > Looking into other Dataflow implementations: I didn't find anything in
> > > either the Apache Beam SDK docs or the Google Cloud Dataflow API docs
> > that
> > > mention allowing you to remove elements from a window. I'm hesitant to
> > > tread new ground in mutability.
> > >
> > > What do you think about reimplementing Evictors as a kind of cyclic
> > filter
> > > operation? Would it be possible? I believe this would fit into the
> > Dataflow
> > > model better, but I'm still in the early stages of becoming familiar
> with
> > > Flink, and I haven't read the ABS paper [1] yet to know if there are
> > > snapshot implications. I also don't (yet) see why you couldn't optimize
> > > such a cyclic operation with mutable operations under the hood.
> > >
> > > [1]: http://arxiv.org/abs/1506.08603
> > >
> > >
> > > On Fri, Sep 9, 2016 at 11:46 AM, AJ Heller  wrote:
> > >
> > >> Thank you for offering your support, I'm excited to dig in!
> > >>
> > >> I have some work to do getting up to speed on the windowing internals.
> > >> And I still need to get my bearing on the Evictor chang

Re: Implementing FLIP-2 and FLIP-4

2016-09-14 Thread AJ Heller
Thank you, Aljoscha! I look forward to reading the papers you mentioned.

Regarding FLIP-2, are there any new use cases that a Window Function
Context enables? If not, my understanding is that adding a this context
would be an optimization over what is currently possible, but maybe
inefficient. For example of how I think this would work, instead of a
"firing reason" context to let you differentiate between (e.g.)
every-30-second early firings and a daily primary firing, I imagine you
could split the stream, where one exclusively emits 30 second aggregates
and the other exclusively emits daily, and deal with them separately.

If that is the case, that it amounts to an optimization: have you
considered wheter the added complexity is worth the potential efficiency
gain? Otherwise, if it amounts to more than a small optimization, I'd be
very interested to understand what this change would enable, I currently
don't see it. I am under time pressure to choose a viable project (the idea
was to be solidified yesterday, actually), and I would very much like to
work on this now if I can justify it. If not, I would still very much like
to work on this, but the timing will have to be different.

Again, thank you Aljoscha, and I apologize for the rushed nature of my
situation.

Best,
-aj heller

On Wed, Sep 14, 2016 at 1:19 AM, Aljoscha Krettek 
wrote:

> Hi AJ,
> the idea for evictors initially came from IBM Infosphere Streams, if I'm
> not mistaken:
> http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/
> com.ibm.streams.dev.doc/doc/windowhandling.html
> The
> first version of the windowing system used a combination of
> triggers/evictors to do the windowing, this is describe in Jonas Traub's
> thesis: http://www.diva-portal.se/smash/get/diva2:861798/FULLTEXT01.pdf.
>
> I'm quite skeptical about having support for Evictors in the first place.
> They make computation inefficient because you always have to keep a list of
> all elements and cannot incrementally aggregate using a reduce function.
> Also, it is quite tricky to figure out how to do eviction based on
> ProcessingTime with a good interface. If you have some ideas how this could
> be improved I'm open to anything.
>
> For now, I would suggest to focus on FLIP-2, since quite a number of people
> would be interested in having that. I would also not put any energy in
> trying to figure out how the context can be shared between evictors and
> other parts of the system. If we keep evictors I would like to keep the API
> and implementation completely separate from anything else that's going on
> in the system.
>
> On implementation, the context would probably created by the WindowOperator
> or by the InternalWindowFunction.
>
> Cheers,
> Aljoscha
>
> On Mon, 12 Sep 2016 at 08:27 AJ Heller  wrote:
>
> > Could you point me towards the inspiration for Evictors? Are there any
> > papers, perhaps, that lay the groundwork for mutable windows like this?
> >
> > After much research this weekend, I found that Evictors are unique to
> > Flink. Conceptually, it looks to me like Dataflow windows are build-only.
> > Looking into other Dataflow implementations: I didn't find anything in
> > either the Apache Beam SDK docs or the Google Cloud Dataflow API docs
> that
> > mention allowing you to remove elements from a window. I'm hesitant to
> > tread new ground in mutability.
> >
> > What do you think about reimplementing Evictors as a kind of cyclic
> filter
> > operation? Would it be possible? I believe this would fit into the
> Dataflow
> > model better, but I'm still in the early stages of becoming familiar with
> > Flink, and I haven't read the ABS paper [1] yet to know if there are
> > snapshot implications. I also don't (yet) see why you couldn't optimize
> > such a cyclic operation with mutable operations under the hood.
> >
> > [1]: http://arxiv.org/abs/1506.08603
> >
> >
> > On Fri, Sep 9, 2016 at 11:46 AM, AJ Heller  wrote:
> >
> >> Thank you for offering your support, I'm excited to dig in!
> >>
> >> I have some work to do getting up to speed on the windowing internals.
> >> And I still need to get my bearing on the Evictor changes, I plan to
> read
> >> through the list archive and documents today. Vishnu, are your changes
> >> already publicly viewable?
> >>
> >> Regarding the window modifications in FLIP-2, I see Vishnu that you've
> >> suggested an interface for the EvictorContext object, and Aljoscha, you
> >> suggested an abstract Context class. Does it make sense for them to
> agree?
> >> The other big difference I've seen in the signatures is wheter the
> Window
> >> is contained in the context or not.
> >>
> >> Have you considered modifying the signature of the methods to accept ` >> extends Context>` or ``? At least in terms of
> >> FLIP-2, this would allow each process window function to define and work
> >> with its own context (without downcasting, anyway), and similarly in the
> >> future, there'd be less work in changing Context subclasses when new
>

Re: Implementing FLIP-2 and FLIP-4

2016-09-14 Thread Aljoscha Krettek
Hi AJ,
the idea for evictors initially came from IBM Infosphere Streams, if I'm
not mistaken:
http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.dev.doc/doc/windowhandling.html
The
first version of the windowing system used a combination of
triggers/evictors to do the windowing, this is describe in Jonas Traub's
thesis: http://www.diva-portal.se/smash/get/diva2:861798/FULLTEXT01.pdf.

I'm quite skeptical about having support for Evictors in the first place.
They make computation inefficient because you always have to keep a list of
all elements and cannot incrementally aggregate using a reduce function.
Also, it is quite tricky to figure out how to do eviction based on
ProcessingTime with a good interface. If you have some ideas how this could
be improved I'm open to anything.

For now, I would suggest to focus on FLIP-2, since quite a number of people
would be interested in having that. I would also not put any energy in
trying to figure out how the context can be shared between evictors and
other parts of the system. If we keep evictors I would like to keep the API
and implementation completely separate from anything else that's going on
in the system.

On implementation, the context would probably created by the WindowOperator
or by the InternalWindowFunction.

Cheers,
Aljoscha

On Mon, 12 Sep 2016 at 08:27 AJ Heller  wrote:

> Could you point me towards the inspiration for Evictors? Are there any
> papers, perhaps, that lay the groundwork for mutable windows like this?
>
> After much research this weekend, I found that Evictors are unique to
> Flink. Conceptually, it looks to me like Dataflow windows are build-only.
> Looking into other Dataflow implementations: I didn't find anything in
> either the Apache Beam SDK docs or the Google Cloud Dataflow API docs that
> mention allowing you to remove elements from a window. I'm hesitant to
> tread new ground in mutability.
>
> What do you think about reimplementing Evictors as a kind of cyclic filter
> operation? Would it be possible? I believe this would fit into the Dataflow
> model better, but I'm still in the early stages of becoming familiar with
> Flink, and I haven't read the ABS paper [1] yet to know if there are
> snapshot implications. I also don't (yet) see why you couldn't optimize
> such a cyclic operation with mutable operations under the hood.
>
> [1]: http://arxiv.org/abs/1506.08603
>
>
> On Fri, Sep 9, 2016 at 11:46 AM, AJ Heller  wrote:
>
>> Thank you for offering your support, I'm excited to dig in!
>>
>> I have some work to do getting up to speed on the windowing internals.
>> And I still need to get my bearing on the Evictor changes, I plan to read
>> through the list archive and documents today. Vishnu, are your changes
>> already publicly viewable?
>>
>> Regarding the window modifications in FLIP-2, I see Vishnu that you've
>> suggested an interface for the EvictorContext object, and Aljoscha, you
>> suggested an abstract Context class. Does it make sense for them to agree?
>> The other big difference I've seen in the signatures is wheter the Window
>> is contained in the context or not.
>>
>> Have you considered modifying the signature of the methods to accept `> extends Context>` or ``? At least in terms of
>> FLIP-2, this would allow each process window function to define and work
>> with its own context (without downcasting, anyway), and similarly in the
>> future, there'd be less work in changing Context subclasses when new
>> abstract methods are added to Context.
>>
>> But I may be getting ahead of myself. Could you point me towards where
>> contexts are/would be created? I'm not clear on the ownership and lifecycle
>> of these objects yet.
>>
>
>


Re: Implementing FLIP-2 and FLIP-4

2016-09-11 Thread AJ Heller
Could you point me towards the inspiration for Evictors? Are there any
papers, perhaps, that lay the groundwork for mutable windows like this?

After much research this weekend, I found that Evictors are unique to
Flink. Conceptually, it looks to me like Dataflow windows are build-only.
Looking into other Dataflow implementations: I didn't find anything in
either the Apache Beam SDK docs or the Google Cloud Dataflow API docs that
mention allowing you to remove elements from a window. I'm hesitant to
tread new ground in mutability.

What do you think about reimplementing Evictors as a kind of cyclic filter
operation? Would it be possible? I believe this would fit into the Dataflow
model better, but I'm still in the early stages of becoming familiar with
Flink, and I haven't read the ABS paper [1] yet to know if there are
snapshot implications. I also don't (yet) see why you couldn't optimize
such a cyclic operation with mutable operations under the hood.

[1]: http://arxiv.org/abs/1506.08603

On Fri, Sep 9, 2016 at 11:46 AM, AJ Heller  wrote:

> Thank you for offering your support, I'm excited to dig in!
>
> I have some work to do getting up to speed on the windowing internals. And
> I still need to get my bearing on the Evictor changes, I plan to read
> through the list archive and documents today. Vishnu, are your changes
> already publicly viewable?
>
> Regarding the window modifications in FLIP-2, I see Vishnu that you've
> suggested an interface for the EvictorContext object, and Aljoscha, you
> suggested an abstract Context class. Does it make sense for them to agree?
> The other big difference I've seen in the signatures is wheter the Window
> is contained in the context or not.
>
> Have you considered modifying the signature of the methods to accept ` extends Context>` or ``? At least in terms of
> FLIP-2, this would allow each process window function to define and work
> with its own context (without downcasting, anyway), and similarly in the
> future, there'd be less work in changing Context subclasses when new
> abstract methods are added to Context.
>
> But I may be getting ahead of myself. Could you point me towards where
> contexts are/would be created? I'm not clear on the ownership and lifecycle
> of these objects yet.
>