Yes that is what the pipeline I came up with looks like. However the next step 
in the pipeline is new/expired logic. I have tried a variety of things but none 
have gotten me close to what I want. Hence my questioning to this mailing list.


Thanks,
Tim
From: Kenneth Knowles <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Wednesday, January 24, 2018 at 2:21 PM
To: "[email protected]" <[email protected]>
Subject: Re: Windowing question

A first approach would be to just not translate any of the new/expired logic. 
Beam does have the concept of expiring a window, though it is true that only 
particular transformations actually drop expired data. Have you tried something 
along the lines of this?

    pipeline.begin()
        .apply(FooIO.read(...))
        
.apply(Window.into(SlidingWindows.of(Duration.standardMinutes(5)).every(Duration.standardSeconds(30))))
        .apply(ParDo.of(new DoFn<>() { ... get into proper format ... })
        ... the rest of the logic ...

Just a very vague sketch of what, actually, most pipelines look like.

Kenn

On Wed, Jan 24, 2018 at 11:07 AM, Tim Ross 
<[email protected]<mailto:[email protected]>> wrote:
I am trying to convert an existing Apache Storm Bolt into an Apache Beam 
pipeline. The storm bolt used sliding windows with a duration of 5 minute and a 
period of 30 seconds. After doing some initial transforms to get data in the 
proper format it would process all elements which were new or expired.

Since Beam doesn’t have the concept of new and expired data in a window I’m 
trying to figure out how one would accomplish this.


Thanks,
Tim
From: Kenneth Knowles <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Wednesday, January 24, 2018 at 1:45 PM

To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Windowing question

Generally, Beam will discard expired data for you (including state). Can you 
describe more? What is your windowing strategy? What are the edge triggers?

On Wed, Jan 24, 2018 at 10:37 AM, Tim Ross 
<[email protected]<mailto:[email protected]>> wrote:
I am just trying to do certain processing on edge triggers, i.e. new or expired 
data, to reduce the overall processing of a very large stream.

How would I go about doing that with state? As I understand it, state is tied 
to key and window.

Thanks,
Tim
From: Kenneth Knowles <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Wednesday, January 24, 2018 at 1:25 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Windowing question

A little clarification: in Beam an element exists in a single window, 
mathematically speaking. So when you use SlidingWindows, for example, to assign 
multiple windows this "copies" the value for each window, and that is how you 
should think of it, from a calculation point of view. Under the hood, a 
compressed representation is often used, but not in all situations.

Kenn

On Wed, Jan 24, 2018 at 9:45 AM, Robert Bradshaw 
<[email protected]<mailto:[email protected]>> wrote:
No, Apache Beam doesn't offer this explicitly. You could accomplish it
using State, but perhaps if you clarified what you were trying to
accomplish by using these mechanisms there'd be another way to do the
same thing.

On Wed, Jan 24, 2018 at 7:03 AM, Tim Ross 
<[email protected]<mailto:[email protected]>> wrote:
> Is there anything comparable to Apache Storm’s Window.getNew and
> Window.getExpired in Apache Beam?  How would I determine if an element is
> new or expired in consecutive windows?
>
>
>
> Thanks,
>
> Tim
>
> This e-mail message and any attachments to it are intended only for the
> named recipients and may contain legally privileged and/or confidential
> information. If you are not one of the intended recipients, do not duplicate
> or forward this e-mail message.


This e-mail message and any attachments to it are intended only for the named 
recipients and may contain legally privileged and/or confidential information. 
If you are not one of the intended recipients, do not duplicate or forward this 
e-mail message.


This e-mail message and any attachments to it are intended only for the named 
recipients and may contain legally privileged and/or confidential information. 
If you are not one of the intended recipients, do not duplicate or forward this 
e-mail message.


This e-mail message and any attachments to it are intended only for the named 
recipients and may contain legally privileged and/or confidential information. 
If you are not one of the intended recipients, do not duplicate or forward this 
e-mail message.

Reply via email to