Can you say more about how you treat the different sorts of inputs? On Wed, Jan 24, 2018 at 11:26 AM, Tim Ross <[email protected]> wrote:
> Yes that is what the pipeline I came up with looks like. However the next > step in the pipeline is new/expired logic. I have tried a variety of things > but none have gotten me close to what I want. Hence my questioning to this > mailing list. > > > > > > Thanks, > > Tim > > *From: *Kenneth Knowles <[email protected]> > *Reply-To: *"[email protected]" <[email protected]> > *Date: *Wednesday, January 24, 2018 at 2:21 PM > > *To: *"[email protected]" <[email protected]> > *Subject: *Re: Windowing question > > > > A first approach would be to just not translate any of the new/expired > logic. Beam does have the concept of expiring a window, though it is true > that only particular transformations actually drop expired data. Have you > tried something along the lines of this? > > > > pipeline.begin() > > .apply(FooIO.read(...)) > > .apply(Window.into(SlidingWindows.of(Duration. > standardMinutes(5)).every(Duration.standardSeconds(30)))) > > .apply(ParDo.of(new DoFn<>() { ... get into proper format ... }) > > ... the rest of the logic ... > > > > Just a very vague sketch of what, actually, most pipelines look like. > > > > Kenn > > > > On Wed, Jan 24, 2018 at 11:07 AM, Tim Ross <[email protected]> > wrote: > > I am trying to convert an existing Apache Storm Bolt into an Apache Beam > pipeline. The storm bolt used sliding windows with a duration of 5 minute > and a period of 30 seconds. After doing some initial transforms to get data > in the proper format it would process all elements which were new or > expired. > > > > Since Beam doesn’t have the concept of new and expired data in a window > I’m trying to figure out how one would accomplish this. > > > > > > Thanks, > > Tim > > *From: *Kenneth Knowles <[email protected]> > *Reply-To: *"[email protected]" <[email protected]> > *Date: *Wednesday, January 24, 2018 at 1:45 PM > > > *To: *"[email protected]" <[email protected]> > *Subject: *Re: Windowing question > > > > Generally, Beam will discard expired data for you (including state). Can > you describe more? What is your windowing strategy? What are the edge > triggers? > > > > On Wed, Jan 24, 2018 at 10:37 AM, Tim Ross <[email protected]> > wrote: > > I am just trying to do certain processing on edge triggers, i.e. new or > expired data, to reduce the overall processing of a very large stream. > > > > How would I go about doing that with state? As I understand it, state is > tied to key and window. > > > > Thanks, > > Tim > > *From: *Kenneth Knowles <[email protected]> > *Reply-To: *"[email protected]" <[email protected]> > *Date: *Wednesday, January 24, 2018 at 1:25 PM > *To: *"[email protected]" <[email protected]> > *Subject: *Re: Windowing question > > > > A little clarification: in Beam an element exists in a single window, > mathematically speaking. So when you use SlidingWindows, for example, to > assign multiple windows this "copies" the value for each window, and that > is how you should think of it, from a calculation point of view. Under the > hood, a compressed representation is often used, but not in all situations. > > > > Kenn > > > > On Wed, Jan 24, 2018 at 9:45 AM, Robert Bradshaw <[email protected]> > wrote: > > No, Apache Beam doesn't offer this explicitly. You could accomplish it > using State, but perhaps if you clarified what you were trying to > accomplish by using these mechanisms there'd be another way to do the > same thing. > > > On Wed, Jan 24, 2018 at 7:03 AM, Tim Ross <[email protected]> > wrote: > > Is there anything comparable to Apache Storm’s Window.getNew and > > Window.getExpired in Apache Beam? How would I determine if an element is > > new or expired in consecutive windows? > > > > > > > > Thanks, > > > > Tim > > > > This e-mail message and any attachments to it are intended only for the > > named recipients and may contain legally privileged and/or confidential > > information. If you are not one of the intended recipients, do not > duplicate > > or forward this e-mail message. > > > > This e-mail message and any attachments to it are intended only for the > named recipients and may contain legally privileged and/or confidential > information. If you are not one of the intended recipients, do not > duplicate or forward this e-mail message. > > > > This e-mail message and any attachments to it are intended only for the > named recipients and may contain legally privileged and/or confidential > information. If you are not one of the intended recipients, do not > duplicate or forward this e-mail message. > > > > This e-mail message and any attachments to it are intended only for the > named recipients and may contain legally privileged and/or confidential > information. If you are not one of the intended recipients, do not > duplicate or forward this e-mail message. >
