Ok, I see.

Thanks a lot,
Zdenko


On 2018/11/09 08:23:12, Robert Bradshaw <[email protected]> wrote: 
> You should be able to do this with 24-hour windows and a 30-minute
> early processing time trigger. Something like
> 
> Window
>     .into(FixedWindows.of(24, TimeUnit.HOURS))
>     .triggering(AfterWatermark.pastEndOfWindow().withEarlyFirings(
>         AfterProcessingTime.pastFirstElementInPane(). alignedTo(30,
> TimeUnit.MINUTES)))
>     . accumulatingFiredPanes()
> 
> Alternatively, you could window into 30 minute fixed windows, write
> the snapshot, then also window into 24 hours to write the final
> snapshot.
> 
> Both of these will buffer all the updates for the day. If that's too
> many, you could consider using
> https://beam.apache.org/blog/2017/02/13/stateful-processing.html
> (either with timers, or windowing.
> On Fri, Nov 9, 2018 at 8:14 AM Zdenko Hrcek <[email protected]> wrote:
> >
> > Greetings,
> >
> > I'm trying to do following:
> >
> > I'm streaming messages from PubSub with encoded json which contain DB 
> > fields like "primary key" as well as "operation type", which can have 
> > values: insert, update, delete to reflect db operations.
> >
> > Goal is to produce one file per day (with 30 minute updates) which will 
> > contain final state of rows/messages.
> > 30 minute file update/flush should reflect date state at that time (taking 
> > into considerations operations).
> > Lets say in first 30 minutes there is insert operation for some row which 
> > will be in first flush of output file, but in next 30 minutes comes a 
> > message with delete operation then in second file flush it shouldn't be in 
> > output file etc.
> > For the next day, new file should be created.
> >
> > I'm just wondering if something like this can be done in Beam (Java).
> >
> > I had in mind doing fixed 24h window (which I guess is not best practice) 
> > which would trigger after 30 minutes and having group by key operation (by 
> > primary key) and then handle operations, but it looks like triggering can't 
> > be done that way, or maybe my whole approach/understanding is wrong...
> >
> > Anyway I would be grateful for any advice how this can be done 
> > (conceptually) or if there is some code sample of similar pipeline.
> >
> > Best regards,
> >
> > Zdenko
> > --
> > _______________________
> > http://www.the-swamp.info
> 

Reply via email to