Ok, I see. Thanks a lot, Zdenko
On 2018/11/09 08:23:12, Robert Bradshaw <[email protected]> wrote: > You should be able to do this with 24-hour windows and a 30-minute > early processing time trigger. Something like > > Window > .into(FixedWindows.of(24, TimeUnit.HOURS)) > .triggering(AfterWatermark.pastEndOfWindow().withEarlyFirings( > AfterProcessingTime.pastFirstElementInPane(). alignedTo(30, > TimeUnit.MINUTES))) > . accumulatingFiredPanes() > > Alternatively, you could window into 30 minute fixed windows, write > the snapshot, then also window into 24 hours to write the final > snapshot. > > Both of these will buffer all the updates for the day. If that's too > many, you could consider using > https://beam.apache.org/blog/2017/02/13/stateful-processing.html > (either with timers, or windowing. > On Fri, Nov 9, 2018 at 8:14 AM Zdenko Hrcek <[email protected]> wrote: > > > > Greetings, > > > > I'm trying to do following: > > > > I'm streaming messages from PubSub with encoded json which contain DB > > fields like "primary key" as well as "operation type", which can have > > values: insert, update, delete to reflect db operations. > > > > Goal is to produce one file per day (with 30 minute updates) which will > > contain final state of rows/messages. > > 30 minute file update/flush should reflect date state at that time (taking > > into considerations operations). > > Lets say in first 30 minutes there is insert operation for some row which > > will be in first flush of output file, but in next 30 minutes comes a > > message with delete operation then in second file flush it shouldn't be in > > output file etc. > > For the next day, new file should be created. > > > > I'm just wondering if something like this can be done in Beam (Java). > > > > I had in mind doing fixed 24h window (which I guess is not best practice) > > which would trigger after 30 minutes and having group by key operation (by > > primary key) and then handle operations, but it looks like triggering can't > > be done that way, or maybe my whole approach/understanding is wrong... > > > > Anyway I would be grateful for any advice how this can be done > > (conceptually) or if there is some code sample of similar pipeline. > > > > Best regards, > > > > Zdenko > > -- > > _______________________ > > http://www.the-swamp.info >
