Hi, Assuming FLINK-6465 lands, will something like
SELECT COUNT(*) FROM (SELECT FIRST_VALUE(names) FROM stream) GROUP BY HOP(proctime, INTERVAL '1' MINUTE, INTERVAL '1' MINUTE) works? ~Haohui On Fri, Sep 29, 2017 at 6:52 PM Ron Crocker <rcroc...@newrelic.com> wrote: > Hi - > > I have a colleague who is trying to write a flink job that will determine > deltas from period to period. Let’s say the periods are 1 minutes. What he > would like to do is report in minute 2 those things that are new since from > minute 1, then in minute 3 report those things that are new also since > minute 1. > > For example, consider the stream looks like > > minute | name > =======|======= > 1 | abc > 1 | def > 2 | abc > 2 | ghi > 3 | abc > 3 | def > 4 | ghi > 4 | jkl > > > What we would like to report is: > > minute | count | names > =======|=======|======= > 1 | 2 | abc, def > 2 | 1 | ghi > 3 | 0 | > 4 | 1 | jkl > > > In minute 2, abc was already seen but ghi is new, so it gets reported out > as new. In minute 3, abc and def havalready been seen, so there are no new > names, and again in minute 4 ghi has been seen but jkl is new, so we report > out the 1 new name. > > I’m struggling to help and thought someone here might be able to help. I > have thought about merging two streams (the stream of new things and the > stream of the full set seen so far) but haven’t tried that yet. > > I welcome any of your inputs. > > Thanks! > > Ron > — > Ron Crocker > Principal Engineer & Architect > ( ( •)) New Relic > rcroc...@newrelic.com > M: +1 630 363 8835 <(630)%20363-8835> >