Hi Experts,
        Assuming there is a stream which content is like this:
        Seq     ID             MONEY
        1.        100           100
        2.        100           200
        3.        101           300

        The record of Seq#2 is updating record of Seq#1, changing the money 
from 100 to 200.
        If I register the stream as table T, and want to sum all the money 
group by each ID, if I write  "select sum(MONEY) from T”, will get 600 as the 
result, which is incorrect.

        I can write a UDAF, for example latest, to compute the latest value of 
all the ID, then the SQL is like this:
        select sum(MONEY) from
        (
                select ID, latest(MONEY) from T group by ID
        )
        But I have to save each ID and its latest value in state, I am worried 
that the state goes too large. Now I use this method and set the state 
retention to several days before the state goes too large. I wonder if there 
are better ways to do this.

        So what is the best practice in this scenario? Anyone have a 
suggestion? Thanks a lot.


Best
Henry
        

Reply via email to