Hi Experts,
Assuming there is a stream which content is like this:
Seq ID MONEY
1. 100 100
2. 100 200
3. 101 300
The record of Seq#2 is updating record of Seq#1, changing the money
from 100 to 200.
If I register the stream as table T, and want to sum all the money
group by each ID, if I write "select sum(MONEY) from T”, will get 600 as the
result, which is incorrect.
I can write a UDAF, for example latest, to compute the latest value of
all the ID, then the SQL is like this:
select sum(MONEY) from
(
select ID, latest(MONEY) from T group by ID
)
But I have to save each ID and its latest value in state, I am worried
that the state goes too large. Now I use this method and set the state
retention to several days before the state goes too large. I wonder if there
are better ways to do this.
So what is the best practice in this scenario? Anyone have a
suggestion? Thanks a lot.
Best
Henry