Hi,
In our product, we use Trident to do real-time aggregations at 5 minute
intervals with persistantAggregate and a state implementation of the RDBMS
implementation does the multi-puts into the RDBMS.
The system has the RDBMS doing higher level rollup aggregations like 15
mins, 1 hour, 1 day etc on top of the lowest level *5 minutes) updated by
Trident.
When some records arrive late and still have to be aggregated, I can use
the same persistentAggregate methodology to update the corresponding
aggregate rows in the RDBMS, no issues.
But, if the higher level aggregations have already been completed for the
late-arriving data (for eg. data comes 2 hrs late which means 8 fifteen
minute aggregations and 2 hour aggregations have already been done by
RDBMS), how can we update that.
One idea Im thinking of is to use the same persistentAggregate methodology
to update the higher level aggregates as well in RDBMS as and when the late
arrival record is processed.
Done that way, the RDBMS takes care of heavy lifting across huge data for
roll up aggregations. Storm does lowest level aggregation in real-time and
also does update of roll-up aggregations in RDBMS for late arrival data.
This way, the late arrival handling process is extremely simplified.
Will this logic perform well? Any suggestions/improvements?
Thanks & Regards
MK

Reply via email to