How is the day boundary identified? Using wall clock time or using some timestamp on a message (there could a temporary higher latency in the bolt causing a dip in throughput causing the messages from a given day to overflow to another day)?
You can use tick tuples <http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/#excursus-tick-tuples-in-storm-08> (if you are tolerant for a little bit of delay in some scenarios) or if you have to be exact and the timestamp is in the message (or the day boundary is identified some other way), then you can either do this in your execute() by checking the day attribute from the message and making sure the data is available in your cache - dataMap, otherwise initializing it from your service or you can extend the Spout to input a special message right at the day boundary identified by your logic and have the bolt refresh the cache when it sees the special message. On Mon, Aug 3, 2015 at 8:44 PM, 韭菜 <[email protected]> wrote: > bolt prepare init data, but where to change it, > > hi, I have a doubt about how to change init data(clean it once per day ) > in prepare method. > > My Bolt prepare code like this: > > public void prepare(Map stormConf, TopologyContext context, > OutputCollector collector) { > > int maxCount = 10000; > > dataMap = myDao.getData(maxCount); > > } > > > myDao.getData method query data from storage, holds current day click > count. > > when storm topology runs some day, dataMap holds data with this days. > Where should I remove data expect today's data at dataMap. > > Is there someone encounter this and have some solution about it ? thanks. > >
