How is the day boundary identified? Using wall clock time or using some
timestamp on a message (there could a temporary higher latency in the bolt
causing a dip in throughput causing the messages from a given day to
overflow to another day)?

You can use tick tuples
<http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/#excursus-tick-tuples-in-storm-08>
(if you are tolerant for a little bit of delay in some scenarios) or if you
have to be exact and the timestamp is in the message (or the day boundary
is identified some other way), then you can either do this in your
execute()  by checking the day attribute from the message and making sure
the data is available in your cache - dataMap, otherwise initializing it
from your service or you can extend the Spout to input a special message
right at the day boundary identified by your logic and have the bolt
refresh the cache when it sees the special message.

On Mon, Aug 3, 2015 at 8:44 PM, 韭菜 <[email protected]> wrote:

> bolt prepare init data, but where to change it,
>
> hi, I have a doubt about how to change init data(clean it once per day )
> in prepare method.
>
> My Bolt prepare code like this:
>
>   public void prepare(Map stormConf, TopologyContext context,
> OutputCollector collector) {
>
>         int maxCount = 10000;
>
>         dataMap = myDao.getData(maxCount);
>
>   }
>
>
> myDao.getData method query data from storage, holds current day click
> count.
>
> when storm  topology runs some day, dataMap holds data with this days.
> Where should I remove data expect today's data at dataMap.
>
> Is there someone encounter this and have some solution about it ? thanks.
>
>

Reply via email to