Hi ,
  You can use Redis to store the keys  and value as count  by doing an update 
function  whenever you receive that minute key , being an in memory database it 
would faster than SQL .You can do an update at the end of each batch to update 
the count of the key if it exists or create in case of a new entry .

Sent from my iPhone

> On Oct 26, 2014, at 4:33 PM, Ji ZHANG <[email protected]> wrote:ah
> 
> Hi,
> 
> Suppose I have a stream of logs and I want to count them by minute.
> The result is like:
> 
> 2014-10-26 18:38:00 100
> 2014-10-26 18:39:00 150
> 2014-10-26 18:40:00 200
> 
> One way to do this is to set the batch interval to 1 min, but each
> batch would be quite large.
> 
> Or I can use updateStateByKey where key is like '2014-10-26 18:38:00',
> but I have two questions:
> 
> 1. How to persist the result to MySQL? Do I need to flush them every batch?
> 2. How to delete the old state? For example, now is 18:50 but the
> 18:40's state is still in Spark. One solution is to set the key's
> state to None when there's no data of this key in this batch. But what
> if the log is not so much, and some batches get zero logs? For
> instance
> 
> 18:40:00~18:40:10 has 10 logs -> key 18:40's value is set to 10
> 18:40:10~18:40:20 has no log -> key 18:40 is deleted
> 18:40:20~18:40:30 has 5 logs -> key 18:40's value is set to 5
> 
> You can see the result is wrong. Maybe I can use an 'update' approach
> when flushing, i.e. check MySQL whether there's already an entry of
> 18:40 and add the result to that. But how about a unique count? I
> can't store all unique values in MySQL per se.
> 
> So I'm looking for a better way to store count-by-minute result into
> rdbms (or nosql?). Any idea would be appreciated.
> 
> Thanks.
> 
> -- 
> Jerry
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to