RE: Cluster Failure and Data Recovery

Nima Movafaghrad Thu, 05 Jun 2014 11:09:34 -0700

Thanks Srinath. We are already using the reliable message processing for bolts 
failure etc. My problem is with a catastrophic cases. For example,  what 
happens if the entire cluster goes down or what if the Topology fully fails. At 
the moment we are reading from MQ and although keeping the transactions open 
would resolve our data loss prevention issue it isn’t quiet feasible. Some of 
our bolts listen and batch for up to 30 seconds so they have big enough batches 
that can be committed to RDBMS. Keeping the transactions open for that long 
slows things down considerably.


 

So I guess to  frame to question better I should ask, if there a way to persist 
the intermediate data?

 

Thanks,

Nima

 

From: Srinath C [mailto:[email protected]] 
Sent: Wednesday, June 04, 2014 5:49 PM
To: user
Subject: Re: Cluster Failure and Data Recovery

 

Hi Nima,

    Use the HYPERLINK 
"https://github.com/nathanmarz/storm/wiki/Guaranteeing-message-processing"reliable
 message processing mechanism to ensure that there is no data loss. You would 
need support for transactional semantics from the tuple source where spout can 
commit/abort a read (kestrel, kafka, rabbitmq, etc can do this). Yes you would 
need to keep the queue transactions open until the spout receives an "ack" or 
"fail" for every tuple.

    IMO, this ensures that each tuple is processed "atleast once" and not 
"exactly once" so you need to be prepared to end up with duplicate entries in 
your DB or have a way to figure out that a write to DB is duplicate or earlier 
write. This is case where there are crashes with intermediate data in memory.

 

Regards,

Srinath.

 

 

On Thu, Jun 5, 2014 at 5:37 AM, Nima Movafaghrad <HYPERLINK 
"mailto:[email protected]"[email protected]> wrote:

Hi everyone,

 

We are in the process of designing a high available system with zero data loss 
tolerance. Plan is for the spouts to read from a queue and process them down in 
several different specialized bolts and then flush to DB. How can we guarantee 
no data loss here? Should we keep the queue transactions open until data is 
committed to DB? Should we persist the state of all the bolts? What happens to 
the intermediate data if the whole cluster fails?

 

Any suggestions are much appreciated.

 

Nima

RE: Cluster Failure and Data Recovery

Reply via email to