Hi everyone,
We are in the process of designing a high available system with zero data loss tolerance. Plan is for the spouts to read from a queue and process them down in several different specialized bolts and then flush to DB. How can we guarantee no data loss here? Should we keep the queue transactions open until data is committed to DB? Should we persist the state of all the bolts? What happens to the intermediate data if the whole cluster fails? Any suggestions are much appreciated. Nima
