Thank you, Satish and Simon. We're thinking of a design analogous to the ones you propose (keeping state in the spout). One question: how would you deal with the case of a spout crash? You would lose the in memory state, right?
Regards, Javier On Fri, Jun 26, 2015 at 12:48 PM, Simon Cooper < [email protected]> wrote: > We’ve implemented this internally. > > > > Within the spout, we keep track of all the ‘in-flight’ tuples in a Set. > Then, when we need to do a flush (determined by several internal and > external conditions), we stop sending new tuples through and wait for all > the pending tuples to be acked (or failed). Then we send a flush tuple > through (it’s emitted on a separate ‘control’ stream connected to all the > running bolts, independent of main tuple processing), wait for that to be > acked, and then start sending normal tuples through again. > > > > This is implemented using a reasonably simple state machine inside the > spout to ensure nextTuple() always does the correct thing whatever’s going > on in the topology. It also keeps track of whether the topology is ‘idle’, > and doesn’t bother flushing when there’s nothing to flush. > > > > Hope this helps, > > SimonC > > > > *From:* Satish Mittal [mailto:[email protected]] > *Sent:* 26 June 2015 16:17 > *To:* [email protected] > *Subject:* Re: Flushing a topology? > > > > Assuming each flush period is associated with a start and end event, one > approach is to instantiate a HashSet in the spout at flush start event. > Each emitted tuple is added to the set, and upon receiving ack, it is > removed from the set. Upon receiving flush end event, wait for all pending > messages to be acked. > > > > Assuming that processing latency is small, the pending set size should be > relatively bounded in steady state at any point of time. > > > > When the HashSet size becomes 0, returns success. Of course, if flush > periods are back-to-back, then start of the next flush period is same as > the end of previous period. > > > > Thanks, > > Satish > > > > On Wed, Jun 24, 2015 at 4:20 PM, Javier Gonzalez <[email protected]> > wrote: > > Hi Satish, > > Thank you for your response. > > The ack at the spout guarantees that the flush message has traversed the > topology, but makes no guarantees about the tuples emitted before the flush > tuple. It would be possible for tuples to be in flight and be "beaten to > the ack" by the flush tuple, particularly if the flush tuple does not > participate in all the processing done in the bolts to the regular tuples. > > Regards, > Javier > > On Jun 24, 2015 2:55 AM, "Satish Mittal" <[email protected]> wrote: > > From the way Storm's acking mechanism works, if a spout tuple gets > acked, then it implies that tuple has completely traversed the topology. > Isn't that sufficient? > > > > On Tue, Jun 23, 2015 at 10:50 PM, Javier Gonzalez <[email protected]> > wrote: > > Hi, > > Question: how would you implement a "flush" in a topology: sending a > special message to the topology that will in time return with a message > that says everything up to the flush message has finished traversing the > topology? (does that make sense?) > > Regards, > Javier > > > > > > _____________________________________________________________ > > The information contained in this communication is intended solely for the > use of the individual or entity to whom it is addressed and others > authorized to receive it. It may contain confidential or legally privileged > information. If you are not the intended recipient you are hereby notified > that any disclosure, copying, distribution or taking any action in reliance > on the contents of this information is strictly prohibited and may be > unlawful. If you have received this communication in error, please notify > us immediately by responding to this email and then delete it from your > system. The firm is neither liable for the proper and complete transmission > of the information contained in this communication nor for any delay in its > receipt. > > > > > > _____________________________________________________________ > > The information contained in this communication is intended solely for the > use of the individual or entity to whom it is addressed and others > authorized to receive it. It may contain confidential or legally privileged > information. If you are not the intended recipient you are hereby notified > that any disclosure, copying, distribution or taking any action in reliance > on the contents of this information is strictly prohibited and may be > unlawful. If you have received this communication in error, please notify > us immediately by responding to this email and then delete it from your > system. The firm is neither liable for the proper and complete transmission > of the information contained in this communication nor for any delay in its > receipt. > This message, and any files/attachments transmitted together with it, is > intended for the use only of the person (or persons) to whom it is > addressed. It may contain information which is confidential and/or > protected by legal privilege. Accordingly, any dissemination, distribution, > copying or use of this message, or any part of it or anything sent together > with it, other than by intended recipients, may constitute a breach of > civil or criminal law and is hereby prohibited. Unless otherwise stated, > any views expressed in this message are those of the person sending it and > not the sender's employer. No responsibility, legal or otherwise, of > whatever nature, is accepted as to the accuracy of the contents of this > message or for the completeness of the message as received. Anyone who is > not the intended recipient of this message is advised to make no use of it > and is requested to contact Featurespace Limited as soon as possible. Any > recipient of this message who has knowledge or suspects that it may have > been the subject of unauthorised interception or alteration is also > requested to contact Featurespace Limited. > -- Javier González Nicolini
