I would push the incoming tuples into a stack, and have a background thread
take the stack, process the top element, and discard the rest.
On Jul 5, 2016 10:33 AM, "Marco Nicolini" <[email protected]> wrote:

> Hello guys,
>
> I'm evaluating using storm for a project at work.
>
> I've just scratched the surface of storm and I'd like to ask opinions on
> how you would tackle a typical issue that pops up in the domain I'm working
> in (financial real time market data).
>
> I have some real time data pushed into storm, (I have written a spout for
> that), and I need to group it on certain keys (field grouping seems to take
> care of that nicely) do some processing on it and write it to a queue
> somewhere (think JMS or other middle ware here...)
>
> Thing is... the processing takes a long time (in respect to the frequency
> of the input data) and I'm not really interested in building a queue of
> data to process... (as I would never be able to keep up with it).
> What really I'm after is a way to coalesce tuples at bolt level so that
> the processing bolt will work on the freshest possible data. In other words
> i'm looking for a way to skip tuples that came in while my processor bolt
> was working).
>
> Note that I'm not concerned about parallelizing the processing...as
> processing data for the same group in parallel would break the ordering
> requisite i have on processed outputs. At most I'm aiming at one "processor
> bolt per group" (we can assume the groups numerosity is known in advance).
>
> I think I can implement this behaviour in the spout: emit a new tuple (for
> a given grouping) only when the last emitted tuple (for a given grouping)
> has been fully processed (I.e. acknowledge is called on the spout). However
> this complicates a bit the spout code (as it would have to group things
> already before emitting them) and in general ends up putting too much
> complexity on one component... imho.
>
> If a bolt had the possibility to know when tuple emitted by itself becomes
> fully processed (similarly to what acknowledge does for the spout) the
> problem would be simpler and complexity-per component would be lower.
>
> Or the processor bolt could just process things asynchronously (i.e. not
> in the thread where execute() is called on the bolt) always picking up the
> "latest" data from some local state to the bolt. This state would be
> overwritten every time a new fresh tuple is received with execute.
>
> Or...
>
> Is there a 'configurable way' to achieve that?
>
> Any other way you would tackle this?
>
> Sorry for the "wall of text". Any idea/help/advice is greatly
> appreciated...
>
> Regards,
>
> Marco
>

Reply via email to