Ah cool Jungtaek, from the looks of it this might help alleviate the problem but I think it won't eliminate it... Keep up the good work :).
Warm regards. A . On Wed, Feb 1, 2017 at 12:48 AM, Jungtaek Lim <[email protected]> wrote: > Thomas, > > I ported back STORM-1742 > <https://issues.apache.org/jira/browse/STORM-1742> to 1.0.x branch since > I got many questions regarding complete latency from user@, dev@, and > Stack Overflow. > > Release vote for Storm 1.0.3 RC1 is starting today (link > <http://mail-archives.apache.org/mod_mbox/storm-dev/201701.mbox/%3C1BD83321-658D-42E7-A88E-7A75D8105240%40gmail.com%3E>), > so you can wait for 1.0.3 or even use 1.0.3 RC1 before releasing officially. > I'd encourage you to help testing 1.0.3 RC1 if you have time. > > Andrew, > STORM-1742 <https://issues.apache.org/jira/browse/STORM-1742> changes the > way to calculate complete latency: > - start time: Acker receiving ACK_INIT from Spout > - end time: Acker receiving ACK message which make the tuple tree completed > > yes I was also aware of clock difference on nodes so I picked this way > though it doesn't account the latency from Spout to Acker. If you're > interested, please refer the issue and relevant discussion link in > description. Feedbacks are always appreciated! > > Thanks, > Jungtaek Lim (HeartSaVioR) > > 2017년 2월 1일 (수) 오전 8:12, Andrew Xor <[email protected]>님이 작성: > > Hello, > > Unfortunately this is a bit more complicated than it might initially > seem; first of all where acks are processed, if at all, depends on the > level of the flow consistency you have implemented for your Spout; which > applies to Trident. In regular Storm, it might be perfectly OK to discard > missed messages, thereby ignoring the "acks" for each of the tuples. In the > other extreme Trident might even stall all of the incoming batches if a > tuple fails to be ack'ed and will have to be replayed, ack'ed and then > resume the computation -- this is done to enforce the in-order exactly once > processing. Additional factor might also impact your latency as network > speed used, packet size and so much more... I think Nick from Upenn did a > similar study and had a more thorough investigation regarding latency > tracking so if he's still subscribed in this list maybe can chip in to give > you more details. > > From my experience the timing is vastly impacted by each node "clock", > thus it cannot really be "trusted" because this can cause big deviations in > the latency measurements -- and to be fair to Storm devs do say its an > approximation! > > Hope this helped... > > On Tue, Jan 31, 2017 at 3:55 PM, Thomas Cooper (PGR) < > [email protected]> wrote: > > Hi, > > > First a bit of background: > > > I am a PhD student working on modelling the performance of Storm > topologies. I am having reasonable success in modelling the complete > latency, however depending on the load (throughput) I can be up to 50% out. > > > After exhausting all other sources of possible latency (mostly remote > transfer delays between workers on separate machines) it seems the final > path of ack tuples from the end component to the Acker and through to the > Spout is the last source of unknown latency. Under heavy load and a > relatively low number of spout tasks, each task will be busy calling > next_tuple and acking, so ack complete messages may back up at the spout. > This will artificially extend the complete latency. As the spout does not > report metrics for the delay/processing of acks, I cannot account for this > effect in my models. > > > I thought I might have to resort to implementing my own spout (a custom > Storm fork is something I would prefer to avoid). However, after seeing > issue 1742 (https://issues.apache.org/jira/browse/STORM-1742) it seems > Jungtaek > Lim and the Storm devs have already spotted this problem and implemented a > solution in the master and 1.x branches. Having the Ackers stop the > complete latency clock makes more sense (particularly under heavy load) and > makes the complete latency match more closely that of the sojourn time > (spout to final component) through the whole topology. > > > However, I was hoping to get these models working with the latest storm > release (1.0.2). It doesn't appear that these changes have been backported > to the 1.0.x branch yet? > > > My Question (TL;DR): > > > Where in the 1.0.x codebase does the ack_ack message to the spout tasks > get processed? I know that implementations of ISpout have an ack() method > that gets called. However, in my test topologies when I leave this method > unimplemented the system still reports a complete latency for that spout? > The timestamp in the ack_ack message must be getting processes somewhere, > but I am struggling to identify where. > > > Any help locating this would be most appreciated. > > > Regards, > > > Thomas Cooper > PhD Student > Newcastle University, School of Computer Science > W: http://www.tomcooper.org.uk | A: 4th Floor, The Core, Science Central, > Bath Lane, Newcastle upon Tyne, NE4 5TF > > >
