Re: Complete latency calculation

Andrew Xor Tue, 31 Jan 2017 17:44:20 -0800

Ah cool Jungtaek, from the looks of it this might help alleviate the
problem but I think it won't eliminate it... Keep up the good work :).


Warm regards.

A
.


On Wed, Feb 1, 2017 at 12:48 AM, Jungtaek Lim <[email protected]> wrote:

> Thomas,
>
> I ported back STORM-1742
> <https://issues.apache.org/jira/browse/STORM-1742> to 1.0.x branch since
> I got many questions regarding complete latency from user@, dev@, and
> Stack Overflow.
>
> Release vote for Storm 1.0.3 RC1 is starting today (link
> <http://mail-archives.apache.org/mod_mbox/storm-dev/201701.mbox/%3C1BD83321-658D-42E7-A88E-7A75D8105240%40gmail.com%3E>),
> so you can wait for 1.0.3 or even use 1.0.3 RC1 before releasing officially.
> I'd encourage you to help testing 1.0.3 RC1 if you have time.
>
> Andrew,
> STORM-1742 <https://issues.apache.org/jira/browse/STORM-1742> changes the
> way to calculate complete latency:
> - start time: Acker receiving ACK_INIT from Spout
> - end time: Acker receiving ACK message which make the tuple tree completed
>
> yes I was also aware of clock difference on nodes so I picked this way
> though it doesn't account the latency from Spout to Acker. If you're
> interested, please refer the issue and relevant discussion link in
> description. Feedbacks are always appreciated!
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 2017년 2월 1일 (수) 오전 8:12, Andrew Xor <[email protected]>님이 작성:
>
> Hello,
>
>  Unfortunately this is a bit more complicated than it might initially
> seem; first of all where acks are processed, if at all, depends on the
> level of the flow consistency you have implemented for your Spout; which
> applies to Trident. In regular Storm, it might be perfectly OK to discard
> missed messages, thereby ignoring the "acks" for each of the tuples. In the
> other extreme Trident might even stall all of the incoming batches if a
> tuple fails to be ack'ed and will have to be replayed, ack'ed and then
> resume the computation -- this is done to enforce the in-order exactly once
> processing. Additional factor might also impact your latency as network
> speed used, packet size and so much more... I think Nick from Upenn did a
> similar study and had a more thorough investigation regarding latency
> tracking so if he's still subscribed in this list maybe can chip in to give
> you more details.
>
> From my experience the timing is vastly impacted by each node "clock",
> thus it cannot really be "trusted" because this can cause big deviations in
> the latency measurements -- and to be fair to Storm devs do say its an
> approximation!
>
> Hope this helped...
>
> On Tue, Jan 31, 2017 at 3:55 PM, Thomas Cooper (PGR) <
> [email protected]> wrote:
>
> Hi,
>
>
> First a bit of background:
>
>
> I am a PhD student working on modelling the performance of Storm
> topologies. I am having reasonable success in modelling the complete
> latency, however depending on the load (throughput) I can be up to 50% out.
>
>
> After exhausting all other sources of possible latency (mostly remote
> transfer delays between workers on separate machines) it seems the final
> path of ack tuples from the end component to the Acker and through to the
> Spout is the last source of unknown latency. Under heavy load and a
> relatively low number of spout tasks, each task will be busy calling
> next_tuple and acking, so ack complete messages may back up at the spout.
> This will artificially extend the complete latency. As the spout does not
> report metrics for the delay/processing of acks, I cannot account for this
> effect in my models.
>
>
> I thought I might have to resort to implementing my own spout (a custom
> Storm fork is something I would prefer to avoid). However, after seeing
> issue 1742 (https://issues.apache.org/jira/browse/STORM-1742) it seems 
> Jungtaek
> Lim and the Storm devs have already spotted this problem and implemented a
> solution in the master and 1.x branches. Having the Ackers stop the
> complete latency clock makes more sense (particularly under heavy load) and
> makes the complete latency match more closely that of the sojourn time
> (spout to final component) through the whole topology.
>
>
> However, I was hoping to get these models working with the latest storm
> release (1.0.2). It doesn't appear that these changes have been backported
> to the 1.0.x branch yet?
>
>
> My Question (TL;DR):
>
>
> Where in the 1.0.x codebase does the ack_ack message to the spout tasks
> get processed? I know that implementations of ISpout have an ack() method
> that gets called. However, in my test topologies when I leave this method
> unimplemented the system still reports a complete latency for that spout?
> The timestamp in the ack_ack message must be getting processes somewhere,
> but I am struggling to identify where.
>
>
> Any help locating this would be most appreciated.
>
>
> Regards,
>
>
> Thomas Cooper
> PhD Student
> Newcastle University, School of Computer Science
> W: http://www.tomcooper.org.uk | A: 4th Floor, The Core, Science Central,
> Bath Lane, Newcastle upon Tyne, NE4 5TF
>
>
>

Re: Complete latency calculation

Reply via email to