slow topology pull for remote supervisors

Jeremy Hansen Fri, 10 Oct 2014 01:20:13 -0700

We have a cluster where workers are distributed across multiple data centers.  
Supervisor nodes in the remote dc (remote meaning the dc where nimbus is not 
running) pull the topology at a very slow rate.  It’s not failing, it’s not 
retrying, it’s not timing out, it’s just very slow.  This seems to be specific 
to netty.  Doing transfers of the same amount of data, such as just scp’ing the 
topology across the network is just fine.  Copies in seconds.  Has anyone seen 
a situation where netty was limiting throughput from worker nodes with higher 
latency?  We’re seeing 50 - 60 ms in latency from the remote data center.  
Local workers load a 350M topology in about 15 - 20 seconds.  Remote workers 
takes about 10 - 15 minutes.


This is apache storm 0.9.2 incubating.  

I can actually just sit there and ‘du’ the topology path to see that it’s 
pulling, just very very slowly.

There’s many options to adjust buffers, but I’ve tried turing them all with no 
change in the behavior.  It almost feels as if netty is setting some kind of 
tcp window size to throttle the throughput.

Thank You
-jeremy

slow topology pull for remote supervisors

Reply via email to