Jeremy,
          I haven't run a cluster across data centers but I feel like
          zookeeper might be an issue here since nimbus receives
          heartbeats from supervisor and workers via zookeeper. Did you
          looked in zookeeper logs if there is any misbehavior. 

>From your description it seems to me that your nimbus is in one dc and
all your supervisors are in other dc is that right?

"Doing transfers of the same amount of data, such as just scp’ing the
topology across the network is just fine". 
What kind of spout are you using and is it trying to read this data from
remote dc?.

-Harsha


On Fri, Oct 10, 2014, at 01:19 AM, Jeremy Hansen wrote:
> 
> We have a cluster where workers are distributed across multiple data
> centers.  Supervisor nodes in the remote dc (remote meaning the dc where
> nimbus is not running) pull the topology at a very slow rate.  It’s not
> failing, it’s not retrying, it’s not timing out, it’s just very slow. 
> This seems to be specific to netty.  Doing transfers of the same amount
> of data, such as just scp’ing the topology across the network is just
> fine.  Copies in seconds.  Has anyone seen a situation where netty was
> limiting throughput from worker nodes with higher latency?  We’re seeing
> 50 - 60 ms in latency from the remote data center.  Local workers load a
> 350M topology in about 15 - 20 seconds.  Remote workers takes about 10 -
> 15 minutes.
> 
> This is apache storm 0.9.2 incubating.  
> 
> I can actually just sit there and ‘du’ the topology path to see that it’s
> pulling, just very very slowly.
> 
> There’s many options to adjust buffers, but I’ve tried turing them all
> with no change in the behavior.  It almost feels as if netty is setting
> some kind of tcp window size to throttle the throughput.
> 
> Thank You
> -jeremy

Reply via email to