Here is a new theory, perhaps the worker port is in use as a src port by another bolt and we should modify our ephemeral port range to not allow that?
On 3/28/14, 10:20 AM, "Luke Forehand" <[email protected]> wrote: >Hello, > >We are experiencing an issue where multiple workers are being assigned to >the same port, causing heartbeat timeout. Here is the gist showing the >supervisor and worker log in time order so you can see the supervisor >launching the worker and waiting, the worker jvm fails due to bind >exception, supervisor gives up due to timeout. > >https://gist.github.com/anonymous/9835036 > >It should be noted that we are seeing odd scheduling behavior where one >machine is getting overloaded with workers. the logs in this gist are from >the machine that gets overloaded. It typically gets 8 workers where other >machines only get 2 workers. I am happy to provide more details. > >Thanks, >Luke Forehand | Networked Insights | Software Engineer >
