Re: Nimbus repeatedly crashing to issue with disk/ZooKeeper resources

Bobby Evans Wed, 06 Jun 2018 07:59:16 -0700

In the case of the EvenScheduler it is all of the free slots in the
cluster.  So it is how ever many slots are on all of the nodes in the
cluster that don't have anything scheduled them.


It should be proportional to the number of nodes in your cluster.

- Bobby

On Wed, Jun 6, 2018 at 9:48 AM Mitchell Rathbun (BLOOMBERG/ 731 LEX) <
[email protected]> wrote:

> What determines the number of slots that we want to schedule on Nimbus
> startup? Is it existing worker processes at the time Nimbus is brought up,
> or is it a config property like supervisor.slots.ports?
>
> From: [email protected] At: 06/06/18 10:37:32
> To: Mitchell Rathbun (BLOOMBERG/ 731 LEX ) <[email protected]>,
> [email protected]
> Subject: Re: Nimbus repeatedly crashing to issue with disk/ZooKeeper
> resources
>
> The issue is that intervleave-all is a recursive function.
>
>
> https://github.com/apache/storm/blob/e40d213de7067f7d3aa4d4992b81890d8ed6ff31/storm-core/src/clj/org/apache/storm/util.clj#L776-L784
>
> So the depth of the stack trace is the number of slots you want to
> schedule on * 3 because of how the recursion happens.
>
> Sadly in the latest code it is the same, but still in java so it is not *
> 3, but still bad.
>
>
> https://github.com/apache/storm/blob/3e098f12e2b09d4954eeeaaf807e4ff6006a6929/storm-server/src/main/java/org/apache/storm/utils/ServerUtils.java#L113-L130
>
> So if you want to file a JIRA for us to fix this, that would be great.
> Even better if you could look at making interleaveAll no longer recursive.
>
> Thanks,
>
> Bobby
>
> On Tue, Jun 5, 2018 at 10:43 PM Mitchell Rathbun (BLOOMBERG/ 731 LEX) <
> [email protected]> wrote:
>
>>
>>
>> From: Mitchell Rathbun (BLOOMBERG/ 731 LEX) At: 06/05/18 23:42:02
>> To: Mitchell Rathbun (BLOOMBERG/ 731 LEX ) <[email protected]>
>> Subject: Nimbus repeatedly crashing to issue with disk/ZooKeeper resources
>> Recently, our Nimbus crashed with a stack overflow error, and we are
>> having some difficulty determining what the initial cause was. I have
>> attached the stack trace to help with the debugging. This same stack trace
>> occurred every time I ran Nimbus. I then deleted everything in the
>> directory specified by storm.local.dir and removed everything in ZooKeeper
>> under the storm.zookeeper.root path. I was then able to successfully run
>> Nimbus. So this points to there being an issue with the data/state that
>> Nimbus keeps. Has this issue been seen before, and how could the state
>> reach a point that would prevent Nimbus from running at all? Is it possible
>> that there was not enough disk/zk space, even though the logs don't really
>> point to this being the issue?
>>
>
>

Re: Nimbus repeatedly crashing to issue with disk/ZooKeeper resources

Reply via email to