Re: Nimbus repeatedly crashing to issue with disk/ZooKeeper resources

Bobby Evans Wed, 06 Jun 2018 08:07:29 -0700

Yes the config is supervisor.slots.ports.  If you only have one node I
really have no idea why it would think there are so many free slots.


- Bobby

On Wed, Jun 6, 2018 at 10:02 AM Mitchell Rathbun (BLOOMBERG/ 731 LEX) <
[email protected]> wrote:

> These slots are controlled by the config property supervisor.slots.ports,
> right? We only have one node per cluster currently (Nimbus, Supervisor, and
> Worker processes all run on the same machine).
>
> From: [email protected] At: 06/06/18 10:58:35
> To: Mitchell Rathbun (BLOOMBERG/ 731 LEX ) <[email protected]>
> Cc: [email protected]
>
> Subject: Re: Nimbus repeatedly crashing to issue with disk/ZooKeeper
> resources
>
> In the case of the EvenScheduler it is all of the free slots in the
> cluster.  So it is how ever many slots are on all of the nodes in the
> cluster that don't have anything scheduled them.
>
> It should be proportional to the number of nodes in your cluster.
>
> - Bobby
>
> On Wed, Jun 6, 2018 at 9:48 AM Mitchell Rathbun (BLOOMBERG/ 731 LEX) <
> [email protected]> wrote:
>
>> What determines the number of slots that we want to schedule on Nimbus
>> startup? Is it existing worker processes at the time Nimbus is brought up,
>> or is it a config property like supervisor.slots.ports?
>>
>> From: [email protected] At: 06/06/18 10:37:32
>> To: Mitchell Rathbun (BLOOMBERG/ 731 LEX ) <[email protected]>,
>> [email protected]
>> Subject: Re: Nimbus repeatedly crashing to issue with disk/ZooKeeper
>> resources
>>
>> The issue is that intervleave-all is a recursive function.
>>
>>
>> https://github.com/apache/storm/blob/e40d213de7067f7d3aa4d4992b81890d8ed6ff31/storm-core/src/clj/org/apache/storm/util.clj#L776-L784
>>
>> So the depth of the stack trace is the number of slots you want to
>> schedule on * 3 because of how the recursion happens.
>>
>> Sadly in the latest code it is the same, but still in java so it is not *
>> 3, but still bad.
>>
>>
>> https://github.com/apache/storm/blob/3e098f12e2b09d4954eeeaaf807e4ff6006a6929/storm-server/src/main/java/org/apache/storm/utils/ServerUtils.java#L113-L130
>>
>> So if you want to file a JIRA for us to fix this, that would be great.
>> Even better if you could look at making interleaveAll no longer recursive.
>>
>> Thanks,
>>
>> Bobby
>>
>> On Tue, Jun 5, 2018 at 10:43 PM Mitchell Rathbun (BLOOMBERG/ 731 LEX) <
>> [email protected]> wrote:
>>
>>>
>>>
>>> From: Mitchell Rathbun (BLOOMBERG/ 731 LEX) At: 06/05/18 23:42:02
>>> To: Mitchell Rathbun (BLOOMBERG/ 731 LEX ) <[email protected]>
>>> Subject: Nimbus repeatedly crashing to issue with disk/ZooKeeper
>>> resources
>>> Recently, our Nimbus crashed with a stack overflow error, and we are
>>> having some difficulty determining what the initial cause was. I have
>>> attached the stack trace to help with the debugging. This same stack trace
>>> occurred every time I ran Nimbus. I then deleted everything in the
>>> directory specified by storm.local.dir and removed everything in ZooKeeper
>>> under the storm.zookeeper.root path. I was then able to successfully run
>>> Nimbus. So this points to there being an issue with the data/state that
>>> Nimbus keeps. Has this issue been seen before, and how could the state
>>> reach a point that would prevent Nimbus from running at all? Is it possible
>>> that there was not enough disk/zk space, even though the logs don't really
>>> point to this being the issue?
>>>
>>
>>
>

Re: Nimbus repeatedly crashing to issue with disk/ZooKeeper resources

Reply via email to