Re: When does nimbus decides that an executor is not alive

Javier Gonzalez Sun, 28 Jun 2015 11:29:44 -0700

I would advise to try running a single worker per machine/supervisor, as
that way you have cheaper communications between storm components
(inter-jvm being faster than communicating between separate jvm processes).
I think we also configured to have the same amount of parallel tasks
executing as cores available in the machine, to avoid overhead due to
thread context switching.


Regards,
Javier

On Sun, Jun 28, 2015 at 1:12 PM, Nick R. Katsipoulakis <
[email protected]> wrote:

> Hello again,
>
> Actually, I should give more info about the system load. On each
> supervisor machine, I have a number of workers (JVM processes) executing a
> number of executors (Java threads). Therefore, for each JVM memory use
> percentage I get (I get them through Java's Runtime class), I really have
> the percent of how much JVM memory is used by all the executors running in
> the same JVM process. So, even if one thread does not use too many
> resources on one JVM, maybe, another JVM on the same machine is taking up
> all the resources, and I end up in a congested environment.
>
> I will try to look into GC going on, but I guess I will have to do some
> research on the matter because I do not know many things about Java GC.
>
> Thank you for your time.
>
> Regards,
> Nick
>
>
> 2015-06-28 13:02 GMT-04:00 Javier Gonzalez <[email protected]>:
>
>> Perhaps you could put explicit GC logs in the childopts so that you see
>> if you have "GC grinding" in the jvm running the worker that gets
>> disconnected. I suggested it since you mentioned that the machine is under
>> heavy load.
>>
>> Another thing that sometimes caused something like that was when the
>> machine came under heavy load from outside processes, since we were testing
>> on a shared machine. Is it your case?
>>
>> Regards,
>> JG
>>
>> On Sun, Jun 28, 2015 at 11:46 AM, Nick R. Katsipoulakis <
>> [email protected]> wrote:
>>
>>> Javier thank you for your response.
>>>
>>> So, do you suggest that I change to "workers.childopts" to more memory
>>> than I have now? Currently I have it set to 4 GBs and some of the executors
>>> do not use all of it (I monitor the JVM memory usage on each executor from
>>> the Bolt code). But, I guess I can try it and see if it works.
>>>
>>> Thank you again.
>>>
>>> Regards,
>>> Nick
>>>
>>> 2015-06-28 11:32 GMT-04:00 Javier Gonzalez <[email protected]>:
>>>
>>>> It could be that heavy usage of an executor's machine prevents the
>>>> executor from communicating with nimbus, hence it appears "dead" to nimbus,
>>>> even though it's still working. I think we saw something like this some
>>>> time during our PoC development, and it was fixed by allocating more memory
>>>> to our workers - not enough memory was causing the workers to incur in
>>>> heavy GC cycles.
>>>>
>>>> Regards,
>>>> Javier
>>>>
>>>> On Fri, Jun 26, 2015 at 3:53 PM, Nick R. Katsipoulakis <
>>>> [email protected]> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I have been running a sample topology and I can see on the nimbus.log
>>>>> messages like the following:
>>>>>
>>>>> 2015-06-26T19:46:35.556+0000 b.s.d.nimbus [INFO] Executor
>>>>> tpch-q5-top-1-1435347835:[5 5] not alive
>>>>> 2015-06-26T19:46:35.557+0000 b.s.d.nimbus [INFO] Executor
>>>>> tpch-q5-top-1-1435347835:[13 13] not alive
>>>>> 2015-06-26T19:46:35.557+0000 b.s.d.nimbus [INFO] Executor
>>>>> tpch-q5-top-1-1435347835:[21 21] not alive
>>>>> 2015-06-26T19:46:35.557+0000 b.s.d.nimbus [INFO] Executor
>>>>> tpch-q5-top-1-1435347835:[29 29] not alive
>>>>>
>>>>> So, my question is when does the nimbus come to the above decision? By
>>>>> the way, none of the above machines has crashed on there is an exception 
>>>>> in
>>>>> the code. The only problem is that the resource utilization in those
>>>>> machines reaches high levels. Is the former a case where nimbus declares 
>>>>> an
>>>>> executor as "not alive"?
>>>>>
>>>>> Thanks,
>>>>> Nick
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Javier González Nicolini
>>>>
>>>
>>>
>>>
>>> --
>>> Nikolaos Romanos Katsipoulakis,
>>> University of Pittsburgh, PhD candidate
>>>
>>
>>
>>
>> --
>> Javier González Nicolini
>>
>
>
>
> --
> Nikolaos Romanos Katsipoulakis,
> University of Pittsburgh, PhD candidate
>



-- 
Javier González Nicolini

Re: When does nimbus decides that an executor is not alive

Reply via email to