It could be that heavy usage of an executor's machine prevents the executor
from communicating with nimbus, hence it appears "dead" to nimbus, even
though it's still working. I think we saw something like this some time
during our PoC development, and it was fixed by allocating more memory to
our workers - not enough memory was causing the workers to incur in heavy
GC cycles.

Regards,
Javier

On Fri, Jun 26, 2015 at 3:53 PM, Nick R. Katsipoulakis <
[email protected]> wrote:

> Hello,
>
> I have been running a sample topology and I can see on the nimbus.log
> messages like the following:
>
> 2015-06-26T19:46:35.556+0000 b.s.d.nimbus [INFO] Executor
> tpch-q5-top-1-1435347835:[5 5] not alive
> 2015-06-26T19:46:35.557+0000 b.s.d.nimbus [INFO] Executor
> tpch-q5-top-1-1435347835:[13 13] not alive
> 2015-06-26T19:46:35.557+0000 b.s.d.nimbus [INFO] Executor
> tpch-q5-top-1-1435347835:[21 21] not alive
> 2015-06-26T19:46:35.557+0000 b.s.d.nimbus [INFO] Executor
> tpch-q5-top-1-1435347835:[29 29] not alive
>
> So, my question is when does the nimbus come to the above decision? By the
> way, none of the above machines has crashed on there is an exception in the
> code. The only problem is that the resource utilization in those machines
> reaches high levels. Is the former a case where nimbus declares an executor
> as "not alive"?
>
> Thanks,
> Nick
>



-- 
Javier González Nicolini

Reply via email to