Javier thank you for your response. So, do you suggest that I change to "workers.childopts" to more memory than I have now? Currently I have it set to 4 GBs and some of the executors do not use all of it (I monitor the JVM memory usage on each executor from the Bolt code). But, I guess I can try it and see if it works.
Thank you again. Regards, Nick 2015-06-28 11:32 GMT-04:00 Javier Gonzalez <[email protected]>: > It could be that heavy usage of an executor's machine prevents the > executor from communicating with nimbus, hence it appears "dead" to nimbus, > even though it's still working. I think we saw something like this some > time during our PoC development, and it was fixed by allocating more memory > to our workers - not enough memory was causing the workers to incur in > heavy GC cycles. > > Regards, > Javier > > On Fri, Jun 26, 2015 at 3:53 PM, Nick R. Katsipoulakis < > [email protected]> wrote: > >> Hello, >> >> I have been running a sample topology and I can see on the nimbus.log >> messages like the following: >> >> 2015-06-26T19:46:35.556+0000 b.s.d.nimbus [INFO] Executor >> tpch-q5-top-1-1435347835:[5 5] not alive >> 2015-06-26T19:46:35.557+0000 b.s.d.nimbus [INFO] Executor >> tpch-q5-top-1-1435347835:[13 13] not alive >> 2015-06-26T19:46:35.557+0000 b.s.d.nimbus [INFO] Executor >> tpch-q5-top-1-1435347835:[21 21] not alive >> 2015-06-26T19:46:35.557+0000 b.s.d.nimbus [INFO] Executor >> tpch-q5-top-1-1435347835:[29 29] not alive >> >> So, my question is when does the nimbus come to the above decision? By >> the way, none of the above machines has crashed on there is an exception in >> the code. The only problem is that the resource utilization in those >> machines reaches high levels. Is the former a case where nimbus declares an >> executor as "not alive"? >> >> Thanks, >> Nick >> > > > > -- > Javier González Nicolini > -- Nikolaos Romanos Katsipoulakis, University of Pittsburgh, PhD candidate
