Re: Error handling

2019-03-26 Thread Prateek Maheshwari
Hi Tom, Yeah, these logs aren't sufficient to debug. To clarify, we're looking for logs from the classes: StreamProcessor, ZkJobCoordinator, JobCoordinatorListener, LocalApplicationRunner, ScheduleAfterDebounceTime (and others in org.apache.samza.zk) etc. Do you still have those available? - Prat

Re: Error handling

2019-03-26 Thread Tom Davis
I have attached the full framework logs. It's basically the same stack trace a few times. 19:24:48.263 [Samza StreamProcessor Container Thread-0] ERROR org.apache.samza.task.AsyncRunLoop - Got callback failure for task Partition 0 org.apache.samza.SamzaException: Callback failed for task Partitio

Re: Error handling

2019-03-25 Thread Prateek Maheshwari
Hi Tom, Unfortunately this exception only shows that the SamzaContainer tried to shut down a second time due to a processing timeout. This by itself is fine, and should be handled by the framework already. We'll need to look at rest of the framework logs to tell what state the application was in

Re: Error handling

2019-03-25 Thread Tom Davis
I am using Samza 1.0, yes. The stacktrace is: 19:24:49.326 [Samza StreamProcessor Container Thread-0] ERROR org.apache.samza.processor.StreamProcessor - Container: org.apache.samza.container.SamzaContainer@3e923d9e failed with an exception. Stopping the stream processor: c13057a8-42c5-4b68-9

Re: Error handling

2019-03-22 Thread Prateek Maheshwari
Hi Tom, This sounds like a bug. ApplicationRunner should return the correct status when the processor has shut down. We fixed a similar standalone bug recently, are you already using Samza 1.0. If this is reproducible / happens again, a thread dump + logs would also be very helpful for debugging a

Re: Error handling

2019-03-22 Thread Tom Davis
Prateek Maheshwari writes: Hi Tom, This would depend on what your k8s container orchestration logic looks like. For example, in YARN, 'status' returns 'not running' after 'start' until all the containers requested from the AM are 'running'. We also leverage YARN to restart containers/job aut

Re: Error handling

2019-03-15 Thread Prateek Maheshwari
Hi Tom, This would depend on what your k8s container orchestration logic looks like. For example, in YARN, 'status' returns 'not running' after 'start' until all the containers requested from the AM are 'running'. We also leverage YARN to restart containers/job automatically on failures (within so