Re: Error handling

2019-03-26 Thread Prateek Maheshwari
Hi Tom, Yeah, these logs aren't sufficient to debug. To clarify, we're looking for logs from the classes: StreamProcessor, ZkJobCoordinator, JobCoordinatorListener, LocalApplicationRunner, ScheduleAfterDebounceTime (and others in org.apache.samza.zk) etc. Do you still have those available? - Prat

Re: Error handling

2019-03-26 Thread Tom Davis
I have attached the full framework logs. It's basically the same stack trace a few times. 19:24:48.263 [Samza StreamProcessor Container Thread-0] ERROR org.apache.samza.task.AsyncRunLoop - Got callback failure for task Partition 0 org.apache.samza.SamzaException: Callback failed for task Partitio

Re: Error handling

2019-03-25 Thread Prateek Maheshwari
Hi Tom, Unfortunately this exception only shows that the SamzaContainer tried to shut down a second time due to a processing timeout. This by itself is fine, and should be handled by the framework already. We'll need to look at rest of the framework logs to tell what state the application was in

Re: Error handling

2019-03-25 Thread Tom Davis
I am using Samza 1.0, yes. The stacktrace is: 19:24:49.326 [Samza StreamProcessor Container Thread-0] ERROR org.apache.samza.processor.StreamProcessor - Container: org.apache.samza.container.SamzaContainer@3e923d9e failed with an exception. Stopping the stream processor: c13057a8-42c5-4b68-9

Re: Error handling

2019-03-22 Thread Prateek Maheshwari
Hi Tom, This sounds like a bug. ApplicationRunner should return the correct status when the processor has shut down. We fixed a similar standalone bug recently, are you already using Samza 1.0. If this is reproducible / happens again, a thread dump + logs would also be very helpful for debugging a

Re: Error handling

2019-03-22 Thread Tom Davis
Prateek Maheshwari writes: Hi Tom, This would depend on what your k8s container orchestration logic looks like. For example, in YARN, 'status' returns 'not running' after 'start' until all the containers requested from the AM are 'running'. We also leverage YARN to restart containers/job aut

Re: Error handling

2019-03-15 Thread Prateek Maheshwari
Hi Tom, This would depend on what your k8s container orchestration logic looks like. For example, in YARN, 'status' returns 'not running' after 'start' until all the containers requested from the AM are 'running'. We also leverage YARN to restart containers/job automatically on failures (within so

Error handling

2019-03-15 Thread Tom Davis
I'm using the LocalApplicationRunner and had added a liveness check around the `status` method. The app is running in Kubernetes so, in theory, it could be restarted if exceptions happened during processing. However, it seems that "container failure" is divorced from "app failure" because the app

[GitHub] samza pull request #124: SAMZA-1209: Improve error handling in LocalStoreMon...

2017-04-18 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/samza/pull/124 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabl

[GitHub] samza pull request #124: SAMZA-1209: Improve error handling in LocalStoreMon...

2017-04-14 Thread shanthoosh
GitHub user shanthoosh opened a pull request: https://github.com/apache/samza/pull/124 SAMZA-1209: Improve error handling in LocalStoreMonitor Changes: 1. Add opt-in configuration to continue garbage collection of local stores when there’s a failure in garbage