[jira] [Commented] (KAFKA-3752) Provide a way for KStreams to recover from unclean shutdown
[ https://issues.apache.org/jira/browse/KAFKA-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504535#comment-15504535 ] Guozhang Wang commented on KAFKA-3752: -- [~theduderog] Could you try to validate if this issue is already fixed? There are a couple of tickets related to this issue that is just merged recently. > Provide a way for KStreams to recover from unclean shutdown > --- > > Key: KAFKA-3752 > URL: https://issues.apache.org/jira/browse/KAFKA-3752 > Project: Kafka > Issue Type: Sub-task > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Roger Hoover > Labels: architecture > > If a KStream application gets killed with SIGKILL (e.g. by the Linux OOM > Killer), it may leave behind lock files and fail to recover. > It would be useful to have an options (say --force) to tell KStreams to > proceed even if it finds old LOCK files. > {noformat} > [2016-05-24 17:37:52,886] ERROR Failed to create an active task #0_0 in > thread [StreamThread-1]: > (org.apache.kafka.streams.processor.internals.StreamThread:583) > org.apache.kafka.streams.errors.ProcessorStateException: Error while creating > the state manager > at > org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:71) > at > org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:86) > at > org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:550) > at > org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:577) > at > org.apache.kafka.streams.processor.internals.StreamThread.access$000(StreamThread.java:68) > at > org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:123) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:222) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:232) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:227) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at > org.apache.kafka.clients.consumer.internals.RequestFuture$2.onSuccess(RequestFuture.java:182) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:436) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:422) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658) > at > org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:243) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:345) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:977) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937) > at >
[jira] [Commented] (KAFKA-3752) Provide a way for KStreams to recover from unclean shutdown
[ https://issues.apache.org/jira/browse/KAFKA-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15416351#comment-15416351 ] Greg Fodor commented on KAFKA-3752: --- Oh apologies for mis-reading the ticket, but in our case it's a recoverable condition. > Provide a way for KStreams to recover from unclean shutdown > --- > > Key: KAFKA-3752 > URL: https://issues.apache.org/jira/browse/KAFKA-3752 > Project: Kafka > Issue Type: Sub-task > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Roger Hoover > Labels: architecture > > If a KStream application gets killed with SIGKILL (e.g. by the Linux OOM > Killer), it may leave behind lock files and fail to recover. > It would be useful to have an options (say --force) to tell KStreams to > proceed even if it finds old LOCK files. > {noformat} > [2016-05-24 17:37:52,886] ERROR Failed to create an active task #0_0 in > thread [StreamThread-1]: > (org.apache.kafka.streams.processor.internals.StreamThread:583) > org.apache.kafka.streams.errors.ProcessorStateException: Error while creating > the state manager > at > org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:71) > at > org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:86) > at > org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:550) > at > org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:577) > at > org.apache.kafka.streams.processor.internals.StreamThread.access$000(StreamThread.java:68) > at > org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:123) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:222) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:232) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:227) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at > org.apache.kafka.clients.consumer.internals.RequestFuture$2.onSuccess(RequestFuture.java:182) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:436) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:422) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658) > at > org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:243) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:345) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:977) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937) > at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:295) > at >
[jira] [Commented] (KAFKA-3752) Provide a way for KStreams to recover from unclean shutdown
[ https://issues.apache.org/jira/browse/KAFKA-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415664#comment-15415664 ] Greg Fodor commented on KAFKA-3752: --- Hey [~guozhang], we're hitting this issue as well during a rebalance and I think also during failures during startup due to KAFKA-3559. The job eventually recovers. We are set to running 32 threads per instance and have 2 instances (though the issue happens more often when we run on a single instance.) Log here: https://gist.github.com/gfodor/bac65bff38233193b70836b78c701e7b > Provide a way for KStreams to recover from unclean shutdown > --- > > Key: KAFKA-3752 > URL: https://issues.apache.org/jira/browse/KAFKA-3752 > Project: Kafka > Issue Type: Sub-task > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Roger Hoover > Labels: architecture > > If a KStream application gets killed with SIGKILL (e.g. by the Linux OOM > Killer), it may leave behind lock files and fail to recover. > It would be useful to have an options (say --force) to tell KStreams to > proceed even if it finds old LOCK files. > {noformat} > [2016-05-24 17:37:52,886] ERROR Failed to create an active task #0_0 in > thread [StreamThread-1]: > (org.apache.kafka.streams.processor.internals.StreamThread:583) > org.apache.kafka.streams.errors.ProcessorStateException: Error while creating > the state manager > at > org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:71) > at > org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:86) > at > org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:550) > at > org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:577) > at > org.apache.kafka.streams.processor.internals.StreamThread.access$000(StreamThread.java:68) > at > org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:123) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:222) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:232) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:227) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at > org.apache.kafka.clients.consumer.internals.RequestFuture$2.onSuccess(RequestFuture.java:182) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:436) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:422) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658) > at > org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:243) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:345) > at >
[jira] [Commented] (KAFKA-3752) Provide a way for KStreams to recover from unclean shutdown
[ https://issues.apache.org/jira/browse/KAFKA-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406893#comment-15406893 ] Guozhang Wang commented on KAFKA-3752: -- [~theduderog] Do you still have the logs on the server side? > Provide a way for KStreams to recover from unclean shutdown > --- > > Key: KAFKA-3752 > URL: https://issues.apache.org/jira/browse/KAFKA-3752 > Project: Kafka > Issue Type: Sub-task > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Roger Hoover > Labels: architecture > > If a KStream application gets killed with SIGKILL (e.g. by the Linux OOM > Killer), it may leave behind lock files and fail to recover. > It would be useful to have an options (say --force) to tell KStreams to > proceed even if it finds old LOCK files. > {noformat} > [2016-05-24 17:37:52,886] ERROR Failed to create an active task #0_0 in > thread [StreamThread-1]: > (org.apache.kafka.streams.processor.internals.StreamThread:583) > org.apache.kafka.streams.errors.ProcessorStateException: Error while creating > the state manager > at > org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:71) > at > org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:86) > at > org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:550) > at > org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:577) > at > org.apache.kafka.streams.processor.internals.StreamThread.access$000(StreamThread.java:68) > at > org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:123) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:222) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:232) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:227) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at > org.apache.kafka.clients.consumer.internals.RequestFuture$2.onSuccess(RequestFuture.java:182) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:436) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:422) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658) > at > org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:243) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:345) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:977) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937) > at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:295) > at >
[jira] [Commented] (KAFKA-3752) Provide a way for KStreams to recover from unclean shutdown
[ https://issues.apache.org/jira/browse/KAFKA-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404439#comment-15404439 ] Roger Hoover commented on KAFKA-3752: - [~guozhang] You're assessment seems correct. It happened again when I restarted after a clean shutdown (SIGTERM + wait for exit). 1. We have a single KafkaStreams instance with 8 threads. 2. Here's the full log: https://gist.github.com/theduderog/f9ab4767cd3b098d404f5513a7e1c27e > Provide a way for KStreams to recover from unclean shutdown > --- > > Key: KAFKA-3752 > URL: https://issues.apache.org/jira/browse/KAFKA-3752 > Project: Kafka > Issue Type: Sub-task > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Roger Hoover > Labels: architecture > > If a KStream application gets killed with SIGKILL (e.g. by the Linux OOM > Killer), it may leave behind lock files and fail to recover. > It would be useful to have an options (say --force) to tell KStreams to > proceed even if it finds old LOCK files. > {noformat} > [2016-05-24 17:37:52,886] ERROR Failed to create an active task #0_0 in > thread [StreamThread-1]: > (org.apache.kafka.streams.processor.internals.StreamThread:583) > org.apache.kafka.streams.errors.ProcessorStateException: Error while creating > the state manager > at > org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:71) > at > org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:86) > at > org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:550) > at > org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:577) > at > org.apache.kafka.streams.processor.internals.StreamThread.access$000(StreamThread.java:68) > at > org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:123) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:222) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:232) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:227) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at > org.apache.kafka.clients.consumer.internals.RequestFuture$2.onSuccess(RequestFuture.java:182) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:436) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:422) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658) > at > org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:243) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:345) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:977) > at >
[jira] [Commented] (KAFKA-3752) Provide a way for KStreams to recover from unclean shutdown
[ https://issues.apache.org/jira/browse/KAFKA-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313385#comment-15313385 ] Guozhang Wang commented on KAFKA-3752: -- [~theduderog] I looked through the file locking issue. After the application was crashed, all resources including the file locks should be released by the OS automatically. But there is a race condition during rebalance which could cause one thread re-joining the group much earlier than others to trying to grab the lock of its migrated tasks while the other threads have not released the tasks, hence the lock. But to make sure your encountered issues is the same as I described (KAFKA-3758), I have a couple of questions for you: 1. In your case, how many KafkaStreams instances are you running, and how many threads per each instance? 2. When you have time, could you re-run your application and upload the full log to this ticket upon seeing this issue again? I tried a simple dummy application with SIGKILL but that cannot re-produce this issue. > Provide a way for KStreams to recover from unclean shutdown > --- > > Key: KAFKA-3752 > URL: https://issues.apache.org/jira/browse/KAFKA-3752 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Roger Hoover >Assignee: Guozhang Wang > Labels: architecture > > If a KStream application gets killed with SIGKILL (e.g. by the Linux OOM > Killer), it may leave behind lock files and fail to recover. > It would be useful to have an options (say --force) to tell KStreams to > proceed even if it finds old LOCK files. > {noformat} > [2016-05-24 17:37:52,886] ERROR Failed to create an active task #0_0 in > thread [StreamThread-1]: > (org.apache.kafka.streams.processor.internals.StreamThread:583) > org.apache.kafka.streams.errors.ProcessorStateException: Error while creating > the state manager > at > org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:71) > at > org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:86) > at > org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:550) > at > org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:577) > at > org.apache.kafka.streams.processor.internals.StreamThread.access$000(StreamThread.java:68) > at > org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:123) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:222) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:232) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:227) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at > org.apache.kafka.clients.consumer.internals.RequestFuture$2.onSuccess(RequestFuture.java:182) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:436) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:422) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658) > at > org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360) > at >
[jira] [Commented] (KAFKA-3752) Provide a way for KStreams to recover from unclean shutdown
[ https://issues.apache.org/jira/browse/KAFKA-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298949#comment-15298949 ] Guozhang Wang commented on KAFKA-3752: -- We are adding locks to the directory only for the case where there are multiple stream threads on the same machine, when one thread is accessing (hence owning) this directory and the other is cleaning it. We should revisit this issue and see if there is a better solution than using locks here. > Provide a way for KStreams to recover from unclean shutdown > --- > > Key: KAFKA-3752 > URL: https://issues.apache.org/jira/browse/KAFKA-3752 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Roger Hoover >Assignee: Guozhang Wang > Labels: architecture > > If a KStream application gets killed with SIGKILL (e.g. by the Linux OOM > Killer), it may leave behind lock files and fail to recover. > It would be useful to have an options (say --force) to tell KStreams to > proceed even if it finds old LOCK files. > {noformat} > [2016-05-24 17:37:52,886] ERROR Failed to create an active task #0_0 in > thread [StreamThread-1]: > (org.apache.kafka.streams.processor.internals.StreamThread:583) > org.apache.kafka.streams.errors.ProcessorStateException: Error while creating > the state manager > at > org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:71) > at > org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:86) > at > org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:550) > at > org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:577) > at > org.apache.kafka.streams.processor.internals.StreamThread.access$000(StreamThread.java:68) > at > org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:123) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:222) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:232) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:227) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at > org.apache.kafka.clients.consumer.internals.RequestFuture$2.onSuccess(RequestFuture.java:182) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:436) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:422) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658) > at > org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:243) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:345) > at >