This was in 10.1.0. What happened was that a kafka broker went down and then this happened on the kafka streaming instance which had connection to this broker. I can send you all logs I got.
Ara. On Oct 24, 2016, at 10:41 PM, Guozhang Wang <wangg...@gmail.com<mailto:wangg...@gmail.com>> wrote: Hello Ara, Your encountered issue seems to be KAFKA-3812 <https://issues.apache.org/jira/browse/KAFKA-3812>, and KAFKA-3938 <https://issues.apache.org/jira/browse/KAFKA-3938>. Could you try to upgrade to the newly released 0.10.1.0 version and see if this issue goes away? If not I would love to investigate this issue further with you. Guozhang Guozhang On Sun, Oct 23, 2016 at 1:45 PM, Ara Ebrahimi <ara.ebrah...@argyledata.com<mailto:ara.ebrah...@argyledata.com>> wrote: And then this on a different node: 2016-10-23 13:43:57 INFO StreamThread:286 - stream-thread [StreamThread-3] Stream thread shutdown complete 2016-10-23 13:43:57 ERROR StreamPipeline:169 - An exception has occurred org.apache.kafka.streams.errors.StreamsException: stream-thread [StreamThread-3] Failed to rebalance at org.apache.kafka.streams.processor.internals.StreamThread.runLoop( StreamThread.java:401) at org.apache.kafka.streams.processor.internals. StreamThread.run(StreamThread.java:235) Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error while creating the state manager at org.apache.kafka.streams.processor.internals.AbstractTask.<init>( AbstractTask.java:72) at org.apache.kafka.streams.processor.internals. StreamTask.<init>(StreamTask.java:90) at org.apache.kafka.streams.processor.internals. StreamThread.createStreamTask(StreamThread.java:622) at org.apache.kafka.streams.processor.internals. StreamThread.addStreamTasks(StreamThread.java:649) at org.apache.kafka.streams.processor.internals.StreamThread.access$000( StreamThread.java:69) at org.apache.kafka.streams.processor.internals.StreamThread$1. onPartitionsAssigned(StreamThread.java:120) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator. onJoinComplete(ConsumerCoordinator.java:228) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator. joinGroupIfNeeded(AbstractCoordinator.java:313) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator. ensureActiveGroup(AbstractCoordinator.java:277) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll( ConsumerCoordinator.java:259) at org.apache.kafka.clients.consumer.KafkaConsumer. pollOnce(KafkaConsumer.java:1013) at org.apache.kafka.clients.consumer.KafkaConsumer.poll( KafkaConsumer.java:979) at org.apache.kafka.streams.processor.internals.StreamThread.runLoop( StreamThread.java:398) ... 1 more Caused by: java.io.IOException: task [7_1] Failed to lock the state directory: /tmp/kafka-streams/argyle-streams/7_1 at org.apache.kafka.streams.processor.internals. ProcessorStateManager.<init>(ProcessorStateManager.java:98) at org.apache.kafka.streams.processor.internals.AbstractTask.<init>( AbstractTask.java:69) ... 13 more Ara. On Oct 23, 2016, at 1:24 PM, Ara Ebrahimi <ara.ebrah...@argyledata.com<mailto:ara.ebrah...@argyledata.com>< mailto:ara.ebrah...@argyledata.com>> wrote: Hi, This happens when I hammer our 5 kafka streaming nodes (each with 4 streaming threads) hard enough for an hour or so: 2016-10-23 13:04:17 ERROR StreamThread:324 - stream-thread [StreamThread-2] Failed to flush state for StreamTask 3_8: org.apache.kafka.streams.errors.ProcessorStateException: task [3_8] Failed to flush state store streams-data-record-stats-avro-br-store at org.apache.kafka.streams.processor.internals. ProcessorStateManager.flush(ProcessorStateManager.java:322) at org.apache.kafka.streams.processor.internals.AbstractTask.flushState( AbstractTask.java:181) at org.apache.kafka.streams.processor.internals.StreamThread$4.apply( StreamThread.java:360) at org.apache.kafka.streams.processor.internals.StreamThread. performOnAllTasks(StreamThread.java:322) at org.apache.kafka.streams.processor.internals. StreamThread.flushAllState(StreamThread.java:357) at org.apache.kafka.streams.processor.internals.StreamThread. shutdownTasksAndState(StreamThread.java:295) at org.apache.kafka.streams.processor.internals.StreamThread.shutdown( StreamThread.java:262) at org.apache.kafka.streams.processor.internals. StreamThread.run(StreamThread.java:245) Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error opening store streams-data-record-stats-avro-br-store-201505160000 at location /tmp/kafka-streams/argyle-streams/3_8/streams-data- record-stats-avro-br-store/streams-data-record-stats- avro-br-store-201505160000 at org.apache.kafka.streams.state.internals.RocksDBStore. openDB(RocksDBStore.java:196) at org.apache.kafka.streams.state.internals.RocksDBStore. openDB(RocksDBStore.java:158) at org.apache.kafka.streams.state.internals.RocksDBWindowStore$Segment. openDB(RocksDBWindowStore.java:72) at org.apache.kafka.streams.state.internals.RocksDBWindowStore. getOrCreateSegment(RocksDBWindowStore.java:402) at org.apache.kafka.streams.state.internals.RocksDBWindowStore. putAndReturnInternalKey(RocksDBWindowStore.java:310) at org.apache.kafka.streams.state.internals.RocksDBWindowStore.put( RocksDBWindowStore.java:292) at org.apache.kafka.streams.state.internals.MeteredWindowStore.put( MeteredWindowStore.java:101) at org.apache.kafka.streams.state.internals.CachingWindowStore$1.apply( CachingWindowStore.java:87) at org.apache.kafka.streams.state.internals.NamedCache. flush(NamedCache.java:117) at org.apache.kafka.streams.state.internals.ThreadCache. flush(ThreadCache.java:100) at org.apache.kafka.streams.state.internals.CachingWindowStore.flush( CachingWindowStore.java:118) at org.apache.kafka.streams.processor.internals. ProcessorStateManager.flush(ProcessorStateManager.java:320) ... 7 more Caused by: org.rocksdb.RocksDBException: IO error: lock /tmp/kafka-streams/argyle-streams/3_8/streams-data- record-stats-avro-br-store/streams-data-record-stats- avro-br-store-201505160000/LOCK: No locks available at org.rocksdb.RocksDB.open(Native Method) at org.rocksdb.RocksDB.open(RocksDB.java:184) at org.apache.kafka.streams.state.internals.RocksDBStore. openDB(RocksDBStore.java:189) ... 18 more Some sort of a locking bug? Note that when this happen this node stops processing anything and the other nodes seem to want to pick up the load, which brings the whole streaming cluster to a stand still. That’s very worrying. Is a document somewhere describing *in detail* how failover for streaming works? Ara. ________________________________ This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation. ________________________________ ________________________________ This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation. ________________________________ ________________________________ This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation. ________________________________ -- -- Guozhang ________________________________ This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation. ________________________________ ________________________________ This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation. ________________________________