Hi, We have been running a clustered kafka streams application and say after 3 months or so of uninterrupted running few threads of couple of instances failed. We checked the logs and we found these two common stack traces pointing to underlying cause of fetch and put operations of rocksdb.
Cause 1 - flush Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error while executing flush from store key-table-201709080400 at org.apache.kafka.streams.state.internals.RocksDBStore.flushInternal(RocksDBStore.java:354) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] at org.apache.kafka.streams.state.internals.RocksDBStore.flush(RocksDBStore.java:345) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] at org.apache.kafka.streams.state.internals.Segments.flush(Segments.java:134) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] at org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStore.flush(RocksDBSegmentedBytesStore.java:114) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] at org.apache.kafka.streams.state.internals.WrappedStateStore$AbstractWrappedStateStore.flush(WrappedStateStore.java:80) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] at org.apache.kafka.streams.state.internals.MeteredSegmentedBytesStore.flush(MeteredSegmentedBytesStore.java:111) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] at org.apache.kafka.streams.state.internals.RocksDBWindowStore.flush(RocksDBWindowStore.java:91) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:323) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] ... Caused by: org.rocksdb.RocksDBException: at org.rocksdb.RocksDB.flush(Native Method) ~[rocksdbjni-5.0.1.jar:na] at org.rocksdb.RocksDB.flush(RocksDB.java:1642) ~[rocksdbjni-5.0.1.jar:na] at org.apache.kafka.streams.state.internals.RocksDBStore.flushInternal(RocksDBStore.java:352) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] ... Cause 2 - put ERROR 2017-09-08 09:40:47,305 [StreamThread-1]: Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error while executing put key .... and value [...] from store key-table-201709080410 at org.apache.kafka.streams.state.internals.RocksDBStore.putInternal(RocksDBStore.java:257) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] at org.apache.kafka.streams.state.internals.RocksDBStore.put(RocksDBStore.java:232) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] at org.apache.kafka.streams.state.internals.RocksDBSegmentedBytesStore.put(RocksDBSegmentedBytesStore.java:74) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] at org.apache.kafka.streams.state.internals.ChangeLoggingSegmentedBytesStore.put(ChangeLoggingSegmentedBytesStore.java:54) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] at org.apache.kafka.streams.state.internals.MeteredSegmentedBytesStore.put(MeteredSegmentedBytesStore.java:101) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] at org.apache.kafka.streams.state.internals.RocksDBWindowStore.put(RocksDBWindowStore.java:109) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] at org.apache.kafka.streams.kstream.internals.KStreamWindowAggregate$KStreamWindowAggregateProcessor.process(KStreamWindowAggregate.java:112) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:48) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:188) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:134) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:83) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] at org.apache.kafka.streams.kstream.internals.KStreamFilter$KStreamFilterProcessor.process(KStreamFilter.java:44) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:48) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:188) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:134) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:83) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:70) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:197) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] ... Caused by: org.rocksdb.RocksDBException: at org.rocksdb.RocksDB.put(Native Method) ~[rocksdbjni-5.0.1.jar:na] at org.rocksdb.RocksDB.put(RocksDB.java:488) ~[rocksdbjni-5.0.1.jar:na] at org.apache.kafka.streams.state.internals.RocksDBStore.putInternal(RocksDBStore.java:254) ~[kafka-streams-0.10.2.1-SNAPSHOT.jar:na] ... So I had few questions here: 1. Can we know anything from stack trace as what caused rocksdb to fail at these operations. 2. Is there a way we can get to know more about failure by looking into some rocks db logs. 3. Are these some known issues and upgrading to 0.11.x will fix such issues? Thanks Sachin