Hello Ara,

Your encountered issue seems to be KAFKA-3812
<https://issues.apache.org/jira/browse/KAFKA-3812>, and KAFKA-3938
<https://issues.apache.org/jira/browse/KAFKA-3938>. Could you try to
upgrade to the newly released 0.10.1.0 version and see if this issue goes
away? If not I would love to investigate this issue further with you.


Guozhang



Guozhang


On Sun, Oct 23, 2016 at 1:45 PM, Ara Ebrahimi <ara.ebrah...@argyledata.com>
wrote:

> And then this on a different node:
>
> 2016-10-23 13:43:57 INFO  StreamThread:286 - stream-thread
> [StreamThread-3] Stream thread shutdown complete
> 2016-10-23 13:43:57 ERROR StreamPipeline:169 - An exception has occurred
> org.apache.kafka.streams.errors.StreamsException: stream-thread
> [StreamThread-3] Failed to rebalance
> at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(
> StreamThread.java:401)
> at org.apache.kafka.streams.processor.internals.
> StreamThread.run(StreamThread.java:235)
> Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error
> while creating the state manager
> at org.apache.kafka.streams.processor.internals.AbstractTask.<init>(
> AbstractTask.java:72)
> at org.apache.kafka.streams.processor.internals.
> StreamTask.<init>(StreamTask.java:90)
> at org.apache.kafka.streams.processor.internals.
> StreamThread.createStreamTask(StreamThread.java:622)
> at org.apache.kafka.streams.processor.internals.
> StreamThread.addStreamTasks(StreamThread.java:649)
> at org.apache.kafka.streams.processor.internals.StreamThread.access$000(
> StreamThread.java:69)
> at org.apache.kafka.streams.processor.internals.StreamThread$1.
> onPartitionsAssigned(StreamThread.java:120)
> at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.
> onJoinComplete(ConsumerCoordinator.java:228)
> at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
> joinGroupIfNeeded(AbstractCoordinator.java:313)
> at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
> ensureActiveGroup(AbstractCoordinator.java:277)
> at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(
> ConsumerCoordinator.java:259)
> at org.apache.kafka.clients.consumer.KafkaConsumer.
> pollOnce(KafkaConsumer.java:1013)
> at org.apache.kafka.clients.consumer.KafkaConsumer.poll(
> KafkaConsumer.java:979)
> at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(
> StreamThread.java:398)
> ... 1 more
> Caused by: java.io.IOException: task [7_1] Failed to lock the state
> directory: /tmp/kafka-streams/argyle-streams/7_1
> at org.apache.kafka.streams.processor.internals.
> ProcessorStateManager.<init>(ProcessorStateManager.java:98)
> at org.apache.kafka.streams.processor.internals.AbstractTask.<init>(
> AbstractTask.java:69)
> ... 13 more
>
> Ara.
>
> On Oct 23, 2016, at 1:24 PM, Ara Ebrahimi <ara.ebrah...@argyledata.com<
> mailto:ara.ebrah...@argyledata.com>> wrote:
>
> Hi,
>
> This happens when I hammer our 5 kafka streaming nodes (each with 4
> streaming threads) hard enough for an hour or so:
>
> 2016-10-23 13:04:17 ERROR StreamThread:324 - stream-thread
> [StreamThread-2] Failed to flush state for StreamTask 3_8:
> org.apache.kafka.streams.errors.ProcessorStateException: task [3_8]
> Failed to flush state store streams-data-record-stats-avro-br-store
> at org.apache.kafka.streams.processor.internals.
> ProcessorStateManager.flush(ProcessorStateManager.java:322)
> at org.apache.kafka.streams.processor.internals.AbstractTask.flushState(
> AbstractTask.java:181)
> at org.apache.kafka.streams.processor.internals.StreamThread$4.apply(
> StreamThread.java:360)
> at org.apache.kafka.streams.processor.internals.StreamThread.
> performOnAllTasks(StreamThread.java:322)
> at org.apache.kafka.streams.processor.internals.
> StreamThread.flushAllState(StreamThread.java:357)
> at org.apache.kafka.streams.processor.internals.StreamThread.
> shutdownTasksAndState(StreamThread.java:295)
> at org.apache.kafka.streams.processor.internals.StreamThread.shutdown(
> StreamThread.java:262)
> at org.apache.kafka.streams.processor.internals.
> StreamThread.run(StreamThread.java:245)
> Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error
> opening store streams-data-record-stats-avro-br-store-201505160000 at
> location /tmp/kafka-streams/argyle-streams/3_8/streams-data-
> record-stats-avro-br-store/streams-data-record-stats-
> avro-br-store-201505160000
> at org.apache.kafka.streams.state.internals.RocksDBStore.
> openDB(RocksDBStore.java:196)
> at org.apache.kafka.streams.state.internals.RocksDBStore.
> openDB(RocksDBStore.java:158)
> at org.apache.kafka.streams.state.internals.RocksDBWindowStore$Segment.
> openDB(RocksDBWindowStore.java:72)
> at org.apache.kafka.streams.state.internals.RocksDBWindowStore.
> getOrCreateSegment(RocksDBWindowStore.java:402)
> at org.apache.kafka.streams.state.internals.RocksDBWindowStore.
> putAndReturnInternalKey(RocksDBWindowStore.java:310)
> at org.apache.kafka.streams.state.internals.RocksDBWindowStore.put(
> RocksDBWindowStore.java:292)
> at org.apache.kafka.streams.state.internals.MeteredWindowStore.put(
> MeteredWindowStore.java:101)
> at org.apache.kafka.streams.state.internals.CachingWindowStore$1.apply(
> CachingWindowStore.java:87)
> at org.apache.kafka.streams.state.internals.NamedCache.
> flush(NamedCache.java:117)
> at org.apache.kafka.streams.state.internals.ThreadCache.
> flush(ThreadCache.java:100)
> at org.apache.kafka.streams.state.internals.CachingWindowStore.flush(
> CachingWindowStore.java:118)
> at org.apache.kafka.streams.processor.internals.
> ProcessorStateManager.flush(ProcessorStateManager.java:320)
> ... 7 more
> Caused by: org.rocksdb.RocksDBException: IO error: lock
> /tmp/kafka-streams/argyle-streams/3_8/streams-data-
> record-stats-avro-br-store/streams-data-record-stats-
> avro-br-store-201505160000/LOCK: No locks available
> at org.rocksdb.RocksDB.open(Native Method)
> at org.rocksdb.RocksDB.open(RocksDB.java:184)
> at org.apache.kafka.streams.state.internals.RocksDBStore.
> openDB(RocksDBStore.java:189)
> ... 18 more
>
> Some sort of a locking bug?
>
> Note that when this happen this node stops processing anything and the
> other nodes seem to want to pick up the load, which brings the whole
> streaming cluster to a stand still. That’s very worrying. Is a document
> somewhere describing *in detail* how failover for streaming works?
>
> Ara.
>
>
>
> ________________________________
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Thank you in
> advance for your cooperation.
>
> ________________________________
>
>
>
> ________________________________
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Thank you in
> advance for your cooperation.
>
> ________________________________
>
>
>
>
> ________________________________
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Thank you in
> advance for your cooperation.
>
> ________________________________
>



-- 
-- Guozhang

Reply via email to