Hello Ara,

I have looked through the logs you sent to me (thanks!), and I'm suspecting
the "file does not exist" issue is because your state directory is set to `
/tmp/kafka-streams` folder with the default value. As for the "LOCK: No
locks available" and "Failed to lock the state directory" issue, it seems
you are rolling bounce the stream instances (from the observed "shutdown"
log entries), and I'm suspecting it is because the state directories has
been auto deleted, it caused the thread tries to re-create it again while
still holding on the locks during the shutdown process.

Could you set the "state.dir" config to another directory and see if this
issue goes away?

http://docs.confluent.io/3.0.0/streams/developer-guide.html#optional-configuration-parameters


As for the detailed failover description, I'd recommend you to this doc
section:

http://docs.confluent.io/3.0.1/streams/architecture.html#fault-tolerance


As well as this blog post:

http://www.confluent.io/blog/elastic-scaling-in-kafka-streams/

Guozhang


On Tue, Oct 25, 2016 at 8:34 AM, Guozhang Wang <wangg...@gmail.com> wrote:

> Logs would be very helpful, I can look further into it.
>
> Thanks Ara.
>
> On Mon, Oct 24, 2016 at 11:04 PM, Ara Ebrahimi <
> ara.ebrah...@argyledata.com> wrote:
>
>> This was in 10.1.0. What happened was that a kafka broker went down and
>> then this happened on the kafka streaming instance which had connection to
>> this broker. I can send you all logs I got.
>>
>> Ara.
>>
>> On Oct 24, 2016, at 10:41 PM, Guozhang Wang <wangg...@gmail.com<mailto:
>> wangg...@gmail.com>> wrote:
>>
>> Hello Ara,
>>
>> Your encountered issue seems to be KAFKA-3812
>> <https://issues.apache.org/jira/browse/KAFKA-3812>, and KAFKA-3938
>> <https://issues.apache.org/jira/browse/KAFKA-3938>. Could you try to
>> upgrade to the newly released 0.10.1.0 version and see if this issue goes
>> away? If not I would love to investigate this issue further with you.
>>
>>
>> Guozhang
>>
>>
>>
>> Guozhang
>>
>>
>> On Sun, Oct 23, 2016 at 1:45 PM, Ara Ebrahimi <
>> ara.ebrah...@argyledata.com<mailto:ara.ebrah...@argyledata.com>>
>> wrote:
>>
>> And then this on a different node:
>>
>> 2016-10-23 13:43:57 INFO  StreamThread:286 - stream-thread
>> [StreamThread-3] Stream thread shutdown complete
>> 2016-10-23 13:43:57 ERROR StreamPipeline:169 - An exception has occurred
>> org.apache.kafka.streams.errors.StreamsException: stream-thread
>> [StreamThread-3] Failed to rebalance
>> at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(
>> StreamThread.java:401)
>> at org.apache.kafka.streams.processor.internals.
>> StreamThread.run(StreamThread.java:235)
>> Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error
>> while creating the state manager
>> at org.apache.kafka.streams.processor.internals.AbstractTask.<init>(
>> AbstractTask.java:72)
>> at org.apache.kafka.streams.processor.internals.
>> StreamTask.<init>(StreamTask.java:90)
>> at org.apache.kafka.streams.processor.internals.
>> StreamThread.createStreamTask(StreamThread.java:622)
>> at org.apache.kafka.streams.processor.internals.
>> StreamThread.addStreamTasks(StreamThread.java:649)
>> at org.apache.kafka.streams.processor.internals.StreamThread.access$000(
>> StreamThread.java:69)
>> at org.apache.kafka.streams.processor.internals.StreamThread$1.
>> onPartitionsAssigned(StreamThread.java:120)
>> at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.
>> onJoinComplete(ConsumerCoordinator.java:228)
>> at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
>> joinGroupIfNeeded(AbstractCoordinator.java:313)
>> at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
>> ensureActiveGroup(AbstractCoordinator.java:277)
>> at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(
>> ConsumerCoordinator.java:259)
>> at org.apache.kafka.clients.consumer.KafkaConsumer.
>> pollOnce(KafkaConsumer.java:1013)
>> at org.apache.kafka.clients.consumer.KafkaConsumer.poll(
>> KafkaConsumer.java:979)
>> at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(
>> StreamThread.java:398)
>> ... 1 more
>> Caused by: java.io.IOException: task [7_1] Failed to lock the state
>> directory: /tmp/kafka-streams/argyle-streams/7_1
>> at org.apache.kafka.streams.processor.internals.
>> ProcessorStateManager.<init>(ProcessorStateManager.java:98)
>> at org.apache.kafka.streams.processor.internals.AbstractTask.<init>(
>> AbstractTask.java:69)
>> ... 13 more
>>
>> Ara.
>>
>> On Oct 23, 2016, at 1:24 PM, Ara Ebrahimi <ara.ebrah...@argyledata.com<m
>> ailto:ara.ebrah...@argyledata.com><
>> mailto:ara.ebrah...@argyledata.com>> wrote:
>>
>> Hi,
>>
>> This happens when I hammer our 5 kafka streaming nodes (each with 4
>> streaming threads) hard enough for an hour or so:
>>
>> 2016-10-23 13:04:17 ERROR StreamThread:324 - stream-thread
>> [StreamThread-2] Failed to flush state for StreamTask 3_8:
>> org.apache.kafka.streams.errors.ProcessorStateException: task [3_8]
>> Failed to flush state store streams-data-record-stats-avro-br-store
>> at org.apache.kafka.streams.processor.internals.
>> ProcessorStateManager.flush(ProcessorStateManager.java:322)
>> at org.apache.kafka.streams.processor.internals.AbstractTask.flushState(
>> AbstractTask.java:181)
>> at org.apache.kafka.streams.processor.internals.StreamThread$4.apply(
>> StreamThread.java:360)
>> at org.apache.kafka.streams.processor.internals.StreamThread.
>> performOnAllTasks(StreamThread.java:322)
>> at org.apache.kafka.streams.processor.internals.
>> StreamThread.flushAllState(StreamThread.java:357)
>> at org.apache.kafka.streams.processor.internals.StreamThread.
>> shutdownTasksAndState(StreamThread.java:295)
>> at org.apache.kafka.streams.processor.internals.StreamThread.shutdown(
>> StreamThread.java:262)
>> at org.apache.kafka.streams.processor.internals.
>> StreamThread.run(StreamThread.java:245)
>> Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error
>> opening store streams-data-record-stats-avro-br-store-201505160000 at
>> location /tmp/kafka-streams/argyle-streams/3_8/streams-data-
>> record-stats-avro-br-store/streams-data-record-stats-
>> avro-br-store-201505160000
>> at org.apache.kafka.streams.state.internals.RocksDBStore.
>> openDB(RocksDBStore.java:196)
>> at org.apache.kafka.streams.state.internals.RocksDBStore.
>> openDB(RocksDBStore.java:158)
>> at org.apache.kafka.streams.state.internals.RocksDBWindowStore$Segment.
>> openDB(RocksDBWindowStore.java:72)
>> at org.apache.kafka.streams.state.internals.RocksDBWindowStore.
>> getOrCreateSegment(RocksDBWindowStore.java:402)
>> at org.apache.kafka.streams.state.internals.RocksDBWindowStore.
>> putAndReturnInternalKey(RocksDBWindowStore.java:310)
>> at org.apache.kafka.streams.state.internals.RocksDBWindowStore.put(
>> RocksDBWindowStore.java:292)
>> at org.apache.kafka.streams.state.internals.MeteredWindowStore.put(
>> MeteredWindowStore.java:101)
>> at org.apache.kafka.streams.state.internals.CachingWindowStore$1.apply(
>> CachingWindowStore.java:87)
>> at org.apache.kafka.streams.state.internals.NamedCache.
>> flush(NamedCache.java:117)
>> at org.apache.kafka.streams.state.internals.ThreadCache.
>> flush(ThreadCache.java:100)
>> at org.apache.kafka.streams.state.internals.CachingWindowStore.flush(
>> CachingWindowStore.java:118)
>> at org.apache.kafka.streams.processor.internals.
>> ProcessorStateManager.flush(ProcessorStateManager.java:320)
>> ... 7 more
>> Caused by: org.rocksdb.RocksDBException: IO error: lock
>> /tmp/kafka-streams/argyle-streams/3_8/streams-data-
>> record-stats-avro-br-store/streams-data-record-stats-
>> avro-br-store-201505160000/LOCK: No locks available
>> at org.rocksdb.RocksDB.open(Native Method)
>> at org.rocksdb.RocksDB.open(RocksDB.java:184)
>> at org.apache.kafka.streams.state.internals.RocksDBStore.
>> openDB(RocksDBStore.java:189)
>> ... 18 more
>>
>> Some sort of a locking bug?
>>
>> Note that when this happen this node stops processing anything and the
>> other nodes seem to want to pick up the load, which brings the whole
>> streaming cluster to a stand still. That’s very worrying. Is a document
>> somewhere describing *in detail* how failover for streaming works?
>>
>> Ara.
>>
>>
>>
>> ________________________________
>>
>> This message is for the designated recipient only and may contain
>> privileged, proprietary, or otherwise confidential information. If you
>> have
>> received it in error, please notify the sender immediately and delete the
>> original. Any other use of the e-mail by you is prohibited. Thank you in
>> advance for your cooperation.
>>
>> ________________________________
>>
>>
>>
>> ________________________________
>>
>> This message is for the designated recipient only and may contain
>> privileged, proprietary, or otherwise confidential information. If you
>> have
>> received it in error, please notify the sender immediately and delete the
>> original. Any other use of the e-mail by you is prohibited. Thank you in
>> advance for your cooperation.
>>
>> ________________________________
>>
>>
>>
>>
>> ________________________________
>>
>> This message is for the designated recipient only and may contain
>> privileged, proprietary, or otherwise confidential information. If you
>> have
>> received it in error, please notify the sender immediately and delete the
>> original. Any other use of the e-mail by you is prohibited. Thank you in
>> advance for your cooperation.
>>
>> ________________________________
>>
>>
>>
>>
>> --
>> -- Guozhang
>>
>>
>>
>> ________________________________
>>
>> This message is for the designated recipient only and may contain
>> privileged, proprietary, or otherwise confidential information. If you have
>> received it in error, please notify the sender immediately and delete the
>> original. Any other use of the e-mail by you is prohibited. Thank you in
>> advance for your cooperation.
>>
>> ________________________________
>>
>>
>>
>>
>> ________________________________
>>
>> This message is for the designated recipient only and may contain
>> privileged, proprietary, or otherwise confidential information. If you have
>> received it in error, please notify the sender immediately and delete the
>> original. Any other use of the e-mail by you is prohibited. Thank you in
>> advance for your cooperation.
>>
>> ________________________________
>>
>
>
>
> --
> -- Guozhang
>



-- 
-- Guozhang

Reply via email to