[jira] [Commented] (KAFKA-8970) StateDirectory creation fails with Exception
[ https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949774#comment-16949774 ] Nishkam Ravi commented on KAFKA-8970: - We upgraded to 2.3.0 and didn't see the race condition. Thanks for the input [~guozhang] [~ableegoldman] [~vvcephei] [~bchen225242] > StateDirectory creation fails with Exception > > > Key: KAFKA-8970 > URL: https://issues.apache.org/jira/browse/KAFKA-8970 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Nishkam Ravi >Priority: Major > > When two threads try to create KafkaStreams simultaneously, one of them > succeeds while the other fails with the following exception: > org.apache.kafka.streams.errors.StreamsException: > org.apache.kafka.streams.errors.ProcessorStateException: base state directory > [/tmp/kafka-streams] doesn't exist and couldn't be created > Quick investigation suggests that this is because the code at/around: > [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82] > is not synchronized and can lead to race conditions. > Specifying different values for state.dir can be a workaround for this issue > but a bit cumbersome. Can we just make this synchronized? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8970) StateDirectory creation fails with Exception
[ https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944931#comment-16944931 ] Sophie Blee-Goldman commented on KAFKA-8970: Actually I just checked the code and I believe you're right, we do create a subdirectory in the state.dir that includes the app.id. So it doesn't seem unsafe, just unusual. There's no real reason to not allow it anyways (as far as I can tell). As for the problem you describe, it seems like you could just create the state.dir ahead of time and then let each app's subdirectories be created by Streams? > StateDirectory creation fails with Exception > > > Key: KAFKA-8970 > URL: https://issues.apache.org/jira/browse/KAFKA-8970 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Nishkam Ravi >Priority: Major > > When two threads try to create KafkaStreams simultaneously, one of them > succeeds while the other fails with the following exception: > org.apache.kafka.streams.errors.StreamsException: > org.apache.kafka.streams.errors.ProcessorStateException: base state directory > [/tmp/kafka-streams] doesn't exist and couldn't be created > Quick investigation suggests that this is because the code at/around: > [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82] > is not synchronized and can lead to race conditions. > Specifying different values for state.dir can be a workaround for this issue > but a bit cumbersome. Can we just make this synchronized? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8970) StateDirectory creation fails with Exception
[ https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944892#comment-16944892 ] Nishkam Ravi commented on KAFKA-8970: - Just so we are on the same page-- even when two apps have different app.id's we want their state dirs to be different (although they are using different subfolders as you point out)? In that case, wouldn't it make sense for KStreams to create state.dir's by appending _appId as a suffix? > StateDirectory creation fails with Exception > > > Key: KAFKA-8970 > URL: https://issues.apache.org/jira/browse/KAFKA-8970 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Nishkam Ravi >Priority: Major > > When two threads try to create KafkaStreams simultaneously, one of them > succeeds while the other fails with the following exception: > org.apache.kafka.streams.errors.StreamsException: > org.apache.kafka.streams.errors.ProcessorStateException: base state directory > [/tmp/kafka-streams] doesn't exist and couldn't be created > Quick investigation suggests that this is because the code at/around: > [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82] > is not synchronized and can lead to race conditions. > Specifying different values for state.dir can be a workaround for this issue > but a bit cumbersome. Can we just make this synchronized? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8970) StateDirectory creation fails with Exception
[ https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944889#comment-16944889 ] Sophie Blee-Goldman commented on KAFKA-8970: Ah ok, thanks for clarifying! I agree it wouldn't make much sense :) In that case though I believe they definitely should have different state directories. The reason being, they shouldn't really be sharing any state. But they could end up overwriting each other's state and/or checkpoint files, pretty much corrupting everything. The reasoning above still sort of applies here: each instance keeps track of its own task subdirectories and locks/owners, so threads from different apps could each "own" a subdirectory. And they're named according to the task naming scheme, like "1_0" or "2_1", so its pretty likely each app would be looking at/using the same task subdirectories. > StateDirectory creation fails with Exception > > > Key: KAFKA-8970 > URL: https://issues.apache.org/jira/browse/KAFKA-8970 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Nishkam Ravi >Priority: Major > > When two threads try to create KafkaStreams simultaneously, one of them > succeeds while the other fails with the following exception: > org.apache.kafka.streams.errors.StreamsException: > org.apache.kafka.streams.errors.ProcessorStateException: base state directory > [/tmp/kafka-streams] doesn't exist and couldn't be created > Quick investigation suggests that this is because the code at/around: > [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82] > is not synchronized and can lead to race conditions. > Specifying different values for state.dir can be a workaround for this issue > but a bit cumbersome. Can we just make this synchronized? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8970) StateDirectory creation fails with Exception
[ https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944874#comment-16944874 ] Nishkam Ravi commented on KAFKA-8970: - [~ableegoldman] Ahh, I see the reason for the disconnect. These streams are initialized with different app.ids (I should have mentioned that sooner). Having two streams initialized with the same app.id running on a single machine wouldn't make much sense. > StateDirectory creation fails with Exception > > > Key: KAFKA-8970 > URL: https://issues.apache.org/jira/browse/KAFKA-8970 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Nishkam Ravi >Priority: Major > > When two threads try to create KafkaStreams simultaneously, one of them > succeeds while the other fails with the following exception: > org.apache.kafka.streams.errors.StreamsException: > org.apache.kafka.streams.errors.ProcessorStateException: base state directory > [/tmp/kafka-streams] doesn't exist and couldn't be created > Quick investigation suggests that this is because the code at/around: > [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82] > is not synchronized and can lead to race conditions. > Specifying different values for state.dir can be a workaround for this issue > but a bit cumbersome. Can we just make this synchronized? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8970) StateDirectory creation fails with Exception
[ https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944838#comment-16944838 ] Sophie Blee-Goldman commented on KAFKA-8970: Maybe it would be better to explicitly not allow this case, either by literally failing in the code or just by explaining it's not safe in the docs. (You can of course still do so if you give each a separate state.dir). But I'm still curious as to why you might want to do this in the first place? One thing I could come up with off the top of my head is that IQ currently is blocked until all tasks have been completely restored by all threads. If one thread was significantly slower to restore than another, I suppose it could reduce the availability gap for IQ. That's definitely something we want to fix more holistically, so people don't need to circumvent the usual way of operating. > StateDirectory creation fails with Exception > > > Key: KAFKA-8970 > URL: https://issues.apache.org/jira/browse/KAFKA-8970 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Nishkam Ravi >Priority: Major > > When two threads try to create KafkaStreams simultaneously, one of them > succeeds while the other fails with the following exception: > org.apache.kafka.streams.errors.StreamsException: > org.apache.kafka.streams.errors.ProcessorStateException: base state directory > [/tmp/kafka-streams] doesn't exist and couldn't be created > Quick investigation suggests that this is because the code at/around: > [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82] > is not synchronized and can lead to race conditions. > Specifying different values for state.dir can be a workaround for this issue > but a bit cumbersome. Can we just make this synchronized? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8970) StateDirectory creation fails with Exception
[ https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944835#comment-16944835 ] Sophie Blee-Goldman commented on KAFKA-8970: What is the motivation for running two instances instead of using two threads (I assume they're both running with the same app.id, or is that incorrect?). I agree that having each use a separate state directory is not ideal, since they then won't be able to share state and will have to rebuild from scratch if a task is migrated from one instance to another. That said, I agree it is actually not safe to share the state dir between different KafkaStreams. The reason is that each task has a subdirectory that is locked by the owning thread to prevent access by others, and each KafkaStreams has a StateDirectory object keeping track of these with a Map. If you have two KafkaStreams, they will each have a separate StateDirectory object (even if the actual underlying state.dir is the same) and each one will be completely unaware of the locks owned by the other thread (from the other KafkaStreams) > StateDirectory creation fails with Exception > > > Key: KAFKA-8970 > URL: https://issues.apache.org/jira/browse/KAFKA-8970 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Nishkam Ravi >Priority: Major > > When two threads try to create KafkaStreams simultaneously, one of them > succeeds while the other fails with the following exception: > org.apache.kafka.streams.errors.StreamsException: > org.apache.kafka.streams.errors.ProcessorStateException: base state directory > [/tmp/kafka-streams] doesn't exist and couldn't be created > Quick investigation suggests that this is because the code at/around: > [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82] > is not synchronized and can lead to race conditions. > Specifying different values for state.dir can be a workaround for this issue > but a bit cumbersome. Can we just make this synchronized? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8970) StateDirectory creation fails with Exception
[ https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944828#comment-16944828 ] Nishkam Ravi commented on KAFKA-8970: - [~bchen225242] Just want to make sure we are on the same page. If we don't want two streams to share the same state dir, we could add a check in StateDirectory initialization and throw an exception when that happens? Or append a randomly generated suffix to the dir name? This Jira is pointing out a thread-safety issue that is somewhat different. > StateDirectory creation fails with Exception > > > Key: KAFKA-8970 > URL: https://issues.apache.org/jira/browse/KAFKA-8970 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Nishkam Ravi >Priority: Major > > When two threads try to create KafkaStreams simultaneously, one of them > succeeds while the other fails with the following exception: > org.apache.kafka.streams.errors.StreamsException: > org.apache.kafka.streams.errors.ProcessorStateException: base state directory > [/tmp/kafka-streams] doesn't exist and couldn't be created > Quick investigation suggests that this is because the code at/around: > [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82] > is not synchronized and can lead to race conditions. > Specifying different values for state.dir can be a workaround for this issue > but a bit cumbersome. Can we just make this synchronized? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8970) StateDirectory creation fails with Exception
[ https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944806#comment-16944806 ] Boyang Chen commented on KAFKA-8970: Thanks for the report Nishkam. I suspect your unusual way of initializing two stream instances on the same physical host could be triggering this problem, as natively we assume single reader-writer from stream JVM to state dir. If both instances are reading from the same state dir, they probably overlap task ownership during assignment which is also not a truly ideal scenario. I also agree that using different state dir is a good temporary solution, but let's see if the upgrade shall work first. > StateDirectory creation fails with Exception > > > Key: KAFKA-8970 > URL: https://issues.apache.org/jira/browse/KAFKA-8970 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Nishkam Ravi >Priority: Major > > When two threads try to create KafkaStreams simultaneously, one of them > succeeds while the other fails with the following exception: > org.apache.kafka.streams.errors.StreamsException: > org.apache.kafka.streams.errors.ProcessorStateException: base state directory > [/tmp/kafka-streams] doesn't exist and couldn't be created > Quick investigation suggests that this is because the code at/around: > [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82] > is not synchronized and can lead to race conditions. > Specifying different values for state.dir can be a workaround for this issue > but a bit cumbersome. Can we just make this synchronized? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8970) StateDirectory creation fails with Exception
[ https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944340#comment-16944340 ] Nishkam Ravi commented on KAFKA-8970: - Hi [~guozhang] we're using 2.1.1 (so a bit behind master). We can try and see if we are able to update to a more recent version and reproduce this problem. Based on what I've seen in the code, we should be able to. Full Stack trace: ```org.apache.kafka.streams.errors.StreamsException: org.apache.kafka.streams.errors.ProcessorStateException: base state directory [/tmp/kafka-streams] doesn't exist and couldn't be created at org.apache.kafka.streams.KafkaStreams.(KafkaStreams.java:649) at org.apache.kafka.streams.KafkaStreams.(KafkaStreams.java:623) at org.apache.kafka.streams.KafkaStreams.(KafkaStreams.java:607) at org.apache.kafka.streams.KafkaStreams.(KafkaStreams.java:597)``` [~vvcephei] running different KStreams on the same machine works just fine (and is quite useful for us). As long as we initialize both from the same thread, there are no issues (this is what we've been doing for quite some time now). > StateDirectory creation fails with Exception > > > Key: KAFKA-8970 > URL: https://issues.apache.org/jira/browse/KAFKA-8970 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Nishkam Ravi >Priority: Major > > When two threads try to create KafkaStreams simultaneously, one of them > succeeds while the other fails with the following exception: > org.apache.kafka.streams.errors.StreamsException: > org.apache.kafka.streams.errors.ProcessorStateException: base state directory > [/tmp/kafka-streams] doesn't exist and couldn't be created > Quick investigation suggests that this is because the code at/around: > [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82] > is not synchronized and can lead to race conditions. > Specifying different values for state.dir can be a workaround for this issue > but a bit cumbersome. Can we just make this synchronized? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8970) StateDirectory creation fails with Exception
[ https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944039#comment-16944039 ] John Roesler commented on KAFKA-8970: - Also, do you mean you are trying to run two instances of KafkaStreams on the same machine, with the same state directory? Depending on the guarantees of your filesystem, this might or might not work. In any case, I'm not 100% sure it's recommended. Can you explain why this is preferable to, say, just using more threads in one instance? Or have I misunderstood you? > StateDirectory creation fails with Exception > > > Key: KAFKA-8970 > URL: https://issues.apache.org/jira/browse/KAFKA-8970 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Nishkam Ravi >Priority: Major > > When two threads try to create KafkaStreams simultaneously, one of them > succeeds while the other fails with the following exception: > org.apache.kafka.streams.errors.StreamsException: > org.apache.kafka.streams.errors.ProcessorStateException: base state directory > [/tmp/kafka-streams] doesn't exist and couldn't be created > Quick investigation suggests that this is because the code at/around: > [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82] > is not synchronized and can lead to race conditions. > Specifying different values for state.dir can be a workaround for this issue > but a bit cumbersome. Can we just make this synchronized? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8970) StateDirectory creation fails with Exception
[ https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943730#comment-16943730 ] Guozhang Wang commented on KAFKA-8970: -- Hi Nishkam, which Kafka version are you using? There are a couple of known race condition issues that have been fixed lately, especially KAFKA-5998. As for the line you pointed it out, creating a File object should be fine, and `mkdirs()` is essentially an atomic operation. If you are on latest version and sees this problem, could you upload the whole stack trace? > StateDirectory creation fails with Exception > > > Key: KAFKA-8970 > URL: https://issues.apache.org/jira/browse/KAFKA-8970 > Project: Kafka > Issue Type: Bug >Reporter: Nishkam Ravi >Priority: Major > > When two threads try to create KafkaStreams simultaneously, one of them > succeeds while the other fails with the following exception: > org.apache.kafka.streams.errors.StreamsException: > org.apache.kafka.streams.errors.ProcessorStateException: base state directory > [/tmp/kafka-streams] doesn't exist and couldn't be created > Quick investigation suggests that this is because the code at/around: > [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82] > is not synchronized and can lead to race conditions. > Specifying different values for state.dir can be a workaround for this issue > but a bit cumbersome. Can we just make this synchronized? -- This message was sent by Atlassian Jira (v8.3.4#803005)