[jira] [Commented] (KAFKA-8970) StateDirectory creation fails with Exception

2019-10-11 Thread Nishkam Ravi (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949774#comment-16949774
 ] 

Nishkam Ravi commented on KAFKA-8970:
-

We upgraded to 2.3.0 and didn't see the race condition. Thanks for the input 
[~guozhang] [~ableegoldman] [~vvcephei] [~bchen225242]

> StateDirectory creation fails with Exception
> 
>
> Key: KAFKA-8970
> URL: https://issues.apache.org/jira/browse/KAFKA-8970
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Nishkam Ravi
>Priority: Major
>
> When two threads try to create KafkaStreams simultaneously, one of them 
> succeeds while the other fails with the following exception:
> org.apache.kafka.streams.errors.StreamsException: 
> org.apache.kafka.streams.errors.ProcessorStateException: base state directory 
> [/tmp/kafka-streams] doesn't exist and couldn't be created
> Quick investigation suggests that this is because the code at/around:
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82]
> is not synchronized and can lead to race conditions.
> Specifying different values for state.dir can be a workaround for this issue 
> but a bit cumbersome. Can we just make this synchronized?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8970) StateDirectory creation fails with Exception

2019-10-04 Thread Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944931#comment-16944931
 ] 

Sophie Blee-Goldman commented on KAFKA-8970:


Actually I just checked the code and I believe you're right, we do create a 
subdirectory in the state.dir that includes the app.id. So it doesn't seem 
unsafe, just unusual. There's no real reason to not allow it anyways (as far as 
I can tell). As for the problem you describe, it seems like you could just 
create the state.dir ahead of time and then let each app's subdirectories be 
created by Streams?

> StateDirectory creation fails with Exception
> 
>
> Key: KAFKA-8970
> URL: https://issues.apache.org/jira/browse/KAFKA-8970
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Nishkam Ravi
>Priority: Major
>
> When two threads try to create KafkaStreams simultaneously, one of them 
> succeeds while the other fails with the following exception:
> org.apache.kafka.streams.errors.StreamsException: 
> org.apache.kafka.streams.errors.ProcessorStateException: base state directory 
> [/tmp/kafka-streams] doesn't exist and couldn't be created
> Quick investigation suggests that this is because the code at/around:
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82]
> is not synchronized and can lead to race conditions.
> Specifying different values for state.dir can be a workaround for this issue 
> but a bit cumbersome. Can we just make this synchronized?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8970) StateDirectory creation fails with Exception

2019-10-04 Thread Nishkam Ravi (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944892#comment-16944892
 ] 

Nishkam Ravi commented on KAFKA-8970:
-

Just so we are on the same page-- even when two apps have different app.id's we 
want their state dirs to be different (although they are using different 
subfolders as you point out)? In that case, wouldn't it make sense for KStreams 
to create state.dir's by appending _appId as a suffix?

> StateDirectory creation fails with Exception
> 
>
> Key: KAFKA-8970
> URL: https://issues.apache.org/jira/browse/KAFKA-8970
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Nishkam Ravi
>Priority: Major
>
> When two threads try to create KafkaStreams simultaneously, one of them 
> succeeds while the other fails with the following exception:
> org.apache.kafka.streams.errors.StreamsException: 
> org.apache.kafka.streams.errors.ProcessorStateException: base state directory 
> [/tmp/kafka-streams] doesn't exist and couldn't be created
> Quick investigation suggests that this is because the code at/around:
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82]
> is not synchronized and can lead to race conditions.
> Specifying different values for state.dir can be a workaround for this issue 
> but a bit cumbersome. Can we just make this synchronized?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8970) StateDirectory creation fails with Exception

2019-10-04 Thread Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944889#comment-16944889
 ] 

Sophie Blee-Goldman commented on KAFKA-8970:


Ah ok, thanks for clarifying! I agree it wouldn't make much sense :) 

In that case though I believe they definitely should have different state 
directories. The reason being, they shouldn't really be sharing any state. But 
they could end up overwriting each other's state and/or checkpoint files, 
pretty much corrupting everything.

The reasoning above still sort of applies here: each instance keeps track of 
its own task subdirectories and locks/owners, so threads from different apps 
could each "own" a subdirectory. And they're named according to the task naming 
scheme, like "1_0" or "2_1", so its pretty likely each app would be looking 
at/using the same task subdirectories.

> StateDirectory creation fails with Exception
> 
>
> Key: KAFKA-8970
> URL: https://issues.apache.org/jira/browse/KAFKA-8970
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Nishkam Ravi
>Priority: Major
>
> When two threads try to create KafkaStreams simultaneously, one of them 
> succeeds while the other fails with the following exception:
> org.apache.kafka.streams.errors.StreamsException: 
> org.apache.kafka.streams.errors.ProcessorStateException: base state directory 
> [/tmp/kafka-streams] doesn't exist and couldn't be created
> Quick investigation suggests that this is because the code at/around:
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82]
> is not synchronized and can lead to race conditions.
> Specifying different values for state.dir can be a workaround for this issue 
> but a bit cumbersome. Can we just make this synchronized?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8970) StateDirectory creation fails with Exception

2019-10-04 Thread Nishkam Ravi (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944874#comment-16944874
 ] 

Nishkam Ravi commented on KAFKA-8970:
-

[~ableegoldman] Ahh, I see the reason for the disconnect. These streams are 
initialized with different app.ids (I should have mentioned that sooner). 
Having two streams initialized with the same app.id running on a single machine 
wouldn't make much sense. 

> StateDirectory creation fails with Exception
> 
>
> Key: KAFKA-8970
> URL: https://issues.apache.org/jira/browse/KAFKA-8970
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Nishkam Ravi
>Priority: Major
>
> When two threads try to create KafkaStreams simultaneously, one of them 
> succeeds while the other fails with the following exception:
> org.apache.kafka.streams.errors.StreamsException: 
> org.apache.kafka.streams.errors.ProcessorStateException: base state directory 
> [/tmp/kafka-streams] doesn't exist and couldn't be created
> Quick investigation suggests that this is because the code at/around:
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82]
> is not synchronized and can lead to race conditions.
> Specifying different values for state.dir can be a workaround for this issue 
> but a bit cumbersome. Can we just make this synchronized?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8970) StateDirectory creation fails with Exception

2019-10-04 Thread Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944838#comment-16944838
 ] 

Sophie Blee-Goldman commented on KAFKA-8970:


Maybe it would be better to explicitly not allow this case, either by literally 
failing in the code or just by explaining it's not safe in the docs. (You can 
of course still do so if you give each a separate state.dir).

But I'm still curious as to why you might want to do this in the first place? 
One thing I could come up with off the top of my head is that IQ currently is 
blocked until all tasks have been completely restored by all threads. If one 
thread was significantly slower to restore than another, I suppose it could 
reduce the availability gap for IQ. That's definitely something we want to fix 
more holistically, so people don't need to circumvent the usual way of 
operating.

> StateDirectory creation fails with Exception
> 
>
> Key: KAFKA-8970
> URL: https://issues.apache.org/jira/browse/KAFKA-8970
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Nishkam Ravi
>Priority: Major
>
> When two threads try to create KafkaStreams simultaneously, one of them 
> succeeds while the other fails with the following exception:
> org.apache.kafka.streams.errors.StreamsException: 
> org.apache.kafka.streams.errors.ProcessorStateException: base state directory 
> [/tmp/kafka-streams] doesn't exist and couldn't be created
> Quick investigation suggests that this is because the code at/around:
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82]
> is not synchronized and can lead to race conditions.
> Specifying different values for state.dir can be a workaround for this issue 
> but a bit cumbersome. Can we just make this synchronized?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8970) StateDirectory creation fails with Exception

2019-10-04 Thread Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944835#comment-16944835
 ] 

Sophie Blee-Goldman commented on KAFKA-8970:


What is the motivation for running two instances instead of using two threads 
(I assume they're both running with the same app.id, or is that incorrect?). I 
agree that having each use a separate state directory is not ideal, since they 
then won't be able to share state and will have to rebuild from scratch if a 
task is migrated from one instance to another. 

That said, I agree it is actually not safe to share the state dir between 
different KafkaStreams. The reason is that each task has a subdirectory that is 
locked by the owning thread to prevent access by others, and each KafkaStreams 
has a StateDirectory object keeping track of these with a Map. If you have two KafkaStreams, they will each have a separate 
StateDirectory object (even if the actual underlying state.dir is the same) and 
each one will be completely unaware of the locks owned by the other thread 
(from the other KafkaStreams)

> StateDirectory creation fails with Exception
> 
>
> Key: KAFKA-8970
> URL: https://issues.apache.org/jira/browse/KAFKA-8970
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Nishkam Ravi
>Priority: Major
>
> When two threads try to create KafkaStreams simultaneously, one of them 
> succeeds while the other fails with the following exception:
> org.apache.kafka.streams.errors.StreamsException: 
> org.apache.kafka.streams.errors.ProcessorStateException: base state directory 
> [/tmp/kafka-streams] doesn't exist and couldn't be created
> Quick investigation suggests that this is because the code at/around:
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82]
> is not synchronized and can lead to race conditions.
> Specifying different values for state.dir can be a workaround for this issue 
> but a bit cumbersome. Can we just make this synchronized?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8970) StateDirectory creation fails with Exception

2019-10-04 Thread Nishkam Ravi (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944828#comment-16944828
 ] 

Nishkam Ravi commented on KAFKA-8970:
-

[~bchen225242] Just want to make sure we are on the same page. If we don't want 
two streams to share the same state dir, we could add a check in StateDirectory 
initialization and throw an exception when that happens? Or append a randomly 
generated suffix to the dir name? This Jira is pointing out a thread-safety 
issue that is somewhat different.

> StateDirectory creation fails with Exception
> 
>
> Key: KAFKA-8970
> URL: https://issues.apache.org/jira/browse/KAFKA-8970
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Nishkam Ravi
>Priority: Major
>
> When two threads try to create KafkaStreams simultaneously, one of them 
> succeeds while the other fails with the following exception:
> org.apache.kafka.streams.errors.StreamsException: 
> org.apache.kafka.streams.errors.ProcessorStateException: base state directory 
> [/tmp/kafka-streams] doesn't exist and couldn't be created
> Quick investigation suggests that this is because the code at/around:
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82]
> is not synchronized and can lead to race conditions.
> Specifying different values for state.dir can be a workaround for this issue 
> but a bit cumbersome. Can we just make this synchronized?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8970) StateDirectory creation fails with Exception

2019-10-04 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944806#comment-16944806
 ] 

Boyang Chen commented on KAFKA-8970:


Thanks for the report Nishkam. I suspect your unusual way of initializing two 
stream instances on the same physical host could be triggering this problem, as 
natively we assume single reader-writer from stream JVM to state dir. If both 
instances are reading from the same state dir, they probably overlap task 
ownership during assignment which is also not a truly ideal scenario. I also 
agree that using different state dir is a good temporary solution, but let's 
see if the upgrade shall work first.

> StateDirectory creation fails with Exception
> 
>
> Key: KAFKA-8970
> URL: https://issues.apache.org/jira/browse/KAFKA-8970
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Nishkam Ravi
>Priority: Major
>
> When two threads try to create KafkaStreams simultaneously, one of them 
> succeeds while the other fails with the following exception:
> org.apache.kafka.streams.errors.StreamsException: 
> org.apache.kafka.streams.errors.ProcessorStateException: base state directory 
> [/tmp/kafka-streams] doesn't exist and couldn't be created
> Quick investigation suggests that this is because the code at/around:
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82]
> is not synchronized and can lead to race conditions.
> Specifying different values for state.dir can be a workaround for this issue 
> but a bit cumbersome. Can we just make this synchronized?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8970) StateDirectory creation fails with Exception

2019-10-04 Thread Nishkam Ravi (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944340#comment-16944340
 ] 

Nishkam Ravi commented on KAFKA-8970:
-

Hi [~guozhang] we're using 2.1.1 (so a bit behind master). We can try and see 
if we are able to update to a more recent version and reproduce this problem. 
Based on what I've seen in the code, we should be able to.

Full Stack trace:

```org.apache.kafka.streams.errors.StreamsException: 
org.apache.kafka.streams.errors.ProcessorStateException: base state directory 
[/tmp/kafka-streams] doesn't exist and couldn't be created at 
org.apache.kafka.streams.KafkaStreams.(KafkaStreams.java:649) at 
org.apache.kafka.streams.KafkaStreams.(KafkaStreams.java:623) at 
org.apache.kafka.streams.KafkaStreams.(KafkaStreams.java:607) at 
org.apache.kafka.streams.KafkaStreams.(KafkaStreams.java:597)```

[~vvcephei] running different KStreams on the same machine works just fine (and 
is quite useful for us). As long as we initialize both from the same thread, 
there are no issues (this is what we've been doing for quite some time now). 

> StateDirectory creation fails with Exception
> 
>
> Key: KAFKA-8970
> URL: https://issues.apache.org/jira/browse/KAFKA-8970
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Nishkam Ravi
>Priority: Major
>
> When two threads try to create KafkaStreams simultaneously, one of them 
> succeeds while the other fails with the following exception:
> org.apache.kafka.streams.errors.StreamsException: 
> org.apache.kafka.streams.errors.ProcessorStateException: base state directory 
> [/tmp/kafka-streams] doesn't exist and couldn't be created
> Quick investigation suggests that this is because the code at/around:
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82]
> is not synchronized and can lead to race conditions.
> Specifying different values for state.dir can be a workaround for this issue 
> but a bit cumbersome. Can we just make this synchronized?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8970) StateDirectory creation fails with Exception

2019-10-03 Thread John Roesler (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944039#comment-16944039
 ] 

John Roesler commented on KAFKA-8970:
-

Also, do you mean you are trying to run two instances of KafkaStreams on the 
same machine, with the same state directory? Depending on the guarantees of 
your filesystem, this might or might not work. In any case, I'm not 100% sure 
it's recommended. Can you explain why this is preferable to, say, just using 
more threads in one instance?

Or have I misunderstood you?

> StateDirectory creation fails with Exception
> 
>
> Key: KAFKA-8970
> URL: https://issues.apache.org/jira/browse/KAFKA-8970
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Nishkam Ravi
>Priority: Major
>
> When two threads try to create KafkaStreams simultaneously, one of them 
> succeeds while the other fails with the following exception:
> org.apache.kafka.streams.errors.StreamsException: 
> org.apache.kafka.streams.errors.ProcessorStateException: base state directory 
> [/tmp/kafka-streams] doesn't exist and couldn't be created
> Quick investigation suggests that this is because the code at/around:
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82]
> is not synchronized and can lead to race conditions.
> Specifying different values for state.dir can be a workaround for this issue 
> but a bit cumbersome. Can we just make this synchronized?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8970) StateDirectory creation fails with Exception

2019-10-03 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943730#comment-16943730
 ] 

Guozhang Wang commented on KAFKA-8970:
--

Hi Nishkam, which Kafka version are you using? There are a couple of known race 
condition issues that have been fixed lately, especially KAFKA-5998. As for the 
line you pointed it out, creating a File object should be fine, and `mkdirs()` 
is essentially an atomic operation.

If you are on latest version and sees this problem, could you upload the whole 
stack trace?

> StateDirectory creation fails with Exception
> 
>
> Key: KAFKA-8970
> URL: https://issues.apache.org/jira/browse/KAFKA-8970
> Project: Kafka
>  Issue Type: Bug
>Reporter: Nishkam Ravi
>Priority: Major
>
> When two threads try to create KafkaStreams simultaneously, one of them 
> succeeds while the other fails with the following exception:
> org.apache.kafka.streams.errors.StreamsException: 
> org.apache.kafka.streams.errors.ProcessorStateException: base state directory 
> [/tmp/kafka-streams] doesn't exist and couldn't be created
> Quick investigation suggests that this is because the code at/around:
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L82]
> is not synchronized and can lead to race conditions.
> Specifying different values for state.dir can be a workaround for this issue 
> but a bit cumbersome. Can we just make this synchronized?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)