[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629504#comment-16629504 ] Ted Yu commented on ZOOKEEPER-1936: --- Can you outline how you plan to fix ? thanks > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Ted Yu >Priority: Minor > Fix For: 3.6.0, 3.5.5 > > Attachments: ZOOKEEPER-1936.branch-3.4.patch, ZOOKEEPER-1936.patch, > ZOOKEEPER-1936.v2.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, > ZOOKEEPER-1936.v4.patch, ZOOKEEPER-1936.v5.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16028392#comment-16028392 ] Enrico Olivelli commented on ZOOKEEPER-1936: [~fpj] [~yuzhih...@gmail.com] [~cnauroth] I can pick up the issue a propose my local patch. This issue is quite annoying in JUnit tests cases of projects which use ZK and spawn ZK servers. I would like to provide a patch for 3.5 branch and 3.6 > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Ted Yu >Priority: Minor > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-1936.branch-3.4.patch, ZOOKEEPER-1936.patch, > ZOOKEEPER-1936.v2.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, > ZOOKEEPER-1936.v4.patch, ZOOKEEPER-1936.v5.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16028326#comment-16028326 ] Flavio Junqueira commented on ZOOKEEPER-1936: - It also looks like the diff was broken as the number of commits listed is large. I haven't looked closely but it seems that merges weren't done appropriately. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Ted Yu >Priority: Minor > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-1936.branch-3.4.patch, ZOOKEEPER-1936.patch, > ZOOKEEPER-1936.v2.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, > ZOOKEEPER-1936.v4.patch, ZOOKEEPER-1936.v5.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16028323#comment-16028323 ] Flavio Junqueira commented on ZOOKEEPER-1936: - I think it was simply closed, I had a few comments there that were never addressed. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Ted Yu >Priority: Minor > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-1936.branch-3.4.patch, ZOOKEEPER-1936.patch, > ZOOKEEPER-1936.v2.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, > ZOOKEEPER-1936.v4.patch, ZOOKEEPER-1936.v5.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16028186#comment-16028186 ] Enrico Olivelli commented on ZOOKEEPER-1936: this issue is marked as fixversion = 3.5.4, the PR #75 has been closed. I cannot find commits in 3.5 branch which is the actual status ? I am very interested in this fix > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Ted Yu >Priority: Minor > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-1936.branch-3.4.patch, ZOOKEEPER-1936.patch, > ZOOKEEPER-1936.v2.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, > ZOOKEEPER-1936.v4.patch, ZOOKEEPER-1936.v5.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15930873#comment-15930873 ] ASF GitHub Bot commented on ZOOKEEPER-1936: --- Github user Humbedooh closed the pull request at: https://github.com/apache/zookeeper/pull/75 > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Ted Yu >Priority: Minor > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-1936.branch-3.4.patch, ZOOKEEPER-1936.patch, > ZOOKEEPER-1936.v2.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, > ZOOKEEPER-1936.v4.patch, ZOOKEEPER-1936.v5.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551351#comment-15551351 ] Flavio Junqueira commented on ZOOKEEPER-1936: - I'd expect [~cnauroth] to +1 it. In the meanwhile, I've had another look, and there are a couple of things I don't understand: - With the 3.4 patch, we have this: {noformat} if (!this.dataDir.exists()) { if (!this.dataDir.mkdirs() && !this.dataDir.exists()) { {noformat} why do we need the first call to {{this.dataDir.exists()}} and the encapsulating if block? It sounds like we don't need the outer if block. - In the 3.5 patch, I'm not sure why we need this if: {noformat} if (!this.snapDir.exists()) {noformat} In the case {{Files.createDirectories}} fails to create the directory, then we will have an exception, so the two possible outcomes are: 1) directory is created just fine; 2) exception is thrown. Consequently, it doesn't look like we need that last if, but maybe I'm missing something. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Ted Yu >Priority: Minor > Fix For: 3.4.10, 3.5.3 > > Attachments: ZOOKEEPER-1936.branch-3.4.patch, ZOOKEEPER-1936.patch, > ZOOKEEPER-1936.v2.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, > ZOOKEEPER-1936.v4.patch, ZOOKEEPER-1936.v5.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15550505#comment-15550505 ] Ted Yu commented on ZOOKEEPER-1936: --- Is there anything I can do to move this forward ? > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Ted Yu >Priority: Minor > Fix For: 3.4.10, 3.5.3 > > Attachments: ZOOKEEPER-1936.branch-3.4.patch, ZOOKEEPER-1936.patch, > ZOOKEEPER-1936.v2.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, > ZOOKEEPER-1936.v4.patch, ZOOKEEPER-1936.v5.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15536734#comment-15536734 ] Flavio Junqueira commented on ZOOKEEPER-1936: - Is it better to catch {{FileAlreadyExistsException}} for the {{Files.createDirectories}} call to be safe, in the case the directory is created concurrently? I'm actually wondering why we added that {{DatadirException}}. I'd much rather just keep it {{IOException}} instead... I understand this isn't being introduced in this patch, although maybe for 3.4 if we merge there. Actually, what are the fix versions for this issue? > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15536684#comment-15536684 ] Flavio Junqueira commented on ZOOKEEPER-1936: - I was looking at this pull request: https://github.com/apache/zookeeper/pull/75.patch but it looks like it doesn't correspond to the v4 patch attached. What is it precisely that we are proposing to merge? > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534026#comment-15534026 ] Chris Nauroth commented on ZOOKEEPER-1936: -- [~rgs] or [~fpj], could I please trouble one of you to do one last review pass to make sure we're in agreement before I commit? Thank you. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15531396#comment-15531396 ] Matt Foley commented on ZOOKEEPER-1936: --- [~cnauroth], please do commit. +1. Thanks. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505015#comment-15505015 ] Chris Nauroth commented on ZOOKEEPER-1936: -- This patch has been stalled, because there is no easy way to write a unit test for it, and no one has been able to produce a consistent repro in a live environment. I have good news. I was able to find a consistent repro with an environment that could reproduce the problem in approximately 80% of ZooKeeper server starts. FWIW, the OS was SUSE11sp3, and it was running ZooKeeper 3.4.6. I applied the v3 patch, deployed it in this environment, and we could no longer repro. Based on successful manual testing, I am now +1 to commit patch v4 to trunk and branch-3.5, and commit patch v3 to branch-3.4. I will wait until later in the week in case other committers who have been watching the issue would like to discuss further. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398742#comment-15398742 ] ASF GitHub Bot commented on ZOOKEEPER-1936: --- GitHub user nddipiazza opened a pull request: https://github.com/apache/zookeeper/pull/75 https://issues.apache.org/jira/browse/ZOOKEEPER-1936 https://issues.apache.org/jira/browse/ZOOKEEPER-1936 port fix to 3.4 You can merge this pull request into a Git repository by running: $ git pull https://github.com/nddipiazza/zookeeper ZOOKEEPER-1936 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/75.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #75 commit bdd8798895e21bf3158c63c1d00aa99fba5e9f34 Author: Nicholas DiPiazzaDate: 2016-07-29T05:32:03Z https://issues.apache.org/jira/browse/ZOOKEEPER-1936 port fix to 3.4 > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398609#comment-15398609 ] Nicholas DiPiazza commented on ZOOKEEPER-1936: -- v1 and v2 patch with no changes. v3 doesn't: {code} branch-3.4 {code} {code} patch -p0 < ZOOKEEPER-1936.v3.patch patching file src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java Hunk #1 FAILED at 101. Hunk #2 FAILED at 117. 2 out of 2 hunks FAILED -- saving rejects to file src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java.rej {code} > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398125#comment-15398125 ] Chris Nauroth commented on ZOOKEEPER-1936: -- Hello [~nicholas.dipiazza]. The v3 patch attachment is similar logic that is compatible with the 3.4 release line. If you can test your 3.4-based environment with that patch, then that would be interesting for us. Thanks! > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398047#comment-15398047 ] Nicholas DiPiazza commented on ZOOKEEPER-1936: -- Is it going to be viciously hard to test against the 3.4 release? Not able to go 3.5 quite yet. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397919#comment-15397919 ] Chris Nauroth commented on ZOOKEEPER-1936: -- Hello [~mumrah]. At this point, I don't believe any of us have a repro, so bringing in this patch was not prioritized. I'm going to update fix version to 3.5.3 to indicate that as the next potential release to contain the patch. I'm curious if you are running the 3.5 release line of ZooKeeper, and if so, do you have the ability to apply the latest proposed patch? If we can get confirmation that the patch fixes the problem you're seeing in your environment, then that would help build confidence in the patch. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397533#comment-15397533 ] Hadoop QA commented on ZOOKEEPER-1936: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12785224/ZOOKEEPER-1936.v4.patch against trunk revision 1754188. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3298//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3298//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3298//console This message is automatically generated. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397513#comment-15397513 ] David Arthur commented on ZOOKEEPER-1936: - Are there any known scenarios that will trigger this race? We've seen it intermittently in an EC2 environment, but have yet to figure out why it happens there and not other environments. Also, any updates on the status of this issue? > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15123906#comment-15123906 ] Chris Nauroth commented on ZOOKEEPER-1936: -- [~yuzhih...@gmail.com], my understanding is that you only can repro this in standalone mode, not when deploying a full ensemble. Is that correct? Did you get to a point where you had a consistent repro and were able to verify that this patch helped fix it? The logic change looks correct to me, but I'm trying to figure out if there is really something more going on, as suggested in earlier comments. Thanks! > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15123961#comment-15123961 ] Chris Nauroth commented on ZOOKEEPER-1936: -- It's a tough call then. I don't see a way to write a deterministic JUnit test to prove the fix. I'm reluctant to accept a code change without a test or a manual verification, at least on a stable maintenance line. Here is my take on it. Even without a consistent repro, I see the theoretical problem in the code. The logic change in the patch looks correct to me, even if there might have been something more happening when Ted reported it. Let's put a fix into trunk and branch-3.5, but not branch-3.4. In trunk and branch-3.5, we also can make the switch to [{{Files#createDirectories}}|http://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#createDirectories(java.nio.file.Path,%20java.nio.file.attribute.FileAttribute...)] that I mentioned earlier, because those branches are compiling to JDK 7. That way, if we see another repro, we'll get additional debugging information to help with any subsequent patches. Would some other committers like to comment on that plan? Thanks! > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15123934#comment-15123934 ] Ted Yu commented on ZOOKEEPER-1936: --- Haven't got a chance to reproduce the bug. After some QE fix, hbase un-secure deployment works reliably. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124017#comment-15124017 ] Hadoop QA commented on ZOOKEEPER-1936: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12785223/ZOOKEEPER-1936.v4.patch against trunk revision 1726354. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3021//console This message is automatically generated. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124054#comment-15124054 ] Hadoop QA commented on ZOOKEEPER-1936: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12785224/ZOOKEEPER-1936.v4.patch against trunk revision 1726354. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3022//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3022//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3022//console This message is automatically generated. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124117#comment-15124117 ] Chris Nauroth commented on ZOOKEEPER-1936: -- There was a test failure in {{WatcherTest#testWatcherAutoResetWithLocal}}, but I can't reproduce it. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108940#comment-15108940 ] Ted Yu commented on ZOOKEEPER-1936: --- https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3010//testReport/org.apache.zookeeper.test/AsyncHammerTest/testHammer/ doesn't seem to be related to the patch. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108068#comment-15108068 ] Ted Yu commented on ZOOKEEPER-1936: --- Previous patch was generated for branch-3.4 Attached patch for trunk. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108083#comment-15108083 ] Rakesh R commented on ZOOKEEPER-1936: - For better understanding about the target branch, probably can include branch details while naming the patch, something like {{ZOOKEEPER-1936-br-3-4.patch}} and for trunk can use like {{ZOOKEEPER-1936.patch}} > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108039#comment-15108039 ] Raul Gutierrez Segales commented on ZOOKEEPER-1936: --- [~te...@apache.org]: mind generating the patch with something like: {code} git diff --no-prefix HEAD~1.. > ZOOKEEPER-1936.patch {code} The latest one you uploaded didn't apply cleanly. Patch lgtm, +1. [~fpj], [~cnauroth]: mind giving it one last look? > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108088#comment-15108088 ] Hadoop QA commented on ZOOKEEPER-1936: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12783285/ZOOKEEPER-1936.v3.patch against trunk revision 1720227. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3010//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3010//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3010//console This message is automatically generated. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101984#comment-15101984 ] Flavio Junqueira commented on ZOOKEEPER-1936: - what happens if the call to mkdir legitimaly fails? It looks like we would assume that the directory exists and would move on. I think we need to differentiate the dir existing from other issues when creating it. Does it sound reasonable? > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102249#comment-15102249 ] Chris Nauroth commented on ZOOKEEPER-1936: -- I'm in favor of the approach in patch v2. This would be a deterministic fix. Adding a delay like the first patch might still not work if we got unlucky in the way the OS scheduled the threads. What do others think? [~tedyu], could you please do the following? # Make the same fix for {{snapDir}}, which is right after the code you already changed in {{FileTxnSnapLog}}. Otherwise, we might get past the {{dataDir}} creation only to fail again on {{snapDir}}. # Post 2 patch files: one that applied to trunk and one that applies to branch-3.4. # Generate the patch files with {{git diff --no-prefix}} for compatibility with our pre-commit automation. Thank you! > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102276#comment-15102276 ] Ted Yu commented on ZOOKEEPER-1936: --- Patch v3 addresses comments from Chris and Rakesh. The same patch can be applied smoothly on branch-3.4 Let me know if separate patch for branch-3.4 should be attached. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102301#comment-15102301 ] Hadoop QA commented on ZOOKEEPER-1936: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12782581/ZOOKEEPER-1936.v3.patch against trunk revision 1720227. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3009//console This message is automatically generated. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102294#comment-15102294 ] Flavio Junqueira commented on ZOOKEEPER-1936: - but [~ted_yu] said that even with his patch {{dataDir}} wasn't created, and if what you suggest in step 1 fixed it, then the directory would be there, no? I'm actually wondering if there is something else causing trouble. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102307#comment-15102307 ] Chris Nauroth commented on ZOOKEEPER-1936: -- Yeah, my comments got crossed up, because I hadn't refreshed the page to see the latest. Unfortunately, [{{File#mkdirs}}|http://docs.oracle.com/javase/7/docs/api/java/io/File.html#mkdirs()] only gives us a boolean response with no further information about root cause. [~tedyu], do you think you can try as a troubleshooting step switching it to call [{{Files#createDirectories}}|http://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#createDirectories(java.nio.file.Path,%20java.nio.file.attribute.FileAttribute...)] ? That might give us more detailed information about the error. We can't use the JDK 1.7 file APIs in the 3.4 maintenance line, so this would just be a temporary troubleshooting step. We can use those APIs in trunk/branch-3.5 though. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099001#comment-15099001 ] Raul Gutierrez Segales commented on ZOOKEEPER-1936: --- cc: [~cnauroth], [~rakeshr] > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099184#comment-15099184 ] Hadoop QA commented on ZOOKEEPER-1936: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12782377/ZOOKEEPER-1936.v2.patch against trunk revision 1720227. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3008//console This message is automatically generated. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101270#comment-15101270 ] Rakesh R commented on ZOOKEEPER-1936: - Thanks [~yuzhih...@gmail.com] for the fix. +1 for the additional {{#exists()}} check. I've few comments: # Could you consider {{snapDir}} creation too. {code} if (!this.snapDir.mkdirs()) { throw new DatadirException("Unable to create snap directory " + this.snapDir); } {code} # Can we reduce the nested calls. How about using AND operator like, {code} if (!this.dataDir.mkdirs() && !this.dataDir.exists()) {code} > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098999#comment-15098999 ] Raul Gutierrez Segales commented on ZOOKEEPER-1936: --- Thanks [~te...@apache.org] - looking. Maybe we can get this in for 3.4.8 as well. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089636#comment-15089636 ] Ted Yu commented on ZOOKEEPER-1936: --- [~fpj]: Can you take a look ? Thanks > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089595#comment-15089595 ] Ted Yu commented on ZOOKEEPER-1936: --- We encountered this issue during testing, though intermittently. Can the fix be committed ? [~shralex] [~phunt] > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089628#comment-15089628 ] Hadoop QA commented on ZOOKEEPER-1936: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12663176/ZOOKEEPER-1936.patch against trunk revision 1720227. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3005//console This message is automatically generated. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105744#comment-14105744 ] Raul Gutierrez Segales commented on ZOOKEEPER-1936: --- lgtm, +1. cc: [~shralex], [~phunt] Server exits when unable to create data directory due to race -- Key: ZOOKEEPER-1936 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.6, 3.5.0 Reporter: Harald Musum Assignee: Andrew Purtell Priority: Minor Attachments: ZOOKEEPER-1936.patch We sometime see issues with ZooKeeper server not starting and seeing this error in the log: [2014-05-27 09:29:48.248] ERROR : - .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, exiting abnormally\nexception=\njava.io.IOException: Unable to create data directory /home/y/var/zookeeper/version-2\n\tat org.apache.zookeeper.server.persistence.FileTxnSnapLog.init(FileTxnSnapLog.java:85)\n\tat org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t [...] Stack trace from JVM gives this: PurgeTask daemon prio=10 tid=0x0201d000 nid=0x1727 runnable [0x7f55d7dc7000] java.lang.Thread.State: RUNNABLE at java.io.UnixFileSystem.createDirectory(Native Method) at java.io.File.mkdir(File.java:1310) at java.io.File.mkdirs(File.java:1337) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.init(FileTxnSnapLog.java:84) at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) at org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) zookeeper server prio=10 tid=0x027df800 nid=0x1715 runnable [0x7f55d7ed8000] java.lang.Thread.State: RUNNABLE at java.io.UnixFileSystem.createDirectory(Native Method) at java.io.File.mkdir(File.java:1310) at java.io.File.mkdirs(File.java:1337) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.init(FileTxnSnapLog.java:84) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) [...] So it seems that when autopurge is used (as it is in our case), it might happen at the same time as starting the server itself. In FileTxnSnapLog() it will check if the directory exists and create it if not. These two tasks do this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104625#comment-14104625 ] Hadoop QA commented on ZOOKEEPER-1936: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12663176/ZOOKEEPER-1936.patch against trunk revision 1619166. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2292//console This message is automatically generated. Server exits when unable to create data directory due to race -- Key: ZOOKEEPER-1936 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.6, 3.5.0 Reporter: Harald Musum Assignee: Andrew Purtell Priority: Minor Attachments: ZOOKEEPER-1936.patch We sometime see issues with ZooKeeper server not starting and seeing this error in the log: [2014-05-27 09:29:48.248] ERROR : - .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, exiting abnormally\nexception=\njava.io.IOException: Unable to create data directory /home/y/var/zookeeper/version-2\n\tat org.apache.zookeeper.server.persistence.FileTxnSnapLog.init(FileTxnSnapLog.java:85)\n\tat org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t [...] Stack trace from JVM gives this: PurgeTask daemon prio=10 tid=0x0201d000 nid=0x1727 runnable [0x7f55d7dc7000] java.lang.Thread.State: RUNNABLE at java.io.UnixFileSystem.createDirectory(Native Method) at java.io.File.mkdir(File.java:1310) at java.io.File.mkdirs(File.java:1337) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.init(FileTxnSnapLog.java:84) at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) at org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) zookeeper server prio=10 tid=0x027df800 nid=0x1715 runnable [0x7f55d7ed8000] java.lang.Thread.State: RUNNABLE at java.io.UnixFileSystem.createDirectory(Native Method) at java.io.File.mkdir(File.java:1310) at java.io.File.mkdirs(File.java:1337) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.init(FileTxnSnapLog.java:84) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) [...] So it seems that when autopurge is used (as it is in our case), it might happen at the same time as starting the server itself. In FileTxnSnapLog() it will check if the directory exists and create it if not. These two tasks do this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.2#6252)