[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397513#comment-15397513 ] David Arthur commented on ZOOKEEPER-1936: - Are there any known scenarios that will trigger this race? We've seen it intermittently in an EC2 environment, but have yet to figure out why it happens there and not other environments. Also, any updates on the status of this issue? > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340429#comment-14340429 ] David Arthur commented on ZOOKEEPER-1621: - I actually like [~shralex]'s suggestion. However, if this is going to be the way you recommended recovering a corrupt log file, there should be a script that does it for users: zk-recover.sh or some such. In this world of deployment automation, it's not a nice thing to say go delete the most recent log segment from ZK's data dir. Much better for the application to handle it through a script or command. ZooKeeper does not recover from crash when disk was full Key: ZOOKEEPER-1621 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.3 Environment: Ubuntu 12.04, Amazon EC2 instance Reporter: David Arthur Assignee: Michi Mutsuzaki Fix For: 3.5.1 Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:282) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) Then many subsequent exceptions like: 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial. 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:504) at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
David Arthur created ZOOKEEPER-1621: --- Summary: ZooKeeper does not recover from crash when disk was full Key: ZOOKEEPER-1621 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.3 Environment: Ubuntu 12.04, Amazon EC2 instance Reporter: David Arthur The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:282) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) Then many subsequent exceptions like: 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial. 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:504) at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555110#comment-13555110 ] David Arthur commented on ZOOKEEPER-1621: - I was able to workaround the issue by deleting the partially written snapshot file ZooKeeper does not recover from crash when disk was full Key: ZOOKEEPER-1621 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.3 Environment: Ubuntu 12.04, Amazon EC2 instance Reporter: David Arthur The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:282) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) Then many subsequent exceptions like: 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial. 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:504) at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555215#comment-13555215 ] David Arthur commented on ZOOKEEPER-1621: - Here is the full sequence of events (sorry for the confusion): * Noticed disk was full * Cleaned up disk space * Tried zkCli.sh, got errors * Checked ZK log, loop of: 2013-01-16 15:01:35,194 - ERROR [main:Util@239] - Last transaction was partial. 2013-01-16 15:01:35,196 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:504) at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) * Stopped ZK * Listed ZK data directory ubuntu@ip-10-78-19-254:/opt/zookeeper-3.4.3/data/version-2$ ls -lat total 18096 drwxr-xr-x 2 zookeeper zookeeper 4096 Jan 16 06:41 . -rw-r--r-- 1 zookeeper zookeeper0 Jan 16 06:41 log.19a3e -rw-r--r-- 1 zookeeper zookeeper 585377 Jan 16 06:41 snapshot.19a3d -rw-r--r-- 1 zookeeper zookeeper 67108880 Jan 16 03:11 log.19a2a -rw-r--r-- 1 zookeeper zookeeper 585911 Jan 16 03:11 snapshot.19a29 -rw-r--r-- 1 zookeeper zookeeper 67108880 Jan 16 03:11 log.11549 -rw-r--r-- 1 zookeeper zookeeper 585190 Jan 15 17:28 snapshot.11547 -rw-r--r-- 1 zookeeper zookeeper 67108880 Jan 15 17:28 log.1 -rw-r--r-- 1 zookeeper zookeeper 296 Jan 14 16:44 snapshot.0 drwxr-xr-x 3 zookeeper zookeeper 4096 Jan 14 16:44 .. * Removed log.19a3e and snapshot.19a3d ubuntu@ip-10-78-19-254:/opt/zookeeper-3.4.3/data/version-2$ sudo rm log.19a3e ubuntu@ip-10-78-19-254:/opt/zookeeper-3.4.3/data/version-2$ sudo rm snapshot.19a3d * Started ZK * Back to normal ZooKeeper does not recover from crash when disk was full Key: ZOOKEEPER-1621 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.3 Environment: Ubuntu 12.04, Amazon EC2 instance Reporter: David Arthur Fix For: 3.5.0 Attachments: zookeeper.log.gz The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:282) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) at
[jira] [Updated] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Arthur updated ZOOKEEPER-1621: Attachment: zookeeper.log.gz Attaching zookeeper.log ZooKeeper does not recover from crash when disk was full Key: ZOOKEEPER-1621 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.3 Environment: Ubuntu 12.04, Amazon EC2 instance Reporter: David Arthur Fix For: 3.5.0 Attachments: zookeeper.log.gz The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:282) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) Then many subsequent exceptions like: 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial. 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:504) at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira