Re: ZK recovery questions
I did try a quick test on Windows (yes, some of us use Windows :) I thought simply changing the dataDir to the /dev/null equivalent on Windows would do the trick. It didn't work. It looks like a Java issue because I noticed inconsistencies in the File API regarding this. I wrote about it here - http://javaforu.blogspot.com/2010/07/devnull-on-windows.html devnull-on-windows . BTW the Windows equivalent is nul. This is the error I got on Windows (below). The mkdirs() returns false. As noted on my blog, it returns true for some cases. 2010-07-20 22:25:47,851 - FATAL [main:zookeeperserverm...@62] - Unexpected exception, exiting abnormally java.io.IOException: Unable to create data directory nul:\version-2 at org.apache.zookeeper.server.persistence.FileTxnSnapLog.init(FileTxnSnapLog.java:79) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:102) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:85) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:51) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:108) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76) Ashwin. -- View this message in context: http://zookeeper-user.578899.n2.nabble.com/ZK-recovery-questions-tp5310116p5319775.html Sent from the zookeeper-user mailing list archive at Nabble.com.
Re: ZK recovery questions
Hi Mahadev, I'd love to but I don't have access to server class machines at home/personal time. Let me see if I can squeeze in some time to get something to run on EC2. I need to learn how to do that first. Will certainly let you know if/when I can get this done in my personal time. So far, all I've done with ZK is this http://javaforu.blogspot.com/2010/07/weekend-at-zookeeper.html. i.e run a simple test. Regards, Ashwin. On Tue, Jul 20, 2010 at 6:54 AM, Mahadev Konar [via zookeeper-user] ml-node+5316726-986685134-462...@n2.nabble.comml-node%2b5316726-986685134-462...@n2.nabble.com wrote: Hi Ashwin, We have seen people wanting to have something like ZooKeeper without the reliability of permanent storage and are willing to work with loosened guarantees of current Zookeeper. What you mention on log files is certainly a valid use case. It would be great to see how much throughput you will be able to get in such a scenario wherein we never log onto a permanent store. Do you want to try this out and see what kind of throughput difference you can get? Thanks mahadev On 7/19/10 8:35 PM, Ashwin Jayaprakash [hidden email]http://user/SendEmail.jtp?type=nodenode=5316726i=0 wrote: Cool. I've only tried the single node server so far. I didn't know it could sync from other senior servers. Server/Cluster addresses: I read somewhere in the docs/todo list that the bootstrap server list for the clients should be the same. So, what happens when a new replacement server has to be brought in on a different IP/hostname? Do the older clients autodetect the new server or is this even supported? I suppose not. Log files: I have absolutely no confusion between ZK and databases (very tempting tho'), but running ZK servers without log files does not seem unusual. Especially since you said new servers can sync directly from senior servers without relying on log files. In that case, I'm curious to see what happens if you just redirect log files to /dev/null. Anyone tried this? Regards, Ashwin Jayaprakash. -- View message @ http://zookeeper-user.578899.n2.nabble.com/ZK-recovery-questions-tp5310116p5316726.html To unsubscribe from Re: ZK recovery questions, click herehttp://zookeeper-user.578899.n2.nabble.com/subscriptions/Unsubscribe.jtp?code=YXNod2luLmpheWFwcmFrYXNoQGdtYWlsLmNvbXw1MzE1MTIzfDE4ODU5MDkyMjA=. -- View this message in context: http://zookeeper-user.578899.n2.nabble.com/ZK-recovery-questions-tp5310116p5319372.html Sent from the zookeeper-user mailing list archive at Nabble.com.
ZK recovery questions
Hi, I've been reading the docs and trying out some basic Zookeeper examples. I have a few simple questions related to recovery. It would be good to have questions like these on the Wiki/docs to avoid noobs like me asking the same thing over and over. - If 1 out of 3 servers crashes and the log files are unrecoverable, how do we provision a replacement server? - If the server log is recoverable but provisioning takes a long time, then what happens if the old log file is far behind the current state? The docs say that recovery is based on fuzzy check pointing and snapshots but I wasn't clear as to how long catching up would take - What happens at the client side code if a server quorum is lost? Does the ZK service freeze or continue to service just reads? - If there was a temporary glitch (n/w or GC) and the replica to which the client is connected breaks away from the quorum does the client get notified? Does it stop processing client requests? Does it rejoin the cluster without manual intervention? - Now if even the client cannot connect to other servers (split brain) .. ... well I suppose this question is moot - Do the servers really have to run with file based persistence? I saw that someone wanted this in-memory mode for unit testing (ZK 694https://issues.apache.org/jira/browse/ZOOKEEPER-694) but there are cases where only a transient ZK service is needed. Most enterprise systems have replicated Databases anyway. So, the fear of data loss is minimal. If ZK logs are the only means of recovery, then this might be harder to implement - A client example with full fledged error handling would be very useful for starters. I'm not sure if http://github.com/sgroschupf/zkclient and http://code.google.com/p/cages/ have everything but they do look promising. Plain ZK API is a bit overwhelming :) Thanks, Ashwin.