Re: ZK recovery questions

2010-07-21 Thread Ashwin Jayaprakash

I did try a quick test on Windows (yes, some of us use Windows :)

I thought simply changing the dataDir to the /dev/null equivalent on
Windows would do the trick. It didn't work. It looks like a Java issue
because I noticed inconsistencies in the File API regarding this. I wrote
about it here - 
http://javaforu.blogspot.com/2010/07/devnull-on-windows.html
devnull-on-windows .

BTW the Windows equivalent is nul.

This is the error I got on Windows (below). The mkdirs() returns false. As
noted on my blog, it returns true for some cases.

2010-07-20 22:25:47,851 - FATAL [main:zookeeperserverm...@62] - Unexpected
exception, exiting abnormally
java.io.IOException: Unable to create data directory nul:\version-2
at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.init(FileTxnSnapLog.java:79)
at
org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:102)
at
org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:85)
at
org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:51)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:108)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76)


Ashwin.
-- 
View this message in context: 
http://zookeeper-user.578899.n2.nabble.com/ZK-recovery-questions-tp5310116p5319775.html
Sent from the zookeeper-user mailing list archive at Nabble.com.


Re: ZK recovery questions

2010-07-21 Thread Benjamin Reed
i did a benchmark a while back to see the effect of turning off the 
disk. (it wasn't as big as you would think.) i had to modify the code. 
there is an option to turn off the sync in the config that will get you 
most of the performance you would get by turning off the disk entirely.


ben

On 07/20/2010 11:01 PM, Ashwin Jayaprakash wrote:

I did try a quick test on Windows (yes, some of us use Windows :)

I thought simply changing the dataDir to the /dev/null equivalent on
Windows would do the trick. It didn't work. It looks like a Java issue
because I noticed inconsistencies in the File API regarding this. I wrote
about it here -
http://javaforu.blogspot.com/2010/07/devnull-on-windows.html
devnull-on-windows .

BTW the Windows equivalent is nul.

This is the error I got on Windows (below). The mkdirs() returns false. As
noted on my blog, it returns true for some cases.

2010-07-20 22:25:47,851 - FATAL [main:zookeeperserverm...@62] - Unexpected
exception, exiting abnormally
java.io.IOException: Unable to create data directory nul:\version-2
 at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.init(FileTxnSnapLog.java:79)
 at
org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:102)
 at
org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:85)
 at
org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:51)
 at
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:108)
 at
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76)


Ashwin.
   




Re: ZK recovery questions

2010-07-21 Thread Ted Dunning
My own experiments in my own environment where ZK is being used purely for
coordination at a fairly low transaction rate (tens to hundreds of ops per
second, mostly status updates) made me feel that disk throughput would only
be detectable as an issue for pretty massively abused ZK applications.  The
impact of disk writing is surprisingly small even for pretty high throughput
cases and for moderate or low throughput, it is just not detectable.

Those seem to share a lot with the applications that could benefit from
being able to restart new servers efficiently from disk snapshot and log and
having the ability to restart the entire cluster with previous state.

On Wed, Jul 21, 2010 at 9:28 AM, Benjamin Reed br...@yahoo-inc.com wrote:

 i did a benchmark a while back to see the effect of turning off the disk.
 (it wasn't as big as you would think.) i had to modify the code. there is an
 option to turn off the sync in the config that will get you most of the
 performance you would get by turning off the disk entirely.

 ben

 On 07/20/2010 11:01 PM, Ashwin Jayaprakash wrote:

 I did try a quick test on Windows (yes, some of us use Windows :)

 I thought simply changing the dataDir to the /dev/null equivalent on
 Windows would do the trick. It didn't work. It looks like a Java issue
 because I noticed inconsistencies in the File API regarding this. I wrote
 about it here -
 http://javaforu.blogspot.com/2010/07/devnull-on-windows.html
 devnull-on-windows .

 BTW the Windows equivalent is nul.

 This is the error I got on Windows (below). The mkdirs() returns false. As
 noted on my blog, it returns true for some cases.

 2010-07-20 22:25:47,851 - FATAL [main:zookeeperserverm...@62] -
 Unexpected
 exception, exiting abnormally
 java.io.IOException: Unable to create data directory nul:\version-2
 at

 org.apache.zookeeper.server.persistence.FileTxnSnapLog.init(FileTxnSnapLog.java:79)
 at

 org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:102)
 at

 org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:85)
 at

 org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:51)
 at

 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:108)
 at

 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76)


 Ashwin.






Re: ZK recovery questions

2010-07-20 Thread Mahadev Konar
Hi Ashwin,
 We have seen people wanting to have something like ZooKeeper without the
reliability of permanent storage and are willing to work with loosened
guarantees of current Zookeeper. What you mention on log files is certainly
a valid use case. 

It would be great to see how much throughput you will be able to get in such
a scenario wherein we never log onto a permanent store. Do you want to try
this out and see what kind of throughput difference you can get?

Thanks
mahadev


On 7/19/10 8:35 PM, Ashwin Jayaprakash ashwin.jayaprak...@gmail.com
wrote:

 
 Cool. I've only tried the single node server so far. I didn't know it could
 sync from other senior servers.
 
 Server/Cluster addresses: I read somewhere in the docs/todo list that the
 bootstrap server list for the clients should be the same. So, what happens
 when a new replacement server has to be brought in on a different
 IP/hostname? Do the older clients autodetect the new server or is this even
 supported? I suppose not.
 
 Log files: I have absolutely no confusion between ZK and databases (very
 tempting tho'), but running ZK servers without log files does not seem
 unusual. Especially since you said new servers can sync directly from senior
 servers without relying on log files. In that case, I'm curious to see what
 happens if you just redirect log files to /dev/null. Anyone tried this?
 
 Regards,
 Ashwin Jayaprakash.



Re: ZK recovery questions

2010-07-20 Thread Ashwin Jayaprakash

Hi Mahadev, I'd love to but I don't have access to server class machines at
home/personal time.

Let me see if I can squeeze in some time to get something to run on EC2. I
need to learn how to do that first. Will certainly let you know if/when I
can get this done in my personal time.

So far, all I've done with ZK is this
http://javaforu.blogspot.com/2010/07/weekend-at-zookeeper.html. i.e run a
simple test.

Regards,
Ashwin.


On Tue, Jul 20, 2010 at 6:54 AM, Mahadev Konar [via zookeeper-user] 
ml-node+5316726-986685134-462...@n2.nabble.comml-node%2b5316726-986685134-462...@n2.nabble.com
 wrote:

 Hi Ashwin,
  We have seen people wanting to have something like ZooKeeper without the
 reliability of permanent storage and are willing to work with loosened
 guarantees of current Zookeeper. What you mention on log files is certainly

 a valid use case.

 It would be great to see how much throughput you will be able to get in
 such
 a scenario wherein we never log onto a permanent store. Do you want to try
 this out and see what kind of throughput difference you can get?

 Thanks
 mahadev


 On 7/19/10 8:35 PM, Ashwin Jayaprakash [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=5316726i=0

 wrote:

 
  Cool. I've only tried the single node server so far. I didn't know it
 could
  sync from other senior servers.
 
  Server/Cluster addresses: I read somewhere in the docs/todo list that the

  bootstrap server list for the clients should be the same. So, what
 happens
  when a new replacement server has to be brought in on a different
  IP/hostname? Do the older clients autodetect the new server or is this
 even
  supported? I suppose not.
 
  Log files: I have absolutely no confusion between ZK and databases (very
  tempting tho'), but running ZK servers without log files does not seem
  unusual. Especially since you said new servers can sync directly from
 senior
  servers without relying on log files. In that case, I'm curious to see
 what
  happens if you just redirect log files to /dev/null. Anyone tried this?
 
  Regards,
  Ashwin Jayaprakash.



 --
  View message @
 http://zookeeper-user.578899.n2.nabble.com/ZK-recovery-questions-tp5310116p5316726.html
 To unsubscribe from Re: ZK recovery questions, click 
 herehttp://zookeeper-user.578899.n2.nabble.com/subscriptions/Unsubscribe.jtp?code=YXNod2luLmpheWFwcmFrYXNoQGdtYWlsLmNvbXw1MzE1MTIzfDE4ODU5MDkyMjA=.




-- 
View this message in context: 
http://zookeeper-user.578899.n2.nabble.com/ZK-recovery-questions-tp5310116p5319372.html
Sent from the zookeeper-user mailing list archive at Nabble.com.


Re: ZK recovery questions

2010-07-19 Thread Ted Dunning
They don't auto-detect.

What is usually done is that the configurations on all the servers are
changed and they are re-started one at a time.

On Mon, Jul 19, 2010 at 8:35 PM, Ashwin Jayaprakash 
ashwin.jayaprak...@gmail.com wrote:

 So, what happens
 when a new replacement server has to be brought in on a different
 IP/hostname? Do the older clients autodetect the new server or is this even
 supported? I suppose not.



ZK recovery questions

2010-07-18 Thread Ashwin Jayaprakash
Hi, I've been reading the docs and trying out some basic Zookeeper examples.
I have a few simple questions related to recovery.

It would be good to have questions like these on the Wiki/docs to avoid
noobs like me asking the same thing over and over.


   - If 1 out of 3 servers crashes and the log files are unrecoverable, how
   do we provision a replacement server?


   - If the server log is recoverable but provisioning takes a long time,
   then what happens if the old log file is far behind the current state? The
   docs say that recovery is based on fuzzy check pointing and snapshots but I
   wasn't clear as to how long catching up would take


   - What happens at the client side code if a server quorum is lost? Does
   the ZK service freeze or continue to service just reads?
  - If there was a temporary glitch (n/w or GC) and the replica to which
  the client is connected breaks away from the quorum does the client get
  notified? Does it stop processing client requests? Does it rejoin the
  cluster without manual intervention?
  - Now if even the client cannot connect to other servers (split brain)
  .. ... well I suppose this question is moot


   - Do the servers really have to run with file based persistence? I saw
   that someone wanted this in-memory mode for unit testing (ZK
694https://issues.apache.org/jira/browse/ZOOKEEPER-694)
   but there are cases where only a transient ZK service is needed. Most
   enterprise systems have replicated Databases anyway. So, the fear of data
   loss is minimal. If ZK logs are the only means of recovery, then this might
   be harder to implement


   - A client example with full fledged error handling would be very useful
   for starters. I'm not sure if http://github.com/sgroschupf/zkclient and
   http://code.google.com/p/cages/ have everything but they do look
   promising. Plain ZK API is a bit overwhelming :)


Thanks,
Ashwin.


Re: ZK recovery questions

2010-07-18 Thread Ted Dunning
On Sun, Jul 18, 2010 at 3:34 PM, Ashwin Jayaprakash 
ashwin.jayaprak...@gmail.com wrote:


   - If 1 out of 3 servers crashes and the log files are unrecoverable, how
   do we provision a replacement server?


Just start it and it will download a snapshot from the other servers.



- If the server log is recoverable but provisioning takes a long time,
   then what happens if the old log file is far behind the current state?


If a server is very far behind, it will download a snapshot as if it knows
nothing.  This rarely takes long.


  - If there was a temporary glitch (n/w or GC) and the replica to which
  the client is connected breaks away from the quorum does the client
 get
  notified? Does it stop processing client requests? Does it rejoin the
  cluster without manual intervention?


Failures like this are normally invisible to the client.


   - Do the servers really have to run with file based persistence? I saw
   that someone wanted this in-memory mode for unit testing (ZK
 694https://issues.apache.org/jira/browse/ZOOKEEPER-694)
   but there are cases where only a transient ZK service is needed. Most
   enterprise systems have replicated Databases anyway. So, the fear of data
   loss is minimal. If ZK logs are the only means of recovery, then this
 might
   be harder to implement


ZK is not a replacement for your database and it is really, really nice to
be able to stop it and start it again.  Disk persistence helps with this
enormously.

  promising. Plain ZK API is a bit overwhelming :)


In practice, it is really pretty simple.  Try it out.