Re: problems on EC2?

2009-04-16 Thread Ted Dunning
Yes. I had seen that before, but it is worth reading about once a month. On Thu, Apr 16, 2009 at 11:45 AM, Patrick Hunt wrote: > Ted Dunning wrote: > >> On a related note, what is best practice for handling session expiration? >> Just deal with it as if it is a new start? >> > > See this re han

Re: problems on EC2?

2009-04-16 Thread Patrick Hunt
Ted Dunning wrote: On a related note, what is best practice for handling session expiration? Just deal with it as if it is a new start? See this re handling the errors ZK can throw at you: http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling Patrick

Re: problems on EC2?

2009-04-16 Thread Ted Dunning
Once we have a bit more experience, that would be fine. Best would be to present solutions as well as non-specific problems. On Thu, Apr 16, 2009 at 11:41 AM, Patrick Hunt wrote: > ps. please consider presenting your "experiences running ZK inside EC2" at > an upcoming Hadoop social or even at

Re: problems on EC2?

2009-04-16 Thread Patrick Hunt
ps. please consider presenting your "experiences running ZK inside EC2" at an upcoming Hadoop social or even at the summit. I know I'd really be interested to hear your experiences and I think it would be useful for both new and existing ZK users. Patrick Patrick Hunt wrote: Well that's good

Re: problems on EC2?

2009-04-16 Thread Patrick Hunt
Well that's good - 300ms max latency means that the server can round trip any requests pretty quickly. It would lead me to look at the client VMs or (intermittent) network problems... Keep in mind though that's one of your servers (unless you are saying you checked all X of the servers in the

Re: problems on EC2?

2009-04-16 Thread Ted Dunning
Patrick, Thanks enormously. This hasn't helped yet, but that is just because it was a very large bite of the apple. Once I digest it, I can tell that it will be very helpful. I did have a chance to look at the "stat" output and maximum latency was <300ms. How that connects with what you are sa

Re: problems on EC2?

2009-04-16 Thread Patrick Hunt
Take a look at this section to start: http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_commonProblems What type of monitoring are you doing on your cluster? You could monitor at both the host and at the java (jmx) level. That will give you some insight on where to look; cp

Re: problems on EC2?

2009-04-14 Thread Mahadev Konar
Hi Ted, > These problems seem to manifest around getting lots of anomalous disconnects > and session expirations even though we have the timeout values set to 2 > seconds on the server side and 5 seconds on the client side. > Your scenario might be a little differetn from what Nitay (Hbase) is s

Re: problems on EC2?

2009-04-14 Thread Nitay
Yes, we are. We currently don't handle SessionExpired very well at all in HBase. There are two things going on in parallel to fix it: 1) Reinitialize the ZooKeeper handler (and everything else that depends on it) on the node in question when a SessionExpired event occurs. 2) Reduce the number of S

Re: problems on EC2?

2009-04-14 Thread Ted Dunning
Very good pointer. Thanks. Are you still having your problems? On Tue, Apr 14, 2009 at 6:09 PM, Nitay wrote: > Hi Ted, > > Fellow user coming from HBase. We were recently seeing lots of > SessionExpired events as well. Check out this mail thread: > > > http://markmail.org/search/?q=SessionExpi

Re: problems on EC2?

2009-04-14 Thread Nitay
Hi Ted, Fellow user coming from HBase. We were recently seeing lots of SessionExpired events as well. Check out this mail thread: http://markmail.org/search/?q=SessionExpired#query:SessionExpired+page:1+mid:gt4c2kn4n4f5s5kw+state:results Perhaps this might have something to do with what you're s

problems on EC2?

2009-04-14 Thread Ted Dunning
We have been using EC2 as a substrate for our search cluster with zookeeper as our coordination layer and have been seeing some strange problems. These problems seem to manifest around getting lots of anomalous disconnects and session expirations even though we have the timeout values set to 2 sec