Re: Confused about KeeperState.Disconnected and KeeperState.Expired

2009-06-24 Thread Gustavo Niemeyer
 Ben's opinion is that it should not belong in the default API but in the
 common client that another recent thread was about. My opinion is just that
 I need such a functionality, wherever it is.

Understood, sorry.  I just meant that it feels like something that
would likely be useful to other people too, so might have a role in
the default API to ensure it gets done properly considering the
details that Ben brought up.

 If the node gets the exception (or has it's own timer), as I wrote, it will
 shut itself down to release HDFS leases as fast as possible. If ZK is really
 down and it's not a network partition, then HBase is down and this is fine
 because it won't be able to work anyway.

Right, that's mostly what I was wondering.  I was pondering about
under which circumstances the node would be unable to talk to the
ZooKeeper server but would still be holding the HDFS lease in a way
that prevented the rest of the system from going on.  If I understand
what you mean, if ZooKeeper is down entirely, HBase would be down for
good. If the machine was partitioned off entirely, the HDFS side of
things will also be disconnected, so shutting the node down won't help
the rest of the system recovering.

-- 
Gustavo Niemeyer
http://niemeyer.net


Re: Confused about KeeperState.Disconnected and KeeperState.Expired

2009-06-24 Thread Jean-Daniel Cryans
If the machine was completely partitioned, as far as I know, it would lose
it's lease so the only thing we have to make sure about is clearing the
state of the region server by doing a restart so that it's ready to come
back in the cluster. If ZK is down but the rest is up, closing the files in
HDFS should ensure that we lose a minimum of data if not losing any.

I think that in a multi-rack setup it is possible to not be able to talk to
ZK but to be able to talk to the Namenode as machines can be anywhere.
Especially in HBase 0.20, the master can failover on any node that has a
backup Master ready. So in that case, the region server should consider
itself gone from the cluster and close any connection it has and restart.

Those are very legetimate questions Gustavo, thanks for asking.

J-D

On Wed, Jun 24, 2009 at 3:38 PM, Gustavo Niemeyer gust...@niemeyer.netwrote:

  Ben's opinion is that it should not belong in the default API but in the
  common client that another recent thread was about. My opinion is just
 that
  I need such a functionality, wherever it is.

 Understood, sorry.  I just meant that it feels like something that
 would likely be useful to other people too, so might have a role in
 the default API to ensure it gets done properly considering the
 details that Ben brought up.

  If the node gets the exception (or has it's own timer), as I wrote, it
 will
  shut itself down to release HDFS leases as fast as possible. If ZK is
 really
  down and it's not a network partition, then HBase is down and this is
 fine
  because it won't be able to work anyway.

 Right, that's mostly what I was wondering.  I was pondering about
 under which circumstances the node would be unable to talk to the
 ZooKeeper server but would still be holding the HDFS lease in a way
 that prevented the rest of the system from going on.  If I understand
 what you mean, if ZooKeeper is down entirely, HBase would be down for
 good. If the machine was partitioned off entirely, the HDFS side of
 things will also be disconnected, so shutting the node down won't help
 the rest of the system recovering.

 --
 Gustavo Niemeyer
 http://niemeyer.net



Re: Confused about KeeperState.Disconnected and KeeperState.Expired

2009-06-24 Thread Benjamin Reed

sorry to jump in late.

if i understand the scenario correctly, you are partitioned from ZK, but 
you still have access to the NN on which you are holding leases to 
files. the problem is that even though your ephemeral nodes may timeout, 
you are still holding a lease on the NN and recovery would go faster if 
you actually closed the file. right? or is it deeper than that? can you 
open a file in such a way that you stomp the lease? or make sure that 
the lease timeout is smaller than the session timeout and only renew if 
you are still connected to ZK?


thanx
ben

Jean-Daniel Cryans wrote:

If the machine was completely partitioned, as far as I know, it would lose
it's lease so the only thing we have to make sure about is clearing the
state of the region server by doing a restart so that it's ready to come
back in the cluster. If ZK is down but the rest is up, closing the files in
HDFS should ensure that we lose a minimum of data if not losing any.

I think that in a multi-rack setup it is possible to not be able to talk to
ZK but to be able to talk to the Namenode as machines can be anywhere.
Especially in HBase 0.20, the master can failover on any node that has a
backup Master ready. So in that case, the region server should consider
itself gone from the cluster and close any connection it has and restart.

Those are very legetimate questions Gustavo, thanks for asking.

J-D

On Wed, Jun 24, 2009 at 3:38 PM, Gustavo Niemeyer gust...@niemeyer.netwrote:

  

Ben's opinion is that it should not belong in the default API but in the
common client that another recent thread was about. My opinion is just
  

that


I need such a functionality, wherever it is.
  

Understood, sorry.  I just meant that it feels like something that
would likely be useful to other people too, so might have a role in
the default API to ensure it gets done properly considering the
details that Ben brought up.



If the node gets the exception (or has it's own timer), as I wrote, it
  

will


shut itself down to release HDFS leases as fast as possible. If ZK is
  

really


down and it's not a network partition, then HBase is down and this is
  

fine


because it won't be able to work anyway.
  

Right, that's mostly what I was wondering.  I was pondering about
under which circumstances the node would be unable to talk to the
ZooKeeper server but would still be holding the HDFS lease in a way
that prevented the rest of the system from going on.  If I understand
what you mean, if ZooKeeper is down entirely, HBase would be down for
good. If the machine was partitioned off entirely, the HDFS side of
things will also be disconnected, so shutting the node down won't help
the rest of the system recovering.

--
Gustavo Niemeyer
http://niemeyer.net






Next Bay Area Hadoop User Group - Focus on Hadoop 0.20 and Core Project Split

2009-06-24 Thread Christophe Bisciglia
Bay Area Hadoop Fans,

We're excited to hold our first Hadoop User Group at Cloudera's office
in Burlingame (just south of SFO). We pushed the start time back 30
minutes to allow a little extra time to drive further north, and we
hope the mid-way location brings more users from San Francisco.

Since meetup.com seems to be the norm for HUGs around the country, we
created a meetup group for the bay area
(http://www.meetup.com/Bay-Area-Hadoop-User-Group-HUG). Join this
group to stay up to date with additional meetings and locations -
we're hoping to move the location around potentially alternating
between north bay and south bay.

We've scheduled the next meetup for July 15th at 6:30 PM. Our office
isn't huge, but we do have room for 40 friendly people:
http://www.meetup.com/Bay-Area-Hadoop-User-Group-HUG/calendar/10728923/

We'll focus this meeting on Hadoop 0.20 and the split of core into
mapreduce, hdfs and common projects. Specifically, we'll go over new
features, API changes, upgrade experiences and more. If you'd like to
present about your experience, please let me know. If you'd like to
present about something else all together, also let me know, and we'll
see what we can do at this, or a later meetup.

We'll provide beer, drinks and snacks, and if there are any board game
fans in the house, we won't kick you our afterwards :-) On a more
serious note, after the meetup is a great opportunity to meet
Cloudera's engineering team and get advice about any headaches you
might be having.

We'll post the agenda to the meetup group as soon as we hear from
potential presenters and nail things down.

Christophe

-- 
get hadoop: cloudera.com/hadoop
online training: cloudera.com/hadoop-training
blog: cloudera.com/blog
twitter: twitter.com/cloudera