Re: How to reestablish a session

2010-11-19 Thread Gustavo Niemeyer
> is_unrecoverable() means exactly that: the session is toast. nothing you do
> will get it back.

Ok, I was wondering about what exactly was unrecoverable indeed.

> zookeeper_init is almost never used with a non-null client_id. the main use
> case for it is crash recovery. i've rarely seen it used, but you can start a
> session, save off the client_id to disk, create ephemerals etc., then if
> your program crashes, you can restart and recover the session and pick back
> up where you left off. in this case we don't worry about the session being
> closed by the previous instance of the program because it crashed. it's
> pretty tricky to use.

Understood.  I agree this is a pretty unique case, and a very hard one
to get right by itself (how to get the app in the proper state to
receive watches after the whole application has crashed?).

-- 
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/blog
http://niemeyer.net/twitter


Re: Session events

2010-11-18 Thread Gustavo Niemeyer
> 2) Session events can *only* be observed by setting a watch
> explicitly, so one has to do something similar to wexists("/",
> observe_change_func) in their preferred client API to set a hook which
> will purposefully wait on a session change.

I see I got this part wrong already.  I missed the code within
zk_hashtable.c, which grabs the watcher from the handle manually in
case of SESSION_EVENTS:

if(type==ZOO_SESSION_EVENT){
watcher_object_t defWatcher;
defWatcher.watcher=zh->watcher;
defWatcher.context=zh->context;

Further comments on the overall context still very welcome.

-- 
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/blog
http://niemeyer.net/twitter


Re: How to reestablish a session

2010-11-18 Thread Gustavo Niemeyer
> why don't you let the client library do the move for you?

Maybe there's no need to reestablish the session manually, but there
are a few details in the API which give a hint this should be
supported.  The strongest one is that there's a parameter in
zookeeper_init() to allow reestablishing an existing session.  Without
the ability to close a previous connection reliably without killing
the existing session, how can we use this parameter and the function
to retrieve the existing client id?  Another hint is in
is_unrecoverable(), which says the application must close the zhandle
and try to reconnect in case it returns true.  Maybe I misinterpreted
it, and it actually means the *session* is dead, rather than just the
connection?

-- 
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/blog
http://niemeyer.net/twitter


Re: How to reestablish a session

2010-11-18 Thread Gustavo Niemeyer
> Right now, if you have a partition between client and server A, I would not 
> expect
> server A to see a clean close from the client, but one of the various 
> exceptions
> that cause the socket to close.

Please don't get me wrong, but I find it very funny to rely on the
stability of a network partition to avoid having a session killed.

Either way, that's not a big deal for me, now that I understand the
problem.  Knowing about it, I can simply postpone the close() until a
safe time.  It just felt worth pointing out, since this will arguably
be *very* hard to track down in practice.

-- 
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/blog
http://niemeyer.net/twitter


Re: How to reestablish a session

2010-11-18 Thread Gustavo Niemeyer
Hi Ben,

> that quote is a bit out of context. it was with respect to a proposed
> change.

My point was just that the reasoning why you believed it wasn't a good
approach to kill ephemerals in that old instance applies to the new
cases I'm pointing out.  I wasn't suggesting you agreed with my new
reasoning upfront.

> in your scenario can you explain step 4)? what are you closing?

I'm closing the old ZooKeeper handler (zh), after a new one was
established with the same client id.

-- 
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/blog
http://niemeyer.net/twitter


Re: Session events

2010-11-18 Thread Gustavo Niemeyer
Hi Camille,

> Check out ZKClient: https://github.com/sgroschupf/zkclient
>
> The way this client deals with sessions is pretty nice and clean and I ended 
> up using a lot of this code as the basis for my Java client.

Looking at the code base feels like a pretty dumb wrapper on top of
standard ZK.  For instance:

  public boolean exists(String path, boolean watch) throws
KeeperException, InterruptedException {
return _zk.exists(path, watch) != null;
  }

So I'm curious about why, in the context of the questioning made, you
feel like it's significant.

-- 
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/blog
http://niemeyer.net/twitter


Session events

2010-11-18 Thread Gustavo Niemeyer
Hello again,

Even though we've been using ZooKeeper for some time, just now we're
stopping and thinking through to see how to best deal with session
events in the context of a whole application.

To ensure I'm not off track, my understanding of session and session
event handling is:

1) It is an error to attempt to use a zk connection before a session
is established.

2) Session events can *only* be observed by setting a watch
explicitly, so one has to do something similar to wexists("/",
observe_change_func) in their preferred client API to set a hook which
will purposefully wait on a session change.

3) *All* the existing watches receive session events.

4) Reconnections within the grace period will not cause a session event.

First, are these statements correct?

Then, how are people generally architecting reliable applications when
facing these facts?  Are you simply checking for type == SESSION_EVENT
on *all* the watching functions?  Are you using a single watching
function and perhaps redirecting the flow to a different location
after freeing the used resources?  Are you ignoring the issue entirely
and hoping for the best? :-)

Any insights into existing practices will be welcome.

-- 
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/blog
http://niemeyer.net/twitter


How to reestablish a session

2010-11-18 Thread Gustavo Niemeyer
Greetings,

As some of you already know, we've been using ZooKeeper at Canonical
for a project we've been pushing (Ensemble, http://j.mp/dql6Fu).
We've already written down txzookeeper (http://j.mp/d3Zx7z), to
integrate the Python bindings with Twisted, and we're also in the
process of creating a Go binding for the C ZooKeeper library (to be
released soon).

Yesterday, while working on the Go bindings, a test made me wonder
about what's the correct way to reestablish a session with ZooKeeper.

In another thread a couple of months ago, Ben mentioned:

> i'm a bit skeptical that this is going to work out properly. a server may
> receive a socket reset even though the client is still alive:
>
> 1) client sends a request to a server
> 2) client is partitioned from the server
> 3) server starts trying to send response
> 4) client reconnects to a different server
> 5) partition heals
> 6) server gets a reset from client
>
> at step 6 i don't think you want to delete the ephemeral nodes.

I also don't think it should delete ephemeral nodes.  While performing
some tests, though, I noticed that something similar to this may
happen.

The following sequence was performed in the test:

1) Establish connection A to ZK
2) Create an ephemeral node with A
3) Establish connection B to ZK, reusing the session from A
4) Close connection A
5) The ephemeral node from (2) got deleted.

So, this made me wonder about what's the proper way to reestablish a
session in practice, due to partitioning. Imagine that the
reconnection which happened on (3) was an attempt from the client to
restore the communication with the ZK cluster when faced with
partitioning.  Once the connection succeeded, the old resources from
connection A should be disposed, but how to do this without risking
killing the healthy connection on B (imagine that the network comes
back between (3) and (4)).

Anyone has thoughts on that?

-- 
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/blog
http://niemeyer.net/twitter


Re: Parent nodes & multi-step transactions

2010-08-24 Thread Gustavo Niemeyer
> Every functionality added to ZK will make it harder to maintain. The use case

Definitely, but it's hard to debate about features at that level.  If
we delete the whole code base, we have nothing to maintain, so given
this r.

> recursiveDelete, recursiveCreate: If you want to create /A/C/D-1 just use
> recursiveCreate and you will end up with  /A/C/D-1, even if the full parent
> path did not exist before.

You're missing the actual problem. Recursive create and delete are
non-issues per se.  They become issues once you want to use the ZK
filesystem state for coordination, which is the only advised use case
for ZK.  Other messages in this thread have already described the
problems related to intermediate state visibility, and some techniques
to deal with them.  The problem is that as the number of dynamic
pieces increase, the cost of maintaining all of that logic increases
too, and it becomes non-practical.

ZK is great at what it does, and these compound atomic operations
target real use cases for what it's most useful at.  In my view, the
additional complexity in the code would not be so great to have this
feature, and it would be absolutely nothing if compared to the
additional logic which these realistic use cases require to deal with
intermediate states.

-- 
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/blog
http://niemeyer.net/twitter


Re: Parent nodes & multi-step transactions

2010-08-24 Thread Gustavo Niemeyer
Hi Thomas,

> I have a very strong feeling against more complex operations in the ZK server.

Can you please describe a little better what that feeling is about?

> These are things that should be provided by a ZK client helper library. The

Which things should be provided by client helper libraries?  Client
libraries cannot provide atomic operations, which means that the
reasoning and logic which must happen on top of ZK to avoid
half-initialized states and observation of structure set up and tear
down must continue to be taken in account.  It basically means that to
avoid having a relatively simple batch operation, the reasoning which
must happen around ZK gets significantly more complex, or has to be
avoided entirely.

> zkclient library from 101tec for example gives you exactly that.

It's not clear to me what "exactly that" is in this context.  I've
looked for the code and couldn't find an answer/alternative to the
issues discussed in this thread.

> If you're planning to write another layer on top of the ZK API please have a
> look at https://issues.apache.org/jira/browse/ZOOKEEPER-835

Looked there as well.  Also can't find anything relative to this discussion.

> I'm planning to provide an alternative java client API for 3.4.0 and would
> then propose to deprecate the current one in the long run.
> You can preview the new API at
> http://github.com/thkoch2001/zookeeper/tree/operation_classes

And this is a full branch of ZK.  Tried checking out the commit
messages or something to get an idea of what you mean, but also am
unable to find answers to these problems.

If you actually have/know of solutions for the suggested problems
which were not yet covered here, I'm very interested in knowing about
them, but will need slightly more precise information.

-- 
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/blog
http://niemeyer.net/twitter


Re: Parent nodes & multi-step transactions

2010-08-24 Thread Gustavo Niemeyer
> My own opinion is that lots of these structure sorts of problems are solved
> by putting the structure into a single znode.  Atomic creation and update
> come for free at that point and we can even make the node ephemeral which we
> can't really do if there are children.

Sure, it makes sense that using a single znode gets rid of some of the
problems, after all we'd be effectively getting an atomic operation.
It also gets rid of many of the advantages of using ZooKeeper, though.
Independent changes become conflicts, watches fire more frequently
than they should, clients have to parse the whole blob to know what
has changed and filter locally, etc.

> The natural representation is to have the nodes signal that they are
> handling a particular node by creating an ephemeral file under a per shard
> directory.  This is nice because node failures cause automagical update of
> the data.  The dual is also natural ... we can create shard files under node
> directories.  That dual is a serious mistake, however, and it is much better
> to put all the dual information in a single node file that the node itself
> creates.  This allows ephemerality to maintain a correct view for us.

Interesting indeed.

(...)
> This doesn't eliminate all desire for transactions, but it gets rid of LOTs
> of them.

Thanks for these ideas.

-- 
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/blog
http://niemeyer.net/twitter


Re: Parent nodes & multi-step transactions

2010-08-23 Thread Gustavo Niemeyer
Hi Mahadev,

>  Usually the paradigm I like to suggest is to have something like
>
> /A/init
>
> Every client watches for the existence of this node and this node is only
> created after /A has been initialized with the creation of /A/C or other
> stuff.
>
> Would that work for you?

Yeah, this is what I referred to as "liveness nodes" in my prior
ramblings, but I'm a bit sad about the amount of boilerplate work that
will have to be done to put use something like this.  It feels like as
the size of the problem increases, it might become a bit hard to keep
the whole picture in mind.

Here is a slightly more realistic example (still significantly
reduced), to give you an idea of the problem size:

/services/wordpress/settings
/services/wordpress/units/wordpress-0/agent-connected
/services/wordpress/units/wordpress-1
/machines/machine-0/agent-connected
/machines/machine-0/units/wordpress-1
/machines/machine-1/units/wordpress-0

There are quite a few dynamic nodes here which are created and
initialized on demand.  If we use these liveness nodes, we'll have to
not only set watches in several places, but also have some kind of
recovering daemon to heal a half-created state, and also filter
user-oriented feedback to avoid showing nodes which may be dead.  All
of that would be avoided if there was a way to have multi-step atomic
actions.  I'm almost pondering about a journal-like system on top of
the basic API, to avoid having to deal with this manually.

-- 
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/blog
http://niemeyer.net/twitter


Parent nodes & multi-step transactions

2010-08-23 Thread Gustavo Niemeyer
Greetings,

We (a development team at Canonical) are stumbling into a situation
here which I'd be curious to understand what is the general practice,
since I'm sure this is somewhat of a common issue.

It's quite easy to describe it: say there's a parent node A somewhere
in the tree.  That node was created dynamically over the course of
running the system, because it's associated with some resource which
has its own life-span.  Now, under this node we put some control nodes
for different reasons (say, A/B), and we also want to track some
information which is related to a sequence of nodes (say, A/C/D-0,
A/C/D-1, etc).

So, we end up with something like this:

A/B
A/C/D-0
A/C/D-1

The question here is about best-practices for taking care of nodes
like A/C.  It'd be fantastic to be able to create A's structure
together with A itself, otherwise we risk getting in a situation where
a client can see the node A before its "initialization" has been
finished (A/C doesn't exist yet).  In fact, A/C may never exist, since
it is possible for a client to die between the creation of A and C.

Anyway, I'm sure you all understand the problem.  The question here
is: this is pretty common, and quite boring to deal with properly on
every single client.  Is there any feature in the roadmap to deal with
this, and any common practice besides the obvious "check for
half-initialization and wait for A/C to be created or deal with
timeouts and whatnot" on every client?

I'm about to start writing another layer on top of Zookeeper's API, so
it'd be great to have some additional insight into this issue.

-- 
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/blog
http://niemeyer.net/twitter


Re: Securing ZooKeeper connections

2010-05-27 Thread Gustavo Niemeyer
>> actually pat hunt took over that issue: ZOOKEEPER-733. pat has made a
>> lot of progress and the patch looks close to being ready.
>
> This is just the server side though, still need to make similar changes on
> the client. That will likely be a separate jira. But yes, it's coming along.

Oh, that's great news Patrick.  Thanks for pushing this forward!

Do you think the client side might see some attention soon as well?
Or, in other words, do you plan to shift over to the client side once
you're done with the server?

-- 
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/blog
http://niemeyer.net/identi.ca
http://niemeyer.net/twitter


Re: Dynamic adding/removing ZK servers on client

2010-05-03 Thread Gustavo Niemeyer
> I've got a situation where I essentially need dynamic cluster
> membership, which has been talked about in ZOOKEEPER-107 but doesn't
> look like it's going to happen any time soon.

Wow, perfect timing!  Vishal K just commented in the ticket moments
ago that he's interested in writing it. :-)

I'm also quite interested in the outcome of this, since we'll need the
feature pretty soon.

-- 
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/blog
http://niemeyer.net/identi.ca
http://niemeyer.net/twitter


Re: ZooKeeper packages for Ubuntu

2010-02-16 Thread Gustavo Niemeyer
> Actually, the package is in Debian and Ubuntu copied it from there:
> http://packages.debian.org/sid/zookeeper
>
> But thanks to Matthias for helping me with the python binding.
>
> Thomas Koch, http://www.koch.ro

Oh, I'm really sorry about this.  I didn't know the package was based
on something else.

I just got in touch with Matthias to get it rolling because I didn't
find the Debian one, and he got back to me in the last few days
pointing out that it was available at his PPA.  It was my personal
naive assumption that it had been created by them, rather than an
intentional omission.

I apologize for the confusion.

-- 
Gustavo Niemeyer
http://niemeyer.net


ZooKeeper packages for Ubuntu

2010-02-16 Thread Gustavo Niemeyer
Hello everyone,

Thanks to Matthias Klose and Thierry Carrez, we've got ZooKeeper
packaged for Ubuntu:

https://launchpad.net/~ttx/+archive/ppa

This is a Personal Package Archive at the moment, but these packages
may end up being promoted depending on how relevant they are.

Please let me know if these work or do not work for you.

-- 
Gustavo Niemeyer
http://niemeyer.net


Re: Dependency on JBoss JMX

2010-01-28 Thread Gustavo Niemeyer
> Sorry, that was JBoss JMX, more specifically, and I actually don't see
> it being used explicitly either.  Probably an indirect dependency.

Yeah, it's indeed just the JMX implementation, without any direct
links.  Apologies for the false alarm. I'll talk to Matthias about
this and hopefully we'll have it packaged and available soon.

-- 
Gustavo Niemeyer
http://niemeyer.net


Re: Dependency on JBoss JMX

2010-01-28 Thread Gustavo Niemeyer
> there aren't any dependencies on jboss. can you clarify the dependency that
> you are seeing?

Sorry, that was JBoss JMX, more specifically, and I actually don't see
it being used explicitly either.  Probably an indirect dependency.

I'll check back with Matthias Klose, who's doing the favor of
packaging it for us.

-- 
Gustavo Niemeyer
http://niemeyer.net


Dependency on JBoss JMX

2010-01-28 Thread Gustavo Niemeyer
Hello there,

Is the dependency on JBoss a hard one, or is there a way to not use
it?  Perhaps an alternative package providing the same interface?

I'm trying to get it included in Ubuntu and being asked about this.

Thanks in advance,

-- 
Gustavo Niemeyer
http://niemeyer.net


Re: Authentication, encryption, and dynamic membership

2009-11-10 Thread Gustavo Niemeyer
Hey Henry,

> I can't speak as to the other JIRAs, but ZK-107 (dynamic membership) is
> still being worked on by me. This is a very large change to the ZK codebase,
> so I can't see it getting in really before 4.0, although the committers may
> view things differently.

That's great to know, thank you.

> If you have a pressing need for the feature, the mailing list archives
> contain suggestions of how to change your cluster on the fly by doing a
> rolling restart of your nodes with a new configuration.

I've actually brought the topic up before, and have been following
discussions here, so this is indeed an alternative.  That said, the
dynamic membership is indeed what would really help in this case.  I
wouldn't mind restarting the ensemble myself to add/remove nodes, but
it's slightly more tricky if ZooKeeper is being provided as one of the
pieces in some kind of platform.

Thanks for these details.

-- 
Gustavo Niemeyer
http://niemeyer.net


Authentication, encryption, and dynamic membership

2009-11-10 Thread Gustavo Niemeyer
Dear ZooKeepers,

I'm quite interested in the features related to inter-server
authentication, encryption, and dynamic membership.  I *think* the
right JIRAs are 107 and 236.  Are these features likely to see some
activity in the upcoming releases, according to existing roadmaps?

Thanks in advance,

-- 
Gustavo Niemeyer
http://niemeyer.net


Re: zookeeper on ec2

2009-07-06 Thread Gustavo Niemeyer
Hi again,

(...)
> ZK seemed pretty darned stable through all of this.

Sounds like a nice test, and it's great to hear that ZooKeeper works well there.

> The only instability that I saw was caused by excessive amounts of data in
> ZK itself.  As I neared the (small) amount of memory I had allocated for Zk
> use, I would see servers go into paroxysms of GC, but the cluster
> functionality was impaired to a very surprisingly small degree.

Cool, makes sense.

> No.  I considered it, but I wanted fewer moving parts rather than more.
>
> Doing that would make the intricate and unlikely failure mode that Henry
> asked about even less likely, but I don't know if it would increase or
> decrease the probability of any kind of failure.

Yeah, I guess it depends a bit on the system architecture too.  If the
system is designed in such a way that ZK is keeping track of
coordination data which must be resumed after a full stop of the
system, having it stored in persistent data would prevent important
loss of information.  If ZK is really just coordinating ephemeral data
(e.g. locks), then if the whole system goes down, it's ok to just
allow it to start up again in an empty state.

> The observed failure modes for ZK in EC2 were completely dominated by our
> (my) own failings (such as letting too much data accumulate).

Details always take a few iterations to get really right.

Thanks for this data Ted.

-- 
Gustavo Niemeyer
http://niemeyer.net


Re: zookeeper on ec2

2009-07-06 Thread Gustavo Niemeyer
Hi Ted,

> b) EC2 interconnect has a lot more going on than in a dedicated VLAN.  That
> can make the ZK servers appear a bit less connected.  You have to plan for
> ConnectionLoss events.

Interesting.

> c) for highest reliability, I switched to large instances.  On reflection, I
> think that was helpful, but less important than I thought at the time.

Besides the fact that there are more resources for ZooKeeper, this
likely helps as well because it reduces the number of systems
competing for the real hardware.

> d) increasing and decreasing cluster size is nearly painless and is easily
> scriptable.  To decrease, do a rolling update on the survivors to update
(...)

Quite interesting indeed.  I guess the work that Henry is pushing on
these couple of JIRA tickets will greatly facilitate this.

Do you mind if I ask you a couple of questions on this:

Do you have any kind of performance data about how much load ZK can
take under this environment?

Have you tried to put the log and snapshot files under EBS?

-- 
Gustavo Niemeyer
http://niemeyer.net


Re: Dynamic servers addition and persistent storage.

2009-07-01 Thread Gustavo Niemeyer
Hi again Henry,

> I hope to have a patch for both fairly soon. I should at least get ZK-368 to
> a workable position this week, and ZK-107 will hopefully not be an enormous
> amount of work on top of that. However, there doubtless be some slack time
> for picking up bugs etc. before it gets committed as it will be a reasonably
> sized patch.

That's great news.  I've added myself as a watcher on both tickets.

> Out of interest, what's your application for this?

I guess the same basic idea that everyone else has for it: bringing
additional systems up and down dynamically to control scalability and
reliability vs. cost of having many machines running at once.  I
understand it is possible to restart the servers one at a time to
change the server list without a full stop, but if we end up bundling
this in some open source framework for people to use in the wild, the
least manual interaction and procedural maintenance the best.  Having
ZooKeeper clients learning about the server list dynamically will help
a lot in this scenario too.

-- 
Gustavo Niemeyer
http://niemeyer.net


Re: Dynamic servers addition and persistent storage.

2009-07-01 Thread Gustavo Niemeyer
Hey Henry,

> We (and myself in particular) are working on dynamic cluster membership, see
> https://issues.apache.org/jira/browse/ZOOKEEPER-107 and the related
> https://issues.apache.org/jira/browse/ZOOKEEPER-368.

That's fantastic news!  How do you feel this is going so far?  We
might have an application for this pretty soon.

-- 
Gustavo Niemeyer
http://niemeyer.net


Re: General Question about Zookeeper

2009-06-25 Thread Gustavo Niemeyer
Hey Harold,

> I am interested in a security aspect of zookeeper, where the clients and the 
> servers don't necessarily belong to the same "group". If a client creates a 
> znode in the zookeeper? Can the person, who owns the zookeeper server, simply 
> look at its filesystem and read the data (out-of-band, not using a client, 
> simply browsing the file system of the machine hosting the zookeeper server)?

Yes, absolutely.  You could certainly encrypt the data that goes
through the ZooKeeper server, but since ZooKeeper is supposed to be
doing coordination work, I think that if you don't trust the server,
the whole situation might get a bit awkward.  I'm curious about your
use case, since I'm pondering about doing something where clients
don't necessarily trust other clients or machines in the same network
(or even different users in the same machine), thus might require
additional tighting up, but if you don't trust the server itself, that
may be tricky.  Please note that ZooKeeper isn't meant to be used just
as a distributed filesystem for storage, but that's probably not your
intention anyway.

-- 
Gustavo Niemeyer
http://niemeyer.net


Re: Confused about KeeperState.Disconnected and KeeperState.Expired

2009-06-24 Thread Gustavo Niemeyer
> Ben's opinion is that it should not belong in the default API but in the
> common client that another recent thread was about. My opinion is just that
> I need such a functionality, wherever it is.

Understood, sorry.  I just meant that it feels like something that
would likely be useful to other people too, so might have a role in
the default API to ensure it gets done properly considering the
details that Ben brought up.

> If the node gets the exception (or has it's own timer), as I wrote, it will
> shut itself down to release HDFS leases as fast as possible. If ZK is really
> down and it's not a network partition, then HBase is down and this is fine
> because it won't be able to work anyway.

Right, that's mostly what I was wondering.  I was pondering about
under which circumstances the node would be unable to talk to the
ZooKeeper server but would still be holding the HDFS lease in a way
that prevented the rest of the system from going on.  If I understand
what you mean, if ZooKeeper is down entirely, HBase would be down for
good. If the machine was partitioned off entirely, the HDFS side of
things will also be disconnected, so shutting the node down won't help
the rest of the system recovering.

-- 
Gustavo Niemeyer
http://niemeyer.net


Re: Confused about KeeperState.Disconnected and KeeperState.Expired

2009-06-24 Thread Gustavo Niemeyer
Hi Jean-Daniel,

> I understand, maybe the common client is the best place.

It sounds like something useful to have in the default API, FWIW.

> In our situation, if a HBase region server is in the state of being
> disconnected for too long, the regions it's holding cannot be reached so
> this is a major problem. Also, if the HMaster node gets the event that an

Out of curiosity, what do you intend to do when you get the exception?
 I mean, if you didn't get the expiration exception it means that the
reconnection isn't working in any case, so how do you plan to recover?

-- 
Gustavo Niemeyer
http://niemeyer.net


Re: Authentification for Zookeeper Server

2009-06-16 Thread Gustavo Niemeyer
> Remember that the patch is almost trivial.  Add a configuration option
> acceptConnectionsOnlyFromLocalHost, and then in the server connect logic
> reject non-localhost attempts (and log a security note).

Sorry, I was actually pondering about it in comparison with the
investment in implementing some kind of plugin system to allow
server-wide access restrictions.  This shouldn't be too hard to hack
in either, but it'd be best to have some kind of agreement on how to
do it "correctly" so that the work can be integrated upstream, and
this would require some additional involvement to get the APIs right.

-- 
Gustavo Niemeyer
http://niemeyer.net


Re: Authentification for Zookeeper Server

2009-06-16 Thread Gustavo Niemeyer
> I think that the stunnel suggestion actually covers what you want here.
>
> You can set stunnel up so that it listens to a known port and it decrypts
> and forwards traffic to the local zookeeper client port.  You can guarantee
> that no direct connections are possible to the zookeeper in a variety of
> ways, the simplest being a change to zookeeper to allow it to insist that
> all connections be from localhost.
>
> Stunnel can also insist on client certificates so that only approved clients
> would be able to connect.

Indeed, this would cover it reasonably well.  I'd still prefer to have
ZooKeeper itself protecting against unauthorized access to its service
so that the deployment would be simpler, but the stunnel solution
should give me a good path without having to invest in patching
ZooKeeper for a while.

Thanks again for the suggestions.

-- 
Gustavo Niemeyer
http://niemeyer.net


Re: Authentification for Zookeeper Server

2009-06-16 Thread Gustavo Niemeyer
[Mahadev Konar]
> The auth plugin  works at the znode level . The server side authentication
> I was talking about is just to verify the authentication for a zookeeper
> client for creating/reading/changing znodes in ZooKeeper.

Ok, understood.  Thanks for these details.

[Ted Dunning]
> For cluster wide security, I think it is also important to use networking
> hardware security.  In EC2, this corresponds to the security groups.  For
> Linux itself, you do this using iptables.

That's the impression I had as well.  Do you think it'd be too tricky
to implement an equivalent pluggable authentication scheme which would
operate at the server level?  E.g. something that would allow using a
shared secret safely, or certificates.

I'm pondering about the possibility of offering ZooKeeper embedded in
another system, so it'd be best if the system security wasn't
dependent on the network setup which is left to the user that deploys
the packed system.

> The basic idea is that you can lock down the network access to the cluster
> so that to access your ZK cluster, you actually have to be running on a
> correct machine.
>
> This doesn't satisfy the original need, but is an important defense in depth
> adjunct to it.

Makes perfect sense.

> Another way to get connection level security on ZK access would be to use
> something like ssh or stunnel  to allow access to the cluster which is
> otherwise completely locked down except for the ZK nodes talking to each
> other.  This approach does meet the original requirements (I think).

I think so as well.  For the same reasons outlined above, it'd be
fantastic to have the authentication system being independent from the
specific deployment environment.  But this is definitely a viable
alternative otherwise.  It also brings encryption as a plus.

Thanks for these ideas,

-- 
Gustavo Niemeyer
http://niemeyer.net


Re: Authentification for Zookeeper Server

2009-06-16 Thread Gustavo Niemeyer
Hello there,

I'm an interested newcomer to ZooKeeper, so please forgive me if I
miss some important basic detail.

I actually had the same high-level question than the original poster,
so I'm interested in the response too.

>  There is a jira open to document this in our forrest docs -
>
> http://issues.apache.org/jira/browse/ZOOKEEPER-329.
>
> Ill try and explain how to do in the email, feel free to respond with more
> questions. The c and java api both have a call called add_auth/addAuth to
> add authentication data for a client. Also, you can write pulgins at the
> server side to verify this authentication. Take a look at files in
> src/java/main/org/apache/zookeeper/server/auth/.

Oh, interesting.  So the auth plugin API works both at the node level
and at the server level, or is the idea that you simply allow the
client to connect, but prevent it from touching any node at all using
ACLs?

-- 
Gustavo Niemeyer
http://niemeyer.net