On Aug 30, 2012, at 5:10 PM, Henry Robinson <[email protected]> wrote:

> FWIW, this is my only reservation about a pure Python client - there isn't
> a spec, and three separate implementations that might have subtly different
> behaviours can be a nightmare to maintain. Ben - if you're able to turn any
> of your efforts towards documenting your observations about how the
> protocol actually works, that would be awesome.

Given that most ppl don't use the client directly, they program against 
'helpers' such as Curator, I'm not sure thats such a big deal. The c style of 
completion callbacks already wasn't terribly useful with the fact that Python 
has its basic threading way, and many ppl use gevent for an async style (or 
twisted, also quite different).

The C binding falls apart badly with gevent, since gevent has issues talking 
across the C thread to the gevent hub. This necessitates a lot of very hacky 
Python code to try and bridge it, which isn't needed when its pure Python. And 
trying to finangle the zookeeper logging stream into proper Python logging was 
another set of hacks.

Right now, the code for the protocol handling looks very close to the Java 
client (Alan Cabrera did some great work on pookeeper which I refactored into 
kazoo). I don't see any reason to deviate too heavily, though of course there 
are some things that are nicer to implement for a more Pythonic feel. Keeping 
the 'feel' from a user perspective similar to the existing Zookeeper 
Programmers Guide is best just to avoid having to replicate docs.

On the happy note, Kazoo actually has a good amount of docs:
- http://kazoo.readthedocs.org/

Unfortunately the C lib only has doc strings, which make the Java API docs look 
like a documentation heaven in comparison. And you can't even get to the C API 
docs online... I finally got tired of digging them out of the source and using 
doxygen so I copied them up to my own server here:
http://groovie.org/zkdocs/

Some notes on the implementation...

Pure Python Zookeeper implementation:
- Approx. 280 lines of code for the socket response/request handling
- Approx. 250 lines of code for the request/response 
serialization/deserialization
- Anyone that knows Python reasonably well can trouble-shoot and contribute
- Can be used in gevent, Pypy, Jython (Jython doesn't even have a GIL!)
- Worst case... an exception bubbles up that you might have to catch

zkpython (Python binding to C lib):
- Approx. 1500 lines of C (Not including the C lib itself, which is another ~ 
8000 lines of code)
- Anyone that knows and wants to read the C library *and* Python *and* the 
Python C binding tricks can trouble-shoot and contribute. And maybe the patches 
will actually be accepted and incorporated at some point...
- Only usable with CPython
- Worst case... Python segfaults

I really don't know many (hardly any) Python developers that know C well enough 
to debug it or dive into it. If their only Zookeeper experience is marred by 
bugs in the 'black box' of C, they'll move on to something else. Which saddens 
me cause I think Zookeeper is pretty awesome.

It took a week for us to figure out why our test suite failed on rare 
occasions. This wasn't helped by the fact that Zookeeper doesn't tell you if 
you supply a bad session id/password you don't get what you do in every app 
known to mankind (bad password or username).... it tells you SESSION_EXPIRED. 
Which is insanely confusing when you see your other client using that 
id/password happily connected still. We had to debug the Java server, use gdb 
and such to debug two C libs, etc. I really really don't want to ever repeat 
that experience, it was that bad. :)

On a side-note, why on Earth does Zookeeper not give you an AUTH_FAILED when 
you fail the auth for the session ID/password on connect?

I'd be happy to document and post more implementation details I've found about 
the actual protocol. I think it makes sense that powerful dynamic languages 
implement the protocol directly in a manner thats documented by the Zookeeper 
project rather than being crippled by using the C lib, and suffering segfaults 
as a result. Already for kazoo, I've been posting implementation details about 
how kazoo handles the C lib and bridging it to gevent/threads to help avoid 
common errors:
http://kazoo.readthedocs.org/en/latest/implementation.html

I can update that for protocol details, though it'd prolly be more useful to 
have a page on the Zookeeper site itself that discusses and documents the 
protocol and how it should be implemented for consistency.

> And as regards the unapplied Python patches - that's my bad, I should be
> committing them much more often. Can you give me a list of those you've
> found useful, and in return for your excellent work I'll get them committed
> as soon as I can?


Well, we've been maintaining a static zookeeper python library here:
https://github.com/python-zk/zc-zookeeper-static/

We've been adding critical patches to it as we've found them on Jira and in our 
own tests. Each Jira bug ticket is linked to on there. Several of those are 
patched in the custom ubuntu compiled distro of the python-zookeeper bindings 
as well.

But obviously at some point it becomes futile. We'd like to use the read-only 
feature, but there's no hope of that getting into the Python binding since its 
still not in the C lib: https://issues.apache.org/jira/browse/ZOOKEEPER-827

There's been patches for that since 2010... and still its not resolved. That's 
pretty discouraging, and given the lack of online generated C docs there's 
definitely a "we don't care much about the C lib" message being broadcast. It's 
very obvious the Java client is what gets the support. Searching Jira for 
'zkpython' and seeing the various unresolved memory leaks and segfault issues 
is also sad. Kazoo is already getting use at several companies, and we all want 
this thing to be solid, to not seg-fault our Python, and to be able to easily 
trouble-shoot it without going through C gymnastics. :)

Cheers,
Ben

Reply via email to