Ok, so, I switched to using the single threaded zookeeper lib and pumping the event loop "by hand," essentially replicating what zookeeper_mt does, but in a ruby thread. There's only one thread that's ever touching the zh. So I've been trying to get fork() working again without having to lose my session in the parent. What I thought would work is:
* parent process quiesces the event thread so there are no pending completions. * parent exits event thread, keeps handle open * fork() * child *immediately* calls zookeeper_close * parent resumes event thread Unfortunately, the parent gets: Assertion failed: (cptr), function zookeeper_process, file src/zookeeper.c, line 1959. Abort trap: 6 (the relevant line from the source: http://is.gd/F4sIkq) This is confusing. >From what I can understand from the source (and I am no C programmer), this case is hit when there are "in flight" requests, but for some reason, dequeue_completion is coming back with a NULL. Is this somehow being tripped up in the parent by the child calling zookeeper_close? One last thing, there's no way to close your connection and resume your session later (using the API)? I'm a bit out of my depth here, any help would be really appreciated. On Fri, May 11, 2012 at 2:41 AM, Martin Kou <[email protected]> wrote: > Jon, > > Hmm... I'm not sure when you're going to fork() though. What I've done > before was to do the fork() before each zookeeper_init(). The simplest > scheme is for each process to hold one event loop. Each event loop can then > be shared by as many single-threaded Zookeeper sessions as you see fit. So > let's say you pre-fork() 4 processes, and each process runs 4 Zookeeper > sessions - you'll be able to run 16 Zookeeper sessions in parallel. > > If you do the fork() after zookeeper_init() - then it will get messier. As > I understand each forked process will increment the reference count on any > opened file descriptors. So you'll have to take care to close the "shared" > file descriptors in every "other" process before you call zookeeper_close(). > > Best Regards, > Martin Kou > > On Thu, May 10, 2012 at 9:19 PM, Jonathan Simms <[email protected]> wrote: > >> Well afaict SO_NOSIGPIPE doesn't exist in linux, which kinda sucks, as >> I need this to be cross platform. I even tried hacking the source to >> allow an option to not send the last message (diff is here: >> http://is.gd/NptC0n and yes, I know this is an incredibly naive >> attempt). >> >> This would be an incredibly useful feature in my case (disconnect the >> client and resume the session within the negotiated timeout). >> >> BTW, is it possible to fork safely when using the st library? >> >> Thanks >> >> On Thu, May 10, 2012 at 9:46 PM, Jonathan Simms <[email protected]> wrote: >> > wow >> > >> > That's scary, but, probably also useful. :) >> > >> > I'm considering rewriting this using the st library, considering all >> > the craziness necessary to use the mt lib. >> > >> > I'm gonna go try that out. >> > >> > On Thu, May 10, 2012 at 7:53 PM, Martin Kou <[email protected]> wrote: >> >> If you don't mind the hackish-ness, I think you can just grab the file >> >> descriptor from a Zookeeper handle like this for mt - >> >> >> >> int fd = ((int *)zhandle)[0]; >> >> >> >> This works because the fd is the first field in the _zhandle struct. >> >> >> >> Best Regards, >> >> Martin Kou >> >> >> >> On Thu, May 10, 2012 at 4:51 PM, Martin Kou <[email protected]> >> wrote: >> >> >> >>> I've had a similar problem as well, but I've been using the single >> >>> threaded async library - I actually find it simpler to use than the mt >> >>> library. >> >>> >> >>> The way I do it is this: >> >>> >> >>> During session connect - >> >>> 1. Grab the file descriptor from the C library via >> zookeeper_interest() >> >>> 2. If this is the first time I saw this file descriptor, and it's >> valid, >> >>> do a setsockopt() on it to set SO_NOSIGPIPE to 1. >> >>> >> >>> When I need to "suspend" the session >> >>> 1. close() the file descriptor >> >>> 2. call zookeeper_close() on the handle >> >>> >> >>> zookeeper_close() will try to send the close session message at step 2 >> >>> here. Normally, that would cause a SIGPIPE and your app would crash - >> but >> >>> this time it won't because you've set SO_NOSIGPIPE on the socket. >> Instead, >> >>> the Zookeeper library will see a regular error from its send operation >> and >> >>> it'll free up the handle peacefully without closing the session. >> >>> >> >>> Best Regards, >> >>> Martin Kou >> >>> >> >>> >> >>> On Thu, May 10, 2012 at 4:11 PM, Jonathan Simms <[email protected]> >> wrote: >> >>> >> >>>> Michi, fair point, I actually just looked into it, there doesn't seem >> >>>> to be a way through the api to re-establish the session. If you call >> >>>> zookeeper_close on the handle: >> >>>> >> >>>> "After this call, the client session will no longer be valid. The >> >>>> function will flush any outstanding send requests before return. As a >> >>>> result it may block." >> >>>> >> >>>> I tried: >> >>>> >> >>>> * establish session with handle A >> >>>> * copy clientid_t from handle A >> >>>> * zookeeper_close handle A >> >>>> * construct handle B using clientid_t values from handle A >> >>>> >> >>>> I get back a SESSION_EXPIRED from the server. (debug from mt lib here: >> >>>> https://gist.github.com/3b7e4060746d03cef287) >> >>>> >> >>>> It would be *really* useful if i could basically "suspend" a session >> >>>> while i forked, then reconnect and pick up where i left off. Is this >> >>>> not possible? >> >>>> >> >>>> On Thu, May 10, 2012 at 6:41 PM, Michi Mutsuzaki < >> [email protected]> >> >>>> wrote: >> >>>> > Hi Jonathan, >> >>>> > >> >>>> > It would be very difficult to share multi-threaded zk handle with >> >>>> > child process. I'm surprised it actually works on mac. I think >> saving >> >>>> > session id/password and re-establishing the session in the child >> >>>> > process is more robust and platform independent. >> >>>> > >> >>>> > Thanks! >> >>>> > --Michi >> >>>> >> >>>> >> >>>> > >> >>>> > On Thu, May 10, 2012 at 12:45 PM, Jonathan Simms <[email protected] >> > >> >>>> wrote: >> >>>> >> Hi all, >> >>>> >> >> >>>> >> I'm the maintainer of the ruby zookeeper library, and I'm having >> >>>> >> trouble getting consistent behavior when a user calls fork(). When >> >>>> >> developing it on MacOS (using 3.3.5), I was able to fork, then >> >>>> >> immediately call zookeeper_close() in the child, and then create a >> new >> >>>> >> handle. Testing on Linux, the behavior is much more unpredictable. >> >>>> >> Regularly, it seems there are segfaults when calling >> zookeeper_close. >> >>>> >> https://gist.github.com/22338464cd47e0e50970 >> >>>> >> >> >>>> >> >> >>>> >> So I guess my question is, is there any safe way to fork() while >> the >> >>>> >> client is running? >> >>>> >> >> >>>> >> Another possibility i thought of is to note the session id/passwd, >> >>>> >> close the client, fork, then re-open with the same id/passwd to >> >>>> >> re-establish the session in the parent. >> >>>> >> >> >>>> >> Any recommendations? >> >>>> >> >>> >> >>> >>
