I guess what you're hitting here is one of those basic laws of physics we hit when building distributed systems. Stuff doesn't happen right away. You'll get the same issue on an unauthenticated socket if you e.g. exit the sender before the message can be fully sent. It's more visible in ZeroMQ in general due to the async background I/O, and perhaps more dramatic when we add extra hops, like the async authentication, yet we have to deal with this in any realistic application.
There are a few classic answers. One is to not exit the sender; this is valid for most use cases, where nodes start and then run as long as possible. Two is to handshake all message. This becomes chatty and slow. Three is to handshake the end of connection. This leads us to "real" protocols, which is where you want to aim for in any significant architecture. Adding a sleep is valid when you're exploring things. We also use them in some test cases to force synchronization. However, it's not a valid answer for real applications, and thus in the examples I don't use sleeps unless it's to represent busy "work". -Pieter On Fri, Apr 18, 2014 at 8:05 AM, Steve Murphy <[email protected]> wrote: > OK, I found it. > > I started tracing how the Curve works underneath, and discovered > something I hadn't thought of: that the encryption process is > asynchronous. It occurred to me that I was terminating the server > immediately after it wrote the data to the client. So, I added > a 1 sec sleep time between writing the "hello" to the client, and > destroying everything and exiting. > > And everything works the cli-ih3.c, and ser-ih3.c. > > I then timed how long I was waiting for recv in ironhouse3.c, I see that > it averaged 2msec , with a few at 7msec on the high end (1000 runs). > That wait held the server back from terminating the connection, and allowed > the data to reach the client. When I split the client and server into two > processes, > that forced wait was lost. > > The equivalent would be to open another socket, and send a message to the > server after the client got its data. If the server were to wait for a > such an acknowledgement to arrive before destroying the auth, cert, > and context, then it would not be chopping off the current data. I used > zclock_sleep(4), and over 1000 runs, no problems occurred. Of course, > there's all sorts of ways to hold back the server, and a fixed wait is > the least dependable way to do it, but, in a pinch, with a generous-enough > wait, you can raise the probablility of success to somewhere near 100%. > Or, at least, you HOPE it will! > > So, my apologies for my jumping to conclusions; perhaps if anyone else > hits these problems, they will be lucky enough to find this correspondence. > Perhaps my fixed-up versions of cli-ih3, and ser-ih3, could be included > somewhere > as an example, with a well commented line including the zclock_sleep(), to > explain > why the wait is necessary? A missed little detail like this really tripped > me up, > maybe 1 in a 1000 will follow my path.... > > murf > > > > > > > On Tue, Apr 15, 2014 at 11:06 AM, Steve Murphy <[email protected]> wrote: >> >> Pieter-- >> >> Basically, ironhouse3.c runs fine >> as a single executable, but if you >> split it into a client and server, (cli-ih3.c, ser-ih3.c) >> it doesn't work. The data is sent >> unencrypted, if it is sent at all. >> >> Sorry about the formatting, I can't predict >> the end-viewer experience. >> >> murf >> >> >> >> On Tue, Apr 15, 2014 at 10:56 AM, Steve Murphy <[email protected]> wrote: >>> >>> Pieter-- >>> >>> If you want to concentrate, then do it on the >>> cli-ih3.c and ser-ih3.c, and ignore the >>> rest. If you can figure out what's going on there, >>> great. I tend to send too much, in favor of not >>> having lots of back-and-forth messages, answering >>> questions. the *-ih3.c files correspond to the >>> ironhouse3.c file, also there, if you need it. >>> >>> murf >>> >>> >>> >>> On Tue, Apr 15, 2014 at 9:44 AM, Pieter Hintjens <[email protected]> wrote: >>>> >>>> Sorry, Steve, that's too much to read and do and digest in the random >>>> fractions of time we have to share here. >>>> >>>> Can you cut this down to a single C program that does the client and >>>> server and sets the keys and reproduces the issue? You can use CZMQ or >>>> the lower level API. You'll see examples of multi-thread client/server >>>> security test cases in the zauth.c selftest method. >>>> >>>> -Pieter >>>> >>>> Ps. your fixed font makes the email wrap weirdly and be harder to read. >>>> >>>> >>>> On Tue, Apr 15, 2014 at 4:44 PM, Steve Murphy <[email protected]> >>>> wrote: >>>> > Hello! >>>> > >>>> > (This message looks best if viewed in a WIDE reader window!) >>>> > >>>> > It looks to me like there's a subtle bug in zmq, but... I've >>>> > thought that generally many times before, only to find it was ME >>>> > the whole time. >>>> > >>>> > I've reproduced the lack of encryption between two of my apps, into >>>> > simpler files serveriron and clientiron. The way >>>> > they use encryption is modeled after the ironhouse.c that is used >>>> > as example by the zmq folks. >>>> > >>>> > I can reproduce the problems in these simple test cases, and I >>>> > think I have enough evidence to report a bug, but... I'm >>>> > very "human". Maybe you folks can spot what I'm doing wrong. >>>> > >>>> > >>>> > I'm getting strangeness when I try to work with encrypted >>>> > communications (via curve). >>>> > >>>> > Find attached my simple test cases. To run it, untar the the attached >>>> > tar file "security-blog.tar.gz", and cd into the newly >>>> > created security-blog dir, and type "make test". It will use the >>>> > currently installed libzmq and libczmq's to build the apps. >>>> > serveriron and clientiron are just the two halves you get >>>> > when you rip the ironhouse example apart to run in separate >>>> > executables, and put the certs in the executables, instead of >>>> > generating new one certs every time. Oh, and you use the certstore >>>> > also for those two public cert files. The Makefile will set things up. >>>> > >>>> > I also run a 'littletest' that just calls the zauth self-test. (Had >>>> > probs >>>> > with CentOS 6.5) >>>> > >>>> > I also run "ironhouse2" which is the same as the ironhouse >>>> > example, except it uses fixed certs for both client and server, >>>> > and grabs the public server cert and feeds that to the client. >>>> > It basically mirrors all the actions of the serveriron/clientiron >>>> > processes, and runs them together in the same process, like ironhouse >>>> > does. >>>> > >>>> > I've tried it with various combinations of czmq and zeromq/libzmq. >>>> > The file "testscript" will build and install libzmq in versions >>>> > 4.0.3, 4.0.4, and the current git version; and also czmq in versions >>>> > 2.0.3, 2.1.0, and the current git latest version. It will run all >>>> > 9 permutations of these. >>>> > >>>> > I have run testscript on a CentOS 6.5 system, and an Ubuntu 13.10 >>>> > system. There are some differences in behavior, but generally, they >>>> > give the same results. >>>> > >>>> > Here's what I see: >>>> > >>>> > for each czmq/zmq combo, "split" means ironhouse split into >>>> > separate client/server processes, with both certstore and >>>> > compiled-in certs. >>>> > >>>> > "lt" is "littletest" and represents running just the zauth self-test >>>> > routine. >>>> > >>>> > "i2" is "ironhouse2", which is basically "split" remerged into a >>>> > single process. >>>> > >>>> > "i3" is "ironhouse3", which is basically "split" remerged into a >>>> > single process, but the client and server have their own >>>> > contexts/zauth. >>>> > >>>> > "i3x" is "ironhouse3x", which is ironhouse3, except we swap the order >>>> > of >>>> > socket instantiation, so the client socket is instantiated before the >>>> > server, >>>> > which mimics real-life (split) behavior. After all, we do want to get >>>> > the >>>> > message from the server, so we have to start the client first, so when >>>> > the >>>> > server sends the hello, we are up and ready to receive it on the >>>> > client >>>> > side. >>>> > Waiting 5 sec between running the client and server, seems to be >>>> > optimal. >>>> > Longer >>>> > yields no better result, shorter yields less (or so it seems). >>>> > >>>> > "i3s" is where I copy ironhouse3.c into cli-ih3.c and ser-ih3.c, and, >>>> > in the >>>> > cli-ih3.c, I remove all the server code, and in ser-ih3.c, I remove >>>> > all the >>>> > client >>>> > code. This is like split, but arrived at step by step (just in case I >>>> > missed >>>> > something in split). >>>> > >>>> > LIBZMQ >>>> > 4.0.3 4.0.4 >>>> > libzmq-git-latest >>>> > >>>> > CZMQ 2.0.3 split: runs OK, split: runs OK, czmq >>>> > 2.0.3 >>>> > doesn't compile >>>> > but no but no >>>> > on this >>>> > libzmq. >>>> > encryption! encryption! >>>> > lt : assert. fail lt : assert. fail >>>> > (CentOS) (CentOS) >>>> > i2 : runs, but no i2 : runs, but no >>>> > encyption! encyption! >>>> > i3 : runs, but no i3 : runs, but no >>>> > encyption! encyption! >>>> > i3x: run w/o encrypt. i3x: runs w/o encryption >>>> > i3s: runs w/o encrypt. i3s: runs w/o encrypt. >>>> > >>>> > CZMQ 2.1.0 split: client hangs split: client hangs czmq >>>> > 2.1.0 >>>> > doesn't compile >>>> > lt: runs OK. lt: runs OK >>>> > on this >>>> > libzmq. >>>> > i2: runs OK. i2: runs OK >>>> > i3: runs OK. i3: runs OK >>>> > i3x: runs OK. i3x: runs OK >>>> > i3s: client hangs i3s: client hangs >>>> > >>>> > CZMQ git latest split: client hangs split: client hangs >>>> > split: >>>> > client hangs >>>> > lt: OK lt: OK lt: >>>> > OK >>>> > i2: OK i2: OK i2: >>>> > OK >>>> > i3: OK i3: OK i3: >>>> > OK >>>> > i3x: OK i3x: OK i3x: >>>> > OK >>>> > i3s: client hangs i3s: client hangs i3s: >>>> > client hangs >>>> > >>>> > >>>> > >>>> > Notes: >>>> > >>>> > >>>> > When I say "client hangs" it means that I wait in zstr_recv() forever. >>>> > I can >>>> > run the >>>> > server multiple times. The server never says "I: "-anything, so even >>>> > if the >>>> > client gets the message, I'd >>>> > expect no encryption. I have found, that if I repeat the tests, I can >>>> > occasionally >>>> > have the client get the hello; but this is not very frequent, and when >>>> > it >>>> > does, I get >>>> > no encryption. Probability <20% maybe < 10%. I played with the sleep >>>> > time >>>> > between firing >>>> > off the client, and starting the server, and it *seems* I get better >>>> > probability with >>>> > sleep = 5 sec, but... I have not done a large statistics exercise, it >>>> > could >>>> > just be >>>> > anecdotal, serendipitous type stuff. I did notice that binding to >>>> > 127.0.0.1 >>>> > instead of * >>>> > increases the probability of the split/ih3 cases having the client >>>> > receive >>>> > the unencrypted message. >>>> > But never at 100%. And never encrypted. >>>> > >>>> > zauth seems to have noticeable problems in czmq-2.0.3. How irontest >>>> > works >>>> > there, but irontest2 >>>> > does not, would be extremely interesting to resolve. (This is >>>> > on >>>> > CentOS 6.5; Ubuntu didn't >>>> > seem to notice. This might be a compiler/linker problem, as >>>> > CentOS is >>>> > NOT cutting edge versions! >>>> > >>>> > In 2.1.0 (and higher), split's client hangs on the recv call, but >>>> > ironhouse2's client does not. The only difference >>>> > I can spot is that in ironhouse2, both client and server share the >>>> > same >>>> > context and process... >>>> > >>>> > So, I created ironhouse3, which is like ironhouse2, but server and >>>> > client >>>> > each use a different zctx. Since >>>> > this also works, then I have to conclude that split and i3 have only 2 >>>> > differences: order that the >>>> > connect/binds are done, and different processes. >>>> > >>>> > So, I created ironhouse3x, which swaps the order. Still works. What's >>>> > left? >>>> > Different processes. >>>> > Is that crazy? >>>> > >>>> > But, perhaps, I'm missing something. Maybe it's a bug in my stuff, >>>> > right >>>> > under my own nose, but I can't >>>> > spot it. >>>> > >>>> > So, at the moment, I think I've ruled out: >>>> > A. My "REAL" application uses ROUTER sockets and has these problems, >>>> > but >>>> > these test sets use PUSH/PULL, and demonstrate the problem, so it >>>> > doesn't >>>> > look socket-type related. >>>> > B. not connect/bind order dependent. >>>> > C. not common/different zctx dependent. >>>> > D. not compiled-in vs file cert dependent. >>>> > >>>> > >>>> > >>>> > Any help that anyone can give, will be highly appreciated! >>>> > >>>> > >>>> > murf >>>> > >>>> > -- >>>> > >>>> > Steve Murphy >>>> > ParseTree Corporation >>>> > 57 Lane 17 >>>> > Cody, WY 82414 >>>> > ✉ murf at parsetree dot com >>>> > ☎ 307-899-5535 >>>> > >>>> > >>>> > >>>> > _______________________________________________ >>>> > zeromq-dev mailing list >>>> > [email protected] >>>> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev >>>> > >>>> _______________________________________________ >>>> zeromq-dev mailing list >>>> [email protected] >>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >>> >>> >>> >>> >>> -- >>> >>> Steve Murphy >>> ParseTree Corporation >>> 57 Lane 17 >>> Cody, WY 82414 >>> ✉ murf at parsetree dot com >>> ☎ 307-899-5535 >>> >>> >> >> >> >> -- >> >> Steve Murphy >> ParseTree Corporation >> 57 Lane 17 >> Cody, WY 82414 >> ✉ murf at parsetree dot com >> ☎ 307-899-5535 >> >> > > > > -- > > Steve Murphy > ParseTree Corporation > 57 Lane 17 > Cody, WY 82414 > ✉ murf at parsetree dot com > ☎ 307-899-5535 > > > > _______________________________________________ > zeromq-dev mailing list > [email protected] > http://lists.zeromq.org/mailman/listinfo/zeromq-dev > _______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
