On Thu, 2018-08-23 at 16:33 -0400, Bill Torpey wrote:
> Thanks, James!
> 
> That’s a very informative thread.  This whole business with
> process_commands and the way ZeroMQ handles resources seems to be a
> classic case of a “leaky abstraction”: i.e., it all “just works” —
> except when it doesn’t.
> 
> In my particular case, this problem turns out to be a bit of a “red
> herring” — it’s an edge case that was exposed in a specific test
> program, under an unusual set of conditions (e.g., peer processes
> connecting and disconnecting repeatedly combined with code that only
> receives, but never sends, messages).  I wouldn’t expect this
> situation to come up in production — on the other hand, we need to
> understand what the behavior is under unusual conditions, and do
> something about it if that behavior can have negative
> consequences.  In any event, I’ve implemented a (presumably
> superfluous) workaround in my library code to avoid this problem by
> calling zmq_getsockopt that appears to work.  (Of course, for others
> this is not necessarily a red herring, but a real problem).
> 
> But it took a while, and a fair amount of effort, to understand what
> was going on here.  It would be nice if there was some middle ground
> between just treating ZeroMQ as a “black box” and stepping through
> the code line-by-line to figure out how it works.  So far, I haven’t
> seen anything like that, but if anyone in the community knows of any
> resources that might help peel back the cover a little bit, I would
> be very grateful for any recommendations.  (FWIW, the best I’ve found
> so far is http://zeromq.org/whitepapers:architecture, but that
> doesn’t address the whole process_commands business). 
> 
> Last but not least, ZeroMQ is amazing stuff, and I don’t mean to
> sound ungrateful to or critical of the smart people who built and
> maintain it, but it’s part of my job to beat the stuffing out of any
> software that the business is going to depend on and expose any
> problems. 

This is a very old issue that has its roots in some of the commands
needing to be processed in the application thread. The solution is to
refactor and move them to the I/O thread instead.

But it's of course much easier said than done. It's quite complex and
with many unknown ramifications.

So if anybody wants to help with that, the very first thing would be to
add a ton of unit tests around that area. We can now do internal per-
class unit tests (libzmq/unittests) so it's possible.
This will never be doable safely without tests.

> > On Aug 23, 2018, at 10:58 AM, James Harvey
> > <jamesdillonhar...@gmail.com> wrote:
> > 
> > As a side note, having some method to call process_commands while
> > idle would also fix the memory usage issues encountered when using
> > ZMQ_CONFLATE and not reading from the socket.
> > 
> > https://github.com/zeromq/libzmq/issues/3171
> > <https://github.com/zeromq/libzmq/issues/3171>
> > 
> > I added documentaion to periodically call getsockopt with
> > ZMQ_EVENTS but that still requires work on the users side.
> > 
> > On Thu, Aug 23, 2018 at 3:29 PM Bill Torpey <wallstp...@gmail.com
> > <mailto:wallstp...@gmail.com>> wrote:
> > I’m posting this here since not everyone on the list will
> > necessarily see the Github issue, and I’m interested in getting as
> > much feedback as possible.
> > 
> > The issue in question ( https://github.com/zeromq/libzmq/issues/318
> > 6 <https://github.com/zeromq/libzmq/issues/3186> ) has to do with
> > finding a good way to trigger process_commands on inactive
> > sockets.  In our tests, we see real-time memory utilization
> > steadily increase for processes that only subscribe to data when
> > other processes connect and disconnect from their publisher
> > sockets.  The root cause of the problem seems to be that the
> > publisher sockets never get a chance to clean up if we never call
> > zmq_send etc.
> > 
> > The Github issue goes into some detail on potential workarounds,
> > along with their drawbacks.  I would very much appreciate any
> > suggestions that the group may have on how to deal with this
> > problem — I can’t believe that we’re the first to run into it.
> > 
> > Thanks in advance for any suggestions!
> > 
> > _______________________________________________
> > zeromq-dev mailing list
> > zeromq-dev@lists.zeromq.org <mailto:zeromq-dev@lists.zeromq.org>
> > https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> > <https://lists.zeromq.org/mailman/listinfo/zeromq-dev>
> > _______________________________________________
> > zeromq-dev mailing list
> > zeromq-dev@lists.zeromq.org
> > https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> 
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev@lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev

-- 
Kind regards,
Luca Boccassi

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to