When I was having issues, (using a gnarly pre-existing code base) I looked at where each socket call was coming from and verified that it was only being used on one thread. Basically just wrote a message to the console for each socket call and printed the thread ID, then analyzed the output.
In cases where refactoring to limit one per thread would be problematic, I was able to use a mutex to allow exclusive access. This worked for me since there were no performance implications. If you're not certain, it might be worth confirming that things are indeed thread safe. FWIW, here's the thread where I muddled through this stuff previously: http://lists.zeromq.org/pipermail/zeromq-dev/2015-December/029445.html My crashes were also happening in encoder.hpp. Josh On Wed, Apr 20, 2016 at 4:58 PM, Joshua Strickon <[email protected]> wrote: > Its mostly single threaded but there could be multiple threads for > different modules and dlls that it uses. It is a bit of a mess and I don’t > think the original developer fully tested it in the production > environment. I was hoping it would be something that upgrading to a later > version of zmq addresses without having to dig into the application code. > > Thanks > > Josh > > On Apr 20, 2016, at 4:52 PM, josh knox <[email protected]> wrote: > > Hi Josh, > > Is your app multi-threaded? Could there be more than one thread hitting > the socket? > > The times that I've had random memory errors with zmq were due to multiple > threads using a socket. > > In my case, either isolating 1 thread per socket, or using other thread > synchonization to prevent concurrent socket use has solved those issues for > me. > > > Josh > > On Wed, Apr 20, 2016 at 4:34 PM, Joshua Strickon <[email protected]> > wrote: > >> I know this is old. I am working on getting an old project up and >> running for a client who >> built it on 2.0.2 and we are seeing these same errors. We are getting >> access violation errors >> and the app is crashing randomly. The windows dump files are pointing to >> these same lines of code as >> described below. What was the resolution on this issue? >> >> thanks >> >> Josh >> >> From: Martin Sustrik <sustrik <at> 250bpm.com> >> Subject: Re: frequent ZeroMQ crashes - how to diagnose? >> <http://news.gmane.org/find-root.php?message_id=4C1C6BF7.6080006%40250bpm.com> >> Newsgroups: gmane.network.zeromq.devel >> <http://news.gmane.org/gmane.network.zeromq.devel> >> Date: 2010-06-19 07:04:23 GMT (5 years, 43 weeks, 5 days, 7 hours and 28 >> minutes ago) >> >> Nick, >> >> > ZeroMQ crashed today. >> > >> > This is a Win32 build of both ZMQ and myApp. >> > myApp was running fine with several thousand messages, when the memcpy >> > code line below threw the >> following exception. >> > >> > "Unhandled exception at 0x6404edd6 (msvcr90d.dll) in myApp.exe: >> > 0xC0000005: *Access* *violation* >> reading location 0xfeeefeee." >> > >> > debugging shows the following values: >> > - buffer 0x00d9b570 "%" unsigned char * >> > pos 2 unsigned int >> > + write_pos 0xfeeefeee <Bad Ptr> unsigned char * >> > to_copy 8190 unsigned int >> > >> > looks like a bad pointer. >> > >> > *encoder*.*hpp* >> > >> > // If there are no data in the buffer yet and we are able >> > to >> > // fill whole buffer in a single go, let's use zero-copy. >> > // There's no disadvantage to it as we cannot stuck >> > multiple >> > // messages into the buffer anyway. Note that subsequent >> > // write(s) are non-blocking, thus each single write >> > writes >> > // at most SO_SNDBUF bytes at once not depending on how >> > large >> > // is the chunk returned from here. >> > // As a consequence, large messages being sent won't block >> > // other engines running in the same I/O thread for >> > excessive >> > // amounts of time. >> > if (!pos && !*data_ && to_write >= buffersize) { >> > *data_ = write_pos; >> > *size_ = to_write; >> > write_pos = NULL; >> > to_write = 0; >> > return; >> > } >> > >> > // Copy data to the buffer. If the buffer is full, return. >> > size_t to_copy = std::min (to_write, buffersize - pos); >> > =======> memcpy (buffer + pos, write_pos, to_copy); >> > pos += to_copy; >> > write_pos += to_copy; >> > to_write -= to_copy; >> > if (pos == buffersize) { >> > *data_ = buffer; >> > *size_ = pos; >> > return; >> > } >> >> It looks like a memory overwrite either in 0MQ or the application. Do >> you have a test program to reproduce the problem? >> >> > Let me know what the error was so that I can fix it in the trunk. >> >> Have you managed to find out what the error code is? >> >> Martin >> >> >> >> _______________________________________________ >> zeromq-dev mailing list >> [email protected] >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >> > > _______________________________________________ > zeromq-dev mailing list > [email protected] > http://lists.zeromq.org/mailman/listinfo/zeromq-dev > > > > _______________________________________________ > zeromq-dev mailing list > [email protected] > http://lists.zeromq.org/mailman/listinfo/zeromq-dev >
_______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
