ZeroMQ crashed today.

This is a Win32 build of both ZMQ and myApp.
myApp was running fine with several thousand messages, when the memcpy code 
line below threw the following exception. 

"Unhandled exception at 0x6404edd6 (msvcr90d.dll) in myApp.exe: 0xC0000005: 
Access violation reading location 0xfeeefeee."

debugging shows the following values:
-               buffer  0x00d9b570 "%"  unsigned char *
                pos     2       unsigned int
+               write_pos       0xfeeefeee <Bad Ptr>    unsigned char *
                to_copy 8190    unsigned int

looks like a bad pointer.

encoder.hpp

                //  If there are no data in the buffer yet and we are able to
                //  fill whole buffer in a single go, let's use zero-copy.
                //  There's no disadvantage to it as we cannot stuck multiple
                //  messages into the buffer anyway. Note that subsequent
                //  write(s) are non-blocking, thus each single write writes
                //  at most SO_SNDBUF bytes at once not depending on how large
                //  is the chunk returned from here.
                //  As a consequence, large messages being sent won't block
                //  other engines running in the same I/O thread for excessive
                //  amounts of time.
                if (!pos && !*data_ && to_write >= buffersize) {
                    *data_ = write_pos;
                    *size_ = to_write;
                    write_pos = NULL;
                    to_write = 0;
                    return;
                }

                //  Copy data to the buffer. If the buffer is full, return.
                size_t to_copy = std::min (to_write, buffersize - pos);
=======>        memcpy (buffer + pos, write_pos, to_copy); 
                pos += to_copy;
                write_pos += to_copy;
                to_write -= to_copy;
                if (pos == buffersize) {
                    *data_ = buffer;
                    *size_ = pos;
                    return;
                }


Hi Nick,

> We are a small financial startup using messaging for communication 
> between our various applications.
> 
> We chose ZeroMQ because of speed and flexibility - both the set of 
> languages and number of distributed systems we have is growing and yet 
> to be determined.

Understood.

> What should our diagnosis strategy be to chase this difficult bug down?

The only problem here seems to be that Windows returns some error we 
haven't expected. The only thing that needs to be done is find out what 
the error is and add it to the list (the long if statement in the code 
you've sent).

wsa_assert should print the error to stderr -- can you check it in the 
console?

Let me know what the error was so that I can fix it in the trunk.

Thanks!
Martin
_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to