As expected, SUE (Stupid User Error).  

It was due to a structure alignment issue.  The  
memcpy(&pReply->findReply.service,pServiceEntry,pServiceEntry->serviceLen) 
statement,  The "service" address was aligned to an extra 4 bytes.  The part of 
the message in front of that, was only 12 bytes long.  Thus instead of being 
+12 from the beginning of the message, it resolved to +16, thus overlaying 4 
bytes at the end.  

I apologize for the noise on the list.

ty

On Dec 31, 2012, at 4:04 PM, Brad Taylor <[email protected]> wrote:

> I am having a problem with what looks like a double free in zmq.  I assume it 
> is something I am doing wrong, however I have looked fairly carefully and can 
> not find the problem.  I will document this from the results back to the 
> source code.
> 
> During execution I get:
> 
> ==============> error during executing 
> <==============================================
> 
> *** glibc detected *** 
> /home/brad/mavenSDK/lz_serviceRegistry/Debug/lz_serviceRegistry:double free 
> or corruption (!prev): 0x00000000006096e0 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x36b287c80e]
> /home/brad/Downloads/zeromq-3.2.1/src/.libs/libzmq.so.3(+0x180b3)[0x7ffff7bcd0b3]
> /home/brad/Downloads/zeromq-3.2.1/src/.libs/libzmq.so.3(+0x2f9c3)[0x7ffff7be49c3]
> /home/brad/Downloads/zeromq-3.2.1/src/.libs/libzmq.so.3(+0x2fe7b)[0x7ffff7be4e7b]
> /home/brad/Downloads/zeromq-3.2.1/src/.libs/libzmq.so.3(+0x2637e)[0x7ffff7bdb37e]
> /home/brad/Downloads/zeromq-3.2.1/src/.libs/libzmq.so.3(+0x21512)[0x7ffff7bd6512]
> /home/brad/Downloads/zeromq-3.2.1/src/.libs/libzmq.so.3(+0x15194)[0x7ffff7bca194]
> /home/brad/Downloads/zeromq-3.2.1/src/.libs/libzmq.so.3(+0x1429e)[0x7ffff7bc929e]
> /home/brad/Downloads/zeromq-3.2.1/src/.libs/libzmq.so.3(+0x2aba6)[0x7ffff7bdfba6]
> 
> ================>Environment<======================
> 
> The service registry is a "daemon" that listens for beacons from service 
> providers, Records those services.  It also listens for requests that "find" 
> to find the endpoint for those services.  
> 
> It is during this request processing, while constructing the and sending the 
> reply that the problem occurs.  The following two routines are where the 
> problem occurs.  The second routine is the driving routine, it receives 
> control when a "find" message is received in an upper polling loop.
> 
>  The sendResponse routine is invoked, because the requested service was 
> found.  The only thing odd I can think of that I am doing, is the subroutine 
> is the one that does the zmq_msg_init_size(), and upon return the 
> zmq_msg_send and zmq_msg_close are done.
> 
> I have numbered the sequence the statements are actually executed in.  Any 
> thoughts on how to debug this problem would be appreciated
> 
> ============>Source code<==============
> 
> 
> typedef struct MSGREPLY {
>       zmq_msg_t               *pPriorMessage;
>       zmq_msg_t               sPriorMessage;
> }MSGREPLY;
> 
> /**
>  * This routine will send a response to the find message.  This is a 
> "delayed" response because
>  * we do not know if there will be any "more" responses to be sent.
>  * We utilize the multi-part message facility of zmq to do this.  Each part 
> of the message
>  * is a service that matches the requestors specification, that is for a find 
> request
>  * there may be multiple service providers for the requested service.  For 
> list request multiple services may exist.
>  */
> static int sendResponse(void * context, void *pReplySocket,
>                                               LZSERVICE *pServiceEntry,
>                                               MSGREPLY *pMsgReply,
>                                               LZSERVICEMSGS msgID) {
> 
>       int                                     msgSize;
>       LZSERVICEREPLY          *pReply;
> 
> ====>8        if (pMsgReply->pPriorMessage) {
>                       
> zmq_msg_send(pMsgReply->pPriorMessage,pReplySocket,ZMQ_SNDMORE);
>                       zmq_msg_close(pMsgReply->pPriorMessage);
>               }
> ====>9        pMsgReply->pPriorMessage = &pMsgReply->sPriorMessage;
> ====>10       msgSize = sizeof(LZSERVICEREPLYHDR)+pServiceEntry->serviceLen;
> 
> ====>11    zmq_msg_init_size (pMsgReply->pPriorMessage, msgSize);
> ====>12    pReply = zmq_msg_data(pMsgReply->pPriorMessage);
> ====>13    LZSERVICE_SET_REPLY(pReply);
> ====>14    pReply->replyHdr.rc = 0;
> ====>15    pReply->replyHdr.msghdr.msgID = msgID;
> ====>16    
> memcpy(&pReply->findReply.service,pServiceEntry,pServiceEntry->serviceLen);
> ====>17    pReply->findReply.service.pServiceName = NULL;
> ====>18    pReply->findReply.service.pServiceURI = NULL;
> 
> ====>19       return 0;
> }
> 
> /**
>  * This routine will find the requested service
>  * This is a "request" message sent on a request socket.  That is it requires 
> a response
>  * Since there may be multiple instances of a given service running on 
> multiple blades or multiple processes on a blade
>  * duplicate service name entries may exist.  We return all of the instances 
> that match the requested name
>  * When there is more than one, we send subsequent service responses in 
> separate message frames
>  */
> static int processFindService(void * context, LZSERVICEMESSAGE *pFindMessage, 
> void *pReplySocket) {
> 
>       int                                     msgSize;
>       LZSERVICEREPLY          *pFindReply;
>       LZSERVICE                       *pServiceEntry;
>       char                            *pServiceName;
>       GList                           *iterator = NULL;
>       MSGREPLY                        sReply;
>       zmq_msg_t                       sReplyMessage;
>       char                            routineName[] = 
> "lzServiceRegistry:processFindService";
> 
> ===>1 pServiceName = pFindMessage->findService.aServiceName;
> ===>2 sReply.pPriorMessage = NULL;
> 
>       
> /*--------------------------------------------------------------------------+
>        * Look for matching service names in the global service table          
>         |
>        * for each found entry, construct a response and send it               
>                         |
>        
> *-------------------------------------------------------------------------*/
> ====>3        for (iterator = pGlobalServices; iterator; iterator = 
> iterator->next) {
> ====>4                pServiceEntry = iterator->data;
> ====>5                if (pServiceEntry->serviceNameLen ==  
> pFindMessage->findService.serviceNameLen) {
> ====>6                        if 
> (!memcmp(pServiceEntry->pServiceName,pServiceName,pServiceEntry->serviceNameLen))
>  {
> ====>7                                
> sendResponse(context,pReplySocket,pServiceEntry,&sReply, 
> LZSERVICEMSGS_FINDSERVICE);
>                               if (options.debugMode) {
>                                       fprintf(stderr,"%s: service %s was 
> found\n",routineName,pServiceName);
>                               }
>                       }
>               }
>       }
> ====>20       if (sReply.pPriorMessage) {
> ====>21               zmq_msg_send(sReply.pPriorMessage,pReplySocket,0);     
> =============This is where I get the exception ===================
>               zmq_msg_close(sReply.pPriorMessage);
>       }
>       else {
>               msgSize = sizeof(LZSERVICEREPLY);
>           zmq_msg_init_size (&sReplyMessage, msgSize);
>           pFindReply = zmq_msg_data(&sReplyMessage);
>           LZSERVICE_SET_REPLY(pFindReply);
>           pFindReply->replyHdr.msghdr.msgID = LZSERVICEMSGS_FINDSERVICE;
>           pFindReply->replyHdr.rc = LZSERVICERC_SERVICENOTFOUND;
>               zmq_msg_send(&sReplyMessage,pReplySocket,0);
>               zmq_msg_close(&sReplyMessage);
>               if (options.debugMode) {
>                       fprintf(stderr,"%s: service name, %s, not 
> found\n",routineName,pServiceName);
>               }
>       }
> 
> 
>       return 0;
> }

_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to