Quoting r. Libor Michalek <[EMAIL PROTECTED]>: > Subject: [PATCH][SDP] AIO buffer corruption > > > Patch to fix the problem a few people reported as ttcp.aio.c > aborting with an error (-104) on longer AIO runs. > > The bug is in the calculation of an AIO buffers starting address. > It would cause data to potentially be written past the end of the > AIO buffer corrupting whatever happen to be there. In the case of > ttcp.aio.c this happen to be the iocb array, which once corrupted > would generate this error when passed to io_submit. > > -Libor > > Signed-off-by: Libor Michalek <[EMAIL PROTECTED]> > > Index: sdp_recv.c > =================================================================== > --- sdp_recv.c (revision 2220) > +++ sdp_recv.c (working copy) > @@ -674,14 +674,16 @@ > #ifndef _SDP_DATA_PATH_NULL > memcpy((addr + offset), buff->data, copy); > #endif > - > + > buff->data += copy; > iocb->post += copy; > iocb->len -= copy; > > offset += copy; > offset &= (~PAGE_MASK); > - > + > + iocb->io_addr += copy; > + > sdp_kunmap(iocb->page_array[counter++]); > } > /* > @@ -1443,7 +1445,8 @@ > iocb->size = size; > iocb->req = req; > iocb->key = req->ki_key; > - iocb->addr = (unsigned long)msg->msg_iov->iov_base; > + iocb->addr = ((unsigned long)msg->msg_iov->iov_base - > + copied); > > req->ki_cancel = sdp_inet_read_cancel; > > Index: sdp_send.c > =================================================================== > --- sdp_send.c (revision 2220) > +++ sdp_send.c (working copy) > @@ -751,6 +751,7 @@ > buff->tail += copy; > iocb->post += copy; > iocb->len -= copy; > + iocb->io_addr += copy; > > offset += copy; > offset &= (~PAGE_MASK); > @@ -2195,7 +2196,7 @@ > iocb->size = size; > iocb->req = req; > iocb->key = req->ki_key; > - iocb->addr = (unsigned long)msg->msg_iov->iov_base; > + iocb->addr = (unsigned long)msg->msg_iov->iov_base - copied; > > req->ki_cancel = sdp_inet_write_cancel; >
Unfortunately I still see data corruptions sometimes with this patch applied. The result for me is the server reporting verification error, closing the socket, and client printing the 104 event. I'm still debugging, but wanted to ask if someone else is seeing this too. Libor, on an unrelated note, could you please generate diffs with -p flag to make it easier to see which function got changed? Thanks, -- MST - Michael S. Tsirkin _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
