> Ok, I did a little digging in mdb. I'm just going to > say up front that I've NEVER debugged a Kernel, but i > am the unix developer for my software company, so I > have a little experience. > > From the stack trace, the last call was: > ffffff0003aed880 xnb_copy_to_peer+0x32(ffffff016a34f000, ffffff013a9368a0) > > The first parameter doesn't evaluate to anything based on: > > ffffff016a34f000::dump > \/ 1 2 3 4 5 6 7 8 9 a b c d e f v123456789abcdef > mdb: failed to read data at 0xffffff016a34f000: no mapping for address
That first parameter is a big data structure that gets dynamically allocated / freed. Since there is "no mapping", the structure must have been freed, but some part of the kernel is still doing function calls to xnb_copy_to_peer passing a pointer to the freed memory block as first argument. > Is the function xnb_copy_to_peer suppose to assign > the second parameter to the first? Is so, that may > explain the problem. The second parameter was NULL, > and if that wasn't check, it could be a NULL pointer > exception when it was attempted to be used. Source code for xnb_copy_to_peer can be found here: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/xen/io/xnb.c#926 926 mblk_t * 927 xnb_copy_to_peer(xnb_t *xnbp, mblk_t *mp) 928 { 929 mblk_t *free = mp, *mp_prev = NULL, *saved_mp = mp; 930 mblk_t *ml, *ml_prev; 931 gnttab_copy_t *gop_cp; 932 boolean_t notify; 933 RING_IDX loop, prod; 934 int i; 935 936 if (!xnbp->xnb_hv_copy) 937 return (xnb_to_peer(xnbp, mp)); 938 939 /* 940 * For each packet the sequence of operations is: 941 * 942 * 1. get a request slot from the ring. 943 * 2. set up data for hypercall (see NOTE below) 944 * 3. have the hypervisore copy the data 945 * 4. update the request slot. 946 * 5. kick the peer. 947 * 948 * NOTE ad 2. 949 * In order to reduce the number of hypercalls, we prepare 950 * several packets (mp->b_cont != NULL) for the peer and 951 * perform a single hypercall to transfer them. 952 * We also have to set up a seperate copy operation for 953 * every page. 954 * 955 * If we have more than one message (mp->b_next != NULL), 956 * we do this whole dance repeatedly. 957 */ 958 959 mutex_enter(&xnbp->xnb_tx_lock); In vmcore.6 it is crashing at line 936, when trying to dereference an invalid (freed?) pointer. The enabled heap checking probably has removed mmu mappings for the freed block, so that we get a page fault when trying to access the freed data. In vmcore.5 it was was crashing inside the mutex_enter call at line 959. This was without heap checking; the xnbp pointer points to mapped memory, but that probably has already been re-used by someone else and now contains unexpected data (==> panic: bad mutex owner). vmcore.6 looks very similar to the issue reported as bug 6600374 / 6657428 ... This message posted from opensolaris.org _______________________________________________ xen-discuss mailing list [email protected]
