Re: NFS, rl0 and Alpha
Bill Paul writes: Of all the gin joints in all the towns in all the world, Gary Jennejohn had to walk into mine and say: OK. Unfortunately, gdb core dumps when I try to analyze a crash dump with a debugging kernel :( Even worse, gdb core dumps when I try to run a debugging gdb in gdb to find out why gdb is core dumping when I try to debug a kernel with symbols :(( Wonderful. I suspect this may have something to do with the way packets sometimes wrap from the end of the RX buffer pool to the beginning. This might result in fragmentation across multiple mbufs in some cases (I think). If I squint hard enough, I can see a way for the data to end up misaligned in one of the additional mbufs. Try this patch. It's an untested hack (I don't have a RealTek card in a test box right this second) but should fix the problem if it's what I think it is. *** if_rl.c.orig Sat Apr 29 14:15:10 2000 --- if_rl.cThu May 4 22:16:31 2000 *** *** 913,919 goto fail; } ! sc-rl_cdata.rl_rx_buf = contigmalloc(RL_RXBUFLEN + 32, M_DEVBUF, M_NOWAIT, 0, 0x, PAGE_SIZE, 0); if (sc-rl_cdata.rl_rx_buf == NULL) { --- 911,917 goto fail; } ! sc-rl_cdata.rl_rx_buf = contigmalloc(RL_RXBUFLEN + 1518, M_DEVBUF, M_NOWAIT, 0, 0x, PAGE_SIZE, 0); if (sc-rl_cdata.rl_rx_buf == NULL) { *** *** 1122,1129 wrap = (sc-rl_cdata.rl_rx_buf + RL_RXBUFLEN) - rxbufpos; if (total_len wrap) { m = m_devget(rxbufpos - RL_ETHER_ALIGN, ! wrap + RL_ETHER_ALIGN, 0, ifp, NULL); if (m == NULL) { ifp-if_ierrors++; printf("rl%d: out of mbufs, tried to " --- 1120,1132 wrap = (sc-rl_cdata.rl_rx_buf + RL_RXBUFLEN) - rxbufpos; if (total_len wrap) { + /* + * Fool m_devget() into thinking we want to copy + * the whole buffer so we don't end up fragmenting + * the data. + */ m = m_devget(rxbufpos - RL_ETHER_ALIGN, ! total_len + RL_ETHER_ALIGN, 0, ifp, NULL); if (m == NULL) { ifp-if_ierrors++; printf("rl%d: out of mbufs, tried to " *** *** 1132,1145 m_adj(m, RL_ETHER_ALIGN); m_copyback(m, wrap, total_len - wrap, sc-rl_cdata.rl_rx_buf); - if (m-m_len sizeof(struct ether_header)) - m = m_pullup(m, - sizeof(struct ether_header)); - if (m == NULL) { - printf("rl%d: m_pullup failed", - sc-rl_unit); - ifp-if_ierrors++; - } } cur_rx = (total_len - wrap + ETHER_CRC_LEN); } else { --- 1135,1140 Yes, this patch fixes the problem. Thank you, Bill Paul ! --- Gary Jennejohn / [EMAIL PROTECTED] [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: NFS, rl0 and Alpha
Of all the gin joints in all the towns in all the world, Gary Jennejohn had to walk into mine and say: [...] Yes, this patch fixes the problem. Thank you, Bill Paul ! *sigh* It figures. Ok, I applied the patch to -current and -stable. We now return you to your regularly scheduled program. Please drive through. -Bill -- = -Bill Paul(212) 854-6020 | System Manager, Master of Unix-Fu Work: [EMAIL PROTECTED] | Center for Telecommunications Research Home: [EMAIL PROTECTED] | Columbia University, New York City = "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness" = To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: NFS, rl0 and Alpha
Matthew Dillon writes: :Thanks, but there is code in rl_rxeof() to align to a 32 bit boundary. :If that weren't the case than I would expect the Alpha to panic with :other IP applications, not just NFS. : :I don't know, NFS must be doing something weird. : :--- :Gary Jennejohn / [EMAIL PROTECTED] [EMAIL PROTECTED] NFS will realign the data payload for misaligned packets. I agree it sounds like an issue in the NFS code somewhere. Something that is slipping through unnoticed. If someone can get a crash dump and do a stack backtrace, or even a simple DDB 'trace', it should be opssible to track the problem down. OK, I'll analyze my crash dump and send the reults to -current later today (Thursday). --- Gary Jennejohn / [EMAIL PROTECTED] [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: NFS, rl0 and Alpha
Matthew Dillon writes: :Thanks, but there is code in rl_rxeof() to align to a 32 bit boundary. :If that weren't the case than I would expect the Alpha to panic with :other IP applications, not just NFS. : :I don't know, NFS must be doing something weird. : :--- :Gary Jennejohn / [EMAIL PROTECTED] [EMAIL PROTECTED] NFS will realign the data payload for misaligned packets. I agree it sounds like an issue in the NFS code somewhere. Something that is slipping through unnoticed. If someone can get a crash dump and do a stack backtrace, or even a simple DDB 'trace', it should be opssible to track the problem down. OK. Unfortunately, gdb core dumps when I try to analyze a crash dump with a debugging kernel :( Even worse, gdb core dumps when I try to run a debugging gdb in gdb to find out why gdb is core dumping when I try to debug a kernel with symbols :(( Wonderful. I've managed to produce 5 crash dumps so far. Trace in ddb shows that the kernel is panicing in various places, so Matt's thesis that it will be easy to pinpoint is apparently shot full of holes :( I've tried various combinations of nfs mounting with tcp, nfsv2, nfsv3, w=1024 and r=1024. Using TCP mounts makes the panic happen less quickly, but as soon as I `ls' a "big" directory the kernel panics. "Big" seems to be more than 10 or 15 entries. Anyway, here's some of the output from a trace in ddb: panic() at panic+0x100 trap() at trap+0x610 XentUna() at XentUna+0x200 [here a list of various locations in the nfs code from various panics] nfs_readdirrpc() at nfs_readdirrpc+0x10ec nfs_readdirrpc() at nfs_readdirrpc+0x12bc nfs_request() at nfs_request+0x79c nfs3_access_otw() at nfs3_access_otw+0x744 nfs_lookup() at [I didn't write down the offset] _GLOBAL_OFFSET_TABLE_ Looking at a disassembly of e.g. nfs_readdirrpc tells me nothing at all. The Alpha's assembly is highly non-transparent. Trying to figure where the corresponding line in the C-code is located is pretty much impossible without debugging symbols - but see above. Looks like I'll have to live without NFS. At least cvsup works so I can keep my src and ports up to date. BTW I'm not using any off-the-wall options to compile the kernel. Just -O -pipe. --- Gary Jennejohn / [EMAIL PROTECTED] [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: NFS, rl0 and Alpha
Of all the gin joints in all the towns in all the world, Gary Jennejohn had to walk into mine and say: OK. Unfortunately, gdb core dumps when I try to analyze a crash dump with a debugging kernel :( Even worse, gdb core dumps when I try to run a debugging gdb in gdb to find out why gdb is core dumping when I try to debug a kernel with symbols :(( Wonderful. I suspect this may have something to do with the way packets sometimes wrap from the end of the RX buffer pool to the beginning. This might result in fragmentation across multiple mbufs in some cases (I think). If I squint hard enough, I can see a way for the data to end up misaligned in one of the additional mbufs. Try this patch. It's an untested hack (I don't have a RealTek card in a test box right this second) but should fix the problem if it's what I think it is. -Bill P.S.: Regardless, somebody should fix gdb. -- = -Bill Paul(212) 854-6020 | System Manager, Master of Unix-Fu Work: [EMAIL PROTECTED] | Center for Telecommunications Research Home: [EMAIL PROTECTED] | Columbia University, New York City = "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness" = *** if_rl.c.origSat Apr 29 14:15:10 2000 --- if_rl.c Thu May 4 22:16:31 2000 *** *** 913,919 goto fail; } ! sc-rl_cdata.rl_rx_buf = contigmalloc(RL_RXBUFLEN + 32, M_DEVBUF, M_NOWAIT, 0, 0x, PAGE_SIZE, 0); if (sc-rl_cdata.rl_rx_buf == NULL) { --- 911,917 goto fail; } ! sc-rl_cdata.rl_rx_buf = contigmalloc(RL_RXBUFLEN + 1518, M_DEVBUF, M_NOWAIT, 0, 0x, PAGE_SIZE, 0); if (sc-rl_cdata.rl_rx_buf == NULL) { *** *** 1122,1129 wrap = (sc-rl_cdata.rl_rx_buf + RL_RXBUFLEN) - rxbufpos; if (total_len wrap) { m = m_devget(rxbufpos - RL_ETHER_ALIGN, ! wrap + RL_ETHER_ALIGN, 0, ifp, NULL); if (m == NULL) { ifp-if_ierrors++; printf("rl%d: out of mbufs, tried to " --- 1120,1132 wrap = (sc-rl_cdata.rl_rx_buf + RL_RXBUFLEN) - rxbufpos; if (total_len wrap) { + /* +* Fool m_devget() into thinking we want to copy +* the whole buffer so we don't end up fragmenting +* the data. +*/ m = m_devget(rxbufpos - RL_ETHER_ALIGN, ! total_len + RL_ETHER_ALIGN, 0, ifp, NULL); if (m == NULL) { ifp-if_ierrors++; printf("rl%d: out of mbufs, tried to " *** *** 1132,1145 m_adj(m, RL_ETHER_ALIGN); m_copyback(m, wrap, total_len - wrap, sc-rl_cdata.rl_rx_buf); - if (m-m_len sizeof(struct ether_header)) - m = m_pullup(m, - sizeof(struct ether_header)); - if (m == NULL) { - printf("rl%d: m_pullup failed", - sc-rl_unit); - ifp-if_ierrors++; - } } cur_rx = (total_len - wrap + ETHER_CRC_LEN); } else { --- 1135,1140 To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: NFS, rl0 and Alpha
On Tue, 2 May 2000, Matthew Dillon wrote: :Is anyone else observing kernel panics in the NFS code with Alpha :(pc164) and rl0 (the Alpha is running as a client only) ? : :NFS worked just fine when I had a de0 in the box. After installing an :rl0 (I know they suck, but they're so cheap :) I _always_ get an :unaligned access panic when I try to access an NFS mounted FS, in any :way. This is almost certainly related to differences in how the packet is aligned in memory between de0 and rl0. If you are getting panics, it is probably at the same location every time. If you can get a kernel core dump and backtrace I'll bet we can find and fix this problem quickly. Bill put workarounds for the alpha's alignment restrictions into some of his drivers but it seems that he missed out rl. Basically the part of the packet which includes headers needs to have the start of the ip header aligned to a 4-byte boundary. Since the preceding ethernet header is not padded to 4 bytes, this often means copying the first part of the packet to another mbuf. -- Doug Rabson Mail: [EMAIL PROTECTED] Nonlinear Systems Ltd. Phone: +44 20 8442 9037 To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: NFS, rl0 and Alpha
:Thanks, but there is code in rl_rxeof() to align to a 32 bit boundary. :If that weren't the case than I would expect the Alpha to panic with :other IP applications, not just NFS. : :I don't know, NFS must be doing something weird. : :--- :Gary Jennejohn / [EMAIL PROTECTED] [EMAIL PROTECTED] NFS will realign the data payload for misaligned packets. I agree it sounds like an issue in the NFS code somewhere. Something that is slipping through unnoticed. If someone can get a crash dump and do a stack backtrace, or even a simple DDB 'trace', it should be opssible to track the problem down. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
NFS, rl0 and Alpha
Is anyone else observing kernel panics in the NFS code with Alpha (pc164) and rl0 (the Alpha is running as a client only) ? NFS worked just fine when I had a de0 in the box. After installing an rl0 (I know they suck, but they're so cheap :) I _always_ get an unaligned access panic when I try to access an NFS mounted FS, in any way. Other network activities like telnet, ftp and cvsup cause no panics, so it doesn't seem to a problem in the IP stack or the rl driver itself. I have a crash dump, but I haven't analyzed it yet. Just looking for reports from other users. BTW I've seen this panic with various kernels, including one with sources cvsup'd yesterday (about 10 AM MEST). BTW2 the server (an x86 with rl0) is also running -current of the same vintage. Gary Jennejohn / [EMAIL PROTECTED] [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: NFS, rl0 and Alpha
:Is anyone else observing kernel panics in the NFS code with Alpha :(pc164) and rl0 (the Alpha is running as a client only) ? : :NFS worked just fine when I had a de0 in the box. After installing an :rl0 (I know they suck, but they're so cheap :) I _always_ get an :unaligned access panic when I try to access an NFS mounted FS, in any :way. This is almost certainly related to differences in how the packet is aligned in memory between de0 and rl0. If you are getting panics, it is probably at the same location every time. If you can get a kernel core dump and backtrace I'll bet we can find and fix this problem quickly. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message