Re: NFS, rl0 and Alpha

2000-05-05 Thread Gary Jennejohn

Bill Paul writes:
Of all the gin joints in all the towns in all the world, Gary Jennejohn 
had to walk into mine and say:

 OK. Unfortunately, gdb core dumps when I try to analyze a crash dump
 with a debugging kernel :( Even worse, gdb core dumps when I try to
 run a debugging gdb in gdb to find out why gdb is core dumping when
 I try to debug a kernel with symbols :(( Wonderful.

I suspect this may have something to do with the way packets sometimes
wrap from the end of the RX buffer pool to the beginning. This might
result in fragmentation across multiple mbufs in some cases (I think).
If I squint hard enough, I can see a way for the data to end up misaligned
in one of the additional mbufs.

Try this patch. It's an untested hack (I don't have a RealTek card
in a test box right this second) but should fix the problem if it's
what I think it is.


*** if_rl.c.orig   Sat Apr 29 14:15:10 2000
--- if_rl.cThu May  4 22:16:31 2000
***
*** 913,919 
   goto fail;
   }
  
!  sc-rl_cdata.rl_rx_buf = contigmalloc(RL_RXBUFLEN + 32, M_DEVBUF,
   M_NOWAIT, 0, 0x, PAGE_SIZE, 0);
  
   if (sc-rl_cdata.rl_rx_buf == NULL) {
--- 911,917 
   goto fail;
   }
  
!  sc-rl_cdata.rl_rx_buf = contigmalloc(RL_RXBUFLEN + 1518, M_DEVBUF,
   M_NOWAIT, 0, 0x, PAGE_SIZE, 0);
  
   if (sc-rl_cdata.rl_rx_buf == NULL) {
***
*** 1122,1129 
   wrap = (sc-rl_cdata.rl_rx_buf + RL_RXBUFLEN) - rxbufpos;
  
   if (total_len  wrap) {
   m = m_devget(rxbufpos - RL_ETHER_ALIGN,
! wrap + RL_ETHER_ALIGN, 0, ifp, NULL);
   if (m == NULL) {
   ifp-if_ierrors++;
   printf("rl%d: out of mbufs, tried to "
--- 1120,1132 
   wrap = (sc-rl_cdata.rl_rx_buf + RL_RXBUFLEN) - rxbufpos;
  
   if (total_len  wrap) {
+  /*
+   * Fool m_devget() into thinking we want to copy
+   * the whole buffer so we don't end up fragmenting
+   * the data.
+   */
   m = m_devget(rxbufpos - RL_ETHER_ALIGN,
!  total_len + RL_ETHER_ALIGN, 0, ifp, NULL);
   if (m == NULL) {
   ifp-if_ierrors++;
   printf("rl%d: out of mbufs, tried to "
***
*** 1132,1145 
   m_adj(m, RL_ETHER_ALIGN);
   m_copyback(m, wrap, total_len - wrap,
   sc-rl_cdata.rl_rx_buf);
-  if (m-m_len  sizeof(struct ether_header))
-  m = m_pullup(m,
-  sizeof(struct ether_header));
-  if (m == NULL) {
-  printf("rl%d: m_pullup failed",
-  sc-rl_unit);
-  ifp-if_ierrors++;
-  }
   }
   cur_rx = (total_len - wrap + ETHER_CRC_LEN);
   } else {
--- 1135,1140 

Yes, this patch fixes the problem. Thank you, Bill Paul !

---
Gary Jennejohn / [EMAIL PROTECTED] [EMAIL PROTECTED]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: NFS, rl0 and Alpha

2000-05-05 Thread Bill Paul

Of all the gin joints in all the towns in all the world, Gary Jennejohn 
had to walk into mine and say:

[...] 
 Yes, this patch fixes the problem. Thank you, Bill Paul !

*sigh* It figures. Ok, I applied the patch to -current and -stable. 
We now return you to your regularly scheduled program. Please drive
through.

-Bill

-- 
=
-Bill Paul(212) 854-6020 | System Manager, Master of Unix-Fu
Work: [EMAIL PROTECTED] | Center for Telecommunications Research
Home:  [EMAIL PROTECTED] | Columbia University, New York City
=
 "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness"
=


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: NFS, rl0 and Alpha

2000-05-04 Thread Gary Jennejohn

Matthew Dillon writes:
:Thanks, but there is code in rl_rxeof() to align to a 32 bit boundary.
:If that weren't the case than I would expect the Alpha to panic with
:other IP applications, not just NFS.
:
:I don't know, NFS must be doing something weird.
:
:---
:Gary Jennejohn / [EMAIL PROTECTED] [EMAIL PROTECTED]

NFS will realign the data payload for misaligned packets.

I agree it sounds like an issue in the NFS code somewhere.  Something
that is slipping through unnoticed.  If someone can get a crash dump
and do a stack backtrace, or even a simple DDB 'trace', it should be 
opssible to track the problem down.

OK, I'll analyze my crash dump and send the reults to -current later
today (Thursday).

---
Gary Jennejohn / [EMAIL PROTECTED] [EMAIL PROTECTED]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: NFS, rl0 and Alpha

2000-05-04 Thread Gary Jennejohn

Matthew Dillon writes:
:Thanks, but there is code in rl_rxeof() to align to a 32 bit boundary.
:If that weren't the case than I would expect the Alpha to panic with
:other IP applications, not just NFS.
:
:I don't know, NFS must be doing something weird.
:
:---
:Gary Jennejohn / [EMAIL PROTECTED] [EMAIL PROTECTED]

NFS will realign the data payload for misaligned packets.

I agree it sounds like an issue in the NFS code somewhere.  Something
that is slipping through unnoticed.  If someone can get a crash dump
and do a stack backtrace, or even a simple DDB 'trace', it should be 
opssible to track the problem down.

OK. Unfortunately, gdb core dumps when I try to analyze a crash dump
with a debugging kernel :( Even worse, gdb core dumps when I try to
run a debugging gdb in gdb to find out why gdb is core dumping when
I try to debug a kernel with symbols :(( Wonderful.

I've managed to produce 5 crash dumps so far. Trace in ddb shows that
the kernel is panicing in various places, so Matt's thesis that it will
be easy to pinpoint is apparently shot full of holes :(

I've tried various combinations of nfs mounting with tcp, nfsv2, nfsv3,
w=1024 and r=1024. Using TCP mounts makes the panic happen less quickly,
but as soon as I `ls' a "big" directory the kernel panics. "Big" seems to
be more than 10 or 15 entries.

Anyway, here's some of the output from a trace in ddb:

panic() at panic+0x100
trap() at trap+0x610
XentUna() at XentUna+0x200
[here a list of various locations in the nfs code from various panics]
nfs_readdirrpc() at nfs_readdirrpc+0x10ec
nfs_readdirrpc() at nfs_readdirrpc+0x12bc
nfs_request() at nfs_request+0x79c
nfs3_access_otw() at nfs3_access_otw+0x744
nfs_lookup() at [I didn't write down the offset]
_GLOBAL_OFFSET_TABLE_

Looking at a disassembly of e.g. nfs_readdirrpc tells me nothing at all.
The Alpha's assembly is highly non-transparent. Trying to figure where
the corresponding line in the C-code is located is pretty much impossible
without debugging symbols - but see above.

Looks like I'll have to live without NFS. At least cvsup works so I can
keep my src and ports up to date.

BTW I'm not using any off-the-wall options to compile the kernel. Just
-O -pipe.

---
Gary Jennejohn / [EMAIL PROTECTED] [EMAIL PROTECTED]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: NFS, rl0 and Alpha

2000-05-04 Thread Bill Paul

Of all the gin joints in all the towns in all the world, Gary Jennejohn 
had to walk into mine and say:

 OK. Unfortunately, gdb core dumps when I try to analyze a crash dump
 with a debugging kernel :( Even worse, gdb core dumps when I try to
 run a debugging gdb in gdb to find out why gdb is core dumping when
 I try to debug a kernel with symbols :(( Wonderful.

I suspect this may have something to do with the way packets sometimes
wrap from the end of the RX buffer pool to the beginning. This might
result in fragmentation across multiple mbufs in some cases (I think).
If I squint hard enough, I can see a way for the data to end up misaligned
in one of the additional mbufs.

Try this patch. It's an untested hack (I don't have a RealTek card
in a test box right this second) but should fix the problem if it's
what I think it is.

-Bill

P.S.: Regardless, somebody should fix gdb.


-- 
=
-Bill Paul(212) 854-6020 | System Manager, Master of Unix-Fu
Work: [EMAIL PROTECTED] | Center for Telecommunications Research
Home:  [EMAIL PROTECTED] | Columbia University, New York City
=
 "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness"
=

*** if_rl.c.origSat Apr 29 14:15:10 2000
--- if_rl.c Thu May  4 22:16:31 2000
***
*** 913,919 
goto fail;
}
  
!   sc-rl_cdata.rl_rx_buf = contigmalloc(RL_RXBUFLEN + 32, M_DEVBUF,
M_NOWAIT, 0, 0x, PAGE_SIZE, 0);
  
if (sc-rl_cdata.rl_rx_buf == NULL) {
--- 911,917 
goto fail;
}
  
!   sc-rl_cdata.rl_rx_buf = contigmalloc(RL_RXBUFLEN + 1518, M_DEVBUF,
M_NOWAIT, 0, 0x, PAGE_SIZE, 0);
  
if (sc-rl_cdata.rl_rx_buf == NULL) {
***
*** 1122,1129 
wrap = (sc-rl_cdata.rl_rx_buf + RL_RXBUFLEN) - rxbufpos;
  
if (total_len  wrap) {
m = m_devget(rxbufpos - RL_ETHER_ALIGN,
!  wrap + RL_ETHER_ALIGN, 0, ifp, NULL);
if (m == NULL) {
ifp-if_ierrors++;
printf("rl%d: out of mbufs, tried to "
--- 1120,1132 
wrap = (sc-rl_cdata.rl_rx_buf + RL_RXBUFLEN) - rxbufpos;
  
if (total_len  wrap) {
+   /*
+* Fool m_devget() into thinking we want to copy
+* the whole buffer so we don't end up fragmenting
+* the data.
+*/
m = m_devget(rxbufpos - RL_ETHER_ALIGN,
!   total_len + RL_ETHER_ALIGN, 0, ifp, NULL);
if (m == NULL) {
ifp-if_ierrors++;
printf("rl%d: out of mbufs, tried to "
***
*** 1132,1145 
m_adj(m, RL_ETHER_ALIGN);
m_copyback(m, wrap, total_len - wrap,
sc-rl_cdata.rl_rx_buf);
-   if (m-m_len  sizeof(struct ether_header))
-   m = m_pullup(m,
-   sizeof(struct ether_header));
-   if (m == NULL) {
-   printf("rl%d: m_pullup failed",
-   sc-rl_unit);
-   ifp-if_ierrors++;
-   }
}
cur_rx = (total_len - wrap + ETHER_CRC_LEN);
} else {
--- 1135,1140 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: NFS, rl0 and Alpha

2000-05-03 Thread Doug Rabson

On Tue, 2 May 2000, Matthew Dillon wrote:

 
 :Is anyone else observing kernel panics in the NFS code with Alpha
 :(pc164) and rl0 (the Alpha is running as a client only) ?
 :
 :NFS worked just fine when I had a de0 in the box. After installing an
 :rl0 (I know they suck, but they're so cheap :) I _always_ get an
 :unaligned access panic when I try to access an NFS mounted FS, in any
 :way.
 
 This is almost certainly related to differences in how the
 packet is aligned in memory between de0 and rl0.
 
 If you are getting panics, it is probably at the same
 location every time.  If you can get a kernel core dump
 and backtrace I'll bet we can find and fix this problem
 quickly.

Bill put workarounds for the alpha's alignment restrictions into some of
his drivers but it seems that he missed out rl. Basically the part of the
packet which includes headers needs to have the start of the ip header
aligned to a 4-byte boundary. Since the preceding ethernet header is not
padded to 4 bytes, this often means copying the first part of the packet
to another mbuf.

-- 
Doug Rabson Mail:  [EMAIL PROTECTED]
Nonlinear Systems Ltd.  Phone: +44 20 8442 9037




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: NFS, rl0 and Alpha

2000-05-03 Thread Matthew Dillon

:Thanks, but there is code in rl_rxeof() to align to a 32 bit boundary.
:If that weren't the case than I would expect the Alpha to panic with
:other IP applications, not just NFS.
:
:I don't know, NFS must be doing something weird.
:
:---
:Gary Jennejohn / [EMAIL PROTECTED] [EMAIL PROTECTED]

NFS will realign the data payload for misaligned packets.

I agree it sounds like an issue in the NFS code somewhere.  Something
that is slipping through unnoticed.  If someone can get a crash dump
and do a stack backtrace, or even a simple DDB 'trace', it should be 
opssible to track the problem down.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



NFS, rl0 and Alpha

2000-05-02 Thread Gary Jennejohn

Is anyone else observing kernel panics in the NFS code with Alpha
(pc164) and rl0 (the Alpha is running as a client only) ?

NFS worked just fine when I had a de0 in the box. After installing an
rl0 (I know they suck, but they're so cheap :) I _always_ get an
unaligned access panic when I try to access an NFS mounted FS, in any
way.

Other network activities like telnet, ftp and cvsup cause no panics,
so it doesn't seem to a problem in the IP stack or the rl driver itself.

I have a crash dump, but I haven't analyzed it yet. Just looking for
reports from other users.

BTW I've seen this panic with various kernels, including one with
sources cvsup'd yesterday (about 10 AM MEST).

BTW2 the server (an x86 with rl0) is also running -current of the
same vintage.


Gary Jennejohn / [EMAIL PROTECTED] [EMAIL PROTECTED]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: NFS, rl0 and Alpha

2000-05-02 Thread Matthew Dillon


:Is anyone else observing kernel panics in the NFS code with Alpha
:(pc164) and rl0 (the Alpha is running as a client only) ?
:
:NFS worked just fine when I had a de0 in the box. After installing an
:rl0 (I know they suck, but they're so cheap :) I _always_ get an
:unaligned access panic when I try to access an NFS mounted FS, in any
:way.

This is almost certainly related to differences in how the
packet is aligned in memory between de0 and rl0.

If you are getting panics, it is probably at the same
location every time.  If you can get a kernel core dump
and backtrace I'll bet we can find and fix this problem
quickly.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message