[OOPS] pegasus + MediaGX: Oops in khubd, the continuing story?

2001-05-04 Thread Frank de Lange

Well,

I got fed up with all those Oops'es, so I started scribbling one on a piece of
paper. This is what ksymoops makes of it:

ksymoops 2.4.1 on i586 2.4.4.  Options used
 -V (default)
 -k /var/log/ksymoops/20010504223943.ksyms (specified)
 -l /var/log/ksymoops/20010504223943.modules (specified)
 -o /lib/modules/2.4.3 (specified)
 -m /boot/System.map-2.4.3 (specified)

Warning (compare_maps): snd symbol pm_register not found in 
/usr/lib/alsa-modules/2.4.3/0.5/snd.o.  Ignoring /usr/lib/alsa-modules/2.4.3/0.5/snd.o 
entry
Warning (compare_maps): snd symbol pm_send not found in 
/usr/lib/alsa-modules/2.4.3/0.5/snd.o.  Ignoring /usr/lib/alsa-modules/2.4.3/0.5/snd.o 
entry
Warning (compare_maps): snd symbol pm_unregister not found in 
/usr/lib/alsa-modules/2.4.3/0.5/snd.o.  Ignoring /usr/lib/alsa-modules/2.4.3/0.5/snd.o 
entry
eip: c010f6f3
Oops: 
CPU: 0
EIP: 0010:[]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010007
eax: c2667000   ebx:  ecx: c2686000   edx: 
esi: 0046   edi: fff8 ebp: c26c7ce8   esp: c26c7ccc
ds: 0018   es: 0018   ss: 0018
Process khubd (pid: 428, stackpage=c26c7000)
Stack: c2686000 c2686074 c283ee40 c26861d0 0001 0286 0001 c283ee40
   c4c840e5 c2686074 c4c7d222 c2686074 2f10 c2686074 0002 c4c7eccd
   c2686074 c4c88010 c4c88010 c2a6c000 0006 c2666000 
Call Trace: c4c840e5 c4c7d222 c4c7cccd c4c88010 c4c88010 c4c7fe9b c4c88014
c4c80857 c4c8000c c4c88000 c01077df c010813e c0106e60 c0115054
c0108171 c0106c60 c011196c c4c84213 c4c859c0 c4c84601 0006
c4c851a2 5f5f c4c85564 c4c86334 c4c8639c c4c86380 c4c8639c
c4c70ad2 c4c86334 c4c7b2e0 c4c70d5b c4c72988 c4c73dba c4c7b334
c4c73fa2 c4c7b36c c4c7b36c c4c74135 c010542c
Code: 8b 4f 04 8b 1b 8b 01 85 45 fc 74 51 31 c0 9c 5e fa c7 01 00

>>EIP; c010f6f3 <__wake_up+33/a8>   <=
Trace; c4c840e5 <[pegasus]__module_parm_desc_loopback+25/28>
Trace; c4c7d222 <[usb-ohci]sohci_return_urb+10e/118>
Trace; c4c7cccd <[usbcore]__kstrtab_usb_devfs_handle+1291/15c4>
Trace; c4c88010 <.data.end+1c51/>
Trace; c4c88010 <.data.end+1c51/>
Trace; c4c7fe9b <[usb-ohci]hc_release_ohci+4b/b0>
Trace; c4c88014 <.data.end+1c55/>
Code;  c010f6f3 <__wake_up+33/a8>
 <_EIP>:
Code;  c010f6f3 <__wake_up+33/a8>   <=
   0:   8b 4f 04  mov0x4(%edi),%ecx   <=
Code;  c010f6f6 <__wake_up+36/a8>
   3:   8b 1b mov(%ebx),%ebx
Code;  c010f6f8 <__wake_up+38/a8>
   5:   8b 01 mov(%ecx),%eax
Code;  c010f6fa <__wake_up+3a/a8>
   7:   85 45 fc  test   %eax,0xfffc(%ebp)
Code;  c010f6fd <__wake_up+3d/a8>
   a:   74 51 je 5d <_EIP+0x5d> c010f750 <__wake_up+90/a8>
Code;  c010f6ff <__wake_up+3f/a8>
   c:   31 c0 xor%eax,%eax
Code;  c010f701 <__wake_up+41/a8>
   e:   9cpushf  
Code;  c010f702 <__wake_up+42/a8>
   f:   5epop%esi
Code;  c010f703 <__wake_up+43/a8>
  10:   facli
Code;  c010f704 <__wake_up+44/a8>
  11:   c7 01 00 00 00 00 movl   $0x0,(%ecx)


3 warnings issued.  Results may not be reliable.

I may have made some transcription errors, but the main stuff is there.

This Oops (and others just like it) appear when the pegasus module is reloaded
into the system. Some info on the system and the circumstances:

MediaGXLV (200 MHz) + 5530 'kahlua' companion chip
 (so this is ohci usb)
60 MB RAM (+4MB for video)

SMC 2202 (pegasus chip) 10/100tx USB NIC on a 10baseT LAN

Oops also appears on 2.4.4

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



pegasus + MediaGX: Oops in khubd, the continuing story?

2001-05-04 Thread Frank de Lange

Hi'all,

I'm experiencing loads of intermittent Oops'es when loading the pegasus driver
(for an SMC 2202) on my MediaGX-equipped (Webplayer) systems. A scan of the
lists turned up more problems with the MediaGX (which contains an OHCI
implementation in the 5530 companion chip) in combination with the pegasus
driver, so I'm not the only one it seems...

The Oops'es are mostly in the khubd process, but they sometimes appear in other
programs (insmod, ifconfig). They always lead to an immedate panic, and nothing
is ever written to any log. When I tried to copy the Oops by hand on a
notebook, the harddisk in that thing chose that specific moment to drop dead (I
was nearly finished typing in the last call trace address...). And there was no
rejoicing, and no call trace... Sorry...

Is this a known problem (MediaGX + pegasus == intermittent Oops on
load/reload), or am I telling something new? If I am, I'll create that call
trace and run it through ksymoops, if it is known I'd rather spare myself the
chore of typing in loads and loads of hex code. I've done enough of that in my
Commodore-64 days...

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



pegasus + MediaGX: Oops in khubd, the continuing story?

2001-05-04 Thread Frank de Lange

Hi'all,

I'm experiencing loads of intermittent Oops'es when loading the pegasus driver
(for an SMC 2202) on my MediaGX-equipped (Webplayer) systems. A scan of the
lists turned up more problems with the MediaGX (which contains an OHCI
implementation in the 5530 companion chip) in combination with the pegasus
driver, so I'm not the only one it seems...

The Oops'es are mostly in the khubd process, but they sometimes appear in other
programs (insmod, ifconfig). They always lead to an immedate panic, and nothing
is ever written to any log. When I tried to copy the Oops by hand on a
notebook, the harddisk in that thing chose that specific moment to drop dead (I
was nearly finished typing in the last call trace address...). And there was no
rejoicing, and no call trace... Sorry...

Is this a known problem (MediaGX + pegasus == intermittent Oops on
load/reload), or am I telling something new? If I am, I'll create that call
trace and run it through ksymoops, if it is known I'd rather spare myself the
chore of typing in loads and loads of hex code. I've done enough of that in my
Commodore-64 days...

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est.  ]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[OOPS] pegasus + MediaGX: Oops in khubd, the continuing story?

2001-05-04 Thread Frank de Lange

Well,

I got fed up with all those Oops'es, so I started scribbling one on a piece of
paper. This is what ksymoops makes of it:

ksymoops 2.4.1 on i586 2.4.4.  Options used
 -V (default)
 -k /var/log/ksymoops/20010504223943.ksyms (specified)
 -l /var/log/ksymoops/20010504223943.modules (specified)
 -o /lib/modules/2.4.3 (specified)
 -m /boot/System.map-2.4.3 (specified)

Warning (compare_maps): snd symbol pm_register not found in 
/usr/lib/alsa-modules/2.4.3/0.5/snd.o.  Ignoring /usr/lib/alsa-modules/2.4.3/0.5/snd.o 
entry
Warning (compare_maps): snd symbol pm_send not found in 
/usr/lib/alsa-modules/2.4.3/0.5/snd.o.  Ignoring /usr/lib/alsa-modules/2.4.3/0.5/snd.o 
entry
Warning (compare_maps): snd symbol pm_unregister not found in 
/usr/lib/alsa-modules/2.4.3/0.5/snd.o.  Ignoring /usr/lib/alsa-modules/2.4.3/0.5/snd.o 
entry
eip: c010f6f3
Oops: 
CPU: 0
EIP: 0010:[c010f6f3]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010007
eax: c2667000   ebx:  ecx: c2686000   edx: 
esi: 0046   edi: fff8 ebp: c26c7ce8   esp: c26c7ccc
ds: 0018   es: 0018   ss: 0018
Process khubd (pid: 428, stackpage=c26c7000)
Stack: c2686000 c2686074 c283ee40 c26861d0 0001 0286 0001 c283ee40
   c4c840e5 c2686074 c4c7d222 c2686074 2f10 c2686074 0002 c4c7eccd
   c2686074 c4c88010 c4c88010 c2a6c000 0006 c2666000 
Call Trace: c4c840e5 c4c7d222 c4c7cccd c4c88010 c4c88010 c4c7fe9b c4c88014
c4c80857 c4c8000c c4c88000 c01077df c010813e c0106e60 c0115054
c0108171 c0106c60 c011196c c4c84213 c4c859c0 c4c84601 0006
c4c851a2 5f5f c4c85564 c4c86334 c4c8639c c4c86380 c4c8639c
c4c70ad2 c4c86334 c4c7b2e0 c4c70d5b c4c72988 c4c73dba c4c7b334
c4c73fa2 c4c7b36c c4c7b36c c4c74135 c010542c
Code: 8b 4f 04 8b 1b 8b 01 85 45 fc 74 51 31 c0 9c 5e fa c7 01 00

EIP; c010f6f3 __wake_up+33/a8   =
Trace; c4c840e5 [pegasus]__module_parm_desc_loopback+25/28
Trace; c4c7d222 [usb-ohci]sohci_return_urb+10e/118
Trace; c4c7cccd [usbcore]__kstrtab_usb_devfs_handle+1291/15c4
Trace; c4c88010 .data.end+1c51/
Trace; c4c88010 .data.end+1c51/
Trace; c4c7fe9b [usb-ohci]hc_release_ohci+4b/b0
Trace; c4c88014 .data.end+1c55/
Code;  c010f6f3 __wake_up+33/a8
 _EIP:
Code;  c010f6f3 __wake_up+33/a8   =
   0:   8b 4f 04  mov0x4(%edi),%ecx   =
Code;  c010f6f6 __wake_up+36/a8
   3:   8b 1b mov(%ebx),%ebx
Code;  c010f6f8 __wake_up+38/a8
   5:   8b 01 mov(%ecx),%eax
Code;  c010f6fa __wake_up+3a/a8
   7:   85 45 fc  test   %eax,0xfffc(%ebp)
Code;  c010f6fd __wake_up+3d/a8
   a:   74 51 je 5d _EIP+0x5d c010f750 __wake_up+90/a8
Code;  c010f6ff __wake_up+3f/a8
   c:   31 c0 xor%eax,%eax
Code;  c010f701 __wake_up+41/a8
   e:   9cpushf  
Code;  c010f702 __wake_up+42/a8
   f:   5epop%esi
Code;  c010f703 __wake_up+43/a8
  10:   facli
Code;  c010f704 __wake_up+44/a8
  11:   c7 01 00 00 00 00 movl   $0x0,(%ecx)


3 warnings issued.  Results may not be reliable.

I may have made some transcription errors, but the main stuff is there.

This Oops (and others just like it) appear when the pegasus module is reloaded
into the system. Some info on the system and the circumstances:

MediaGXLV (200 MHz) + 5530 'kahlua' companion chip
 (so this is ohci usb)
60 MB RAM (+4MB for video)

SMC 2202 (pegasus chip) 10/100tx USB NIC on a 10baseT LAN

Oops also appears on 2.4.4

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est.  ]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: * Re: Severe trashing in 2.4.4

2001-05-01 Thread Frank de Lange

On Tue, May 01, 2001 at 04:00:53PM -0700, David S. Miller wrote:
> 
> Frank, thanks for doing all the legwork to resolve the networking
> side of this problem.

No problem...

I just diff'd the 'old' and 'new' kernel trees. The one which produced the
ravenous skb_hungry kernels was for all intents and purposed identical to the
one which produced the (working, bug_free(tm)) kernel I'm currently running...

Must be the weather...

Cheers//Frank
-- 
  W  ___
 ## o o\    /     Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



* Re: Severe trashing in 2.4.4

2001-05-01 Thread Frank de Lange

Well,

When a puzzled Alexey wondered whether the problems I was seeing with 2.4.4
might be related to a failure to execute 'make clean' before compiling the
kernel, I replied in the negative as I *always* clean up before compiling
anything. Yet, for the sake of science and such I moved the kernel tree and
started from scratch.

The problems I was seeing are no more, 2.4.4 behaves like a good kernel should.

Was it me? Was it reiserfs? Was is divine intervention? I will probably never
find out, but for now this thread, and the accompanying scare, can Resquiam In
Paces.

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



* Re: Severe trashing in 2.4.4

2001-05-01 Thread Frank de Lange

Well,

When a puzzled Alexey wondered whether the problems I was seeing with 2.4.4
might be related to a failure to execute 'make clean' before compiling the
kernel, I replied in the negative as I *always* clean up before compiling
anything. Yet, for the sake of science and such I moved the kernel tree and
started from scratch.

The problems I was seeing are no more, 2.4.4 behaves like a good kernel should.

Was it me? Was it reiserfs? Was is divine intervention? I will probably never
find out, but for now this thread, and the accompanying scare, can Resquiam In
Paces.

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est.  ]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: * Re: Severe trashing in 2.4.4

2001-05-01 Thread Frank de Lange

On Tue, May 01, 2001 at 04:00:53PM -0700, David S. Miller wrote:
 
 Frank, thanks for doing all the legwork to resolve the networking
 side of this problem.

No problem...

I just diff'd the 'old' and 'new' kernel trees. The one which produced the
ravenous skb_hungry kernels was for all intents and purposed identical to the
one which produced the (working, bug_free(tm)) kernel I'm currently running...

Must be the weather...

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est.  ]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Severe trashing in 2.4.4

2001-04-29 Thread Frank de Lange

On Sun, Apr 29, 2001 at 04:45:00PM -0700, David S. Miller wrote:
> 
> Frank de Lange writes:
>  > What do you want me to check for? /proc/net/netstat is a rather busy place...
> 
> Just show us the contents after you reproduce the problem.
> We just want to see if a certain event if being triggered.

Hm, 'twould be nice to know WHAT to look for (if only for educational
purposes), but ok:

 http://www.unternet.org/~frank/projects/linux2404/2404-meminfo/

it contains an extra set of files, named p_n_netstat.*. Same as before, the
.diff contains one-second interval diffs.

Cheers//Frank
-- 
  W  ___
 ## o o\    / Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Severe trashing in 2.4.4

2001-04-29 Thread Frank de Lange

On Mon, Apr 30, 2001 at 12:06:52AM +0200, Manfred Spraul wrote:
> You could enable STATS in mm/slab.c, then the number of alloc and free
> calls would be printed in /proc/slabinfo.
> 
> > Yeah, those as well. I kinda guessed they were related...
> 
> Could you check /proc/sys/net/core/hot_list_length and skb_head_pool
> (not available in /proc, use gdb --core /proc/kcore)? I doubt that this
> causes your problems, but the skb_head code uses a special per-cpu
> linked list for even faster allocations.
> 
> Which network card do you use? Perhaps a bug in the zero-copy code of
> the driver?

I'll give it a go once I reboot into 2.4.4 again (now in 2.4.3 to get some
'work' done). Using the dreaded ne2k cards (two of them), which have caused me
more than one headache already...

I'll have a look at the driver for these cards.

Cheers//Frank

-- 
  W  ___________
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Severe trashing in 2.4.4

2001-04-29 Thread Frank de Lange

On Sun, Apr 29, 2001 at 01:58:52PM -0400, Alexander Viro wrote:
> Hmm... I'd say that you also have a leak in kmalloc()'ed stuff - something
> in 1K--2K range. From your logs it looks like the thing never shrinks and
> grows prettu fast...

Same goes for buffer_head:

buffer_head44236  48520 96 1188 12131 :  252  126

quite high I think. 2.4.3 shows this, after about the same time and activity:

buffer_head  891   2880 96   72   721 :  252  126

Cheers//Frank

-- 
  W  ___
 ## o o\    /     Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Severe trashing in 2.4.4

2001-04-29 Thread Frank de Lange

On Sun, Apr 29, 2001 at 01:58:52PM -0400, Alexander Viro wrote:
> Hmm... I'd say that you also have a leak in kmalloc()'ed stuff - something
> in 1K--2K range. From your logs it looks like the thing never shrinks and
> grows prettu fast...

Yeah, those as well. I kinda guessed they were related...

Cheers//Frank
-- 
  W  ___
 ## o o\    /     Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Severe trashing in 2.4.4

2001-04-29 Thread Frank de Lange
  499928 kB
SwapFree:   461132 kB

And to top-10 memury hogs:

 892 54696  2279 /usr/bin/X11/XFree86 -depth 16 -gamma 1.6 -auth /var/lib/gdm/:0
 632  2932 11363 ps -ax -o rss,vsz,pid,command
 600  8988  2785 gnome-terminal -t [EMAIL PROTECTED]
 368  7660  2685 multiload_applet --activate-goad-server multiload_applet --goad
 312  2100  4731 top
 308  7528  2675 gnomexmms --activate-goad-server gnomexmms --goad-fd 10
 244  7660  2701 multiload_applet --activate-goad-server multiload_applet --goad
 240  7436  2682 asclock_applet --activate-goad-server asclock_applet --goad-fd 
   4 11740  1110 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --user=my
   4 11740  1109 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --user=my

I've got a ton of logging from /proc/slabinfo, one entry a second. If someone
wants to peruse it, you can find it here:

 http://www.unternet.org/~frank/projects/linux2404/2404-meminfo/

 The .diff files are diffs between 'current' and 'previous' (one second
interval) snapshots. slabinfo and meminfo are self-explanatory I guess. The
'memhogs' entry is the top-10 memory users list for each second of logging.

Cheers//Frank
-- 
  W  ___
 ## o o\    /     Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Severe trashing in 2.4.4

2001-04-29 Thread Frank de Lange
 kB
SwapFree:   464420 kB

[frank@behemoth mp3]$ ps -xao rss,vsz,pid,command|sort -rn|head
2244 55304  1310 /usr/bin/X11/XFree86 -depth 16 -gamma 1.6 -auth /var/lib/gdm/:0
1644  5484  1401 sawfish --sm-client-id 11c0a801059849521860010240115 --
1252  9008  1438 gnome-terminal -t [EMAIL PROTECTED]
1172  2924  1796 ps -xao rss,vsz,pid,command
 956  7656  1413 tasklist_applet --activate-goad-server tasklist_applet --goad-f
 944  8388  1696 gnome-terminal --tclass=Remote -x ssh -v ostrogoth.localnet
 776  7588  1411 deskguide_applet --activate-goad-server deskguide_applet --goad
 556  3012  1797 sort -rn
 504  7436  1419 asclock_applet --activate-goad-server asclock_applet --goad-fd 
 464  8356  1405 panel --sm-config-prefix /panel.d/default-ZTNCVS/ --sm-client-i

 [ system just started thrashing again, had to sysrq-reboot ]

So, there's something wrong here... Wish I knew what...

2.4.3 runs fine on the same box with the same apps. 

Any clues?

Cheers//Frank
-- 
  W  ___
 ## o o\/     Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Severe trashing in 2.4.4

2001-04-29 Thread Frank de Lange
SwapFree:   464420 kB

[frank@behemoth mp3]$ ps -xao rss,vsz,pid,command|sort -rn|head
2244 55304  1310 /usr/bin/X11/XFree86 -depth 16 -gamma 1.6 -auth /var/lib/gdm/:0
1644  5484  1401 sawfish --sm-client-id 11c0a801059849521860010240115 --
1252  9008  1438 gnome-terminal -t [EMAIL PROTECTED]
1172  2924  1796 ps -xao rss,vsz,pid,command
 956  7656  1413 tasklist_applet --activate-goad-server tasklist_applet --goad-f
 944  8388  1696 gnome-terminal --tclass=Remote -x ssh -v ostrogoth.localnet
 776  7588  1411 deskguide_applet --activate-goad-server deskguide_applet --goad
 556  3012  1797 sort -rn
 504  7436  1419 asclock_applet --activate-goad-server asclock_applet --goad-fd 
 464  8356  1405 panel --sm-config-prefix /panel.d/default-ZTNCVS/ --sm-client-i

 [ system just started thrashing again, had to sysrq-reboot ]

So, there's something wrong here... Wish I knew what...

2.4.3 runs fine on the same box with the same apps. 

Any clues?

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est.  ]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Severe trashing in 2.4.4

2001-04-29 Thread Frank de Lange
SwapFree:   461132 kB

And to top-10 memury hogs:

 892 54696  2279 /usr/bin/X11/XFree86 -depth 16 -gamma 1.6 -auth /var/lib/gdm/:0
 632  2932 11363 ps -ax -o rss,vsz,pid,command
 600  8988  2785 gnome-terminal -t [EMAIL PROTECTED]
 368  7660  2685 multiload_applet --activate-goad-server multiload_applet --goad
 312  2100  4731 top
 308  7528  2675 gnomexmms --activate-goad-server gnomexmms --goad-fd 10
 244  7660  2701 multiload_applet --activate-goad-server multiload_applet --goad
 240  7436  2682 asclock_applet --activate-goad-server asclock_applet --goad-fd 
   4 11740  1110 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --user=my
   4 11740  1109 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --user=my

I've got a ton of logging from /proc/slabinfo, one entry a second. If someone
wants to peruse it, you can find it here:

 http://www.unternet.org/~frank/projects/linux2404/2404-meminfo/

 The .diff files are diffs between 'current' and 'previous' (one second
interval) snapshots. slabinfo and meminfo are self-explanatory I guess. The
'memhogs' entry is the top-10 memory users list for each second of logging.

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est.  ]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Severe trashing in 2.4.4

2001-04-29 Thread Frank de Lange

On Sun, Apr 29, 2001 at 01:58:52PM -0400, Alexander Viro wrote:
 Hmm... I'd say that you also have a leak in kmalloc()'ed stuff - something
 in 1K--2K range. From your logs it looks like the thing never shrinks and
 grows prettu fast...

Yeah, those as well. I kinda guessed they were related...

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est.  ]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Severe trashing in 2.4.4

2001-04-29 Thread Frank de Lange

On Sun, Apr 29, 2001 at 01:58:52PM -0400, Alexander Viro wrote:
 Hmm... I'd say that you also have a leak in kmalloc()'ed stuff - something
 in 1K--2K range. From your logs it looks like the thing never shrinks and
 grows prettu fast...

Same goes for buffer_head:

buffer_head44236  48520 96 1188 12131 :  252  126

quite high I think. 2.4.3 shows this, after about the same time and activity:

buffer_head  891   2880 96   72   721 :  252  126

Cheers//Frank

-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est.  ]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Severe trashing in 2.4.4

2001-04-29 Thread Frank de Lange

On Mon, Apr 30, 2001 at 12:06:52AM +0200, Manfred Spraul wrote:
 You could enable STATS in mm/slab.c, then the number of alloc and free
 calls would be printed in /proc/slabinfo.
 
  Yeah, those as well. I kinda guessed they were related...
 
 Could you check /proc/sys/net/core/hot_list_length and skb_head_pool
 (not available in /proc, use gdb --core /proc/kcore)? I doubt that this
 causes your problems, but the skb_head code uses a special per-cpu
 linked list for even faster allocations.
 
 Which network card do you use? Perhaps a bug in the zero-copy code of
 the driver?

I'll give it a go once I reboot into 2.4.4 again (now in 2.4.3 to get some
'work' done). Using the dreaded ne2k cards (two of them), which have caused me
more than one headache already...

I'll have a look at the driver for these cards.

Cheers//Frank

-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est.  ]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Severe trashing in 2.4.4

2001-04-29 Thread Frank de Lange

On Sun, Apr 29, 2001 at 04:45:00PM -0700, David S. Miller wrote:
 
 Frank de Lange writes:
   What do you want me to check for? /proc/net/netstat is a rather busy place...
 
 Just show us the contents after you reproduce the problem.
 We just want to see if a certain event if being triggered.

Hm, 'twould be nice to know WHAT to look for (if only for educational
purposes), but ok:

 http://www.unternet.org/~frank/projects/linux2404/2404-meminfo/

it contains an extra set of files, named p_n_netstat.*. Same as before, the
.diff contains one-second interval diffs.

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est.  ]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Network error persists in 2.4.4

2001-04-28 Thread Frank de Lange

> (on problems with ne2k-pci on SMP-systems)

Seems you're experiencing the effects of the infamous IO-APIC problem
('erratum' in Intel-lingo). There's a patch for these problems by Maciej W.
Rozycki, which should (IMnsHO) really be accepted into the main kernel tree
since many people are experiencing these problems and the patch fixes them
quite well.

The patch has been submitted to the list several times now, but I'll do it
again. (attached to this message...)

Cheers//Frank
-- 
  W  ___
 ## o o\/     Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]


diff -up --recursive --new-file linux-2.4.1.macro/arch/i386/kernel/apic.c 
linux-2.4.1/arch/i386/kernel/apic.c
--- linux-2.4.1.macro/arch/i386/kernel/apic.c   Wed Dec 13 23:54:27 2000
+++ linux-2.4.1/arch/i386/kernel/apic.c Mon Feb 12 16:11:15 2001
@@ -23,6 +23,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -270,7 +271,13 @@ void __init setup_local_APIC (void)
 *   PCI Ne2000 networking cards and PII/PIII processors, dual
 *   BX chipset. ]
 */
-#if 0
+   /*
+* Actually disabling the focus CPU check just makes the hang less
+* frequent as it makes the interrupt distributon model be more
+* like LRU than MRU (the short-term load is more even across CPUs).
+* See also the comment in end_level_ioapic_irq().  --macro
+*/
+#if 1
/* Enable focus processor (bit==0) */
value &= ~(1<<9);
 #else
@@ -764,7 +771,7 @@ asmlinkage void smp_error_interrupt(void
apic_write(APIC_ESR, 0);
v1 = apic_read(APIC_ESR);
ack_APIC_irq();
-   irq_err_count++;
+   atomic_inc(_err_count);
 
/* Here is what the APIC error bits mean:
   0: Send CS error
diff -up --recursive --new-file linux-2.4.1.macro/arch/i386/kernel/i8259.c 
linux-2.4.1/arch/i386/kernel/i8259.c
--- linux-2.4.1.macro/arch/i386/kernel/i8259.c  Mon Nov 20 18:01:58 2000
+++ linux-2.4.1/arch/i386/kernel/i8259.cSun Feb 11 19:54:33 2001
@@ -12,6 +12,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -321,7 +322,7 @@ spurious_8259A_irq:
printk("spurious 8259A interrupt: IRQ%d.\n", irq);
spurious_irq_mask |= irqmask;
}
-   irq_err_count++;
+   atomic_inc(_err_count);
/*
 * Theoretically we do not have to handle this IRQ,
 * but in Linux this does not cause problems and is
diff -up --recursive --new-file linux-2.4.1.macro/arch/i386/kernel/io_apic.c 
linux-2.4.1/arch/i386/kernel/io_apic.c
--- linux-2.4.1.macro/arch/i386/kernel/io_apic.cSat Feb  3 12:05:49 2001
+++ linux-2.4.1/arch/i386/kernel/io_apic.c  Tue Feb 13 19:59:55 2001
@@ -33,6 +33,8 @@
 #include 
 #include 
 
+#define APIC_LOCKUP_DEBUG
+
 static spinlock_t ioapic_lock = SPIN_LOCK_UNLOCKED;
 
 /*
@@ -122,8 +124,14 @@ static void add_pin_to_irq(unsigned int 
static void name##_IO_APIC_irq (unsigned int irq)   \
__DO_ACTION(R, ACTION, FINAL)
 
-DO_ACTION( __mask,0, |= 0x0001, io_apic_sync(entry->apic))/* mask = 1 */
-DO_ACTION( __unmask,  0, &= 0xfffe, )  /* mask = 0 */
+DO_ACTION( __mask, 0, |= 0x0001, io_apic_sync(entry->apic) )
+   /* mask = 1 */
+DO_ACTION( __unmask,   0, &= 0xfffe, )
+   /* mask = 0 */
+DO_ACTION( __mask_and_edge,0, = (reg & 0x7fff) | 0x0001, )
+   /* mask = 1, trigger = 0 */
+DO_ACTION( __unmask_and_level, 0, = (reg & 0xfffe) | 0x8000, )
+   /* mask = 0, trigger = 1 */
 
 static void mask_IO_APIC_irq (unsigned int irq)
 {
@@ -847,6 +855,8 @@ void /*__init*/ print_local_APIC(void * 
 
v = apic_read(APIC_EOI);
printk(KERN_DEBUG "... APIC EOI: %08x\n", v);
+   v = apic_read(APIC_RRR);
+   printk(KERN_DEBUG "... APIC RRR: %08x\n", v);
v = apic_read(APIC_LDR);
printk(KERN_DEBUG "... APIC LDR: %08x\n", v);
v = apic_read(APIC_DFR);
@@ -1191,12 +1201,61 @@ static unsigned int startup_level_ioapic
 #define enable_level_ioapic_irqunmask_IO_APIC_irq
 #define disable_level_ioapic_irq   mask_IO_APIC_irq
 
-static void end_level_ioapic_irq (unsigned int i)
+static void end_level_ioapic_irq (unsigned int irq)
 {
+   unsigned long v;
+
+/*
+ * It appears there is an erratum which a

Re: Network error persists in 2.4.4

2001-04-28 Thread Frank de Lange

 (on problems with ne2k-pci on SMP-systems)

Seems you're experiencing the effects of the infamous IO-APIC problem
('erratum' in Intel-lingo). There's a patch for these problems by Maciej W.
Rozycki, which should (IMnsHO) really be accepted into the main kernel tree
since many people are experiencing these problems and the patch fixes them
quite well.

The patch has been submitted to the list several times now, but I'll do it
again. (attached to this message...)

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est.  ]


diff -up --recursive --new-file linux-2.4.1.macro/arch/i386/kernel/apic.c 
linux-2.4.1/arch/i386/kernel/apic.c
--- linux-2.4.1.macro/arch/i386/kernel/apic.c   Wed Dec 13 23:54:27 2000
+++ linux-2.4.1/arch/i386/kernel/apic.c Mon Feb 12 16:11:15 2001
@@ -23,6 +23,7 @@
 #include linux/mc146818rtc.h
 #include linux/kernel_stat.h
 
+#include asm/atomic.h
 #include asm/smp.h
 #include asm/mtrr.h
 #include asm/mpspec.h
@@ -270,7 +271,13 @@ void __init setup_local_APIC (void)
 *   PCI Ne2000 networking cards and PII/PIII processors, dual
 *   BX chipset. ]
 */
-#if 0
+   /*
+* Actually disabling the focus CPU check just makes the hang less
+* frequent as it makes the interrupt distributon model be more
+* like LRU than MRU (the short-term load is more even across CPUs).
+* See also the comment in end_level_ioapic_irq().  --macro
+*/
+#if 1
/* Enable focus processor (bit==0) */
value = ~(19);
 #else
@@ -764,7 +771,7 @@ asmlinkage void smp_error_interrupt(void
apic_write(APIC_ESR, 0);
v1 = apic_read(APIC_ESR);
ack_APIC_irq();
-   irq_err_count++;
+   atomic_inc(irq_err_count);
 
/* Here is what the APIC error bits mean:
   0: Send CS error
diff -up --recursive --new-file linux-2.4.1.macro/arch/i386/kernel/i8259.c 
linux-2.4.1/arch/i386/kernel/i8259.c
--- linux-2.4.1.macro/arch/i386/kernel/i8259.c  Mon Nov 20 18:01:58 2000
+++ linux-2.4.1/arch/i386/kernel/i8259.cSun Feb 11 19:54:33 2001
@@ -12,6 +12,7 @@
 #include linux/init.h
 #include linux/kernel_stat.h
 
+#include asm/atomic.h
 #include asm/system.h
 #include asm/io.h
 #include asm/irq.h
@@ -321,7 +322,7 @@ spurious_8259A_irq:
printk(spurious 8259A interrupt: IRQ%d.\n, irq);
spurious_irq_mask |= irqmask;
}
-   irq_err_count++;
+   atomic_inc(irq_err_count);
/*
 * Theoretically we do not have to handle this IRQ,
 * but in Linux this does not cause problems and is
diff -up --recursive --new-file linux-2.4.1.macro/arch/i386/kernel/io_apic.c 
linux-2.4.1/arch/i386/kernel/io_apic.c
--- linux-2.4.1.macro/arch/i386/kernel/io_apic.cSat Feb  3 12:05:49 2001
+++ linux-2.4.1/arch/i386/kernel/io_apic.c  Tue Feb 13 19:59:55 2001
@@ -33,6 +33,8 @@
 #include asm/smp.h
 #include asm/desc.h
 
+#define APIC_LOCKUP_DEBUG
+
 static spinlock_t ioapic_lock = SPIN_LOCK_UNLOCKED;
 
 /*
@@ -122,8 +124,14 @@ static void add_pin_to_irq(unsigned int 
static void name##_IO_APIC_irq (unsigned int irq)   \
__DO_ACTION(R, ACTION, FINAL)
 
-DO_ACTION( __mask,0, |= 0x0001, io_apic_sync(entry-apic))/* mask = 1 */
-DO_ACTION( __unmask,  0, = 0xfffe, )  /* mask = 0 */
+DO_ACTION( __mask, 0, |= 0x0001, io_apic_sync(entry-apic) )
+   /* mask = 1 */
+DO_ACTION( __unmask,   0, = 0xfffe, )
+   /* mask = 0 */
+DO_ACTION( __mask_and_edge,0, = (reg  0x7fff) | 0x0001, )
+   /* mask = 1, trigger = 0 */
+DO_ACTION( __unmask_and_level, 0, = (reg  0xfffe) | 0x8000, )
+   /* mask = 0, trigger = 1 */
 
 static void mask_IO_APIC_irq (unsigned int irq)
 {
@@ -847,6 +855,8 @@ void /*__init*/ print_local_APIC(void * 
 
v = apic_read(APIC_EOI);
printk(KERN_DEBUG ... APIC EOI: %08x\n, v);
+   v = apic_read(APIC_RRR);
+   printk(KERN_DEBUG ... APIC RRR: %08x\n, v);
v = apic_read(APIC_LDR);
printk(KERN_DEBUG ... APIC LDR: %08x\n, v);
v = apic_read(APIC_DFR);
@@ -1191,12 +1201,61 @@ static unsigned int startup_level_ioapic
 #define enable_level_ioapic_irqunmask_IO_APIC_irq
 #define disable_level_ioapic_irq   mask_IO_APIC_irq
 
-static void end_level_ioapic_irq (unsigned int i)
+static void end_level_ioapic_irq (unsigned int irq

Re: 2.4.3: still experiencing APIC-related hangs

2001-03-30 Thread Frank de Lange

On Fri, Mar 30, 2001 at 08:32:39AM -0800, [EMAIL PROTECTED] wrote:
> On Fri, Mar 30, 2001 at 02:32:24PM +0200, Frank de Lange wrote:
> > 
> > Maciej, did you submit the patch to Linus? It really seems to solve the
> > (occurence of the) problems with these boards...
> 
> Where is this patch found? I am not seeing it so far on kernel.org.

It is allmost ancient history, from days long gone when men were men, women
were women and Linux had only reached 2.4.1...

I can send you a copy, if you need it...

Cheers//Frank
-- 
  W  _______
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



2.4.3: still experiencing APIC-related hangs

2001-03-30 Thread Frank de Lange

Hi'all,

Subject says it all: 2.4.3 (unpatchaed) is still causing the dreaded
APIC-related hangs on SMP BX systems (Abit BP-6, maybe Gigabyte). I still need
to apply one of Maciej's patches to get rid of these hangs. The source comments
in arc/i386/kernel/apic.c ("If focus CPU is disabled then the hang goes away")
are incorrect, as the hang does not go away by simply disabling focus CPU. The
only way for me to get rid of the hangs is to apply patch-2.4.1-io_apic-46
(which does the LEVEL->EDGE->LEVEL triggered trick to 'free' the IO_APIC). I've
been running with this patch for quite some time now, and have not experienced
any problems with it. Maybe it it time to include it in the main kernel,
perhaps as a configurable option ("BROKEN_IO_APIC")?

Maciej, did you submit the patch to Linus? It really seems to solve the
(occurence of the) problems with these boards...

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



2.4.3: still experiencing APIC-related hangs

2001-03-30 Thread Frank de Lange

Hi'all,

Subject says it all: 2.4.3 (unpatchaed) is still causing the dreaded
APIC-related hangs on SMP BX systems (Abit BP-6, maybe Gigabyte). I still need
to apply one of Maciej's patches to get rid of these hangs. The source comments
in arc/i386/kernel/apic.c ("If focus CPU is disabled then the hang goes away")
are incorrect, as the hang does not go away by simply disabling focus CPU. The
only way for me to get rid of the hangs is to apply patch-2.4.1-io_apic-46
(which does the LEVEL-EDGE-LEVEL triggered trick to 'free' the IO_APIC). I've
been running with this patch for quite some time now, and have not experienced
any problems with it. Maybe it it time to include it in the main kernel,
perhaps as a configurable option ("BROKEN_IO_APIC")?

Maciej, did you submit the patch to Linus? It really seems to solve the
(occurence of the) problems with these boards...

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.3: still experiencing APIC-related hangs

2001-03-30 Thread Frank de Lange

On Fri, Mar 30, 2001 at 08:32:39AM -0800, [EMAIL PROTECTED] wrote:
 On Fri, Mar 30, 2001 at 02:32:24PM +0200, Frank de Lange wrote:
  
  Maciej, did you submit the patch to Linus? It really seems to solve the
  (occurence of the) problems with these boards...
 
 Where is this patch found? I am not seeing it so far on kernel.org.

It is allmost ancient history, from days long gone when men were men, women
were women and Linux had only reached 2.4.1...

I can send you a copy, if you need it...

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Linux 2.4.2-ac21

2001-03-22 Thread Frank de Lange

Oops...

Linux 2.4.2-ac21 does not like my box, or the other way around:

loading the agpgart module (MGA G400 AGP) -> system hangs
loading the SCSI module (53c875) -> system hangs

In both cases, the magic SysRq sequence does not work, but it is still possible
to ping the box from the outside. Connecting to it (ssh) does not work,
however. I backed out both the SCSI driver patches as well as the agpgart
patches, but this did not fix the symptoms. Looks more like a module-loading
related issue, but I have not found it yet.

All this on an SMP (Abit BP6) box by the way...

The changes which introduced these symptoms have occured somewhere between -ac7
and -ac21, since -ac7 DID run on the same hardware.

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Linux 2.4.2-ac21

2001-03-22 Thread Frank de Lange

Oops...

Linux 2.4.2-ac21 does not like my box, or the other way around:

loading the agpgart module (MGA G400 AGP) - system hangs
loading the SCSI module (53c875) - system hangs

In both cases, the magic SysRq sequence does not work, but it is still possible
to ping the box from the outside. Connecting to it (ssh) does not work,
however. I backed out both the SCSI driver patches as well as the agpgart
patches, but this did not fix the symptoms. Looks more like a module-loading
related issue, but I have not found it yet.

All this on an SMP (Abit BP6) box by the way...

The changes which introduced these symptoms have occured somewhere between -ac7
and -ac21, since -ac7 DID run on the same hardware.

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile

2001-02-19 Thread Frank de Lange

On Sat, Feb 17, 2001 at 06:18:46PM -0800, David wrote:
> > Well, I run glibc-2.2.1 as well, so that might be one of the factors
> > contributing to this. Then again, glibc-2.2.1 with ext2 does not cause any
> > problems whatsoever with mozilla. So it could be that reiserfs + glibc-2.2.1 is
> > a bad combination, question remains which of these two is the culprit (if not
> > both). Since glibc-2.2.2 is out, I will give that a try as well. Not tonight
> > though...

FYI

I'm running glibc-2.2.2 now, and alas... Mozilla still refuses to be compiled,
no change...

Cheers//Frank
-- 
  W  ___________
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile

2001-02-19 Thread Frank de Lange

On Sat, Feb 17, 2001 at 06:18:46PM -0800, David wrote:
  Well, I run glibc-2.2.1 as well, so that might be one of the factors
  contributing to this. Then again, glibc-2.2.1 with ext2 does not cause any
  problems whatsoever with mozilla. So it could be that reiserfs + glibc-2.2.1 is
  a bad combination, question remains which of these two is the culprit (if not
  both). Since glibc-2.2.2 is out, I will give that a try as well. Not tonight
  though...

FYI

I'm running glibc-2.2.2 now, and alas... Mozilla still refuses to be compiled,
no change...

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile

2001-02-18 Thread Frank de Lange

> Minor nit, but I'd rather clear it up now. Which distribution you run
> doesn't matter for debugging. What does matter is that we've got known
> problems with a given compiler, and that compiler goes by a few different
> flavors with the same version number. Since there are known problems, if
> you don't provide the compiler version, I'll ask. If your bug is *really*
> odd, I might ask a few different ways, just to make sure you give the same
> answer every time ;-)

Well, a nit to a nit... In my experience it surely matters which distribution
somebody runs, since that tells a lot about the basic system (libc, probable
compiler, binutils, etc). RH7 is broken in many respects. Since it uses
glibc-2.2 as well, I usually add the notice that I do NOT run RH7 to messages
like these where I mention I use glibc-2.2.x, if only to ward off the usual
'are you running RH7 if yes please upgrade so and so' cycle. Bits and electrons
are much to precious to waste on
useless banter like that...

Cheers//Frank
-- 
  W  ___
 ## o o\    / Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile

2001-02-18 Thread Frank de Lange

 Minor nit, but I'd rather clear it up now. Which distribution you run
 doesn't matter for debugging. What does matter is that we've got known
 problems with a given compiler, and that compiler goes by a few different
 flavors with the same version number. Since there are known problems, if
 you don't provide the compiler version, I'll ask. If your bug is *really*
 odd, I might ask a few different ways, just to make sure you give the same
 answer every time ;-)

Well, a nit to a nit... In my experience it surely matters which distribution
somebody runs, since that tells a lot about the basic system (libc, probable
compiler, binutils, etc). RH7 is broken in many respects. Since it uses
glibc-2.2 as well, I usually add the notice that I do NOT run RH7 to messages
like these where I mention I use glibc-2.2.x, if only to ward off the usual
'are you running RH7 if yes please upgrade so and so' cycle. Bits and electrons
are much to precious to waste on
useless banter like that...

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile

2001-02-17 Thread Frank de Lange

On Sat, Feb 17, 2001 at 05:47:49PM -0800, David wrote:
> I can say "me too" for this.  I thought it was perhaps glibc or binutils 
> tho.  I only have reiserfs systems now so I don't have a basis for 
> comparison.
> 
> However I -can- say that I didn't experience this until I put glibc 
> 2.2.1 on my systems.  I do use an "approved" gcc, stock 2.95.2.
> 
> I wouldn't be so quick to pin it on reiserfs.

Well, I run glibc-2.2.1 as well, so that might be one of the factors
contributing to this. Then again, glibc-2.2.1 with ext2 does not cause any
problems whatsoever with mozilla. So it could be that reiserfs + glibc-2.2.1 is
a bad combination, question remains which of these two is the culprit (if not
both). Since glibc-2.2.2 is out, I will give that a try as well. Not tonight
though...

And no, I'm not running RedHat 7.x for those who might think so (and
automatically blame everything on it).

When did you switch to glibc-2.2.1? Were you running reiserfs before that?

Cheers//Frank

-- 
  W  ___________
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile

2001-02-17 Thread Frank de Lange

On Sun, Feb 18, 2001 at 01:57:15AM +0100, Frank de Lange wrote:
> I will retry this with 'all warnings and bells and whistles' turned on in
> reiserfs (on 2.4.1-ac18), and see if anything out of the ordinary is logged. I
> somehow doubt it, since repeated forced reiserfsck's have turned up nothing at
> all...

I just ran the compile again on the described build, same results, no warnings
of any kind, nothing in the debug log facility, nothing on the console...

Reiserfs seems to believe it did the right thing. I'm here to tell you that it
didn't...

Cheers//Frank
-- 
  W  ___
 ## o o\    /     Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile

2001-02-17 Thread Frank de Lange

>  At least the patch didn't make it worse. Would anyone care to comment on
>  how the elf-dynstr-gc option changes the file access patterns for the
>  compile?

It does not change the file access patterns, it adds an extra step. A separate
binary (dist/bin/elf-dynstr-gc, a convoluted version of strip) is run over the
final (linked) library/executable to remove some symbol info. The elf-dynstr-gc
program is compiled as part of the mozilla build. There's nothing wrong with
elf-dynstr-gc on the reiserfs filesystem, it is identical to the one on the
ext2 partition. Running the 'reiserfs' version on the ext2 tree works as it
should, running the ext2 version on the reiserfs tree crashes (seems the
program is not very robust, as it does not detect garbled input files). As
said, running objdump on the corrupted (reiserfs compiled) library also
produces errors.

Cheers//Frank

-- 
  W  ___
 ## o o\    /     Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile

2001-02-17 Thread Frank de Lange

> That's not good. Which compiler did you use to compile the kernel? This
> sounds lame, but reiserfs exercises the cpu/mem more than ext2, so we hit
> bad ram more often. If we run out of other things to try, please run a
> memory tester.

I use 'good old' gcc 2.95.2:

gcc -v: gcc version 2.95.2 19991024 (release)

I just tried 2.4.1-ac18, which also gave me the same segfault. When I compare
the corrupted binary (the one compile on reiserfs) to the working one (compiled
on ext2), I notice that at position 0x1000 in the file, a block of data from
position 0x0e60 is duplicated. It seems to be inserted into the data stream, as
it is followed by data which (in the working version of libsample.so) starts at
0x1000:

(bsdiff (binary sdiff) between both files)

(actually the differences between both files start much earlier, but that seems
to be just all kinds of changed relocation information as a result of the error)

(hope my careful ASCII-formatting makes it through the list and the archives)

THE BAD THE GOOD



e60  c4 20 83 c4 f4 8b 06   e60  c4 20 83 c4 f4 8b 06 
e68  8b 40 10 ff d0 eb 06   e68  8b 40 10 ff d0 eb 06 
e70  bf 0e 00 07 80 89 f8   e70  bf 0e 00 07 80 89 f8 
e78  65 e8 5b 5e 5f 89 ec   e78  65 e8 5b 5e 5f 89 ec 
e80  c3 8d 76 00 55 89 e5   e80  c3 8d 76 00 55 89 e5 
e88  c0 89 ec 5d c3 8d 76   e88  c0 89 ec 5d c3 8d 76 
e90  55 89 e5 31 c0 89 ec   e90  55 89 e5 31 c0 89 ec 



fd8  00 00 00 00 c0 00 00   fd8  00 00 00 00 c0 00 00 
fe0  00 00 00 46 80 a0 c0   fe0  00 00 00 46 80 a0 c0 
fe8  68 08 d3 11 91 5f d9   fe8  68 08 d3 11 91 5f d9 
ff0  89 d4 8e 3c 40 92 89   ff0  89 d4 8e 3c 40 92 89 
ff8  d2 f9 d2 11 bd d6 00   ff8  d2 f9 d2 11 bd d6 00 

LOOK HERE: IDENTICAL TO THE AND THIS IS WHAT IT SHOULD
DATA AT e60 LOOK LIKE...

0001000  c4 20 83 c4 f4 8b 06 | 0001000  64 65 73 74 86 52 38 
0001008  8b 40 10 ff d0 eb 06 | 0001008  c4 cb d2 11 8c ca 00 
0001010  bf 0e 00 07 80 89 f8 | 0001010  b0 fc 14 a3 a0 58 f1 
0001018  65 e8 5b 5e 5f 89 ec | 0001018  dd ca d2 11 8c ca 00 



0001190  89 d4 8e 3c 40 92 89 <
0001198  d2 f9 d2 11 bd d6 00 <

AND HERE THE 'GOOD' DATA STARTS
AGAIN, THIS BLOCK IS IDENTICAL TO
THE ONE AT 0x1000 IN THE 'GOOD' FILE

00011a0  64 65 73 74 86 52 38 <
00011a8  c4 cb d2 11 8c ca 00 <
00011b0  b0 fc 14 a3 a0 58 f1 <
00011b8  dd ca d2 11 8c ca 00 <
00011c0  b0 fc 14 a3 40 a7 58 <
00011c8  dc d5 d2 11 92 fb 00 <



So, it seems a wrong block of data was inserted into the stream at position
0x1000, wreaking havoc on the file structure. Now 0x1000 is kind of a magic
number, isn't it? Alsmost to good to be true...

I will retry this with 'all warnings and bells and whistles' turned on in
reiserfs (on 2.4.1-ac18), and see if anything out of the ordinary is logged. I
somehow doubt it, since repeated forced reiserfsck's have turned up nothing at
all...

Oh, and both my own and my computer's memory is OK, so this is not a hardware
fault... :-)

By the way, /tmp (where most action is taking place when compiling) is hosted
on a good ext2 filesystem. Just in case you wondered...

And, also of interest, I'm using an SMP box (BP6, 2 non overclocked Celeron
466s)

Cheers//Frank

-- 
  W  ___________
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile

2001-02-17 Thread Frank de Lange

Hi'all,

Well, subject says it all... When I try to compile mozilla (CVS version) with
the '--enable-elf-dynstr-gc' option, the compile fails with a segfault:

../../dist/bin/elf-dynstr-gc ../../dist/lib/components/libsample.so
make[2]: *** [install] Segmentation fault (core dumped)

compiling the same codebase on an ext2 filesystem does not produce this
segfault. When I compare the produced library (libsample.so), there is a
consistent difference between the one compile on the reiserfs and the ext2
filesystem. Running objdump on the reiserfs-compiled library also produces
errors (some assertion failures, a lot of 'invalid string offset' errors, and
finally a 'Memory exhausted' error), while objdump happily disassebles the
ext-produced binary.

These problems occur on:

 2.4.1
 2.4.2-pre4
 2.4.2-pre4 with Chris Mason's 'reiserfs fix for null bytes in small files'

So, there's something quite wrong here.

If anyone wants me to try something, do tell...

Cheers//Frank

-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile

2001-02-17 Thread Frank de Lange

Hi'all,

Well, subject says it all... When I try to compile mozilla (CVS version) with
the '--enable-elf-dynstr-gc' option, the compile fails with a segfault:

../../dist/bin/elf-dynstr-gc ../../dist/lib/components/libsample.so
make[2]: *** [install] Segmentation fault (core dumped)

compiling the same codebase on an ext2 filesystem does not produce this
segfault. When I compare the produced library (libsample.so), there is a
consistent difference between the one compile on the reiserfs and the ext2
filesystem. Running objdump on the reiserfs-compiled library also produces
errors (some assertion failures, a lot of 'invalid string offset' errors, and
finally a 'Memory exhausted' error), while objdump happily disassebles the
ext-produced binary.

These problems occur on:

 2.4.1
 2.4.2-pre4
 2.4.2-pre4 with Chris Mason's 'reiserfs fix for null bytes in small files'

So, there's something quite wrong here.

If anyone wants me to try something, do tell...

Cheers//Frank

-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile

2001-02-17 Thread Frank de Lange

 That's not good. Which compiler did you use to compile the kernel? This
 sounds lame, but reiserfs exercises the cpu/mem more than ext2, so we hit
 bad ram more often. If we run out of other things to try, please run a
 memory tester.

I use 'good old' gcc 2.95.2:

gcc -v: gcc version 2.95.2 19991024 (release)

I just tried 2.4.1-ac18, which also gave me the same segfault. When I compare
the corrupted binary (the one compile on reiserfs) to the working one (compiled
on ext2), I notice that at position 0x1000 in the file, a block of data from
position 0x0e60 is duplicated. It seems to be inserted into the data stream, as
it is followed by data which (in the working version of libsample.so) starts at
0x1000:

(bsdiff (binary sdiff) between both files)

(actually the differences between both files start much earlier, but that seems
to be just all kinds of changed relocation information as a result of the error)

(hope my careful ASCII-formatting makes it through the list and the archives)

THE BAD THE GOOD

deletia, a lot of uninteresting data...

e60  c4 20 83 c4 f4 8b 06   e60  c4 20 83 c4 f4 8b 06 
e68  8b 40 10 ff d0 eb 06   e68  8b 40 10 ff d0 eb 06 
e70  bf 0e 00 07 80 89 f8   e70  bf 0e 00 07 80 89 f8 
e78  65 e8 5b 5e 5f 89 ec   e78  65 e8 5b 5e 5f 89 ec 
e80  c3 8d 76 00 55 89 e5   e80  c3 8d 76 00 55 89 e5 
e88  c0 89 ec 5d c3 8d 76   e88  c0 89 ec 5d c3 8d 76 
e90  55 89 e5 31 c0 89 ec   e90  55 89 e5 31 c0 89 ec 

deletia, a lot of uninteresting data...

fd8  00 00 00 00 c0 00 00   fd8  00 00 00 00 c0 00 00 
fe0  00 00 00 46 80 a0 c0   fe0  00 00 00 46 80 a0 c0 
fe8  68 08 d3 11 91 5f d9   fe8  68 08 d3 11 91 5f d9 
ff0  89 d4 8e 3c 40 92 89   ff0  89 d4 8e 3c 40 92 89 
ff8  d2 f9 d2 11 bd d6 00   ff8  d2 f9 d2 11 bd d6 00 

LOOK HERE: IDENTICAL TO THE AND THIS IS WHAT IT SHOULD
DATA AT e60 LOOK LIKE...

0001000  c4 20 83 c4 f4 8b 06 | 0001000  64 65 73 74 86 52 38 
0001008  8b 40 10 ff d0 eb 06 | 0001008  c4 cb d2 11 8c ca 00 
0001010  bf 0e 00 07 80 89 f8 | 0001010  b0 fc 14 a3 a0 58 f1 
0001018  65 e8 5b 5e 5f 89 ec | 0001018  dd ca d2 11 8c ca 00 

deletia, a lot of uninteresting data...

0001190  89 d4 8e 3c 40 92 89 
0001198  d2 f9 d2 11 bd d6 00 

AND HERE THE 'GOOD' DATA STARTS
AGAIN, THIS BLOCK IS IDENTICAL TO
THE ONE AT 0x1000 IN THE 'GOOD' FILE

00011a0  64 65 73 74 86 52 38 
00011a8  c4 cb d2 11 8c ca 00 
00011b0  b0 fc 14 a3 a0 58 f1 
00011b8  dd ca d2 11 8c ca 00 
00011c0  b0 fc 14 a3 40 a7 58 
00011c8  dc d5 d2 11 92 fb 00 

deletia, a lot of uninteresting data...

So, it seems a wrong block of data was inserted into the stream at position
0x1000, wreaking havoc on the file structure. Now 0x1000 is kind of a magic
number, isn't it? Alsmost to good to be true...

I will retry this with 'all warnings and bells and whistles' turned on in
reiserfs (on 2.4.1-ac18), and see if anything out of the ordinary is logged. I
somehow doubt it, since repeated forced reiserfsck's have turned up nothing at
all...

Oh, and both my own and my computer's memory is OK, so this is not a hardware
fault... :-)

By the way, /tmp (where most action is taking place when compiling) is hosted
on a good ext2 filesystem. Just in case you wondered...

And, also of interest, I'm using an SMP box (BP6, 2 non overclocked Celeron
466s)

Cheers//Frank

-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile

2001-02-17 Thread Frank de Lange

  At least the patch didn't make it worse. Would anyone care to comment on
  how the elf-dynstr-gc option changes the file access patterns for the
  compile?

It does not change the file access patterns, it adds an extra step. A separate
binary (dist/bin/elf-dynstr-gc, a convoluted version of strip) is run over the
final (linked) library/executable to remove some symbol info. The elf-dynstr-gc
program is compiled as part of the mozilla build. There's nothing wrong with
elf-dynstr-gc on the reiserfs filesystem, it is identical to the one on the
ext2 partition. Running the 'reiserfs' version on the ext2 tree works as it
should, running the ext2 version on the reiserfs tree crashes (seems the
program is not very robust, as it does not detect garbled input files). As
said, running objdump on the corrupted (reiserfs compiled) library also
produces errors.

Cheers//Frank

-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile

2001-02-17 Thread Frank de Lange

On Sun, Feb 18, 2001 at 01:57:15AM +0100, Frank de Lange wrote:
 I will retry this with 'all warnings and bells and whistles' turned on in
 reiserfs (on 2.4.1-ac18), and see if anything out of the ordinary is logged. I
 somehow doubt it, since repeated forced reiserfsck's have turned up nothing at
 all...

I just ran the compile again on the described build, same results, no warnings
of any kind, nothing in the debug log facility, nothing on the console...

Reiserfs seems to believe it did the right thing. I'm here to tell you that it
didn't...

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile

2001-02-17 Thread Frank de Lange

On Sat, Feb 17, 2001 at 05:47:49PM -0800, David wrote:
 I can say "me too" for this.  I thought it was perhaps glibc or binutils 
 tho.  I only have reiserfs systems now so I don't have a basis for 
 comparison.
 
 However I -can- say that I didn't experience this until I put glibc 
 2.2.1 on my systems.  I do use an "approved" gcc, stock 2.95.2.
 
 I wouldn't be so quick to pin it on reiserfs.

Well, I run glibc-2.2.1 as well, so that might be one of the factors
contributing to this. Then again, glibc-2.2.1 with ext2 does not cause any
problems whatsoever with mozilla. So it could be that reiserfs + glibc-2.2.1 is
a bad combination, question remains which of these two is the culprit (if not
both). Since glibc-2.2.2 is out, I will give that a try as well. Not tonight
though...

And no, I'm not running RedHat 7.x for those who might think so (and
automatically blame everything on it).

When did you switch to glibc-2.2.1? Were you running reiserfs before that?

Cheers//Frank

-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups

2001-02-13 Thread Frank de Lange

On Tue, Feb 13, 2001 at 09:13:10PM +0100, Maciej W. Rozycki wrote:
> There is also an additional debugging/statistics counter provided in
> /proc/cpuinfo that counts interrupts which got delivered with its trigger
> mode mismatched.  Check it out to find if you get any misdelivered
> interrupts at all.

I guess you mean the MIS: counter in /proc/interrupts? This is what it says on
my box after running some 33 interrupts (at a rate of app. 900/second)
through the network/usb IRQ:

 cat /proc/interrupts 
   CPU0   CPU1   
  0:  31693  32749IO-APIC-edge  timer
  1:   1208   1174IO-APIC-edge  keyboard
  2:  0  0  XT-PIC  cascade
  3:113 26IO-APIC-edge  serial
  4:   4689   4567IO-APIC-edge  serial
 14:   4440   4545IO-APIC-edge  ide0
 15:   1911   2132IO-APIC-edge  ide1
 16:  85021  84227   IO-APIC-level  es1371, mga@PCI:1:0:0
 17: 26 26   IO-APIC-level  sym53c8xx
 18:  0  0   IO-APIC-level  btaudio, bttv
 19: 165467 166254   IO-APIC-level  eth0, eth1, usb-uhci
NMI:  64376  64376 
LOC:  64364  64362 
ERR:  0
MIS:647

So, that's about 650 misdelivered interrupts for 33 deliveries (the other
interrupts never gave me any trouble, so I guess the misdelivered ones are all
from IRQ 19), or about .2%

When I load the network and stream some audio over it, the sound becomes a bit
choppy. The MIS: counter only increases when the network (read: IRQ1() is
loaded, a single audio stream (app. 220 int/sec) causes no MISses to occur.

In general, I'd say the stability WITH the patch is good, and timeouts are
withing tolerable levels. If I need something better, I'll probably get myself
a better set of network cards...

So, quick conclusion, this seems a reasonable fix...

Cheers//Frank

-- 
  W  ___
 ## o o\    /     Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups

2001-02-13 Thread Frank de Lange

On Tue, Feb 13, 2001 at 09:13:10PM +0100, Maciej W. Rozycki wrote:
 There is also an additional debugging/statistics counter provided in
 /proc/cpuinfo that counts interrupts which got delivered with its trigger
 mode mismatched.  Check it out to find if you get any misdelivered
 interrupts at all.

I guess you mean the MIS: counter in /proc/interrupts? This is what it says on
my box after running some 33 interrupts (at a rate of app. 900/second)
through the network/usb IRQ:

 cat /proc/interrupts 
   CPU0   CPU1   
  0:  31693  32749IO-APIC-edge  timer
  1:   1208   1174IO-APIC-edge  keyboard
  2:  0  0  XT-PIC  cascade
  3:113 26IO-APIC-edge  serial
  4:   4689   4567IO-APIC-edge  serial
 14:   4440   4545IO-APIC-edge  ide0
 15:   1911   2132IO-APIC-edge  ide1
 16:  85021  84227   IO-APIC-level  es1371, mga@PCI:1:0:0
 17: 26 26   IO-APIC-level  sym53c8xx
 18:  0  0   IO-APIC-level  btaudio, bttv
 19: 165467 166254   IO-APIC-level  eth0, eth1, usb-uhci
NMI:  64376  64376 
LOC:  64364  64362 
ERR:  0
MIS:647

So, that's about 650 misdelivered interrupts for 33 deliveries (the other
interrupts never gave me any trouble, so I guess the misdelivered ones are all
from IRQ 19), or about .2%

When I load the network and stream some audio over it, the sound becomes a bit
choppy. The MIS: counter only increases when the network (read: IRQ1() is
loaded, a single audio stream (app. 220 int/sec) causes no MISses to occur.

In general, I'd say the stability WITH the patch is good, and timeouts are
withing tolerable levels. If I need something better, I'll probably get myself
a better set of network cards...

So, quick conclusion, this seems a reasonable fix...

Cheers//Frank

-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: hard crashes 2.4.0/1 with NE2K stuff

2001-02-05 Thread Frank de Lange

On Mon, Feb 05, 2001 at 07:41:11PM +, Roeland Th. Jansen wrote:
> On Mon, Feb 05, 2001 at 06:26:52PM +, Roeland Th. Jansen wrote:
> > 
> > I'll report further. an Maciej -- thanks for your work !
> 
> with the extra patch in arch/i386/kernel/apic.c:
> 
> #else
> /* Disable focus processor (bit==1) */
> value |= (1<<9);
> #endif
> 
> used, eth0 (ne2k) doesn't die anymore; no choppy sound either. we're
> currently having over 2.100.000 interrupts without a problem.

Same here (although I just changed #if 1 to #if 0 to disable focus processor
support), the net stays up and the chops are gone. 

Cheers//Frank
-- 
  WWWWW      ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: hard crashes 2.4.0/1 with NE2K stuff

2001-02-02 Thread Frank de Lange

> 2.4.1. rebuilt here and with a floodping towards my machine causes a
> hard crash where nothing works anymore.

I'm currently running 2.4.1 with Maciej's patch-2.4.0-io_apic-4. Additionally,
I disabled focus_processor in apic.c to get rid of some network delays. Flood
pings both from and to this system do not cause any problems, other than making
the streaming audio sound a bit choppy...

Box is a dual-celeron (466, non-overclocked) BP-6 with two ne2k (Winbond
W89C940 based) cards sharing an interrupt.  

Maybe that works for you as well?

Cheers//Frank

-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: hard crashes 2.4.0/1 with NE2K stuff

2001-02-02 Thread Frank de Lange

 2.4.1. rebuilt here and with a floodping towards my machine causes a
 hard crash where nothing works anymore.

I'm currently running 2.4.1 with Maciej's patch-2.4.0-io_apic-4. Additionally,
I disabled focus_processor in apic.c to get rid of some network delays. Flood
pings both from and to this system do not cause any problems, other than making
the streaming audio sound a bit choppy...

Box is a dual-celeron (466, non-overclocked) BP-6 with two ne2k (Winbond
W89C940 based) cards sharing an interrupt.  

Maybe that works for you as well?

Cheers//Frank

-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Linux Kernel Mailing List, Archive by Week: Gigabyte 6VXDC7: APGigabyte 6VXDC7: APIC error on CPU1: 08(08)

2001-01-30 Thread Frank de Lange

Heikki,

Those are the same problems I had with my Abit BP-6 SMP-board. There are a
couple of patched which seem to make the problem disappear. The jury is still
not out on whether they really solve the problem or merely hide it, but I
haven't had a crash ever since I patched my box. The most recent patch is the
one from Maciej, you can find it on the list, or in the archives (like this
one: http://boudicca.tux.org/hypermail/linux-kernel/this-week/0469.html - this
link is only valid 'till sunday!)

Unfortunately, the archives often mangle patches, so it is better to get them
directly from the list (or mail Maciej for it...)

Cheers//Frank

-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-13 Thread Frank de Lange

On Sun, Jan 14, 2001 at 12:13:58AM +, Roeland Th. Jansen wrote:
> On Fri, Jan 12, 2001 at 09:03:49PM +0100, Ingo Molnar wrote:
> > well, some time ago i had an ne2k card in an SMP system as well, and found
> > this very problem. Disabling/enabling focus-cpu appeared to make a
> > difference, but later on i made experiments that show that in both cases
> > the hang happens. I spent a good deal of time trying to fix this problem,
> > but failed - so any fresh ideas are more than welcome.
> 
> for the record. my BP6, non OC, apic smp system with ne2k fails within
> 24 hours here too. if I can be of any help. (2.4.0. kernel. no
> vmware or opensound)

You can help yourself by applying Manfred's patch to 8390.c (in preference to
my own patch to the same file). This will sove the hanging-network problem. If
your entire box hangs, that's another story which will probably not be fixed by
that patch. You can find the patch in Manfred's posting to the list from Fri
Jan 12 2001 - 14:04:24 EST.

I've been running a patched driver for more than a day now, under heavy network
load, without problems.

Frank

-- 
  W      _______
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-13 Thread Frank de Lange

On Sun, Jan 14, 2001 at 12:13:58AM +, Roeland Th. Jansen wrote:
 On Fri, Jan 12, 2001 at 09:03:49PM +0100, Ingo Molnar wrote:
  well, some time ago i had an ne2k card in an SMP system as well, and found
  this very problem. Disabling/enabling focus-cpu appeared to make a
  difference, but later on i made experiments that show that in both cases
  the hang happens. I spent a good deal of time trying to fix this problem,
  but failed - so any fresh ideas are more than welcome.
 
 for the record. my BP6, non OC, apic smp system with ne2k fails within
 24 hours here too. if I can be of any help. (2.4.0. kernel. no
 vmware or opensound)

You can help yourself by applying Manfred's patch to 8390.c (in preference to
my own patch to the same file). This will sove the hanging-network problem. If
your entire box hangs, that's another story which will probably not be fixed by
that patch. You can find the patch in Manfred's posting to the list from Fri
Jan 12 2001 - 14:04:24 EST.

I've been running a patched driver for more than a day now, under heavy network
load, without problems.

Frank

-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware

2001-01-12 Thread Frank de Lange

On Sat, Jan 13, 2001 at 02:51:54AM +0100, Manfred Spraul wrote:
> Frank de Lange wrote:
> > 
> > It could be that people using those cards are not the ones who tend
> > to go for the (somewhat tricky) BP6 board...
> > 
> 
> I doubt that it's BP6 specific: I have the problem with a Gigabyte BXD
> board and I doubt that Ingo used an BP6. Perhaps 82093AA specific (the
> IO APIC chip used for SMP 440BX board)

It isn't. But I just meant to indicate that the mere fact that I could not find
any problem-report for that combination does not indicate that there ARE no
problems...

> I can't find any spec updates for that chip: either it's the first
> perfect chip Intel ever produced, or ...

:-)

Well, the BX chipset is one of their better attempts I think...

Frank
-- 
  W      _______
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 04:56:24PM -0800, Linus Torvalds wrote:
> IDE is not my favourite example of a "known stable driver". Also, in many
> cases IDE is for historical reasons connected to an EDGE io-apic pin (ie
> it's still considered an ISA interrupt). Which probably wouldn't show this
> problem anyway.

They (ide interrupts) are indeed EDGE-triggered on my box. I have not enabled
the HPT366 (ATA66) controller on this board, so I can not tell if that
controller is EDGE-triggered as well.

> Also, IDE doesn't generate all that many interrupts. You can make a
> network driver do a _lot_ more interrupts than just about any disk driver
> by simply sending/receiving a lot of packets. With disks it is very hard
> to get the same kind of irq load - Linux will merge the requests and do at
> least 1kB worth of transfer per interrupt etc. On a ne2k 100Mbps PCI card,
> you can probably _easily_ generate a much higher stream of interrupts.

There's sound... The msnd.c (Turtle Beach MultiSound) driver (and its
derivatives, like msnd_pinnacle) uses disable_irq.  Running esd (esound
daemon), sound can easily generate > 1000 interrupts/second, since esd uses
small dma transfers. This can be seen quite clearly from /proc/interrupts on my
soundserver:

   CPU0   
  0:  276867328  XT-PIC  timer
  1:  2  XT-PIC  keyboard
  2:  0  XT-PIC  cascade
  3:7631519  XT-PIC  eth1
  4:2751419  XT-PIC  serial
  5: 1907346678  XT-PIC  soundblaster
  8:  1  XT-PIC  rtc
  9:   45022986  XT-PIC  eth0
 13:  1  XT-PIC  fpu
 14:4320643  XT-PIC  ide0
 15:4409193  XT-PIC  ide1
NMI:  0

OK, this is an ageing P166, and it uses a different driver, etc. I have not
found any problems with hanging sound drivers in Google query for 'linux msnd
bp6' or 'linux multisound bp6'. Of course, this is no conclusive evidence, far
from it... It could be that people using those cards are not the ones who tend
to go for the (somewhat tricky) BP6 board...

Cheers//Frank

-- 
  W  _______
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 04:36:33PM -0800, Linus Torvalds wrote:
> It may well not be disable_irq() that is buggy. In fact, there's good
> reason to believe that it's a hardware problem.

I am inclined to believe it IS a hardware problem... If disable_irq were buggy,
wouldn't the problem occur more frequently in other irq-heavy areas? A quick
count shows that disable_irq* is used in 84 sourcefiles in the driver/*
directory. This includes drivers which generate many interrupts in a short
timeframe (like ide).

Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 04:15:37PM -0800, Linus Torvalds wrote:
> On Fri, 12 Jan 2001, Frank de Lange wrote:
> > 
> > Gentleman, this (the patch to 8390.c) seems to fix the problem.
> 
> The problem with this patch is that anybody with a slow ISA ne2000 clone
> will basically have absolutely _horrible_ interrupt latency because we
> hold the irq lock over some quite expensive operations.
> 
> The spin_lock_irqsave() is absolutely my preferred fix, and if I remember
> correctly this is in fact how some early 2.1.x code fixed the ne2000
> driver when the original irq scalability stuff happened (for some time
> during development we did not have a working "disable_irq()" AT ALL
> because the irq-disabling counters etc logic hadn't been done).

And that's the patch I meant... Manfred's
spin_lock_irqsave/spin_unlock_irqrestore based one, not my
(spin_lock_irq/spin_unlock_irq) based patch. That is also the one I'm running
now.

Frank

-- 
  WWWWW  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Frank de Lange

> Remind me: what polarity are your io-apic irq's? Level, edge, sideways?
> Anything else that might be relevant?

Well, sideways ofcourse! :-)

here's a cat /proc/interrupts from the (BP6) box:

   CPU0   CPU1   
  0: 104936 105433IO-APIC-edge  timer
  1:      4384IO-APIC-edge  keyboard
  2:  0  0  XT-PIC  cascade
  3: 79 59IO-APIC-edge  serial
  4:  12743  12850IO-APIC-edge  serial
 14:   7855   7885IO-APIC-edge  ide0
 15:   1990   1703IO-APIC-edge  ide1
 16:  0  0   IO-APIC-level  es1371, mga@PCI:1:0:0
 17: 24 28   IO-APIC-level  sym53c8xx
 18:  0  0   IO-APIC-level  bttv
 19: 460435 460402   IO-APIC-level  eth0, eth1, usb-uhci
NMI: 210303 210303 
LOC: 210285 210284 
ERR:  0

The interrupt which caused problems was 19 (with both network cards and USB on
it). It shows a high number of interrupts because I've been load-testing the
network. The mere fact that it shows this hig number of interrupts shows the
fix works...

As this is a BP6, I'm now supposed to go on about the dead chickens, dedicated
air conditioners, nuclear powersupplies and other magic you're supposed to buy
to get these boards running. Well, nothing of that sort, it is running on a
simple (but high quality) 235W PSU with heatgreased coolers on the CPUs and the
BX xhipset. Nothing is overclocked. CPU and chipset tmeperatures are 24.C and
32.C, respectively.

In short, nothing remarkable. All PCI slots are used, as you can see from my
first posting in this thread (which contains more info on the hardware).

//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 09:54:31PM +0100, Manfred Spraul wrote:
> I have found one combination that doesn't hang with the unpatched
> 8390.c, but network throughput is down to 1/2. I hope that's due to the
> debugging changes.

Hm, could it be that the fact that network throughput is halved causes the
problem not to appear? Remember, it only appears under HEAVY network load. A
single nfs cp -rd  was not enough to hang my network, I needed to add
at least another cp -rd or some streaming audio or something else...

Cheers//Frank

-- 
  W  ___
 ## o o\    /     Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 09:51:36PM +0100, Ingo Molnar wrote:
> great. Back when i had the same problem, flood pinging another host (on
> the local network) was the quickest way to reproduce the hang:
> 
>   ping -f -s 10 otherhost
> 
> this produced an IOAPIC-hang within seconds.

Apart from killing streaming audio and interactive network use, nothing hangs.
As soon as the ping flood is stopped, audio streams on and ssh sessions are
useable again. So, it seems to fix it...

Frank
-- 
  W  ___
 ## o o\    / Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 09:37:24PM +0100, Ingo Molnar wrote:
> okay - i just wanted to hear a definitive word from you that this fixes
> your problem, because this is what we'll have to do as a final solution.
> (barring any other solution.)

Now running with this config:

PATCHED 8390.c (using irq_safe spinlocks instead of disable_irq)
PATCHED apic.c (focus cpu ENABLED)
STOCK io_apic.c

No problems under heavy network load.

Gentleman, this (the patch to 8390.c) seems to fix the problem.

Cheers//Frank

-- 
  W  ___
 ## o o\    /     Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 09:34:03PM +0100, Ingo Molnar wrote:
> ? this is x86-only code. There is no hot-pluggable CPU support for Linux
> AFAIK. (But in any case, the code is basically ready for hot-pluggable
> CPUs, just take a few precautions and change cpu_online_mask and a couple
> of other things.)

OK, maybe the Sun example was not the best to give for this code... But if
there are no hot-pluggable x86's around now (I think there are, but can not
recollect who made 'm...) and nobody is complaining, then it is fine with me...
I won't hot-unplug my BP6's CPU's anyway...

Cheers//Frank
-- 
  W  ___
 ## o o\    /     Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 09:31:15PM +0100, Ingo Molnar wrote:
> 
> On Fri, 12 Jan 2001, Frank de Lange wrote:
> 
> > WITH or WITHOUT the changed 8390 driver? I can already give you the
> > results for running WITH the changed driver: it works. I have not yet
> > tried it WITHOUT the changed 8390 driver (so that would be stock 8390,
> > patched apic.c, stock io_apic.c). Please let me know which you want...
> 
> WITH. patched 8390.c, patched apic.c, sock io_apic.c. My very strong
> feeling is that this will be a stable combination, and that this is what
> we want as a final solution.

It is. As I already mentioned in other messages, I already tested with JUST the
patched 8390.c driver, no other patches. It was stable. I then patched apic.c
AND io_apic.c, which did not introduce new instabilities. Unless you think that
reverting back to a stock io_apic.c would cause instabilities (which would be
weird, since I had no instabilities running only a patched 8390.c), I think the
patch to 8390.c DOES remove the symptoms all by itself. No other patches seem
necessary to get a stable box.

But I'll patch the mess again just fox kicks :-)

Cheers//Frank

-- 
  W      _______
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 09:19:53PM +0100, Ingo Molnar wrote:
> > In addition, I patched apic.c (focus cpu enabled)
> > In addition, I patched io_apic ((TARGET_CPUS 0xff)
> 
> please try it with the focus CPU enabling change (we want to enable that
> feature, i only disabled it due to the stuck-ne2k bug), but with
> TARGET_CPUS set to cpu_online_mask. (this later is needed for certain
> crappy BIOSes.)

WITH or WITHOUT the changed 8390 driver? I can already give you the results for
running WITH the changed driver: it works. I have not yet tried it WITHOUT the
changed 8390 driver (so that would be stock 8390, patched apic.c, stock
io_apic.c). Please let me know which you want...

Frank
-- 
  W  _______
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 09:11:29PM +0100, Manfred Spraul wrote:
> Frank, please clarify:
> you still run without disable_irq_nosync() in 8390.c?

I am running with your patched version of 8390.c (so WITHOUT
disable_irq_nosync()).

In addition, I patched apic.c (focus cpu enabled)
In addition, I patched io_apic ((TARGET_CPUS 0xff)

> I have a first idea: we send an EOI to an interrupt that is masked on
> the IO apic, perhaps that causes the problems.

Sound plausible...

> I'm right now typing a patch.

I'll await yours instead of making my own patch this time... :-)

Cheers//Frank
-- 
  W  ___
 ## o o\    / Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 11:59:25AM -0800, Linus Torvalds wrote:
> > Could this really be the solution?
> 
> I'd like to know _which_ of the two makes a difference (or does it only
> trigger with both of them enabled)? And even then I'm not sure that it is
> "the" solution - both changes to io-apic handling had some reason for
> them. Ingo, what was the focus-cpu thing?

Well, with 'this' (in 'could THIS be') I really meant the move from disable_irq
to the irq_safe spinlocks. I'm currently running with the patched 8390.c
driver, patched io_apic (TARGET_CPUS 0xff) and patched apic.c (focus cpu
enabled), and have had no problems yet... even though I'm running several
simulatnsous nfs cp -rd , streaming network audio, scanning with an
USB scanner, etc.

So far, it seems that the patch to 8390.c removed the symptoms. The changes to
apic.c and io_apic.c did not make the network hang come back. 

Cheers//Frank
-- 
  W  ___________
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 08:33:15PM +0100, Manfred Spraul wrote:
> Frank, the 2.4.0 contains 2 band aids that were added for ne2k smp:
> 
> * From Ingo: focus cpu disabled, in arch/i386/kernel/apic.c
> * From myself: TARGET_CPU = cpu_online_mask, was 0xFF.
> 
> Could you disable both bandaids? I disabled them, no problems so far.

I disabled both (I guess you meant the 'define TARGET_CPUS cpu_online' in
io_apic.c?), and reverted my own patch, added your patch... Now running with
the usual heavy network load, no problems so far... Also made USB produce
interrupts (shares irq with network), no problems...

Could this really be the solution?

Cheers//Frank
-- 
  W  ___
 ## o o\    / Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 08:04:24PM +0100, Manfred Spraul wrote:
> I removed the disable_irq lines from 8390.c, and that fixed the problem:
> no hang within 2 minutes - the test is still running.
> 
> Frank, could you double check it?

I'm currently running my own patched version, which uses
spin_lock_irq/spin_unlock_irq instead of
spin_lock_irqsave/spin_unlock_irqrestore like you patch uses. Looking at
spinlock.h, spin_lock_irq does a local irq disable, which seems to be closer to
the original intent (disable_irq) than spin_lock_irqsave. Anyone want to
comment on this?

Anyway, still running under load, also got USB (which uses the same irq) to
produce some interrupts by scanning some stuff. No problems so far...

Cheers//Frank

-- 
  W  ___
 ## o o\    /     Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 08:04:24PM +0100, Manfred Spraul wrote:
> Linus wrote:
> > Does this seem to happen mainly with drivers that use "disable_irq()" 
> > and "enable_irq()"? I know the ne drivers do (through the 8390 module), 
> > and some others do too (3c59x). 
> 
> I removed the disable_irq lines from 8390.c, and that fixed the problem:
> no hang within 2 minutes - the test is still running.
> 
> Frank, could you double check it?

Hm, I also sent in a (somewhat different) patch on my own... :-)]

Anyway, still running under heavy load...

Cheers//Frank
-- 
  WWWWW  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware

2001-01-12 Thread Frank de Lange

As per Linus' suggestion, I removed the disable_irq/enable_irq statements from
the 8390 core driver, and replace the spinlocks with irq-safe versions. This
seems to solve the network hangs, as I am currently running a heavy network
load (which would have killed a non-patched driver within seconds). Network
latency seems a bit higher, and there are some hiccups in the streaming audio
(part of the network load, easy indicator of performance...), but no hangs.
Here's the patch:

--- linux/drivers/net/8390.c.orgFri Jan 12 19:52:38 2001
+++ linux/drivers/net/8390.cFri Jan 12 19:54:50 2001
@@ -242,15 +242,15 @@
 
/* Ugly but a reset can be slow, yet must be protected */

-   disable_irq_nosync(dev->irq);
-   spin_lock(_local->page_lock);
+   /* disable_irq_nosync(dev->irq); */
+   spin_lock_irq(_local->page_lock);

/* Try to restart the card.  Perhaps the user has fixed something. */
ei_reset_8390(dev);
NS8390_init(dev, 1);

-   spin_unlock(_local->page_lock);
-   enable_irq(dev->irq);
+   spin_unlock_irq(_local->page_lock);
+   /* enable_irq(dev->irq); */
netif_wake_queue(dev);
 }
 
@@ -285,9 +285,9 @@
 *  Slow phase with lock held.
 */
 
-   disable_irq_nosync(dev->irq);
+   /* disable_irq_nosync(dev->irq); */

-   spin_lock(_local->page_lock);
+   spin_lock_irq(_local->page_lock);

ei_local->irqlock = 1;
 
@@ -383,8 +383,8 @@
ei_local->irqlock = 0;
outb_p(ENISR_ALL, e8390_base + EN0_IMR);

-   spin_unlock(_local->page_lock);
-   enable_irq(dev->irq);
+   spin_unlock_irq(_local->page_lock);
+   /* enable_irq(dev->irq); */
 
dev_kfree_skb (skb);
ei_local->stat.tx_bytes += send_length;

-- 
  WWWWW  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 06:51:36PM +0100, Manfred Spraul wrote:
> Frank, I've attached a proposed kick_IOAPIC pin. Could you try it?
> I'm rebooting with that patch right now.

I added the patch, and tried it out. When the network hangs, I am able to revive it 
with ALT-SYSRQ-Q. The debug log shows these entries:

Jan 12 19:22:57 behemoth kernel: SysRq: <0> NR Log Phy Mask Trig IRR Pol Stat Dest 
Deli Vect:
Jan 12 19:22:57 behemoth kernel: Before:
Jan 12 19:22:57 behemoth kernel:  00 003 03  011   1   11199
Jan 12 19:22:57 behemoth kernel: After switching to edge:
Jan 12 19:22:57 behemoth kernel:  00 003 03  001   1   11199
Jan 12 19:22:57 behemoth kernel: After switch back:
Jan 12 19:22:57 behemoth kernel:  00 003 03  011   1   11199

-- 
  W  ___
 ## o o\    /     Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Frank de Lange
hemoth kernel: ... APIC ICR2: 0100
Jan 12 18:26:21 behemoth kernel: ... APIC LVTT: 000200ef
Jan 12 18:26:21 behemoth kernel: ... APIC LVTPC: 0001
Jan 12 18:26:21 behemoth kernel: ... APIC LVT0: 00010700
Jan 12 18:26:21 behemoth kernel: ... APIC LVT1: 00010400
Jan 12 18:26:21 behemoth kernel: ... APIC LVTERR: 00fe
Jan 12 18:26:21 behemoth kernel: ... APIC TMICT: a322
Jan 12 18:26:21 behemoth kernel: ... APIC TMCCT: 1803
Jan 12 18:26:21 behemoth kernel: ... APIC TDCR: 0003
Jan 12 18:26:21 behemoth kernel:
Jan 12 18:26:21 behemoth kernel:
Jan 12 18:26:21 behemoth kernel: printing local APIC contents on CPU#0/0:
Jan 12 18:26:21 behemoth kernel: ... APIC ID:   (0)
Jan 12 18:26:21 behemoth kernel: ... APIC VERSION: 00040011
Jan 12 18:26:21 behemoth kernel: ... APIC TASKPRI:  (00)
Jan 12 18:26:21 behemoth kernel: ... APIC ARBPRI: 00e0 (e0)
Jan 12 18:26:21 behemoth kernel: ... APIC PROCPRI: 
Jan 12 18:26:21 behemoth kernel: ... APIC EOI: 
Jan 12 18:26:21 behemoth kernel: ... APIC LDR: 0100
Jan 12 18:26:21 behemoth kernel: ... APIC DFR: 
Jan 12 18:26:21 behemoth kernel: ... APIC SPIV: 03ff
Jan 12 18:26:21 behemoth kernel: ... APIC ISR field:
Jan 12 18:26:21 behemoth kernel: 0123456789abcdef0123456789abcdef
Jan 12 18:26:21 behemoth kernel: 
Jan 12 18:26:21 behemoth last message repeated 7 times
Jan 12 18:26:21 behemoth kernel: ... APIC TMR field:
Jan 12 18:26:21 behemoth kernel: 0123456789abcdef0123456789abcdef
Jan 12 18:26:21 behemoth kernel: 
Jan 12 18:26:21 behemoth last message repeated 3 times
Jan 12 18:26:21 behemoth kernel: 01000100
Jan 12 18:26:21 behemoth kernel: 
Jan 12 18:26:21 behemoth last message repeated 2 times
Jan 12 18:26:21 behemoth kernel: ... APIC IRR field:
Jan 12 18:26:21 behemoth kernel: 0123456789abcdef0123456789abcdef
Jan 12 18:26:21 behemoth kernel: 
Jan 12 18:26:21 behemoth last message repeated 6 times
Jan 12 18:26:21 behemoth kernel: 0001
Jan 12 18:26:21 behemoth kernel: ... APIC ESR: 
Jan 12 18:26:21 behemoth kernel: ... APIC ICR: 000c08fb
Jan 12 18:26:21 behemoth kernel: ... APIC ICR2: 0200
Jan 12 18:26:21 behemoth kernel: ... APIC LVTT: 000200ef
Jan 12 18:26:21 behemoth kernel: ... APIC LVTPC: 0001
Jan 12 18:26:21 behemoth kernel: ... APIC LVT0: 00010700
Jan 12 18:26:21 behemoth kernel: ... APIC LVT1: 0400
Jan 12 18:26:21 behemoth kernel: ... APIC LVTERR: 00fe
Jan 12 18:26:21 behemoth kernel: ... APIC TMICT: a322
Jan 12 18:26:21 behemoth kernel: ... APIC TMCCT: 4e26
Jan 12 18:26:21 behemoth kernel: ... APIC TDCR: 0003

-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Frank de Lange
el: ... APIC ICR: 08fc
Jan 12 16:29:32 behemoth kernel: ... APIC ICR2: 0100
Jan 12 16:29:32 behemoth kernel: ... APIC LVTT: 000200ef
Jan 12 16:29:32 behemoth kernel: ... APIC LVTPC: 0001
Jan 12 16:29:32 behemoth kernel: ... APIC LVT0: 0400
Jan 12 16:29:32 behemoth kernel: ... APIC LVT1: 00010400
Jan 12 16:29:32 behemoth kernel: ... APIC LVTERR: 00fe
Jan 12 16:29:32 behemoth kernel: ... APIC TMICT: a322
Jan 12 16:29:32 behemoth kernel: ... APIC TMCCT: 1686
Jan 12 16:29:32 behemoth kernel: ... APIC TDCR: 0003
Jan 12 16:29:32 behemoth kernel:
Jan 12 16:29:32 behemoth kernel:
Jan 12 16:29:32 behemoth kernel: printing local APIC contents on CPU#0/0:
Jan 12 16:29:32 behemoth kernel: ... APIC ID:   (0)
Jan 12 16:29:32 behemoth kernel: ... APIC VERSION: 00040011
Jan 12 16:29:32 behemoth kernel: ... APIC TASKPRI:  (00)
Jan 12 16:29:32 behemoth kernel: ... APIC ARBPRI: 00f0 (f0)
Jan 12 16:29:32 behemoth kernel: ... APIC PROCPRI: 
Jan 12 16:29:32 behemoth kernel: ... APIC EOI: 
Jan 12 16:29:32 behemoth kernel: ... APIC LDR: 0100
Jan 12 16:29:32 behemoth kernel: ... APIC DFR: 
Jan 12 16:29:32 behemoth kernel: ... APIC SPIV: 03ff
Jan 12 16:29:32 behemoth kernel: ... APIC ISR field:
Jan 12 16:29:32 behemoth kernel: 0123456789abcdef0123456789abcdef
Jan 12 16:29:32 behemoth kernel: 
Jan 12 16:29:32 behemoth last message repeated 7 times
Jan 12 16:29:32 behemoth kernel: ... APIC TMR field:
Jan 12 16:29:32 behemoth kernel: 0123456789abcdef0123456789abcdef
Jan 12 16:29:32 behemoth kernel: 
Jan 12 16:29:32 behemoth last message repeated 3 times
Jan 12 16:29:32 behemoth kernel: 0100
Jan 12 16:29:32 behemoth kernel: 
Jan 12 16:29:32 behemoth last message repeated 2 times
Jan 12 16:29:32 behemoth kernel: ... APIC IRR field:
Jan 12 16:29:32 behemoth kernel: 0123456789abcdef0123456789abcdef
Jan 12 16:29:32 behemoth kernel: 
Jan 12 16:29:32 behemoth last message repeated 6 times
Jan 12 16:29:32 behemoth kernel: 00011000
Jan 12 16:29:32 behemoth kernel: ... APIC ESR: 
Jan 12 16:29:32 behemoth kernel: ... APIC ICR: 000c08fb
Jan 12 16:29:32 behemoth kernel: ... APIC ICR2: 0200
Jan 12 16:29:32 behemoth kernel: ... APIC LVTT: 000200ef
Jan 12 16:29:32 behemoth kernel: ... APIC LVTPC: 0001
Jan 12 16:29:32 behemoth kernel: ... APIC LVT0: 0400
Jan 12 16:29:32 behemoth kernel: ... APIC LVT1: 0400
Jan 12 16:29:32 behemoth kernel: ... APIC LVTERR: 00fe
Jan 12 16:29:32 behemoth kernel: ... APIC TMICT: a322
Jan 12 16:29:32 behemoth kernel: ... APIC TMCCT: 47d7
Jan 12 16:29:32 behemoth kernel: ... APIC TDCR: 0003


-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 10:40:04PM +1100, Andrew Morton wrote:
> Here is a debugging patch.  Could you please apply this,
> rebuild and:
> 
> 1: Type ALT-SYSRQ-A when everything is good
> 2: Type ALT-SYSRQ-A when everything is bad
> 3: send the resulting logs.

OK, here's the results I get...

Before network hang
===

print_PIC()
printing PIC contents
print_IO_APIC()
testing the IO APIC...

 done.
print_all_local_APICs()

... APIC ID:  0100 (1)
... APIC VERSION: 00040011










0100









0001


... APIC ID:   (0)
... APIC VERSION: 00040011










01000100









1000

NOTICE: results differ every time I hit ALT-SYSRQ-A.
The '1' bit at 'row 11, col. 26' stays '1'
no matter how many times I use the magic keys.
The other '1' bits jump around a bit, or
disappear alltogether. Also, the sequence
in which the APICs appear in the dump sometimes
differs (this example shows 1 first, then 0,
other times you'd see 0 first, then 1)

After network hang
==

print_PIC()
printing PIC contents
print_IO_APIC()
testing the IO APIC...

 done.
print_all_local_APICs()

... APIC ID:   (0)
... APIC VERSION: 00040011










0100









0001


... APIC ID:  0100 (1)
... APIC VERSION: 00040011










0100









0001

NOTICE: hmmm... see, now that '1' bit at row 11,
col. 26 for APIC 0 which was '1' before
has turned to '0'. It will stay '0' no
matter how many times I hit the magic keys...
It seems to have been replaced by the '1'
bit at row 11, col. 10, since that bit 
stays '1' no matter how many magic I
throw at it...

Hope this helps... If you need more, let me know...

Cheers//Frank
-- 
  W  ___
 ## o o\    / Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list

Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 10:40:04PM +1100, Andrew Morton wrote:
> Frank de Lange wrote:
> > 
> > Quick and dirty conclusion: as soon as the apic comes in to play, things get
> > messy...
> Here is a debugging patch.  Could you please apply this,
> rebuild and:
> 
> 1: Type ALT-SYSRQ-A when everything is good
> 2: Type ALT-SYSRQ-A when everything is bad
> 3: send the resulting logs.

WillCo...

Now rebuilding...


Cheers//Frank

-- 
  W  ___________
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 10:40:04PM +1100, Andrew Morton wrote:
 Frank de Lange wrote:
  
  Quick and dirty conclusion: as soon as the apic comes in to play, things get
  messy...
 Here is a debugging patch.  Could you please apply this,
 rebuild and:
 
 1: Type ALT-SYSRQ-A when everything is good
 2: Type ALT-SYSRQ-A when everything is bad
 3: send the resulting logs.

WillCo...

Now rebuilding...


Cheers//Frank

-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 10:40:04PM +1100, Andrew Morton wrote:
 Here is a debugging patch.  Could you please apply this,
 rebuild and:
 
 1: Type ALT-SYSRQ-A when everything is good
 2: Type ALT-SYSRQ-A when everything is bad
 3: send the resulting logs.

OK, here's the results I get...

Before network hang
===

print_PIC()
printing PIC contents
print_IO_APIC()
testing the IO APIC...

 done.
print_all_local_APICs()

... APIC ID:  0100 (1)
... APIC VERSION: 00040011










0100









0001


... APIC ID:   (0)
... APIC VERSION: 00040011










01000100









1000

NOTICE: results differ every time I hit ALT-SYSRQ-A.
The '1' bit at 'row 11, col. 26' stays '1'
no matter how many times I use the magic keys.
The other '1' bits jump around a bit, or
disappear alltogether. Also, the sequence
in which the APICs appear in the dump sometimes
differs (this example shows 1 first, then 0,
other times you'd see 0 first, then 1)

After network hang
==

print_PIC()
printing PIC contents
print_IO_APIC()
testing the IO APIC...

 done.
print_all_local_APICs()

... APIC ID:   (0)
... APIC VERSION: 00040011










0100









0001


... APIC ID:  0100 (1)
... APIC VERSION: 00040011










0100









0001

NOTICE: hmmm... see, now that '1' bit at row 11,
col. 26 for APIC 0 which was '1' before
has turned to '0'. It will stay '0' no
matter how many times I hit the magic keys...
It seems to have been replaced by the '1'
bit at row 11, col. 10, since that bit 
stays '1' no matter how many magic I
throw at it...

Hope this helps... If you need more, let me know...

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: sen

Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Frank de Lange
:29:32 behemoth kernel: ... APIC LVTPC: 0001
Jan 12 16:29:32 behemoth kernel: ... APIC LVT0: 0400
Jan 12 16:29:32 behemoth kernel: ... APIC LVT1: 00010400
Jan 12 16:29:32 behemoth kernel: ... APIC LVTERR: 00fe
Jan 12 16:29:32 behemoth kernel: ... APIC TMICT: a322
Jan 12 16:29:32 behemoth kernel: ... APIC TMCCT: 1686
Jan 12 16:29:32 behemoth kernel: ... APIC TDCR: 0003
Jan 12 16:29:32 behemoth kernel:
Jan 12 16:29:32 behemoth kernel:
Jan 12 16:29:32 behemoth kernel: printing local APIC contents on CPU#0/0:
Jan 12 16:29:32 behemoth kernel: ... APIC ID:   (0)
Jan 12 16:29:32 behemoth kernel: ... APIC VERSION: 00040011
Jan 12 16:29:32 behemoth kernel: ... APIC TASKPRI:  (00)
Jan 12 16:29:32 behemoth kernel: ... APIC ARBPRI: 00f0 (f0)
Jan 12 16:29:32 behemoth kernel: ... APIC PROCPRI: 
Jan 12 16:29:32 behemoth kernel: ... APIC EOI: 
Jan 12 16:29:32 behemoth kernel: ... APIC LDR: 0100
Jan 12 16:29:32 behemoth kernel: ... APIC DFR: 
Jan 12 16:29:32 behemoth kernel: ... APIC SPIV: 03ff
Jan 12 16:29:32 behemoth kernel: ... APIC ISR field:
Jan 12 16:29:32 behemoth kernel: 0123456789abcdef0123456789abcdef
Jan 12 16:29:32 behemoth kernel: 
Jan 12 16:29:32 behemoth last message repeated 7 times
Jan 12 16:29:32 behemoth kernel: ... APIC TMR field:
Jan 12 16:29:32 behemoth kernel: 0123456789abcdef0123456789abcdef
Jan 12 16:29:32 behemoth kernel: 
Jan 12 16:29:32 behemoth last message repeated 3 times
Jan 12 16:29:32 behemoth kernel: 0100
Jan 12 16:29:32 behemoth kernel: 
Jan 12 16:29:32 behemoth last message repeated 2 times
Jan 12 16:29:32 behemoth kernel: ... APIC IRR field:
Jan 12 16:29:32 behemoth kernel: 0123456789abcdef0123456789abcdef
Jan 12 16:29:32 behemoth kernel: 
Jan 12 16:29:32 behemoth last message repeated 6 times
Jan 12 16:29:32 behemoth kernel: 00011000
Jan 12 16:29:32 behemoth kernel: ... APIC ESR: 
Jan 12 16:29:32 behemoth kernel: ... APIC ICR: 000c08fb
Jan 12 16:29:32 behemoth kernel: ... APIC ICR2: 0200
Jan 12 16:29:32 behemoth kernel: ... APIC LVTT: 000200ef
Jan 12 16:29:32 behemoth kernel: ... APIC LVTPC: 0001
Jan 12 16:29:32 behemoth kernel: ... APIC LVT0: 0400
Jan 12 16:29:32 behemoth kernel: ... APIC LVT1: 0400
Jan 12 16:29:32 behemoth kernel: ... APIC LVTERR: 00fe
Jan 12 16:29:32 behemoth kernel: ... APIC TMICT: a322
Jan 12 16:29:32 behemoth kernel: ... APIC TMCCT: 47d7
Jan 12 16:29:32 behemoth kernel: ... APIC TDCR: 0003


-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Frank de Lange
: ... APIC LVTPC: 0001
Jan 12 18:26:21 behemoth kernel: ... APIC LVT0: 00010700
Jan 12 18:26:21 behemoth kernel: ... APIC LVT1: 00010400
Jan 12 18:26:21 behemoth kernel: ... APIC LVTERR: 00fe
Jan 12 18:26:21 behemoth kernel: ... APIC TMICT: a322
Jan 12 18:26:21 behemoth kernel: ... APIC TMCCT: 1803
Jan 12 18:26:21 behemoth kernel: ... APIC TDCR: 0003
Jan 12 18:26:21 behemoth kernel:
Jan 12 18:26:21 behemoth kernel:
Jan 12 18:26:21 behemoth kernel: printing local APIC contents on CPU#0/0:
Jan 12 18:26:21 behemoth kernel: ... APIC ID:   (0)
Jan 12 18:26:21 behemoth kernel: ... APIC VERSION: 00040011
Jan 12 18:26:21 behemoth kernel: ... APIC TASKPRI:  (00)
Jan 12 18:26:21 behemoth kernel: ... APIC ARBPRI: 00e0 (e0)
Jan 12 18:26:21 behemoth kernel: ... APIC PROCPRI: 
Jan 12 18:26:21 behemoth kernel: ... APIC EOI: 
Jan 12 18:26:21 behemoth kernel: ... APIC LDR: 0100
Jan 12 18:26:21 behemoth kernel: ... APIC DFR: 
Jan 12 18:26:21 behemoth kernel: ... APIC SPIV: 03ff
Jan 12 18:26:21 behemoth kernel: ... APIC ISR field:
Jan 12 18:26:21 behemoth kernel: 0123456789abcdef0123456789abcdef
Jan 12 18:26:21 behemoth kernel: 
Jan 12 18:26:21 behemoth last message repeated 7 times
Jan 12 18:26:21 behemoth kernel: ... APIC TMR field:
Jan 12 18:26:21 behemoth kernel: 0123456789abcdef0123456789abcdef
Jan 12 18:26:21 behemoth kernel: 
Jan 12 18:26:21 behemoth last message repeated 3 times
Jan 12 18:26:21 behemoth kernel: 01000100
Jan 12 18:26:21 behemoth kernel: 
Jan 12 18:26:21 behemoth last message repeated 2 times
Jan 12 18:26:21 behemoth kernel: ... APIC IRR field:
Jan 12 18:26:21 behemoth kernel: 0123456789abcdef0123456789abcdef
Jan 12 18:26:21 behemoth kernel: 
Jan 12 18:26:21 behemoth last message repeated 6 times
Jan 12 18:26:21 behemoth kernel: 0001
Jan 12 18:26:21 behemoth kernel: ... APIC ESR: 
Jan 12 18:26:21 behemoth kernel: ... APIC ICR: 000c08fb
Jan 12 18:26:21 behemoth kernel: ... APIC ICR2: 0200
Jan 12 18:26:21 behemoth kernel: ... APIC LVTT: 000200ef
Jan 12 18:26:21 behemoth kernel: ... APIC LVTPC: 0001
Jan 12 18:26:21 behemoth kernel: ... APIC LVT0: 00010700
Jan 12 18:26:21 behemoth kernel: ... APIC LVT1: 0400
Jan 12 18:26:21 behemoth kernel: ... APIC LVTERR: 00fe
Jan 12 18:26:21 behemoth kernel: ... APIC TMICT: a322
Jan 12 18:26:21 behemoth kernel: ... APIC TMCCT: 4e26
Jan 12 18:26:21 behemoth kernel: ... APIC TDCR: 0003

-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 06:51:36PM +0100, Manfred Spraul wrote:
 Frank, I've attached a proposed kick_IOAPIC pin. Could you try it?
 I'm rebooting with that patch right now.

I added the patch, and tried it out. When the network hangs, I am able to revive it 
with ALT-SYSRQ-Q. The debug log shows these entries:

Jan 12 19:22:57 behemoth kernel: SysRq: 0 NR Log Phy Mask Trig IRR Pol Stat Dest 
Deli Vect:
Jan 12 19:22:57 behemoth kernel: Before:
Jan 12 19:22:57 behemoth kernel:  00 003 03  011   1   11199
Jan 12 19:22:57 behemoth kernel: After switching to edge:
Jan 12 19:22:57 behemoth kernel:  00 003 03  001   1   11199
Jan 12 19:22:57 behemoth kernel: After switch back:
Jan 12 19:22:57 behemoth kernel:  00 003 03  011   1   11199

-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware

2001-01-12 Thread Frank de Lange

As per Linus' suggestion, I removed the disable_irq/enable_irq statements from
the 8390 core driver, and replace the spinlocks with irq-safe versions. This
seems to solve the network hangs, as I am currently running a heavy network
load (which would have killed a non-patched driver within seconds). Network
latency seems a bit higher, and there are some hiccups in the streaming audio
(part of the network load, easy indicator of performance...), but no hangs.
Here's the patch:

--- linux/drivers/net/8390.c.orgFri Jan 12 19:52:38 2001
+++ linux/drivers/net/8390.cFri Jan 12 19:54:50 2001
@@ -242,15 +242,15 @@
 
/* Ugly but a reset can be slow, yet must be protected */

-   disable_irq_nosync(dev-irq);
-   spin_lock(ei_local-page_lock);
+   /* disable_irq_nosync(dev-irq); */
+   spin_lock_irq(ei_local-page_lock);

/* Try to restart the card.  Perhaps the user has fixed something. */
ei_reset_8390(dev);
NS8390_init(dev, 1);

-   spin_unlock(ei_local-page_lock);
-   enable_irq(dev-irq);
+   spin_unlock_irq(ei_local-page_lock);
+   /* enable_irq(dev-irq); */
netif_wake_queue(dev);
 }
 
@@ -285,9 +285,9 @@
 *  Slow phase with lock held.
 */
 
-   disable_irq_nosync(dev-irq);
+   /* disable_irq_nosync(dev-irq); */

-   spin_lock(ei_local-page_lock);
+   spin_lock_irq(ei_local-page_lock);

ei_local-irqlock = 1;
 
@@ -383,8 +383,8 @@
ei_local-irqlock = 0;
outb_p(ENISR_ALL, e8390_base + EN0_IMR);

-   spin_unlock(ei_local-page_lock);
-   enable_irq(dev-irq);
+   spin_unlock_irq(ei_local-page_lock);
+   /* enable_irq(dev-irq); */
 
dev_kfree_skb (skb);
ei_local-stat.tx_bytes += send_length;

-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 08:04:24PM +0100, Manfred Spraul wrote:
 Linus wrote:
  Does this seem to happen mainly with drivers that use "disable_irq()" 
  and "enable_irq()"? I know the ne drivers do (through the 8390 module), 
  and some others do too (3c59x). 
 
 I removed the disable_irq lines from 8390.c, and that fixed the problem:
 no hang within 2 minutes - the test is still running.
 
 Frank, could you double check it?

Hm, I also sent in a (somewhat different) patch on my own... :-)]

Anyway, still running under heavy load...

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 08:04:24PM +0100, Manfred Spraul wrote:
 I removed the disable_irq lines from 8390.c, and that fixed the problem:
 no hang within 2 minutes - the test is still running.
 
 Frank, could you double check it?

I'm currently running my own patched version, which uses
spin_lock_irq/spin_unlock_irq instead of
spin_lock_irqsave/spin_unlock_irqrestore like you patch uses. Looking at
spinlock.h, spin_lock_irq does a local irq disable, which seems to be closer to
the original intent (disable_irq) than spin_lock_irqsave. Anyone want to
comment on this?

Anyway, still running under load, also got USB (which uses the same irq) to
produce some interrupts by scanning some stuff. No problems so far...

Cheers//Frank

-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 08:33:15PM +0100, Manfred Spraul wrote:
 Frank, the 2.4.0 contains 2 band aids that were added for ne2k smp:
 
 * From Ingo: focus cpu disabled, in arch/i386/kernel/apic.c
 * From myself: TARGET_CPU = cpu_online_mask, was 0xFF.
 
 Could you disable both bandaids? I disabled them, no problems so far.

I disabled both (I guess you meant the 'define TARGET_CPUS cpu_online' in
io_apic.c?), and reverted my own patch, added your patch... Now running with
the usual heavy network load, no problems so far... Also made USB produce
interrupts (shares irq with network), no problems...

Could this really be the solution?

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 11:59:25AM -0800, Linus Torvalds wrote:
  Could this really be the solution?
 
 I'd like to know _which_ of the two makes a difference (or does it only
 trigger with both of them enabled)? And even then I'm not sure that it is
 "the" solution - both changes to io-apic handling had some reason for
 them. Ingo, what was the focus-cpu thing?

Well, with 'this' (in 'could THIS be') I really meant the move from disable_irq
to the irq_safe spinlocks. I'm currently running with the patched 8390.c
driver, patched io_apic (TARGET_CPUS 0xff) and patched apic.c (focus cpu
enabled), and have had no problems yet... even though I'm running several
simulatnsous nfs cp -rd big_dir, streaming network audio, scanning with an
USB scanner, etc.

So far, it seems that the patch to 8390.c removed the symptoms. The changes to
apic.c and io_apic.c did not make the network hang come back. 

Cheers//Frank
-- 
  W  ___
 ## o o\/     Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 09:11:29PM +0100, Manfred Spraul wrote:
 Frank, please clarify:
 you still run without disable_irq_nosync() in 8390.c?

I am running with your patched version of 8390.c (so WITHOUT
disable_irq_nosync()).

In addition, I patched apic.c (focus cpu enabled)
In addition, I patched io_apic ((TARGET_CPUS 0xff)

 I have a first idea: we send an EOI to an interrupt that is masked on
 the IO apic, perhaps that causes the problems.

Sound plausible...

 I'm right now typing a patch.

I'll await yours instead of making my own patch this time... :-)

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 09:19:53PM +0100, Ingo Molnar wrote:
  In addition, I patched apic.c (focus cpu enabled)
  In addition, I patched io_apic ((TARGET_CPUS 0xff)
 
 please try it with the focus CPU enabling change (we want to enable that
 feature, i only disabled it due to the stuck-ne2k bug), but with
 TARGET_CPUS set to cpu_online_mask. (this later is needed for certain
 crappy BIOSes.)

WITH or WITHOUT the changed 8390 driver? I can already give you the results for
running WITH the changed driver: it works. I have not yet tried it WITHOUT the
changed 8390 driver (so that would be stock 8390, patched apic.c, stock
io_apic.c). Please let me know which you want...

Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 09:31:15PM +0100, Ingo Molnar wrote:
 
 On Fri, 12 Jan 2001, Frank de Lange wrote:
 
  WITH or WITHOUT the changed 8390 driver? I can already give you the
  results for running WITH the changed driver: it works. I have not yet
  tried it WITHOUT the changed 8390 driver (so that would be stock 8390,
  patched apic.c, stock io_apic.c). Please let me know which you want...
 
 WITH. patched 8390.c, patched apic.c, sock io_apic.c. My very strong
 feeling is that this will be a stable combination, and that this is what
 we want as a final solution.

It is. As I already mentioned in other messages, I already tested with JUST the
patched 8390.c driver, no other patches. It was stable. I then patched apic.c
AND io_apic.c, which did not introduce new instabilities. Unless you think that
reverting back to a stock io_apic.c would cause instabilities (which would be
weird, since I had no instabilities running only a patched 8390.c), I think the
patch to 8390.c DOES remove the symptoms all by itself. No other patches seem
necessary to get a stable box.

But I'll patch the mess again just fox kicks :-)

Cheers//Frank

-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 09:34:03PM +0100, Ingo Molnar wrote:
 ? this is x86-only code. There is no hot-pluggable CPU support for Linux
 AFAIK. (But in any case, the code is basically ready for hot-pluggable
 CPUs, just take a few precautions and change cpu_online_mask and a couple
 of other things.)

OK, maybe the Sun example was not the best to give for this code... But if
there are no hot-pluggable x86's around now (I think there are, but can not
recollect who made 'm...) and nobody is complaining, then it is fine with me...
I won't hot-unplug my BP6's CPU's anyway...

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 09:37:24PM +0100, Ingo Molnar wrote:
 okay - i just wanted to hear a definitive word from you that this fixes
 your problem, because this is what we'll have to do as a final solution.
 (barring any other solution.)

Now running with this config:

PATCHED 8390.c (using irq_safe spinlocks instead of disable_irq)
PATCHED apic.c (focus cpu ENABLED)
STOCK io_apic.c

No problems under heavy network load.

Gentleman, this (the patch to 8390.c) seems to fix the problem.

Cheers//Frank

-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 09:51:36PM +0100, Ingo Molnar wrote:
 great. Back when i had the same problem, flood pinging another host (on
 the local network) was the quickest way to reproduce the hang:
 
   ping -f -s 10 otherhost
 
 this produced an IOAPIC-hang within seconds.

Apart from killing streaming audio and interactive network use, nothing hangs.
As soon as the ping flood is stopped, audio streams on and ssh sessions are
useable again. So, it seems to fix it...

Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardwarerelated?

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 09:54:31PM +0100, Manfred Spraul wrote:
 I have found one combination that doesn't hang with the unpatched
 8390.c, but network throughput is down to 1/2. I hope that's due to the
 debugging changes.

Hm, could it be that the fact that network throughput is halved causes the
problem not to appear? Remember, it only appears under HEAVY network load. A
single nfs cp -rd big_dir was not enough to hang my network, I needed to add
at least another cp -rd or some streaming audio or something else...

Cheers//Frank

-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-12 Thread Frank de Lange

 Remind me: what polarity are your io-apic irq's? Level, edge, sideways?
 Anything else that might be relevant?

Well, sideways ofcourse! :-)

here's a cat /proc/interrupts from the (BP6) box:

   CPU0   CPU1   
  0: 104936 105433IO-APIC-edge  timer
  1:      4384IO-APIC-edge  keyboard
  2:  0  0  XT-PIC  cascade
  3: 79 59IO-APIC-edge  serial
  4:  12743  12850IO-APIC-edge  serial
 14:   7855   7885IO-APIC-edge  ide0
 15:   1990   1703IO-APIC-edge  ide1
 16:  0  0   IO-APIC-level  es1371, mga@PCI:1:0:0
 17: 24 28   IO-APIC-level  sym53c8xx
 18:  0  0   IO-APIC-level  bttv
 19: 460435 460402   IO-APIC-level  eth0, eth1, usb-uhci
NMI: 210303 210303 
LOC: 210285 210284 
ERR:  0

The interrupt which caused problems was 19 (with both network cards and USB on
it). It shows a high number of interrupts because I've been load-testing the
network. The mere fact that it shows this hig number of interrupts shows the
fix works...

As this is a BP6, I'm now supposed to go on about the dead chickens, dedicated
air conditioners, nuclear powersupplies and other magic you're supposed to buy
to get these boards running. Well, nothing of that sort, it is running on a
simple (but high quality) 235W PSU with heatgreased coolers on the CPUs and the
BX xhipset. Nothing is overclocked. CPU and chipset tmeperatures are 24.C and
32.C, respectively.

In short, nothing remarkable. All PCI slots are used, as you can see from my
first posting in this thread (which contains more info on the hardware).

//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 04:36:33PM -0800, Linus Torvalds wrote:
 It may well not be disable_irq() that is buggy. In fact, there's good
 reason to believe that it's a hardware problem.

I am inclined to believe it IS a hardware problem... If disable_irq were buggy,
wouldn't the problem occur more frequently in other irq-heavy areas? A quick
count shows that disable_irq* is used in 84 sourcefiles in the driver/*
directory. This includes drivers which generate many interrupts in a short
timeframe (like ide).

Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware

2001-01-12 Thread Frank de Lange

On Fri, Jan 12, 2001 at 04:56:24PM -0800, Linus Torvalds wrote:
 IDE is not my favourite example of a "known stable driver". Also, in many
 cases IDE is for historical reasons connected to an EDGE io-apic pin (ie
 it's still considered an ISA interrupt). Which probably wouldn't show this
 problem anyway.

They (ide interrupts) are indeed EDGE-triggered on my box. I have not enabled
the HPT366 (ATA66) controller on this board, so I can not tell if that
controller is EDGE-triggered as well.

 Also, IDE doesn't generate all that many interrupts. You can make a
 network driver do a _lot_ more interrupts than just about any disk driver
 by simply sending/receiving a lot of packets. With disks it is very hard
 to get the same kind of irq load - Linux will merge the requests and do at
 least 1kB worth of transfer per interrupt etc. On a ne2k 100Mbps PCI card,
 you can probably _easily_ generate a much higher stream of interrupts.

There's sound... The msnd.c (Turtle Beach MultiSound) driver (and its
derivatives, like msnd_pinnacle) uses disable_irq.  Running esd (esound
daemon), sound can easily generate  1000 interrupts/second, since esd uses
small dma transfers. This can be seen quite clearly from /proc/interrupts on my
soundserver:

   CPU0   
  0:  276867328  XT-PIC  timer
  1:  2  XT-PIC  keyboard
  2:  0  XT-PIC  cascade
  3:7631519  XT-PIC  eth1
  4:2751419  XT-PIC  serial
  5: 1907346678  XT-PIC  soundblaster
  8:  1  XT-PIC  rtc
  9:   45022986  XT-PIC  eth0
 13:  1  XT-PIC  fpu
 14:4320643  XT-PIC  ide0
 15:4409193  XT-PIC  ide1
NMI:  0

OK, this is an ageing P166, and it uses a different driver, etc. I have not
found any problems with hanging sound drivers in Google query for 'linux msnd
bp6' or 'linux multisound bp6'. Of course, this is no conclusive evidence, far
from it... It could be that people using those cards are not the ones who tend
to go for the (somewhat tricky) BP6 board...

Cheers//Frank

-- 
  W  ___
 ## o o\/     Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware

2001-01-12 Thread Frank de Lange

On Sat, Jan 13, 2001 at 02:51:54AM +0100, Manfred Spraul wrote:
 Frank de Lange wrote:
  
  It could be that people using those cards are not the ones who tend
  to go for the (somewhat tricky) BP6 board...
  
 
 I doubt that it's BP6 specific: I have the problem with a Gigabyte BXD
 board and I doubt that Ingo used an BP6. Perhaps 82093AA specific (the
 IO APIC chip used for SMP 440BX board)

It isn't. But I just meant to indicate that the mere fact that I could not find
any problem-report for that combination does not indicate that there ARE no
problems...

 I can't find any spec updates for that chip: either it's the first
 perfect chip Intel ever produced, or ...

:-)

Well, the BX chipset is one of their better attempts I think...

Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-11 Thread Frank de Lange

On Thu, Jan 11, 2001 at 02:23:53PM -0500, Jeff Garzik wrote:
> Just out of curiosity, if you boot a Linux 2.4.0 kernel with the
> "noapic" command line option, does behavior improve?

For the curious, here's a summary of some tests I did:

apic, 2 cpu's, no smp affinity -> network hangs under load
apic, maxcpus=1, no smp affinity -> network hangs under load
apic, 2 cpu's, smp affinity for all irq's on CPU1 -> network hangs under load
noapic, 2 cpu's, no smp affinity -> NO HANG, WORKSFORME

Quick and dirty conclusion: as soon as the apic comes in to play, things get
messy...

ps. load == 2 simultaneous nfs cp -rd  sessions and streaming
esd audio over the network

Cheers//Frank
-- 
  W  _______
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-11 Thread Frank de Lange

On Thu, Jan 11, 2001 at 04:47:00PM -0500, Jeff Garzik wrote:
> Are you judging based on the error message?  The 'netdev watchdog ...'
> message is a generic error message that could have any number of
> causes.  It's just saying, well, what it says :)  The kernel was unable
> to transmit a packet in a certain amount of time.  You might get these
> messages if you unplug a cable suddenly, or if your hardware isn't
> delivering interrupts, or many other things...

No, I'm judging based on the fact that I found reports from people using
NE2K-PCI with several cards as well as tulip-based cards (different driver) on
abit BP6 as well as Gigabyte motherboards, mostly on 2.3.x/2.4.x kernels. I
found some postings with these problems on 2.2.x kernels.

Cheers//Frank
-- 
  W  ___
 ## o o\    / Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-11 Thread Frank de Lange

OK, just one last addition to what has nearly become my own thread...

I now am fairly certain that the problem (network stalls on multiprocessor systems) is 
not BP6 or NE2K-PCI specific. I found several postings which relate to similar 
problems on dissimilar hardware. Another interesting one is:

Re: PROBLEM : Networking stops working with kernel 2.4.0-test11 
  (http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg18722.html)

"...I have an almos identical system as you, 2x200MMX motherboard (Gigabyte
   586DX) also Voodoo3 (2000 pci) the same nic Realtek 8029AS, also a bt848
   tv card, also SCSI (Aic-7880 onboard, but not used).

   I have reported it some time ago, and now all I get with
   2.4.0-test11-pre4 and I think a additional patch is  NETDEV WATCHDOG:
   eth0: transmit timed out, and something in the console about lost irq?

   I can't reproduce it with a uniprocesor kernel, and I have a 3c503 card
   wich uses the 8390 module, so I suppose that the problem it's not in the
   8390, and it seems to be smp related"


ne2k-pci freezes with APIC error on 2.4.0-testX SMP
  (http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg14468.html)

"...

   When doing massive NFS transfers (2.4 machine as the client) on my SMP
   box
   (Abit BP6 2x celeronA 533mhz (non-overclocked) 64Mb ram, latest
   apt-get-ed
   debian woody) my ne2k-pci card (Realtek Semiconductor Co., Ltd.
   RTL-8029(AS)
(rev 0)) suddenly stops working. test5 spits that in syslog:..."

More to be found when searching the archives. This problem has been around for
a long, long time (probably since the current level of apic-support was added,
somewhere around 2.3.1x?). It has been reported by several people, several
times. I feel like rigging every apic-related piece of code with a zillion
bells and printk's but that would surely only create more mayhem as this whole
thing seems to be timing-related...

Anyone got any idea's on how to tackle this? Anyone who is 'intimate with' the
apic-related code? It'll take me some time to dive into that part, so if there
is anyone who already has taken the plunge, do tell...

Cheers//Frank

[ who is still running apic-less, without problems [
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-11 Thread Frank de Lange

Hm, the noapic option seems to help, as I'm currently beating the network to
death but it won't die... As the problem is elusive, it is hard to tell, and it
would not surprise me if the net dropped dead the moment this mail went
through, but current indication is that noapic makes the sudden net-death
disappear.

So we're still left with the question 'is this hardware-related, or is it a
software/configuration problem'? Other people seem to have similar problems
with dissimilar hardware (tulip cards instead of Winbond, etc), on 2.2.x as
well as 2.3/4.x. As I do not run Windows (NT or 2K), I can not tell if this
problem also occurs there. And my FreeBSD-box is uniprocessor... So... has
anyone seen anything like this on other 'true' (SMP) OS's? If so, that would
indicate a hardware problem...

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-11 Thread Frank de Lange

Another observation wrt. behaviour with 'noapic'...

When streaming time-critical data over the network (running esound to another
server, etc), sometimes there are hiccups in the stream. These hiccups seem to
be much less frequent, if at all present, when running with 'noapic'. I'm
currently running sound over a heavily loaded ethernet, no hiccups at all...
Weird, since the apic ought to spread the load of handling the interrupts over
all available CPU's.

Whatever is causing this, there seems to be something fishy in the way
interrupts are handled when the apic(s) is/are enabled...

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-11 Thread Frank de Lange

Here's another posting to the list which mentions problems with NE2K and BP6:

http://web.gnu.walfield.org/mail-archive/linux-kernel/2000-August/0132.html

"...In another machine, a dual celeron abit-bp6, recent 2.3.x kernels seem to 
dislike my realtek 8029 NIC. (I know, it's garbage plugged in to 
garbage...) The network card will die randomly, usually when I'm sending 
large amounts of data. When it dies, there are no kernel messages, and 
the interrupt count in /proc/interrupts for the card stop changing. Minor 
(painful) experimentation has shown that if the card is sharing the 
interrupt with anything else (say, ide2), it takes that with it. This 
only happens in "newer" kernels, it's fine in 2.2.16, and in some earlier 
2.3.x kernels. It goes away if I boot with the noapic=1 kernel parameter, 
and seems to be replaced with harmless "spurious 8259A interrupt: IRQ7." 
messages. (I haven't configured any hardware at all to be on IRQ7 - 
though I'm lead to believe IRQ7 has some sort of special purpose) ..."

So I'm not the only one...

Cheers//Frank
-- 
  W  _______
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?

2001-01-11 Thread Frank de Lange

> Do you get any transmit timeout messages in the logs?  If
> so, send them.

In addition to my previous message, here's what I get from the debug log
facility:

Jan 10 22:56:51 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out 
Jan 10 22:56:51 behemoth kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, 
t=33. 
Jan 10 22:56:52 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out 
Jan 10 22:56:52 behemoth kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, 
t=26. 
Jan 10 22:56:53 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out 
Jan 10 22:56:53 behemoth kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, 
t=30. 
Jan 10 22:56:56 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out 
Jan 10 22:56:56 behemoth kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, 
t=78. 
Jan 10 22:56:56 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out 
Jan 10 22:56:56 behemoth kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, 
t=32. 
Jan 10 22:56:58 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out 
Jan 10 22:56:58 behemoth kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, 
t=89. 
Jan 10 22:57:00 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out 
Jan 10 22:57:00 behemoth kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, 
t=77. 
Jan 10 22:57:03 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out 
Jan 10 22:57:03 behemoth kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, 
t=171. 

So yeah, I get timeouts allright...

Currently running NOAPIC, pity to see CPU1 receiving no interrupts at all... In the 
same debug log I now just saw this:

Jan 11 17:37:05 behemoth kernel: spurious 8259A interrupt: IRQ7

That's weird, since there's nothing there...:

cat /proc/interrupts 
   CPU0   CPU1   
  0: 232967  0  XT-PIC  timer
  1:   6424  0  XT-PIC  keyboard
  2:  0  0  XT-PIC  cascade
  3:138  0  XT-PIC  serial
  4:  46201  0  XT-PIC  serial
  9: 52  0  XT-PIC  sym53c8xx
 10: 744329  0  XT-PIC  eth0, eth1, usb-uhci
 11:  0  0  XT-PIC  bttv
 12:  0  0  XT-PIC  es1371, mga@PCI:1:0:0
 14:  19778  0  XT-PIC  ide0
 15:   4520  0  XT-PIC  ide1
NMI:  0  0 
LOC: 232916 232914 
ERR:  1

See? Nothing on 7... This is with NOAPIC (as you can see from the XT-PIC's in
the above dump). BP6 again?

Cheers//Frank
-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



  1   2   >