RE: obsolete code must die

2001-06-13 Thread Rainer Mager
I agree that removing support for any hardware is a bad idea but I question
the idea of putting it all in one monolithic download (tar file). If we're
considering the concern for less developed nations with older hardware,
imagine how you would like to download the whole kernel with an old 2400 bps
modem. Not a fun thought.

Would it make sense to create some sort of 'make config' script that
determines what you want in your kernel and then downloads only those
components? After all, with the constant release of new hardware, isn't a
50MB kernel release not too far away? 100MB?


--Rainer

> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED]]On Behalf Of Colonel
> Sent: Thursday, June 14, 2001 10:32 AM
> To: [EMAIL PROTECTED]
> Subject: Re: obsolete code must die
>
>
> In list.kernel, you wrote:
>
> >i think we are all missing the ball here: i am happy when i see driver
> >support for a piece of hardware that i have _NEVER_ heard of and at most
> >_ONE_ person uses it.  why?  it means more stuff works in linux.  we
> >dont need to defend how many people use hardware X.  if you have X, good
> >for you.  if not, you dont care, but at least good for linux as a whole.
>
> Good Point!
>
> >lets stop fanning the flames and let this (Microsoft-using, as Rik
> >pointed out) troll die off.
>
> Agreed, he made the filter already.  But it was good for some laughs,
> checking a few cobwebs and I really didn't see flames.  Plus I got to
> test my new mailing list archive.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: obsolete code must die

2001-06-13 Thread Rainer Mager
I agree that removing support for any hardware is a bad idea but I question
the idea of putting it all in one monolithic download (tar file). If we're
considering the concern for less developed nations with older hardware,
imagine how you would like to download the whole kernel with an old 2400 bps
modem. Not a fun thought.

Would it make sense to create some sort of 'make config' script that
determines what you want in your kernel and then downloads only those
components? After all, with the constant release of new hardware, isn't a
50MB kernel release not too far away? 100MB?


--Rainer

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]]On Behalf Of Colonel
 Sent: Thursday, June 14, 2001 10:32 AM
 To: [EMAIL PROTECTED]
 Subject: Re: obsolete code must die


 In list.kernel, you wrote:

 i think we are all missing the ball here: i am happy when i see driver
 support for a piece of hardware that i have _NEVER_ heard of and at most
 _ONE_ person uses it.  why?  it means more stuff works in linux.  we
 dont need to defend how many people use hardware X.  if you have X, good
 for you.  if not, you dont care, but at least good for linux as a whole.

 Good Point!

 lets stop fanning the flames and let this (Microsoft-using, as Rik
 pointed out) troll die off.

 Agreed, he made the filter already.  But it was good for some laughs,
 checking a few cobwebs and I really didn't see flames.  Plus I got to
 test my new mailing list archive.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Building autofs

2001-02-27 Thread Rainer Mager

Hi all,

I'm trying to use autofs for the first time and am running into some
problems. First,  the documentation seems quite weak, that is, I'm not sure
if what I have is what I should have. I managed to find an autofs version 4
pre 9 tarball on the kernel mirrors. This seem the latest but is still a bit
old and the referenced home page doesn't seem any newer. My real problem,
however, is that when I try to build it I get this error:

lookup_program.c:147: `OPEN_MAX' undeclared (first use in this function)

My understanding is that OPEN_MAX is defined in linux/limits.h but I
hesitate to change the code since I would expect this to build out of the
box.


Cas someone who is using autofs give me some pointers? Am I on the right
track?

Thanks,

--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Building autofs

2001-02-27 Thread Rainer Mager

Hi all,

I'm trying to use autofs for the first time and am running into some
problems. First,  the documentation seems quite weak, that is, I'm not sure
if what I have is what I should have. I managed to find an autofs version 4
pre 9 tarball on the kernel mirrors. This seem the latest but is still a bit
old and the referenced home page doesn't seem any newer. My real problem,
however, is that when I try to build it I get this error:

lookup_program.c:147: `OPEN_MAX' undeclared (first use in this function)

My understanding is that OPEN_MAX is defined in linux/limits.h but I
hesitate to change the code since I would expect this to build out of the
box.


Cas someone who is using autofs give me some pointers? Am I on the right
track?

Thanks,

--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



RE: /proc/stat missing disk_io info

2001-02-18 Thread Rainer Mager

Not to be pushy or anything but since I received zero responses to this I
was wondering what else I can do. I'd be happy to patch the problem myself
but I have no idea what the correct value for DK_MAX_MAJOR  should be.

Anywho, if anyone has any thoughts I'd appreciate them.

--Rainer


> -Original Message-
>   I was wondering why some of my disks don't show up in
> /proc/stat's disk_io
> line. Specifically, my line says:
>
> disk_io: (2,0):(144,144,288,0,0) (3,0):(35,35,140,0,0)
>
> This equates to my floppy and first cdrom. I also have a second cdrom (RW)
> and 2 hard disks. Looking at the code (kstat_read_proc in
> fs/proc/proc_misc.c) it is looping only up to DK_MAX_MAJOR which
> is defined
> as 16 in kernel_stat.h. The problem is that my 2 HDs have a major
> number of
> 22.
>
> I don't know enough to produce a patch, that is, what should
> DK_MAX_MAJOR be
> set to, but I believe the above is the problem.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



RE: /proc/stat missing disk_io info

2001-02-18 Thread Rainer Mager

Not to be pushy or anything but since I received zero responses to this I
was wondering what else I can do. I'd be happy to patch the problem myself
but I have no idea what the correct value for DK_MAX_MAJOR  should be.

Anywho, if anyone has any thoughts I'd appreciate them.

--Rainer


 -Original Message-
   I was wondering why some of my disks don't show up in
 /proc/stat's disk_io
 line. Specifically, my line says:

 disk_io: (2,0):(144,144,288,0,0) (3,0):(35,35,140,0,0)

 This equates to my floppy and first cdrom. I also have a second cdrom (RW)
 and 2 hard disks. Looking at the code (kstat_read_proc in
 fs/proc/proc_misc.c) it is looping only up to DK_MAX_MAJOR which
 is defined
 as 16 in kernel_stat.h. The problem is that my 2 HDs have a major
 number of
 22.

 I don't know enough to produce a patch, that is, what should
 DK_MAX_MAJOR be
 set to, but I believe the above is the problem.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



/proc/stat missing disk_io info

2001-02-14 Thread Rainer Mager

Hi all,

I was wondering why some of my disks don't show up in /proc/stat's disk_io
line. Specifically, my line says:

disk_io: (2,0):(144,144,288,0,0) (3,0):(35,35,140,0,0)

This equates to my floppy and first cdrom. I also have a second cdrom (RW)
and 2 hard disks. Looking at the code (kstat_read_proc in
fs/proc/proc_misc.c) it is looping only up to DK_MAX_MAJOR which is defined
as 16 in kernel_stat.h. The problem is that my 2 HDs have a major number of
22.

I don't know enough to produce a patch, that is, what should DK_MAX_MAJOR be
set to, but I believe the above is the problem.



Thanks,

--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



/proc/stat missing disk_io info

2001-02-14 Thread Rainer Mager

Hi all,

I was wondering why some of my disks don't show up in /proc/stat's disk_io
line. Specifically, my line says:

disk_io: (2,0):(144,144,288,0,0) (3,0):(35,35,140,0,0)

This equates to my floppy and first cdrom. I also have a second cdrom (RW)
and 2 hard disks. Looking at the code (kstat_read_proc in
fs/proc/proc_misc.c) it is looping only up to DK_MAX_MAJOR which is defined
as 16 in kernel_stat.h. The problem is that my 2 HDs have a major number of
22.

I don't know enough to produce a patch, that is, what should DK_MAX_MAJOR be
set to, but I believe the above is the problem.



Thanks,

--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



RE: [patch] smbfs cache rewrite - 2nd try

2001-01-28 Thread Rainer Mager

This is working great for me so far. I've now got my full 1G RAM and samba
seems to be working fine. Woohoo! One more oops dead.

Put it in the official kernel. Put it in the official kernel. Put it in
the err, excuse me. I was chanting again.


--Rainer


> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED]]On Behalf Of Urban Widmark
> Sent: Monday, January 29, 2001 1:23 AM
> To: [EMAIL PROTECTED]
> Cc: Rainer Mager; Scott A. Sibert
> Subject: [patch] smbfs cache rewrite - 2nd try
>
>
>
> Smbfs testers wanted, with or without highmem boxes.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: [patch] smbfs cache rewrite - 2nd try

2001-01-28 Thread Rainer Mager

This is working great for me so far. I've now got my full 1G RAM and samba
seems to be working fine. Woohoo! One more oops dead.

Put it in the official kernel. Put it in the official kernel. Put it in
the err, excuse me. I was chanting again.


--Rainer


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]]On Behalf Of Urban Widmark
 Sent: Monday, January 29, 2001 1:23 AM
 To: [EMAIL PROTECTED]
 Cc: Rainer Mager; Scott A. Sibert
 Subject: [patch] smbfs cache rewrite - 2nd try



 Smbfs testers wanted, with or without highmem boxes.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Is this kernel related (signal 11)?

2001-01-24 Thread Rainer Mager

Hi all,

Well, I upgraded my system to glibc 2.2.1 with few problems. Unfortunately,
there are no improvements in my stability problems. X still dies.


So, I ask again, how can I debug this? How can I determine if this is a
kernel problem or not?


Thanks,

--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Is this kernel related (signal 11)?

2001-01-24 Thread Rainer Mager

Hi all,

Well, I upgraded my system to glibc 2.2.1 with few problems. Unfortunately,
there are no improvements in my stability problems. X still dies.


So, I ask again, how can I debug this? How can I determine if this is a
kernel problem or not?


Thanks,

--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Is this kernel related (signal 11)?

2001-01-23 Thread Rainer Mager

As per Russell King's suggestion, I ran memtest86 on my system for about 12
hours last night. I found no memory errors. Note that the tests did not
complete because I had to stop them this morning. I'll contiue them tonight.
They got through test 9 of 11.


As per David Ford's suggestion, I am looking into upgrading to glibc 2.2.1.
Can someone please give hints on doing this. I tried to upgrade to 2.2 a few
weeks ago and after the 'make install' and then reboot my system was very
broken and I had to reinstall the RedHat glibc RPM from CD to recover. I
found a howto but it seems pretty old. How do other people do this?


I've also done a strace on X. Now what do I do with this 4 MB log file?


Thanks,

--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Is this kernel related (signal 11)?

2001-01-23 Thread Rainer Mager

Thanks for all the info, comments below:

First, I ran X in gdb and got the following via 'bt' after X died. This is
my first experience with gdb so if I should do anything in particular,
please tell me.

#0  0x401addeb in __sigsuspend (set=0xb930)
at ../sysdeps/unix/sysv/linux/sigsuspend.c:48
#1  0x80495a4 in startServer ()
#2  0x804922c in main ()
#3  0x401a79cb in __libc_start_main (main=0x8048ee0 , argc=5,
argv=0xbacc, init=0x8048a64 <_init>, fini=0x8049a44 <_fini>,
rtld_fini=0x4000ae60 <_dl_fini>, stack_end=0xbac4)
at ../sysdeps/generic/libc-start.c:92


> David Ford:
>
> Upgrade -past- 2.2, get 2.2.1.  2.2 causes numerous segfaults,
> notably sendmail
> and apache stop working.

I'm willing. Are there any good how-tos on doing this without killing your
system? The last time I manually upgraded libc was about 5 years ago.


> Russell King:
>
>
> In answer to the original posters question, the first step would be
> to grab a copy of memtest86 (iirc its a program that is run from floppy
> disk) and run that on your system.  That /should/ (and I stress should
> there) detect any RAM problems you have.

I'll try this.



> Barry K. Nathan:
>
>
> Does it always happen when you are moving the mouse over a button or
> windowbar or some other on-screen object like that?

Nope. If anything I'd say it happens during blitting (scrolling, screen
refreshing, etc). Also, I'm not overclocking anything.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Is this kernel related (signal 11)?

2001-01-23 Thread Rainer Mager

Thanks for all the info, comments below:

First, I ran X in gdb and got the following via 'bt' after X died. This is
my first experience with gdb so if I should do anything in particular,
please tell me.

#0  0x401addeb in __sigsuspend (set=0xb930)
at ../sysdeps/unix/sysv/linux/sigsuspend.c:48
#1  0x80495a4 in startServer ()
#2  0x804922c in main ()
#3  0x401a79cb in __libc_start_main (main=0x8048ee0 main, argc=5,
argv=0xbacc, init=0x8048a64 _init, fini=0x8049a44 _fini,
rtld_fini=0x4000ae60 _dl_fini, stack_end=0xbac4)
at ../sysdeps/generic/libc-start.c:92


 David Ford:

 Upgrade -past- 2.2, get 2.2.1.  2.2 causes numerous segfaults,
 notably sendmail
 and apache stop working.

I'm willing. Are there any good how-tos on doing this without killing your
system? The last time I manually upgraded libc was about 5 years ago.


 Russell King:


 In answer to the original posters question, the first step would be
 to grab a copy of memtest86 (iirc its a program that is run from floppy
 disk) and run that on your system.  That /should/ (and I stress should
 there) detect any RAM problems you have.

I'll try this.



 Barry K. Nathan:


 Does it always happen when you are moving the mouse over a button or
 windowbar or some other on-screen object like that?

Nope. If anything I'd say it happens during blitting (scrolling, screen
refreshing, etc). Also, I'm not overclocking anything.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Is this kernel related (signal 11)?

2001-01-23 Thread Rainer Mager

As per Russell King's suggestion, I ran memtest86 on my system for about 12
hours last night. I found no memory errors. Note that the tests did not
complete because I had to stop them this morning. I'll contiue them tonight.
They got through test 9 of 11.


As per David Ford's suggestion, I am looking into upgrading to glibc 2.2.1.
Can someone please give hints on doing this. I tried to upgrade to 2.2 a few
weeks ago and after the 'make install' and then reboot my system was very
broken and I had to reinstall the RedHat glibc RPM from CD to recover. I
found a howto but it seems pretty old. How do other people do this?


I've also done a strace on X. Now what do I do with this 4 MB log file?


Thanks,

--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Is this kernel related (signal 11)?

2001-01-21 Thread Rainer Mager

> Would this be an SMP IA32 box with glibc 2.2? I have two such boxen
> showing exactly the same behaviour, although I can't reproduce it at will.

Close, it is actually an SMP IA32 box with glibc 2.1.3. But you've now
convinced me to not upgrade glibc yet  ;-)

--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Is this kernel related (signal 11)?

2001-01-21 Thread Rainer Mager

Hi all,

I brought up this issue last month and had some response but as of yet my
particular problem still exists. In brief, X windows dies with signal 11. I
have done quite a bit of testing and this does not seem to be a hardware
issue. Also, I have never managed to get a signal 11 error when not running
X.
I posted on the X Free86 mailing lists and the consensus there seems to be
that it is likely a hardware or kernel problem. So, my question is, how can
I pin point the problem? Is this likely to be a kernel issue?

Recently I have been able to reproduce the problem reliably in a few ways.
First, if I use an app that uses ncurses (like 'make menuconfig' on the
Linux kernel) from within Gnome-terminal then X dies instantly. For now I
have gone to using only xterm.
I can also cause the error from xmms by scrolling the playlist repeatedly.
This will happen within a few seconds but not instantly like above.
I have also seen the error in other cases but none that I am yet able to
reproduce on demand.


PLEASE, any suggestions?


--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Is this kernel related (signal 11)?

2001-01-21 Thread Rainer Mager

Hi all,

I brought up this issue last month and had some response but as of yet my
particular problem still exists. In brief, X windows dies with signal 11. I
have done quite a bit of testing and this does not seem to be a hardware
issue. Also, I have never managed to get a signal 11 error when not running
X.
I posted on the X Free86 mailing lists and the consensus there seems to be
that it is likely a hardware or kernel problem. So, my question is, how can
I pin point the problem? Is this likely to be a kernel issue?

Recently I have been able to reproduce the problem reliably in a few ways.
First, if I use an app that uses ncurses (like 'make menuconfig' on the
Linux kernel) from within Gnome-terminal then X dies instantly. For now I
have gone to using only xterm.
I can also cause the error from xmms by scrolling the playlist repeatedly.
This will happen within a few seconds but not instantly like above.
I have also seen the error in other cases but none that I am yet able to
reproduce on demand.


PLEASE, any suggestions?


--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Is this kernel related (signal 11)?

2001-01-21 Thread Rainer Mager

 Would this be an SMP IA32 box with glibc 2.2? I have two such boxen
 showing exactly the same behaviour, although I can't reproduce it at will.

Close, it is actually an SMP IA32 box with glibc 2.1.3. But you've now
convinced me to not upgrade glibc yet  ;-)

--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Oops with 4GB memory setting in 2.4.0 stable

2001-01-17 Thread Rainer Mager

> smb_rename suggests mv, but the process is ls ... er? What commands where
> you running on smbfs when it crashed?
>
> Could this be a symbol mismatch? Keith Owens suggested a less manual way
> to get module symbol output. Do you get the same results using that?

Here is a newly parsed oops, this time using the /var/log/ksymoops method
mentioned by Keith Owens. Does this look better?

--Rainer

 oops.parsed

Unable to handle kernel NULL pointer dereference at virtual address 
 printing eip:
c01239a4
*pde = 
Oops: 
CPU:0
EIP:0010:[]
EFLAGS: 00010202
eax: 1001   ebx:    ecx: c0256730   edx: 0003f435
esi: c20cde24   edi:    ebp: 0001   esp: ee5e3e30
ds: 0018   es: 0018   ss: 0018
Process ls (pid: 449, stackpage=ee5e3000)
Stack: c20cde24 ee5e3e64 f7e4 0001 c01262f5 c20cde24  0001
   f7e4 c110 fe2f0014 0018 fe2f c20cde24 f88982f8 
   0001 0070 ee5e3ee8 f889e180 ee61a000 ee6ede9c 0010 f8896e69
Call Trace: [] [] [] [] [] 
[] []
   [] [] [] [] [] [] 
[] []
   [] [] [] [] [] [] 
[] []

Code: 8b 07 ff 47 18 89 70 04 89 06 89 7e 04 89 37 89 7e 08 8b 44
Segmentation fault



RE: Oops with 4GB memory setting in 2.4.0 stable

2001-01-17 Thread Rainer Mager

 smb_rename suggests mv, but the process is ls ... er? What commands where
 you running on smbfs when it crashed?

 Could this be a symbol mismatch? Keith Owens suggested a less manual way
 to get module symbol output. Do you get the same results using that?

Here is a newly parsed oops, this time using the /var/log/ksymoops method
mentioned by Keith Owens. Does this look better?

--Rainer

 oops.parsed

Unable to handle kernel NULL pointer dereference at virtual address 
 printing eip:
c01239a4
*pde = 
Oops: 
CPU:0
EIP:0010:[c01239a4]
EFLAGS: 00010202
eax: 1001   ebx:    ecx: c0256730   edx: 0003f435
esi: c20cde24   edi:    ebp: 0001   esp: ee5e3e30
ds: 0018   es: 0018   ss: 0018
Process ls (pid: 449, stackpage=ee5e3000)
Stack: c20cde24 ee5e3e64 f7e4 0001 c01262f5 c20cde24  0001
   f7e4 c110 fe2f0014 0018 fe2f c20cde24 f88982f8 
   0001 0070 ee5e3ee8 f889e180 ee61a000 ee6ede9c 0010 f8896e69
Call Trace: [c01262f5] [fe2f0014] [fe2f] [f88982f8] [f889e180] 
[f8896e69] [f8896eaa]
   [fe2f] [fe2f] [f889e048] [f889e03c] [f8896f40] [fe2f] 
[f88983b0] [fe2f]
   [fe2f] [f889798b] [fe2f] [c0140c10] [c0140e7c] [c0140f9e] 
[c0140e7c] [c0108f4b]

Code: 8b 07 ff 47 18 89 70 04 89 06 89 7e 04 89 37 89 7e 08 8b 44
Segmentation fault



RE: Oops with 4GB memory setting in 2.4.0 stable

2001-01-16 Thread Rainer Mager

Hi again,

It looks like some progress is being made, *wonderful*, as to some earlier
questions...


> I'll have a look tonight or so. It works for you on non-bigmem?

Yes. Absolutely no problems on non-bigmem.


> smb_rename suggests mv, but the process is ls ... er? What commands where
> you running on smbfs when it crashed?

It seems that ANY access to the smbfs has this affect. Definitely confirmed
are: ls, tab completion from bash, cat [some file], and usually df.

>
> Could this be a symbol mismatch? Keith Owens suggested a less manual way
> to get module symbol output. Do you get the same results using that?

I'll try to do this and report back.



--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Oops with 4GB memory setting in 2.4.0 stable

2001-01-16 Thread Rainer Mager

Hi again,

It looks like some progress is being made, *wonderful*, as to some earlier
questions...


 I'll have a look tonight or so. It works for you on non-bigmem?

Yes. Absolutely no problems on non-bigmem.


 smb_rename suggests mv, but the process is ls ... er? What commands where
 you running on smbfs when it crashed?

It seems that ANY access to the smbfs has this affect. Definitely confirmed
are: ls, tab completion from bash, cat [some file], and usually df.


 Could this be a symbol mismatch? Keith Owens suggested a less manual way
 to get module symbol output. Do you get the same results using that?

I'll try to do this and report back.



--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Oops with 4GB memory setting in 2.4.0 stable

2001-01-15 Thread Rainer Mager

Ok, now were making progress. I did as you said and have attached (really!)
the new parsed output. Now we have some useful information (I hope). I still
got lots of warnings on symbols (which I have edited out of the parsed file
for the sake of briefness). What's the next step?

--Rainer


> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED]]On Behalf Of Marcelo Tosatti
> Sent: Tuesday, January 16, 2001 7:09 AM
> To: Rainer Mager
> Cc: [EMAIL PROTECTED]
> Subject: RE: Oops with 4GB memory setting in 2.4.0 stable
>
> >>EIP; f889e044<=
> Trace; f889d966 
> Trace; c0140c10 
> Trace; c0140e7c 
> Trace; c0140f9e 
> Trace; c0140e7c 
>
> It seems the oops is happening in a module's function.
>
> You have to make ksymoops parse the oops output against a System.map which
> has all modules symbols. Load each module by hand with the insmod -m
> option ("insmod -m module.o") and _append_ the outputs to System.map.
>
> After that you can run ksymoops against this new System.map.

 oops.parsed.edit


RE: Oops with 4GB memory setting in 2.4.0 stable

2001-01-15 Thread Rainer Mager

I knew that, I was just testing you all.  ;-)

\e hides his head in shame



> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED]]On Behalf Of Marcelo Tosatti
> Sent: Tuesday, January 16, 2001 6:47 AM
> To: Rainer Mager
> Cc: [EMAIL PROTECTED]
> Subject: Re: Oops with 4GB memory setting in 2.4.0 stable
>
>
>
>
> On Tue, 16 Jan 2001, Rainer Mager wrote:
>
> > Attached is my oops.txt and the result sent through
> ksymoops. The results
> > don't look particularly useful to me so perhaps I'm doing
> something wrong.
> > PLEASE tell me if I should parse this differently. Likewise, if there is
> > anything else I can do to help debug this, please tell me.
>
> It seems you forgot to attach oops.txt.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/

 oops.parsed

Unable to handle kernel NULL pointer dereference at virtual address 
 printing eip:
f889e044
*pde = 
Oops: 0002
CPU:1
EIP:0010:[]
EFLAGS: 00010246
eax:    ebx: d5762800   ecx: 0400   edx: c19665fc
esi: d55be120   edi:    ebp: d5764260   esp: d5505f1c
ds: 0018   es: 0018   ss: 0018
Process ls (pid: 865, stackpage=d5505000)
Stack: d5762800 d55be120 d5764260 d5764260 d55be120  f889d966 d55be120
   d5762800 d5504000 d5764260 fffe fffb d5762800 d5764260 d55be120
    d5764260 ba40 0006 c0140c10 d5764260 d5505fb0 c0140e7c
Call Trace: [] [] [] [] [] 
[]

Code: f3 ab e9 8b 00 00 00 90 8d 74 26 00 8b 44 24 14 c7 00 00 00
Segmentation fault



Oops with 4GB memory setting in 2.4.0 stable

2001-01-15 Thread Rainer Mager

Hi all,

I have a 100% reproducable bug in all of the 2.4.0 kernels including the
latest stable one. The issue is that if I compile the kernel to support 4GB
RAM (I have 1 GB) and then try to access a samba mount I get an oops. This
ALWAYS happens. Usually after this the system is frozen (although the magic
SYSREQ still works). If the system isn't frozen then any commands that
access the disk will freeze. Fortunately GPM worked and I was able to paste
the oops to a file via telnet.

Attached is my oops.txt and the result sent through ksymoops. The results
don't look particularly useful to me so perhaps I'm doing something wrong.
PLEASE tell me if I should parse this differently. Likewise, if there is
anything else I can do to help debug this, please tell me.

--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Oops with 4GB memory setting in 2.4.0 stable

2001-01-15 Thread Rainer Mager

Hi all,

I have a 100% reproducable bug in all of the 2.4.0 kernels including the
latest stable one. The issue is that if I compile the kernel to support 4GB
RAM (I have 1 GB) and then try to access a samba mount I get an oops. This
ALWAYS happens. Usually after this the system is frozen (although the magic
SYSREQ still works). If the system isn't frozen then any commands that
access the disk will freeze. Fortunately GPM worked and I was able to paste
the oops to a file via telnet.

Attached is my oops.txt and the result sent through ksymoops. The results
don't look particularly useful to me so perhaps I'm doing something wrong.
PLEASE tell me if I should parse this differently. Likewise, if there is
anything else I can do to help debug this, please tell me.

--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Oops with 4GB memory setting in 2.4.0 stable

2001-01-15 Thread Rainer Mager

I knew that, I was just testing you all.  ;-)

\e hides his head in shame



 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]]On Behalf Of Marcelo Tosatti
 Sent: Tuesday, January 16, 2001 6:47 AM
 To: Rainer Mager
 Cc: [EMAIL PROTECTED]
 Subject: Re: Oops with 4GB memory setting in 2.4.0 stable




 On Tue, 16 Jan 2001, Rainer Mager wrote:

  Attached is my oops.txt and the result sent through
 ksymoops. The results
  don't look particularly useful to me so perhaps I'm doing
 something wrong.
  PLEASE tell me if I should parse this differently. Likewise, if there is
  anything else I can do to help debug this, please tell me.

 It seems you forgot to attach oops.txt.

 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/

 oops.parsed

Unable to handle kernel NULL pointer dereference at virtual address 
 printing eip:
f889e044
*pde = 
Oops: 0002
CPU:1
EIP:0010:[f889e044]
EFLAGS: 00010246
eax:    ebx: d5762800   ecx: 0400   edx: c19665fc
esi: d55be120   edi:    ebp: d5764260   esp: d5505f1c
ds: 0018   es: 0018   ss: 0018
Process ls (pid: 865, stackpage=d5505000)
Stack: d5762800 d55be120 d5764260 d5764260 d55be120  f889d966 d55be120
   d5762800 d5504000 d5764260 fffe fffb d5762800 d5764260 d55be120
    d5764260 ba40 0006 c0140c10 d5764260 d5505fb0 c0140e7c
Call Trace: [f889d966] [c0140c10] [c0140e7c] [c0140f9e] [c0140e7c] 
[c0108f4b]

Code: f3 ab e9 8b 00 00 00 90 8d 74 26 00 8b 44 24 14 c7 00 00 00
Segmentation fault



RE: Oops with 4GB memory setting in 2.4.0 stable

2001-01-15 Thread Rainer Mager

Ok, now were making progress. I did as you said and have attached (really!)
the new parsed output. Now we have some useful information (I hope). I still
got lots of warnings on symbols (which I have edited out of the parsed file
for the sake of briefness). What's the next step?

--Rainer


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]]On Behalf Of Marcelo Tosatti
 Sent: Tuesday, January 16, 2001 7:09 AM
 To: Rainer Mager
 Cc: [EMAIL PROTECTED]
 Subject: RE: Oops with 4GB memory setting in 2.4.0 stable

 EIP; f889e044 END_OF_CODE+385bfe34/   =
 Trace; f889d966 END_OF_CODE+385bf756/
 Trace; c0140c10 vfs_readdir+90/ec
 Trace; c0140e7c filldir+0/d8
 Trace; c0140f9e sys_getdents+4a/98
 Trace; c0140e7c filldir+0/d8

 It seems the oops is happening in a module's function.

 You have to make ksymoops parse the oops output against a System.map which
 has all modules symbols. Load each module by hand with the insmod -m
 option ("insmod -m module.o") and _append_ the outputs to System.map.

 After that you can run ksymoops against this new System.map.

 oops.parsed.edit


Signal 11 - revisited

2000-12-17 Thread Rainer Mager

I was wondering if anyone had any new info/suggestions for the Signal 11
problem.

I think I last reported that I had tried 2.4.0test12 w AGPGart and DRM
turned off. This seemed a bit more stable but I did have X crash with
Signall 11 after about 1.5 days.

I'd really appreciate any advice on how to diagnose this.


Thanks,

--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Signal 11 - revisited

2000-12-17 Thread Rainer Mager

I was wondering if anyone had any new info/suggestions for the Signal 11
problem.

I think I last reported that I had tried 2.4.0test12 w AGPGart and DRM
turned off. This seemed a bit more stable but I did have X crash with
Signall 11 after about 1.5 days.

I'd really appreciate any advice on how to diagnose this.


Thanks,

--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Signal 11 - the continuing saga

2000-12-13 Thread Rainer Mager

Err, for those of us who aren't up to our elbows in the kernel code, is
there a patch for this? Presumeably this will be rolled into 2.4.0test13 but
I'd like to try it out? Also, can someone summarize the fix in English along
with the expected, improved behavior (e.g. Linux will never have a signal 11
again and will never, ever crash ;-)

Finally, as soon as there is a patch, can other people who have seen this
problem test it. My problem is so random that I'd need at least a few days
to gain some confidence this is fixed.


Thanks all.

--Rainer

> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED]]On Behalf Of Linus Torvalds
> Sent: Thursday, December 14, 2000 5:19 AM
> To: Mike Galbraith
> Cc: Kernel Mailing List
> Subject: Re: Signal 11 - the continuing saga
>
>
> On Wed, 13 Dec 2000, Linus Torvalds wrote:
> >
> > Hint: "ptep_mkdirty()".
>
> In case you wonder why the bug was so insidious, what this caused was two
> separate problems, both of them able to cause SIGSGV's.
>
> One: we didn't mark the page table entry dirty like we were supposed to.
>
> Two: by making it writable, we also made the page shared, even if it
> wasn't supposed to be shared (so when the next process wrote to the page,
> if the swap page was shared with somebody else, the changes would show up
> even in the process that _didn't_ write to it).
>
> And "ptep_mkdirty()" is only used by swapoff, so nothing else would show
> this. Which was why it hadn't been immediately obvious that anything was
> broken.
>
>   Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Signal 11 - the continuing saga

2000-12-13 Thread Rainer Mager

Mike et al,

I have no idea what IKD is and I don't know what to do with any results I
might find BUT I'd be happy to do this if it will help. Please pass on the
info with the instructions. Who should I report the results to?



--Rainer

> [mailto:[EMAIL PROTECTED]]On Behalf Of Mike Galbraith
> If you want, I can extract IKD.. which happens to have a trap in place
> for this (because I have a 100% reproducable swap related SIGSEGV that
> I'm trying to figure out).
>
> If you're interested, let me know and I'll extract it (quite large) and
> send it along instructions on how to do the trap.
>
>   -Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Signal 11 - the continuing saga

2000-12-13 Thread Rainer Mager
Give that man a cigarit was an env var (not LOCALE but LANG). I'd
actually checked this but I didn't think that made a difference in my case.

Thanks Linus, now can you fix the larger signal 11 problem?

--Rainer


> [mailto:[EMAIL PROTECTED]]On Behalf Of Linus Torvalds
> I'd guess that the program has a bug, and depending on the arguments and
> environment (especially the latter will be different), it shows up or
> not. Things like not having a LOCALE set in either case or similar.
>
>   Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/


RE: Signal 11 - the continuing saga

2000-12-13 Thread Rainer Mager

Mike et al,

I have no idea what IKD is and I don't know what to do with any results I
might find BUT I'd be happy to do this if it will help. Please pass on the
info with the instructions. Who should I report the results to?



--Rainer

 [mailto:[EMAIL PROTECTED]]On Behalf Of Mike Galbraith
 If you want, I can extract IKD.. which happens to have a trap in place
 for this (because I have a 100% reproducable swap related SIGSEGV that
 I'm trying to figure out).

 If you're interested, let me know and I'll extract it (quite large) and
 send it along instructions on how to do the trap.

   -Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Signal 11 - the continuing saga

2000-12-13 Thread Rainer Mager
Give that man a cigarit was an env var (not LOCALE but LANG). I'd
actually checked this but I didn't think that made a difference in my case.

Thanks Linus, now can you fix the larger signal 11 problem?

--Rainer


 [mailto:[EMAIL PROTECTED]]On Behalf Of Linus Torvalds
 I'd guess that the program has a bug, and depending on the arguments and
 environment (especially the latter will be different), it shows up or
 not. Things like not having a LOCALE set in either case or similar.

   Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/


RE: Signal 11 - the continuing saga

2000-12-13 Thread Rainer Mager

Err, for those of us who aren't up to our elbows in the kernel code, is
there a patch for this? Presumeably this will be rolled into 2.4.0test13 but
I'd like to try it out? Also, can someone summarize the fix in English along
with the expected, improved behavior (e.g. Linux will never have a signal 11
again and will never, ever crash ;-)

Finally, as soon as there is a patch, can other people who have seen this
problem test it. My problem is so random that I'd need at least a few days
to gain some confidence this is fixed.


Thanks all.

--Rainer

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]]On Behalf Of Linus Torvalds
 Sent: Thursday, December 14, 2000 5:19 AM
 To: Mike Galbraith
 Cc: Kernel Mailing List
 Subject: Re: Signal 11 - the continuing saga


 On Wed, 13 Dec 2000, Linus Torvalds wrote:
 
  Hint: "ptep_mkdirty()".

 In case you wonder why the bug was so insidious, what this caused was two
 separate problems, both of them able to cause SIGSGV's.

 One: we didn't mark the page table entry dirty like we were supposed to.

 Two: by making it writable, we also made the page shared, even if it
 wasn't supposed to be shared (so when the next process wrote to the page,
 if the swap page was shared with somebody else, the changes would show up
 even in the process that _didn't_ write to it).

 And "ptep_mkdirty()" is only used by swapoff, so nothing else would show
 this. Which was why it hadn't been immediately obvious that anything was
 broken.

   Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Signal 11 - the continuing saga

2000-12-12 Thread Rainer Mager

Thanks for the info...

> [mailto:[EMAIL PROTECTED]]On Behalf Of Jeff V. Merkey
> > So, is this related to the larger signal 11 problems?
>
> There's a corruption bug in the page cache somewhere, and it's 100%
> reproducable.  Finding it will be tough

Ok, granted this will be tough but is anyone even actively working on it?
What can I do to help?



> > Anyone know how to do [disable L1 and L2 caches]?
>
> Usually this is performed in the BIOS setup.  You can also disable L1
> with a sequence of instructions that write to the CR0 register on intel
> and flip a bit, but in doing this you have to execute a WBINV (write
> back invalidate) instruction to flush out the cache.  BIOS setup is
> probably simpler.  Disabling Level I will make the machine slower
> than mollasses, BTW, and if this bug is race related (they always
> are) it won't help much in running it down.

Aha, just as I suspected. My BIOS doesn't appear to support this. You seem
to be saying that doing so won't really contribute anything anyway so I will
hold off for now.



--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Signal 11 - the continuing saga

2000-12-12 Thread Rainer Mager

Hi again,

Ok, I just upgraded to 2.4.0test12 (although I don't think there was any
work in 12 that directly addresses this signal 11 problem). When compiling
the new kernel I chose to disable AGPGart and RDM as suggested by
[EMAIL PROTECTED] I will report later if this makes any difference.

On another, possibly related note, I'm getting some really weird behavior
with a Java program. The only reason I mention it here is because it dies
with our old friend Signal 11. Anyway, please bear with the description
below.
I have a tiny bash script that launches a Java swing app. If I run my
script from an xterm (or gnome-terminal or whatever) then it starts up fine.
If, however, I try to launch it from my gnome taskbar's menu then it dies
with signal 11 (the Java log is available upon request). This seems to be
100% consistent, since I noticed it yesterday, even across reboots.
Interestingly, the same behavior occurs if I try to run the program from
withis JBuilder 4.
So, is this related to the larger signal 11 problems?


What else can I do regarding these issues to help fix it? Would a core dump
help anyone? I'd really like to contribute somehow but I need some
direction.


--Rainer

> From: CMA [mailto:[EMAIL PROTECTED]]
> Did you already try to selectively disable L1 and L2 caches (if
> your box has both) and see what happens?

Anyone know how to do this?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Signal 11 - the continuing saga

2000-12-12 Thread Rainer Mager

Hi again,

Ok, I just upgraded to 2.4.0test12 (although I don't think there was any
work in 12 that directly addresses this signal 11 problem). When compiling
the new kernel I chose to disable AGPGart and RDM as suggested by
[EMAIL PROTECTED] I will report later if this makes any difference.

On another, possibly related note, I'm getting some really weird behavior
with a Java program. The only reason I mention it here is because it dies
with our old friend Signal 11. Anyway, please bear with the description
below.
I have a tiny bash script that launches a Java swing app. If I run my
script from an xterm (or gnome-terminal or whatever) then it starts up fine.
If, however, I try to launch it from my gnome taskbar's menu then it dies
with signal 11 (the Java log is available upon request). This seems to be
100% consistent, since I noticed it yesterday, even across reboots.
Interestingly, the same behavior occurs if I try to run the program from
withis JBuilder 4.
So, is this related to the larger signal 11 problems?


What else can I do regarding these issues to help fix it? Would a core dump
help anyone? I'd really like to contribute somehow but I need some
direction.


--Rainer

 From: CMA [mailto:[EMAIL PROTECTED]]
 Did you already try to selectively disable L1 and L2 caches (if
 your box has both) and see what happens?

Anyone know how to do this?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Signal 11 - the continuing saga

2000-12-12 Thread Rainer Mager

Thanks for the info...

 [mailto:[EMAIL PROTECTED]]On Behalf Of Jeff V. Merkey
  So, is this related to the larger signal 11 problems?

 There's a corruption bug in the page cache somewhere, and it's 100%
 reproducable.  Finding it will be tough

Ok, granted this will be tough but is anyone even actively working on it?
What can I do to help?



  Anyone know how to do [disable L1 and L2 caches]?

 Usually this is performed in the BIOS setup.  You can also disable L1
 with a sequence of instructions that write to the CR0 register on intel
 and flip a bit, but in doing this you have to execute a WBINV (write
 back invalidate) instruction to flush out the cache.  BIOS setup is
 probably simpler.  Disabling Level I will make the machine slower
 than mollasses, BTW, and if this bug is race related (they always
 are) it won't help much in running it down.

Aha, just as I suspected. My BIOS doesn't appear to support this. You seem
to be saying that doing so won't really contribute anything anyway so I will
hold off for now.



--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Signal 11

2000-12-11 Thread Rainer Mager

(This message contains a number of related replies.)

> From: Mike Galbraith [mailto:[EMAIL PROTECTED]]
> Is init permanently running after you see a couple of these?

No, that is, after 23 hours up time it has used only 6 seconds CPU time
(according to top).

That reminds me that I should repeat that my signal 11 problem has (so far)
only caused X to die. The OS remains up and stable.


> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> My troublesome box finally seems to be stable.[...]I disabled DRM
> & AGPGart. With them both disabled, I get no problems at all.
> No Sig11's, No Sig4's, No lockups.
>
> This box has a Voodoo3 3000 AGP..

I suppose I can try this too. My box has a Matrox G400. BTW, what is DRM?
Direct Rendering something?


> From: CMA [mailto:[EMAIL PROTECTED]]
> Did you already try to selectively disable L1 and L2 caches (if
> your box has both) and see what happens?

I'll look into this as well. Anyone have any pointers on how to do this? I
have a Tyan Tiger 133 with Award BIOS if this helps/matters.

Even if this setting does make a difference, what does this tell me/us? I
don't consider running the box with disabled cache(s) a viable solution.



Thanks all and keep those suggestions coming.

--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Signal 11

2000-12-11 Thread Rainer Mager

Well, I just had a Signal 11 even with the patch. What can I do to help
figure this out?


Thanks,

--Rainer

-Original Message-
From: Alan Cox [mailto:[EMAIL PROTECTED]]
Sent: Friday, December 08, 2000 11:07 PM
To: David Woodhouse
Cc: Andi Kleen; Rainer Mager; [EMAIL PROTECTED]; Mark Vojkovich
Subject: Re: Signal 11


> > wrong with it.  I've only seen this under 2.3.x/2.4 SMP kernels.  I
> > would say that this is definitely a kernel problem.=20
>
> XFree86 3.9 and XFree86 4 were rock solid for a _long_ time on 2.[34]
> kernels - even on my BP6=B9. The random crashes started to happen when =
> I
> upgraded my distribution=B2 - and are only seen by people using 2.4. So=
>  I
> suspect that it's the combination of glibc and kernel which is triggeri=
> ng
> it.

Have any of the folks seeing it checked if Ben LaHaise's fixes for the page
table updating race help ?

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Signal 11

2000-12-11 Thread Rainer Mager

Well, I just had a Signal 11 even with the patch. What can I do to help
figure this out?


Thanks,

--Rainer

-Original Message-
From: Alan Cox [mailto:[EMAIL PROTECTED]]
Sent: Friday, December 08, 2000 11:07 PM
To: David Woodhouse
Cc: Andi Kleen; Rainer Mager; [EMAIL PROTECTED]; Mark Vojkovich
Subject: Re: Signal 11


  wrong with it.  I've only seen this under 2.3.x/2.4 SMP kernels.  I
  would say that this is definitely a kernel problem.=20

 XFree86 3.9 and XFree86 4 were rock solid for a _long_ time on 2.[34]
 kernels - even on my BP6=B9. The random crashes started to happen when =
 I
 upgraded my distribution=B2 - and are only seen by people using 2.4. So=
  I
 suspect that it's the combination of glibc and kernel which is triggeri=
 ng
 it.

Have any of the folks seeing it checked if Ben LaHaise's fixes for the page
table updating race help ?

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Signal 11

2000-12-11 Thread Rainer Mager

(This message contains a number of related replies.)

 From: Mike Galbraith [mailto:[EMAIL PROTECTED]]
 Is init permanently running after you see a couple of these?

No, that is, after 23 hours up time it has used only 6 seconds CPU time
(according to top).

That reminds me that I should repeat that my signal 11 problem has (so far)
only caused X to die. The OS remains up and stable.


 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
 My troublesome box finally seems to be stable.[...]I disabled DRM
  AGPGart. With them both disabled, I get no problems at all.
 No Sig11's, No Sig4's, No lockups.

 This box has a Voodoo3 3000 AGP..

I suppose I can try this too. My box has a Matrox G400. BTW, what is DRM?
Direct Rendering something?


 From: CMA [mailto:[EMAIL PROTECTED]]
 Did you already try to selectively disable L1 and L2 caches (if
 your box has both) and see what happens?

I'll look into this as well. Anyone have any pointers on how to do this? I
have a Tyan Tiger 133 with Award BIOS if this helps/matters.

Even if this setting does make a difference, what does this tell me/us? I
don't consider running the box with disabled cache(s) a viable solution.



Thanks all and keep those suggestions coming.

--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



OOPS when using 4GB memory setting

2000-12-10 Thread Rainer Mager

Hi all,

About 1 month back I reported a problem with getting OOPs when running with
a kernel compiled with the 4GB memory setting. Since then I've finally
managed to get the ksymoops results. Where should I post them?

To review:

My machine has 1GB RAM. If I build a 2.4.0test11 (or 8, 9, or 10 I haven't
tried earlier) kernel and chose the 1GB memory setting then only 900504 K is
detected (but everything runs stably). If I chose the 4GB memory setting
then the full 1 GB is detected but I get oops. I can reliably force an oops
by mounting a samba drive and then accessing it (via ls for example).


--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Signal 11

2000-12-10 Thread Rainer Mager

I just applied the said patch and will report my results. Note that I have
never been able to reliably, on-demand reproduce this so give me a few days
to see what happens.

--Rainer


-Original Message-
From: Alan Cox [mailto:[EMAIL PROTECTED]]
Sent: Friday, December 08, 2000 11:07 PM
To: David Woodhouse
Cc: Andi Kleen; Rainer Mager; [EMAIL PROTECTED]; Mark Vojkovich
Subject: Re: Signal 11


> > wrong with it.  I've only seen this under 2.3.x/2.4 SMP kernels.  I
> > would say that this is definitely a kernel problem.=20
>
> XFree86 3.9 and XFree86 4 were rock solid for a _long_ time on 2.[34]
> kernels - even on my BP6=B9. The random crashes started to happen when =
> I
> upgraded my distribution=B2 - and are only seen by people using 2.4. So=
>  I
> suspect that it's the combination of glibc and kernel which is triggeri=
> ng
> it.

Have any of the folks seeing it checked if Ben LaHaise's fixes for the page
table updating race help ?

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Signal 11

2000-12-10 Thread Rainer Mager

I just applied the said patch and will report my results. Note that I have
never been able to reliably, on-demand reproduce this so give me a few days
to see what happens.

--Rainer


-Original Message-
From: Alan Cox [mailto:[EMAIL PROTECTED]]
Sent: Friday, December 08, 2000 11:07 PM
To: David Woodhouse
Cc: Andi Kleen; Rainer Mager; [EMAIL PROTECTED]; Mark Vojkovich
Subject: Re: Signal 11


  wrong with it.  I've only seen this under 2.3.x/2.4 SMP kernels.  I
  would say that this is definitely a kernel problem.=20

 XFree86 3.9 and XFree86 4 were rock solid for a _long_ time on 2.[34]
 kernels - even on my BP6=B9. The random crashes started to happen when =
 I
 upgraded my distribution=B2 - and are only seen by people using 2.4. So=
  I
 suspect that it's the combination of glibc and kernel which is triggeri=
 ng
 it.

Have any of the folks seeing it checked if Ben LaHaise's fixes for the page
table updating race help ?

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



OOPS when using 4GB memory setting

2000-12-10 Thread Rainer Mager

Hi all,

About 1 month back I reported a problem with getting OOPs when running with
a kernel compiled with the 4GB memory setting. Since then I've finally
managed to get the ksymoops results. Where should I post them?

To review:

My machine has 1GB RAM. If I build a 2.4.0test11 (or 8, 9, or 10 I haven't
tried earlier) kernel and chose the 1GB memory setting then only 900504 K is
detected (but everything runs stably). If I chose the 4GB memory setting
then the full 1 GB is detected but I get oops. I can reliably force an oops
by mounting a samba drive and then accessing it (via ls for example).


--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Signal 11

2000-12-07 Thread Rainer Mager

Hi all,

Thanks for all the input so far. Regarding this...

> (I'm not sure exactly what cerberos does, do you have a link for it ?).

The official name is "Cerberus Test Control System" aka CTCS. I don't know
the official site but a search for this should reveal something. Anyway it
is a pretty comprehensive test that includes multiple kernel compiles,
memory tests, disk test, etc, etc. Like I said, I ran this for more than 15
hours with no problems.

Well, actually, I did notice that if I run CTCS from within X then it
freezes up after a few minutes. This appears to happen when/because of
extreme swapping.


Aside from the above I've also run repeated kernel compiles (more than 50
times) with 'make -j bzImage' and had no problems; all outputs were
identical.

So given these tests, I'm reasonably confident the core hardware is ok. I
suppose it is possible there's some iffy bits in the G400's VRAM (but
wouldn't that just result in screen artifacts?). I will admit that I have't
yet tried swapping RAM or any other system components.


Any other ideas?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Signal 11

2000-12-07 Thread Rainer Mager

Hi all,

I've searched around for a answer to this with no real luck yet. If anyone
has some ideas I'd be very grateful.

I recently upgraded to a new machine. It is running RedHat 6.2 Linux (with
a SMP 2.4.0test[8-11] kernel) and has a Matrox G400 in it. X is 4.0.1.
Anyway, about once every 2-3 days X will spontaneously die and the only info
I get back is that it was because of signal 11.
I've heard that signal 11 can be related to bad hardware, most often
memory, but I've done a good bit of testing on this and the system seems ok.
What I did was to run the VA Linux Cerberos(sp?) test for 15 hours+ with no
errors. Actually this only worked when running from the console. When
running from X the machine locked up (although no signal 11).
The only info I've gotten back from the XFree86 mailing lists so far is
that there are known and wide spread problems with SMP and these types of
problems. Can anyone comment on this? Are there known SMP problems? What is
the current resolution plan?


Thanks,

--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Signal 11

2000-12-07 Thread Rainer Mager

Hi all,

I've searched around for a answer to this with no real luck yet. If anyone
has some ideas I'd be very grateful.

I recently upgraded to a new machine. It is running RedHat 6.2 Linux (with
a SMP 2.4.0test[8-11] kernel) and has a Matrox G400 in it. X is 4.0.1.
Anyway, about once every 2-3 days X will spontaneously die and the only info
I get back is that it was because of signal 11.
I've heard that signal 11 can be related to bad hardware, most often
memory, but I've done a good bit of testing on this and the system seems ok.
What I did was to run the VA Linux Cerberos(sp?) test for 15 hours+ with no
errors. Actually this only worked when running from the console. When
running from X the machine locked up (although no signal 11).
The only info I've gotten back from the XFree86 mailing lists so far is
that there are known and wide spread problems with SMP and these types of
problems. Can anyone comment on this? Are there known SMP problems? What is
the current resolution plan?


Thanks,

--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Signal 11

2000-12-07 Thread Rainer Mager

Hi all,

Thanks for all the input so far. Regarding this...

 (I'm not sure exactly what cerberos does, do you have a link for it ?).

The official name is "Cerberus Test Control System" aka CTCS. I don't know
the official site but a search for this should reveal something. Anyway it
is a pretty comprehensive test that includes multiple kernel compiles,
memory tests, disk test, etc, etc. Like I said, I ran this for more than 15
hours with no problems.

Well, actually, I did notice that if I run CTCS from within X then it
freezes up after a few minutes. This appears to happen when/because of
extreme swapping.


Aside from the above I've also run repeated kernel compiles (more than 50
times) with 'make -j bzImage' and had no problems; all outputs were
identical.

So given these tests, I'm reasonably confident the core hardware is ok. I
suppose it is possible there's some iffy bits in the G400's VRAM (but
wouldn't that just result in screen artifacts?). I will admit that I have't
yet tried swapping RAM or any other system components.


Any other ideas?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



OOPS when using 4GB memory setting

2000-11-06 Thread Rainer Mager

Hi all,

Please respond directly since I'm not on this mailing list.

I have 2 intertwined problems that my initial web research has failed to
reveal help. I recently upgraded machines and the new one has 1GB RAM. If I
build a 2.4.0pre10 (or 8 or 9, I haven't tried earlier) kernel and chose the
1GB memory setting then only 900504 K is detected (but everything runs
stably). If I chose the 4GB memory setting then the full 1 G is detected but
I get oops. I can reliably force an oops by mounting a samba drive and then
accessing it (via ls for example).
So, is this a known issue? Should I do an oops analysis? What can I do to
fix this?

Also 2 items of note. The kernel that comes RetHat 6.2 detects all of the
RAM and is stable. Related to this, although not that important, I also
noticed that via this RedHad kernel, hdparm shows memory access (not disk)
of over 200 MB/s. On my 2.4 kernels this is about 120MB/s. Any ideas why?
Second, it is a dual PIII system so is an SMP kernel, if that makes a
difference.


Any help would be greatly appreciated.

--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



OOPS when using 4GB memory setting

2000-11-06 Thread Rainer Mager

Hi all,

Please respond directly since I'm not on this mailing list.

I have 2 intertwined problems that my initial web research has failed to
reveal help. I recently upgraded machines and the new one has 1GB RAM. If I
build a 2.4.0pre10 (or 8 or 9, I haven't tried earlier) kernel and chose the
1GB memory setting then only 900504 K is detected (but everything runs
stably). If I chose the 4GB memory setting then the full 1 G is detected but
I get oops. I can reliably force an oops by mounting a samba drive and then
accessing it (via ls for example).
So, is this a known issue? Should I do an oops analysis? What can I do to
fix this?

Also 2 items of note. The kernel that comes RetHat 6.2 detects all of the
RAM and is stable. Related to this, although not that important, I also
noticed that via this RedHad kernel, hdparm shows memory access (not disk)
of over 200 MB/s. On my 2.4 kernels this is about 120MB/s. Any ideas why?
Second, it is a dual PIII system so is an SMP kernel, if that makes a
difference.


Any help would be greatly appreciated.

--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/