Re: Testing RAM from userspace / question about memmap= arguments

2007-12-22 Thread Matthew Bloch

David Newall wrote:

Pavel Machek wrote:

On Sat 2007-12-22 13:42:47, Richard D wrote:
 
Cant you, modify bootmem allocator to test with memtest patterns and 
then

use kexec (as Pavel suggested) to test the one where kernel was sitting
earlier?




I do not think you need to modify anything in kernel. Just use
/dev/mem to test areas that kernel doesn't see, then kexec into place
you already tested, and test the rest.
  


That's still an insufficient test.  One failure mode is writes at one 
location corrupting cells at another.


The idea of wanting to do comprehensive and robust memory testing from 
within the operating system seems dubious at best, to me.  


Well if we're trying to be thorough, either way is flawed - you can't 
possibly test pathologically-misbehaving memory from code running from 
inside of it, you'd want some kind of non-uniform memory arrangement to 
do that properly.  memtest86's value is that it at least *tries* to work 
in this environment by dynamically relocating itself, but its memory 
testing algorithms aren't the hard bit.  Also I'm not necessarily 
interested in *which* section of which DIMM is faulty, just a yes or no 
is enough so I can send the faulty ones back to the shop.


I don't agree that adding a network stack to memtest86's bare kernel is 
going to be easier than working out how to get Linux to do the same job, 
with its luxurious programming environment.  I can already automate 
memtest via serial consoles, power cycling, network booting and so on 
but it's ugly.


I will report back in the new year when I've had a chance to play with 
our collection of dodgy hardware.


--
Matthew

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-22 Thread Matthew Bloch

David Newall wrote:

Pavel Machek wrote:

On Sat 2007-12-22 13:42:47, Richard D wrote:
 
Cant you, modify bootmem allocator to test with memtest patterns and 
then

use kexec (as Pavel suggested) to test the one where kernel was sitting
earlier?




I do not think you need to modify anything in kernel. Just use
/dev/mem to test areas that kernel doesn't see, then kexec into place
you already tested, and test the rest.
  


That's still an insufficient test.  One failure mode is writes at one 
location corrupting cells at another.


The idea of wanting to do comprehensive and robust memory testing from 
within the operating system seems dubious at best, to me.  


Well if we're trying to be thorough, either way is flawed - you can't 
possibly test pathologically-misbehaving memory from code running from 
inside of it, you'd want some kind of non-uniform memory arrangement to 
do that properly.  memtest86's value is that it at least *tries* to work 
in this environment by dynamically relocating itself, but its memory 
testing algorithms aren't the hard bit.  Also I'm not necessarily 
interested in *which* section of which DIMM is faulty, just a yes or no 
is enough so I can send the faulty ones back to the shop.


I don't agree that adding a network stack to memtest86's bare kernel is 
going to be easier than working out how to get Linux to do the same job, 
with its luxurious programming environment.  I can already automate 
memtest via serial consoles, power cycling, network booting and so on 
but it's ugly.


I will report back in the new year when I've had a chance to play with 
our collection of dodgy hardware.


--
Matthew

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-20 Thread Matthew Bloch
Jon Masters wrote:
> On Tue, 2007-12-18 at 17:06 +0000, Matthew Bloch wrote:
> 
>> I can see a few potential problems, but since my understanding of the
>> low-level memory mapping is muddy at best, I won't speculate; I'd just
>> appreciate any more expert views on whether this does work, or could be
>> made to work.
> 
> Yo,
> 
> I don't think your testing approach is thorough enough. Clearly (knowing
> your line of business - as a virtual machine provider), you want to do
> pre-production testing as part of your provisioning. I would suggest
> instead of using mlock() from userspace of simply writing a kernel
> module that does this for every page of available memory.

Yes this is to improve the efficiency of server burn-ins.  I would
consider a kernel module, but I still wouldn't be able to test the
memory in which the kernel is sitting, which is my problem.  I'm not
sure even a kernel module could reliably test the memory in which it is
residing (memtest86+ relocates itself to do this).  Also I don't see how
  userspace testing is any less thorough than doing it in the kernel; I
just need a creative way of accessing every single page of memory.

I may do some experiments with the memmap args, some bad RAM and
shuffling it between DIMM sockets when I have the time :)

-- 
Matthew

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-20 Thread Matthew Bloch
Jon Masters wrote:
 On Tue, 2007-12-18 at 17:06 +, Matthew Bloch wrote:
 
 I can see a few potential problems, but since my understanding of the
 low-level memory mapping is muddy at best, I won't speculate; I'd just
 appreciate any more expert views on whether this does work, or could be
 made to work.
 
 Yo,
 
 I don't think your testing approach is thorough enough. Clearly (knowing
 your line of business - as a virtual machine provider), you want to do
 pre-production testing as part of your provisioning. I would suggest
 instead of using mlock() from userspace of simply writing a kernel
 module that does this for every page of available memory.

Yes this is to improve the efficiency of server burn-ins.  I would
consider a kernel module, but I still wouldn't be able to test the
memory in which the kernel is sitting, which is my problem.  I'm not
sure even a kernel module could reliably test the memory in which it is
residing (memtest86+ relocates itself to do this).  Also I don't see how
  userspace testing is any less thorough than doing it in the kernel; I
just need a creative way of accessing every single page of memory.

I may do some experiments with the memmap args, some bad RAM and
shuffling it between DIMM sockets when I have the time :)

-- 
Matthew

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Testing RAM from userspace / question about memmap= arguments

2007-12-18 Thread Matthew Bloch
Hi - I'm trying to come up with a way of thoroughly testing every byte
of RAM from within Linux on amd64 (so that it can be automated better
than using memtest86+), and came up with an idea which I'm not sure is
supported or practical.

The obvious problem with testing memory from user space is that you
can't mlock all of it, so the best you can do is about three quarters,
and hope that the rest of the memory is okay.

In order to test all of the memory, I'd like to run the user-space
memtester over two boots of the kernel.

Say we have a 1024MB machine, the first boot I'd not specify any
arguments and assume the kernel would start at the bottom of physical
memory and work its way up, so that the kernel & working userspace would
live at the bottom, and the rest would be testable from space.

On the second boot, could I then specify:

memmap=exact [EMAIL PROTECTED] [EMAIL PROTECTED]

i.e. such that the kernel's idea of the usable memory started in the
middle of physical RAM, and that's where it would locate itself?  That
way, on the second boot, the same test in userspace would definitely
grab the previously inaccessible RAM at the start for testing.

I can see a few potential problems, but since my understanding of the
low-level memory mapping is muddy at best, I won't speculate; I'd just
appreciate any more expert views on whether this does work, or could be
made to work.

Thanks,

-- 
Matthew

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Testing RAM from userspace / question about memmap= arguments

2007-12-18 Thread Matthew Bloch
Hi - I'm trying to come up with a way of thoroughly testing every byte
of RAM from within Linux on amd64 (so that it can be automated better
than using memtest86+), and came up with an idea which I'm not sure is
supported or practical.

The obvious problem with testing memory from user space is that you
can't mlock all of it, so the best you can do is about three quarters,
and hope that the rest of the memory is okay.

In order to test all of the memory, I'd like to run the user-space
memtester over two boots of the kernel.

Say we have a 1024MB machine, the first boot I'd not specify any
arguments and assume the kernel would start at the bottom of physical
memory and work its way up, so that the kernel  working userspace would
live at the bottom, and the rest would be testable from space.

On the second boot, could I then specify:

memmap=exact [EMAIL PROTECTED] [EMAIL PROTECTED]

i.e. such that the kernel's idea of the usable memory started in the
middle of physical RAM, and that's where it would locate itself?  That
way, on the second boot, the same test in userspace would definitely
grab the previously inaccessible RAM at the start for testing.

I can see a few potential problems, but since my understanding of the
low-level memory mapping is muddy at best, I won't speculate; I'd just
appreciate any more expert views on whether this does work, or could be
made to work.

Thanks,

-- 
Matthew

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/