Re: Testing RAM from userspace / question about memmap= arguments

2007-12-26 Thread Maxim Levitsky
В сообщении от Wednesday 26 December 2007 12:17:56 Arjan van de Ven написал(а):
> On Wed, 26 Dec 2007 00:09:57 +0100
> Pavel Machek <[EMAIL PROTECTED]> wrote:
> 
> > On Sat 2007-12-22 12:09:59, Arjan van de Ven wrote:
> > > On Tue, 18 Dec 2007 17:06:24 +
> 
> > > memtest86+ does various magic to basically bypass the caches (by
> > > disabling them ;-)... Doing that in a live kernel situation, and
> > > from userspace to boot.. that's... and issue.
> > 
> > Are you sure? I always assumed that memtest just used patterns bigger
> > than L1/L2 caches...
> 
> that's... not nearly usable or enough. Caches are relatively smart
> about things like use-once and they're huge. 12Mb today. You'd need
> patterns bigger than 100Mb to get even close to being reasonably
> confident that there's nothing left.
> 
> > ... and IIRC my celeron testing confirmed it, if
> > I disabled L2 cache in BIOS, memtest behave differently.
> > 
> > Anyway, if you can do iopl(), we may as well let you disable caches,
> > but you are right, that will need a kernel patch.
> 
> and a new syscall of some sorts I suspect; "flush all caches" is a ring
> 0 operation (and you probably need to do it in an ipi anyway on all
> cpus)
> 

I think that PAT support will help a lot.
How about opening/mmaping /dev/mem, and setting uncacheable attribute there.
Actually it is even possible today with MTRRs.

Regards,
Maxim Levitsky
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-26 Thread Arjan van de Ven
On Wed, 26 Dec 2007 00:09:57 +0100
Pavel Machek <[EMAIL PROTECTED]> wrote:

> On Sat 2007-12-22 12:09:59, Arjan van de Ven wrote:
> > On Tue, 18 Dec 2007 17:06:24 +

> > memtest86+ does various magic to basically bypass the caches (by
> > disabling them ;-)... Doing that in a live kernel situation, and
> > from userspace to boot.. that's... and issue.
> 
> Are you sure? I always assumed that memtest just used patterns bigger
> than L1/L2 caches...

that's... not nearly usable or enough. Caches are relatively smart
about things like use-once and they're huge. 12Mb today. You'd need
patterns bigger than 100Mb to get even close to being reasonably
confident that there's nothing left.

> ... and IIRC my celeron testing confirmed it, if
> I disabled L2 cache in BIOS, memtest behave differently.
> 
> Anyway, if you can do iopl(), we may as well let you disable caches,
> but you are right, that will need a kernel patch.

and a new syscall of some sorts I suspect; "flush all caches" is a ring
0 operation (and you probably need to do it in an ipi anyway on all
cpus)

-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-26 Thread Arjan van de Ven
On Wed, 26 Dec 2007 00:09:57 +0100
Pavel Machek [EMAIL PROTECTED] wrote:

 On Sat 2007-12-22 12:09:59, Arjan van de Ven wrote:
  On Tue, 18 Dec 2007 17:06:24 +

  memtest86+ does various magic to basically bypass the caches (by
  disabling them ;-)... Doing that in a live kernel situation, and
  from userspace to boot.. that's... and issue.
 
 Are you sure? I always assumed that memtest just used patterns bigger
 than L1/L2 caches...

that's... not nearly usable or enough. Caches are relatively smart
about things like use-once and they're huge. 12Mb today. You'd need
patterns bigger than 100Mb to get even close to being reasonably
confident that there's nothing left.

 ... and IIRC my celeron testing confirmed it, if
 I disabled L2 cache in BIOS, memtest behave differently.
 
 Anyway, if you can do iopl(), we may as well let you disable caches,
 but you are right, that will need a kernel patch.

and a new syscall of some sorts I suspect; flush all caches is a ring
0 operation (and you probably need to do it in an ipi anyway on all
cpus)

-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-26 Thread Maxim Levitsky
В сообщении от Wednesday 26 December 2007 12:17:56 Arjan van de Ven написал(а):
 On Wed, 26 Dec 2007 00:09:57 +0100
 Pavel Machek [EMAIL PROTECTED] wrote:
 
  On Sat 2007-12-22 12:09:59, Arjan van de Ven wrote:
   On Tue, 18 Dec 2007 17:06:24 +
 
   memtest86+ does various magic to basically bypass the caches (by
   disabling them ;-)... Doing that in a live kernel situation, and
   from userspace to boot.. that's... and issue.
  
  Are you sure? I always assumed that memtest just used patterns bigger
  than L1/L2 caches...
 
 that's... not nearly usable or enough. Caches are relatively smart
 about things like use-once and they're huge. 12Mb today. You'd need
 patterns bigger than 100Mb to get even close to being reasonably
 confident that there's nothing left.
 
  ... and IIRC my celeron testing confirmed it, if
  I disabled L2 cache in BIOS, memtest behave differently.
  
  Anyway, if you can do iopl(), we may as well let you disable caches,
  but you are right, that will need a kernel patch.
 
 and a new syscall of some sorts I suspect; flush all caches is a ring
 0 operation (and you probably need to do it in an ipi anyway on all
 cpus)
 

I think that PAT support will help a lot.
How about opening/mmaping /dev/mem, and setting uncacheable attribute there.
Actually it is even possible today with MTRRs.

Regards,
Maxim Levitsky
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-25 Thread Pavel Machek
On Sat 2007-12-22 12:09:59, Arjan van de Ven wrote:
> On Tue, 18 Dec 2007 17:06:24 +
> Matthew Bloch <[EMAIL PROTECTED]> wrote:
> 
> > Hi - I'm trying to come up with a way of thoroughly testing every byte
> > of RAM from within Linux on amd64 (so that it can be automated better
> > than using memtest86+), and came up with an idea which I'm not sure is
> > supported or practical.
> > 
> > The obvious problem with testing memory from user space is that you
> > can't mlock all of it, so the best you can do is about three quarters,
> > and hope that the rest of the memory is okay.
> 
> well... to be honest the more obvious problem will be that you won't be 
> testing the RAM, you'll be testing the CPU's cache.. over and over again.
> 
> memtest86+ does various magic to basically bypass the caches (by disabling 
> them ;-)...
> Doing that in a live kernel situation, and from userspace to boot.. 
> that's... and issue.

Are you sure? I always assumed that memtest just used patterns bigger
than L1/L2 caches... ... and IIRC my celeron testing confirmed it, if
I disabled L2 cache in BIOS, memtest behave differently.

Anyway, if you can do iopl(), we may as well let you disable caches,
but you are right, that will need a kernel patch.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-25 Thread Pavel Machek
On Sat 2007-12-22 12:09:59, Arjan van de Ven wrote:
 On Tue, 18 Dec 2007 17:06:24 +
 Matthew Bloch [EMAIL PROTECTED] wrote:
 
  Hi - I'm trying to come up with a way of thoroughly testing every byte
  of RAM from within Linux on amd64 (so that it can be automated better
  than using memtest86+), and came up with an idea which I'm not sure is
  supported or practical.
  
  The obvious problem with testing memory from user space is that you
  can't mlock all of it, so the best you can do is about three quarters,
  and hope that the rest of the memory is okay.
 
 well... to be honest the more obvious problem will be that you won't be 
 testing the RAM, you'll be testing the CPU's cache.. over and over again.
 
 memtest86+ does various magic to basically bypass the caches (by disabling 
 them ;-)...
 Doing that in a live kernel situation, and from userspace to boot.. 
 that's... and issue.

Are you sure? I always assumed that memtest just used patterns bigger
than L1/L2 caches... ... and IIRC my celeron testing confirmed it, if
I disabled L2 cache in BIOS, memtest behave differently.

Anyway, if you can do iopl(), we may as well let you disable caches,
but you are right, that will need a kernel patch.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-23 Thread Pavel Machek
On Sun 2007-12-23 18:05:59, David Newall wrote:
> Pavel Machek wrote:
>> On Sun 2007-12-23 07:06:58, David Newall wrote:
>>   
>>> It's kind of hard to run anything over SSH if it has to be run before 
>>> userspace is up.  But the kernel can collect results from a modified 
>>> memtest, after it chains back.
>>> 
>>
>> memtest can be ran from userspace, that's the point.
>>   
>
> I'm not sure I believe that.  You need to tinker with hardware tables 
> before you know what physical RAM is being used.  Sequential virtual

No, I can just use /dev/mem. (After passing mem=XXX exactmap to kernel
so that I know what I may play with).

>> Yes, that's what CPU microcode update is for. And I want to test my
>> RAM with up-to-date microcode.
>>   
>
> Don't microcode updates fix CPU bugs?  That's not fixing faulty RAM.

L1/L2 cache is part of memory subsystem.

> I suppose a CPU retains microcode updates, once loaded, until power-down or 
> some hard reboot that you surely can avoid.  If it does happen that
> you

If CPU retains microcode after reset, then you are right. I'm not
sure.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-23 Thread Pavel Machek
On Sun 2007-12-23 18:05:59, David Newall wrote:
 Pavel Machek wrote:
 On Sun 2007-12-23 07:06:58, David Newall wrote:
   
 It's kind of hard to run anything over SSH if it has to be run before 
 userspace is up.  But the kernel can collect results from a modified 
 memtest, after it chains back.
 

 memtest can be ran from userspace, that's the point.
   

 I'm not sure I believe that.  You need to tinker with hardware tables 
 before you know what physical RAM is being used.  Sequential virtual

No, I can just use /dev/mem. (After passing mem=XXX exactmap to kernel
so that I know what I may play with).

 Yes, that's what CPU microcode update is for. And I want to test my
 RAM with up-to-date microcode.
   

 Don't microcode updates fix CPU bugs?  That's not fixing faulty RAM.

L1/L2 cache is part of memory subsystem.

 I suppose a CPU retains microcode updates, once loaded, until power-down or 
 some hard reboot that you surely can avoid.  If it does happen that
 you

If CPU retains microcode after reset, then you are right. I'm not
sure.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-22 Thread David Newall

Pavel Machek wrote:

On Sun 2007-12-23 07:06:58, David Newall wrote:
  
It's kind of hard to run anything over SSH if it has to be run before 
userspace is up.  But the kernel can collect results from a modified 
memtest, after it chains back.



memtest can be ran from userspace, that's the point.
  


I'm not sure I believe that.  You need to tinker with hardware tables 
before you know what physical RAM is being used.  Sequential virtual 
pages might be mapped to sequential physical RAM, but it might also be 
mapped psuedo-randomly, or even page-reverse-sequential!  How can you do 
a basic walking bit test when you could be accessing pages in random order?



1) if linux fixes some problem with PCI quirk or microcod
upload, memtest will not see the fix
  
  

What are you saying?  Linux is going to fix faulty RAM?



Yes, that's what CPU microcode update is for. And I want to test my
RAM with up-to-date microcode.
  


Don't microcode updates fix CPU bugs?  That's not fixing faulty RAM.  If 
base microcode is so faulty as to make RAM access unreliable, the CPU 
probably won't even POST, let alone boot the kernel and start a whole 
bunch of userspace stuff, before it can get around to checking to see if 
there is new microcode for that CPU and download it.


I suppose a CPU retains microcode updates, once loaded, until power-down 
or some hard reboot that you surely can avoid.  If it does happen that 
you have an update that works around something unrelated to the CPU, for 
example maybe interaction with a bridge, then you can update the CPU 
before running memtest.  Once loaded it's there until power down.



These are not RAM faults. The very last thing you want is evidence that
you've got a faulty piece of RAM when the fault is actually a hard disk 
glitch!



No, it may be power supply leading to RAM problems. Yes, I want to
detect that.


I'm sure you don't mean that.  I'm sure you don't want a faulty power 
supply to look like faulty RAM.  No amount of replacing pieces of memory 
is going to solve a faulty power supply.  At worst you'll hit on a 
combination of pieces that pass the test ... and then the system will 
fail, mysteriously, in production.  I'm certain you don't want that.


Anyhow, good luck with your idea.  I think it's crazy, and that you're 
doomed to failure.  Doomed! I tell you.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-22 Thread Pavel Machek
On Sun 2007-12-23 07:06:58, David Newall wrote:
> Pavel Machek wrote:
>> memtest has following problems:
>>
>>  0) it is kind of hard to run memtest over ssh
>>   
>
> It's kind of hard to run anything over SSH if it has to be run before 
> userspace is up.  But the kernel can collect results from a modified 
> memtest, after it chains back.

memtest can be ran from userspace, that's the point.

>>  1) if linux fixes some problem with PCI quirk or microcode
>>  upload, memtest will not see the fix
>>   
>
> What are you saying?  Linux is going to fix faulty RAM?

Yes, that's what CPU microcode update is for. And I want to test my
RAM with up-to-date microcode.

>>  2) if memory only fails while something else happens (DMA to
>>  other piece of memory? Hard disk load glitching powre
>>  supply?), memtest will not see the problem.
>
> These are not RAM faults.  The very last thing you want is evidence that 
> you've got a faulty piece of RAM when the fault is actually a hard disk 
> glitch!

No, it may be power supply leading to RAM problems. Yes, I want to
detect that.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-22 Thread Matthew Bloch

David Newall wrote:

Pavel Machek wrote:

On Sat 2007-12-22 13:42:47, Richard D wrote:
 
Cant you, modify bootmem allocator to test with memtest patterns and 
then

use kexec (as Pavel suggested) to test the one where kernel was sitting
earlier?




I do not think you need to modify anything in kernel. Just use
/dev/mem to test areas that kernel doesn't see, then kexec into place
you already tested, and test the rest.
  


That's still an insufficient test.  One failure mode is writes at one 
location corrupting cells at another.


The idea of wanting to do comprehensive and robust memory testing from 
within the operating system seems dubious at best, to me.  


Well if we're trying to be thorough, either way is flawed - you can't 
possibly test pathologically-misbehaving memory from code running from 
inside of it, you'd want some kind of non-uniform memory arrangement to 
do that properly.  memtest86's value is that it at least *tries* to work 
in this environment by dynamically relocating itself, but its memory 
testing algorithms aren't the hard bit.  Also I'm not necessarily 
interested in *which* section of which DIMM is faulty, just a yes or no 
is enough so I can send the faulty ones back to the shop.


I don't agree that adding a network stack to memtest86's bare kernel is 
going to be easier than working out how to get Linux to do the same job, 
with its luxurious programming environment.  I can already automate 
memtest via serial consoles, power cycling, network booting and so on 
but it's ugly.


I will report back in the new year when I've had a chance to play with 
our collection of dodgy hardware.


--
Matthew

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-22 Thread David Newall

Pavel Machek wrote:

memtest has following problems:

0) it is kind of hard to run memtest over ssh
  


It's kind of hard to run anything over SSH if it has to be run before 
userspace is up.  But the kernel can collect results from a modified 
memtest, after it chains back.



1) if linux fixes some problem with PCI quirk or microcode
upload, memtest will not see the fix
  


What are you saying?  Linux is going to fix faulty RAM?  The point with 
testing RAM is you *want* to see it fail; you don't want Linux to fix it.



2) if memory only fails while something else happens (DMA to
other piece of memory? Hard disk load glitching powre
supply?), memtest will not see the problem.


These are not RAM faults.  The very last thing you want is evidence that 
you've got a faulty piece of RAM when the fault is actually a hard disk 
glitch!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-22 Thread Arjan van de Ven
On Tue, 18 Dec 2007 17:06:24 +
Matthew Bloch <[EMAIL PROTECTED]> wrote:

> Hi - I'm trying to come up with a way of thoroughly testing every byte
> of RAM from within Linux on amd64 (so that it can be automated better
> than using memtest86+), and came up with an idea which I'm not sure is
> supported or practical.
> 
> The obvious problem with testing memory from user space is that you
> can't mlock all of it, so the best you can do is about three quarters,
> and hope that the rest of the memory is okay.

well... to be honest the more obvious problem will be that you won't be testing 
the RAM, you'll be testing the CPU's cache.. over and over again.

memtest86+ does various magic to basically bypass the caches (by disabling them 
;-)...
Doing that in a live kernel situation, and from userspace to boot.. 
that's... and issue.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-22 Thread Pavel Machek
On Sun 2007-12-23 02:36:14, David Newall wrote:
> Pavel Machek wrote:
>> On Sat 2007-12-22 13:42:47, Richard D wrote:
>>   
>>> Cant you, modify bootmem allocator to test with memtest patterns and then
>>> use kexec (as Pavel suggested) to test the one where kernel was sitting
>>> earlier?
>>
>> I do not think you need to modify anything in kernel. Just use
>> /dev/mem to test areas that kernel doesn't see, then kexec into place
>> you already tested, and test the rest.
>
> That's still an insufficient test.  One failure mode is writes at one 
> location corrupting cells at another.
>
> The idea of wanting to do comprehensive and robust memory testing from 
> within the operating system seems dubious at best, to me.  If there is 
> something wrong with memtest86, doing the tests from within Linux is not 
> the answer.  The answer is to fix memtest86.  If the problem is that you 
> automation, e.g. switching a server from production to memory test mode at 
> midnight and back again at 6am, the answer is still to "fix" memtest86.  
> Writing something that grabs some physical RAM from Linux's control, tests 
> it, and then moves the kernel itself so that it can test the rest, is 
> adding a whole extra layer of complexity to an already challenging (I 
> assume, based on errors that dedicated software-based testers miss) 
> problem.

Well, we have kexec. We already have way for kernel to relocate itself.

> Give up on this misguided idea and build on the best tools that are already 
> available.

Yes, the idea is "interesting". I do not think it quite cuts
"misguided" part.

memtest has following problems:

0) it is kind of hard to run memtest over ssh

1) if linux fixes some problem with PCI quirk or microcode
upload, memtest will not see the fix

2) if memory only fails while something else happens (DMA to
other piece of memory? Hard disk load glitching powre
supply?), memtest will not see the problem.

(Of course, memtest-under-linux has some problems too. Like "if it
freezes, was it bad memory or kernel problem").
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-22 Thread Pavel Machek
On Sat 2007-12-22 21:00:11, Richard D wrote:
> I was thinking that by the time userspace is ready, the memory that can be
> tested will be less.

Which does not matter when you can test the rest using second
(kexec-ed) kernel, right?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-22 Thread David Newall

Pavel Machek wrote:

On Sat 2007-12-22 13:42:47, Richard D wrote:
  

Cant you, modify bootmem allocator to test with memtest patterns and then
use kexec (as Pavel suggested) to test the one where kernel was sitting
earlier?




I do not think you need to modify anything in kernel. Just use
/dev/mem to test areas that kernel doesn't see, then kexec into place
you already tested, and test the rest.
  


That's still an insufficient test.  One failure mode is writes at one 
location corrupting cells at another.


The idea of wanting to do comprehensive and robust memory testing from 
within the operating system seems dubious at best, to me.  If there is 
something wrong with memtest86, doing the tests from within Linux is not 
the answer.  The answer is to fix memtest86.  If the problem is that you 
automation, e.g. switching a server from production to memory test mode 
at midnight and back again at 6am, the answer is still to "fix" 
memtest86.  Writing something that grabs some physical RAM from Linux's 
control, tests it, and then moves the kernel itself so that it can test 
the rest, is adding a whole extra layer of complexity to an already 
challenging (I assume, based on errors that dedicated software-based 
testers miss) problem.


Give up on this misguided idea and build on the best tools that are 
already available.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Testing RAM from userspace / question about memmap= arguments

2007-12-22 Thread Richard D
I was thinking that by the time userspace is ready, the memory that can be
tested will be less. 

-Original Message-
From: Pavel Machek [mailto:[EMAIL PROTECTED] 
Sent: Saturday, December 22, 2007 7:16 PM
To: Richard D
Cc: 'Matthew Bloch'; linux-kernel@vger.kernel.org
Subject: Re: Testing RAM from userspace / question about memmap= arguments

On Sat 2007-12-22 13:42:47, Richard D wrote:
> Cant you, modify bootmem allocator to test with memtest patterns and then
> use kexec (as Pavel suggested) to test the one where kernel was sitting
> earlier?


I do not think you need to modify anything in kernel. Just use
/dev/mem to test areas that kernel doesn't see, then kexec into place
you already tested, and test the rest.
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures)
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-22 Thread Pavel Machek
On Sat 2007-12-22 13:42:47, Richard D wrote:
> Cant you, modify bootmem allocator to test with memtest patterns and then
> use kexec (as Pavel suggested) to test the one where kernel was sitting
> earlier?


I do not think you need to modify anything in kernel. Just use
/dev/mem to test areas that kernel doesn't see, then kexec into place
you already tested, and test the rest.
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Testing RAM from userspace / question about memmap= arguments

2007-12-22 Thread Richard D
Cant you, modify bootmem allocator to test with memtest patterns and then
use kexec (as Pavel suggested) to test the one where kernel was sitting
earlier?

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Pavel Machek
Sent: Friday, December 21, 2007 6:28 PM
To: Matthew Bloch
Cc: linux-kernel@vger.kernel.org
Subject: Re: Testing RAM from userspace / question about memmap= arguments

On Tue 2007-12-18 17:06:24, Matthew Bloch wrote:
> Hi - I'm trying to come up with a way of thoroughly testing every byte
> of RAM from within Linux on amd64 (so that it can be automated better
> than using memtest86+), and came up with an idea which I'm not sure is
> supported or practical.
> 
> The obvious problem with testing memory from user space is that you
> can't mlock all of it, so the best you can do is about three quarters,
> and hope that the rest of the memory is okay.
> 
> In order to test all of the memory, I'd like to run the user-space
> memtester over two boots of the kernel.
> 
> Say we have a 1024MB machine, the first boot I'd not specify any
> arguments and assume the kernel would start at the bottom of physical
> memory and work its way up, so that the kernel & working userspace would
> live at the bottom, and the rest would be testable from space.
> 
> On the second boot, could I then specify:
> 
> memmap=exact [EMAIL PROTECTED] [EMAIL PROTECTED]

Actually, with kexec, you can probably doing without reboot.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures)
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Testing RAM from userspace / question about memmap= arguments

2007-12-22 Thread Richard D
Cant you, modify bootmem allocator to test with memtest patterns and then
use kexec (as Pavel suggested) to test the one where kernel was sitting
earlier?

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Pavel Machek
Sent: Friday, December 21, 2007 6:28 PM
To: Matthew Bloch
Cc: linux-kernel@vger.kernel.org
Subject: Re: Testing RAM from userspace / question about memmap= arguments

On Tue 2007-12-18 17:06:24, Matthew Bloch wrote:
 Hi - I'm trying to come up with a way of thoroughly testing every byte
 of RAM from within Linux on amd64 (so that it can be automated better
 than using memtest86+), and came up with an idea which I'm not sure is
 supported or practical.
 
 The obvious problem with testing memory from user space is that you
 can't mlock all of it, so the best you can do is about three quarters,
 and hope that the rest of the memory is okay.
 
 In order to test all of the memory, I'd like to run the user-space
 memtester over two boots of the kernel.
 
 Say we have a 1024MB machine, the first boot I'd not specify any
 arguments and assume the kernel would start at the bottom of physical
 memory and work its way up, so that the kernel  working userspace would
 live at the bottom, and the rest would be testable from space.
 
 On the second boot, could I then specify:
 
 memmap=exact [EMAIL PROTECTED] [EMAIL PROTECTED]

Actually, with kexec, you can probably doing without reboot.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures)
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-22 Thread Pavel Machek
On Sat 2007-12-22 13:42:47, Richard D wrote:
 Cant you, modify bootmem allocator to test with memtest patterns and then
 use kexec (as Pavel suggested) to test the one where kernel was sitting
 earlier?


I do not think you need to modify anything in kernel. Just use
/dev/mem to test areas that kernel doesn't see, then kexec into place
you already tested, and test the rest.
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Testing RAM from userspace / question about memmap= arguments

2007-12-22 Thread Richard D
I was thinking that by the time userspace is ready, the memory that can be
tested will be less. 

-Original Message-
From: Pavel Machek [mailto:[EMAIL PROTECTED] 
Sent: Saturday, December 22, 2007 7:16 PM
To: Richard D
Cc: 'Matthew Bloch'; linux-kernel@vger.kernel.org
Subject: Re: Testing RAM from userspace / question about memmap= arguments

On Sat 2007-12-22 13:42:47, Richard D wrote:
 Cant you, modify bootmem allocator to test with memtest patterns and then
 use kexec (as Pavel suggested) to test the one where kernel was sitting
 earlier?


I do not think you need to modify anything in kernel. Just use
/dev/mem to test areas that kernel doesn't see, then kexec into place
you already tested, and test the rest.
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures)
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-22 Thread David Newall

Pavel Machek wrote:

On Sat 2007-12-22 13:42:47, Richard D wrote:
  

Cant you, modify bootmem allocator to test with memtest patterns and then
use kexec (as Pavel suggested) to test the one where kernel was sitting
earlier?




I do not think you need to modify anything in kernel. Just use
/dev/mem to test areas that kernel doesn't see, then kexec into place
you already tested, and test the rest.
  


That's still an insufficient test.  One failure mode is writes at one 
location corrupting cells at another.


The idea of wanting to do comprehensive and robust memory testing from 
within the operating system seems dubious at best, to me.  If there is 
something wrong with memtest86, doing the tests from within Linux is not 
the answer.  The answer is to fix memtest86.  If the problem is that you 
automation, e.g. switching a server from production to memory test mode 
at midnight and back again at 6am, the answer is still to fix 
memtest86.  Writing something that grabs some physical RAM from Linux's 
control, tests it, and then moves the kernel itself so that it can test 
the rest, is adding a whole extra layer of complexity to an already 
challenging (I assume, based on errors that dedicated software-based 
testers miss) problem.


Give up on this misguided idea and build on the best tools that are 
already available.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-22 Thread Pavel Machek
On Sat 2007-12-22 21:00:11, Richard D wrote:
 I was thinking that by the time userspace is ready, the memory that can be
 tested will be less.

Which does not matter when you can test the rest using second
(kexec-ed) kernel, right?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-22 Thread Pavel Machek
On Sun 2007-12-23 02:36:14, David Newall wrote:
 Pavel Machek wrote:
 On Sat 2007-12-22 13:42:47, Richard D wrote:
   
 Cant you, modify bootmem allocator to test with memtest patterns and then
 use kexec (as Pavel suggested) to test the one where kernel was sitting
 earlier?

 I do not think you need to modify anything in kernel. Just use
 /dev/mem to test areas that kernel doesn't see, then kexec into place
 you already tested, and test the rest.

 That's still an insufficient test.  One failure mode is writes at one 
 location corrupting cells at another.

 The idea of wanting to do comprehensive and robust memory testing from 
 within the operating system seems dubious at best, to me.  If there is 
 something wrong with memtest86, doing the tests from within Linux is not 
 the answer.  The answer is to fix memtest86.  If the problem is that you 
 automation, e.g. switching a server from production to memory test mode at 
 midnight and back again at 6am, the answer is still to fix memtest86.  
 Writing something that grabs some physical RAM from Linux's control, tests 
 it, and then moves the kernel itself so that it can test the rest, is 
 adding a whole extra layer of complexity to an already challenging (I 
 assume, based on errors that dedicated software-based testers miss) 
 problem.

Well, we have kexec. We already have way for kernel to relocate itself.

 Give up on this misguided idea and build on the best tools that are already 
 available.

Yes, the idea is interesting. I do not think it quite cuts
misguided part.

memtest has following problems:

0) it is kind of hard to run memtest over ssh

1) if linux fixes some problem with PCI quirk or microcode
upload, memtest will not see the fix

2) if memory only fails while something else happens (DMA to
other piece of memory? Hard disk load glitching powre
supply?), memtest will not see the problem.

(Of course, memtest-under-linux has some problems too. Like if it
freezes, was it bad memory or kernel problem).
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-22 Thread Arjan van de Ven
On Tue, 18 Dec 2007 17:06:24 +
Matthew Bloch [EMAIL PROTECTED] wrote:

 Hi - I'm trying to come up with a way of thoroughly testing every byte
 of RAM from within Linux on amd64 (so that it can be automated better
 than using memtest86+), and came up with an idea which I'm not sure is
 supported or practical.
 
 The obvious problem with testing memory from user space is that you
 can't mlock all of it, so the best you can do is about three quarters,
 and hope that the rest of the memory is okay.

well... to be honest the more obvious problem will be that you won't be testing 
the RAM, you'll be testing the CPU's cache.. over and over again.

memtest86+ does various magic to basically bypass the caches (by disabling them 
;-)...
Doing that in a live kernel situation, and from userspace to boot.. 
that's... and issue.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-22 Thread David Newall

Pavel Machek wrote:

memtest has following problems:

0) it is kind of hard to run memtest over ssh
  


It's kind of hard to run anything over SSH if it has to be run before 
userspace is up.  But the kernel can collect results from a modified 
memtest, after it chains back.



1) if linux fixes some problem with PCI quirk or microcode
upload, memtest will not see the fix
  


What are you saying?  Linux is going to fix faulty RAM?  The point with 
testing RAM is you *want* to see it fail; you don't want Linux to fix it.



2) if memory only fails while something else happens (DMA to
other piece of memory? Hard disk load glitching powre
supply?), memtest will not see the problem.


These are not RAM faults.  The very last thing you want is evidence that 
you've got a faulty piece of RAM when the fault is actually a hard disk 
glitch!

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-22 Thread Matthew Bloch

David Newall wrote:

Pavel Machek wrote:

On Sat 2007-12-22 13:42:47, Richard D wrote:
 
Cant you, modify bootmem allocator to test with memtest patterns and 
then

use kexec (as Pavel suggested) to test the one where kernel was sitting
earlier?




I do not think you need to modify anything in kernel. Just use
/dev/mem to test areas that kernel doesn't see, then kexec into place
you already tested, and test the rest.
  


That's still an insufficient test.  One failure mode is writes at one 
location corrupting cells at another.


The idea of wanting to do comprehensive and robust memory testing from 
within the operating system seems dubious at best, to me.  


Well if we're trying to be thorough, either way is flawed - you can't 
possibly test pathologically-misbehaving memory from code running from 
inside of it, you'd want some kind of non-uniform memory arrangement to 
do that properly.  memtest86's value is that it at least *tries* to work 
in this environment by dynamically relocating itself, but its memory 
testing algorithms aren't the hard bit.  Also I'm not necessarily 
interested in *which* section of which DIMM is faulty, just a yes or no 
is enough so I can send the faulty ones back to the shop.


I don't agree that adding a network stack to memtest86's bare kernel is 
going to be easier than working out how to get Linux to do the same job, 
with its luxurious programming environment.  I can already automate 
memtest via serial consoles, power cycling, network booting and so on 
but it's ugly.


I will report back in the new year when I've had a chance to play with 
our collection of dodgy hardware.


--
Matthew

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-22 Thread Pavel Machek
On Sun 2007-12-23 07:06:58, David Newall wrote:
 Pavel Machek wrote:
 memtest has following problems:

  0) it is kind of hard to run memtest over ssh
   

 It's kind of hard to run anything over SSH if it has to be run before 
 userspace is up.  But the kernel can collect results from a modified 
 memtest, after it chains back.

memtest can be ran from userspace, that's the point.

  1) if linux fixes some problem with PCI quirk or microcode
  upload, memtest will not see the fix
   

 What are you saying?  Linux is going to fix faulty RAM?

Yes, that's what CPU microcode update is for. And I want to test my
RAM with up-to-date microcode.

  2) if memory only fails while something else happens (DMA to
  other piece of memory? Hard disk load glitching powre
  supply?), memtest will not see the problem.

 These are not RAM faults.  The very last thing you want is evidence that 
 you've got a faulty piece of RAM when the fault is actually a hard disk 
 glitch!

No, it may be power supply leading to RAM problems. Yes, I want to
detect that.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-22 Thread David Newall

Pavel Machek wrote:

On Sun 2007-12-23 07:06:58, David Newall wrote:
  
It's kind of hard to run anything over SSH if it has to be run before 
userspace is up.  But the kernel can collect results from a modified 
memtest, after it chains back.



memtest can be ran from userspace, that's the point.
  


I'm not sure I believe that.  You need to tinker with hardware tables 
before you know what physical RAM is being used.  Sequential virtual 
pages might be mapped to sequential physical RAM, but it might also be 
mapped psuedo-randomly, or even page-reverse-sequential!  How can you do 
a basic walking bit test when you could be accessing pages in random order?



1) if linux fixes some problem with PCI quirk or microcod
upload, memtest will not see the fix
  
  

What are you saying?  Linux is going to fix faulty RAM?



Yes, that's what CPU microcode update is for. And I want to test my
RAM with up-to-date microcode.
  


Don't microcode updates fix CPU bugs?  That's not fixing faulty RAM.  If 
base microcode is so faulty as to make RAM access unreliable, the CPU 
probably won't even POST, let alone boot the kernel and start a whole 
bunch of userspace stuff, before it can get around to checking to see if 
there is new microcode for that CPU and download it.


I suppose a CPU retains microcode updates, once loaded, until power-down 
or some hard reboot that you surely can avoid.  If it does happen that 
you have an update that works around something unrelated to the CPU, for 
example maybe interaction with a bridge, then you can update the CPU 
before running memtest.  Once loaded it's there until power down.



These are not RAM faults. The very last thing you want is evidence that
you've got a faulty piece of RAM when the fault is actually a hard disk 
glitch!



No, it may be power supply leading to RAM problems. Yes, I want to
detect that.


I'm sure you don't mean that.  I'm sure you don't want a faulty power 
supply to look like faulty RAM.  No amount of replacing pieces of memory 
is going to solve a faulty power supply.  At worst you'll hit on a 
combination of pieces that pass the test ... and then the system will 
fail, mysteriously, in production.  I'm certain you don't want that.


Anyhow, good luck with your idea.  I think it's crazy, and that you're 
doomed to failure.  Doomed! I tell you.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-21 Thread Pavel Machek
On Tue 2007-12-18 17:06:24, Matthew Bloch wrote:
> Hi - I'm trying to come up with a way of thoroughly testing every byte
> of RAM from within Linux on amd64 (so that it can be automated better
> than using memtest86+), and came up with an idea which I'm not sure is
> supported or practical.
> 
> The obvious problem with testing memory from user space is that you
> can't mlock all of it, so the best you can do is about three quarters,
> and hope that the rest of the memory is okay.
> 
> In order to test all of the memory, I'd like to run the user-space
> memtester over two boots of the kernel.
> 
> Say we have a 1024MB machine, the first boot I'd not specify any
> arguments and assume the kernel would start at the bottom of physical
> memory and work its way up, so that the kernel & working userspace would
> live at the bottom, and the rest would be testable from space.
> 
> On the second boot, could I then specify:
> 
> memmap=exact [EMAIL PROTECTED] [EMAIL PROTECTED]

Actually, with kexec, you can probably doing without reboot.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-21 Thread Pavel Machek
On Tue 2007-12-18 17:06:24, Matthew Bloch wrote:
 Hi - I'm trying to come up with a way of thoroughly testing every byte
 of RAM from within Linux on amd64 (so that it can be automated better
 than using memtest86+), and came up with an idea which I'm not sure is
 supported or practical.
 
 The obvious problem with testing memory from user space is that you
 can't mlock all of it, so the best you can do is about three quarters,
 and hope that the rest of the memory is okay.
 
 In order to test all of the memory, I'd like to run the user-space
 memtester over two boots of the kernel.
 
 Say we have a 1024MB machine, the first boot I'd not specify any
 arguments and assume the kernel would start at the bottom of physical
 memory and work its way up, so that the kernel  working userspace would
 live at the bottom, and the rest would be testable from space.
 
 On the second boot, could I then specify:
 
 memmap=exact [EMAIL PROTECTED] [EMAIL PROTECTED]

Actually, with kexec, you can probably doing without reboot.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Testing RAM from userspace / question about memmap= arguments

2007-12-20 Thread Siva Prasad

Hi Matthew,

I worked on some thing similar. For one of our customer product that 
goes to defense and security markets, we had to support maximum possible 
memory test. We implemented a mechanism of pre-test to test the memory 
with walking 1's and 0's just before Linux kernel starts allocating 
serious memory for its use. That way, coverage was almost 99%. Once 
Linux boots, we do a very through test using various algorithms, however 
as you said coverage of memory is little less when we test the system 
after Linux boots up completely.


memtest86+ started as a very good alternative, until customer's customer 
started complaining about memory issues. Then we had no choice but to 
take this route and implement it ourselves from the scratch.


If you want 100% coverage, it may not be possible unless you do it in 
BIOS early on. If you take the route of implementing some simple memory 
test in Linux kernel before it starts allocating memory, you get very 
good % of coverage. Good Luck.


- Siva


Date: Thu, 20 Dec 2007 14:17:10 +

From: Matthew Bloch <[EMAIL PROTECTED]>

Subject: Re: Testing RAM from userspace / question about memmap=

 arguments

To: linux-kernel@vger.kernel.org

Message-ID: <[EMAIL PROTECTED]>

Content-Type: text/plain; charset=ISO-8859-1



Jon Masters wrote:

> On Tue, 2007-12-18 at 17:06 +, Matthew Bloch wrote:

>

>> I can see a few potential problems, but since my understanding of the

>> low-level memory mapping is muddy at best, I won't speculate; I'd just

>> appreciate any more expert views on whether this does work, or could be

>> made to work.

>

> Yo,

>

> I don't think your testing approach is thorough enough. Clearly (knowing

> your line of business - as a virtual machine provider), you want to do

> pre-production testing as part of your provisioning. I would suggest

> instead of using mlock() from userspace of simply writing a kernel

> module that does this for every page of available memory.



Yes this is to improve the efficiency of server burn-ins.  I would

consider a kernel module, but I still wouldn't be able to test the

memory in which the kernel is sitting, which is my problem.  I'm not

sure even a kernel module could reliably test the memory in which it is

residing (memtest86+ relocates itself to do this).  Also I don't see how

 userspace testing is any less thorough than doing it in the kernel; I

just need a creative way of accessing every single page of memory.



I may do some experiments with the memmap args, some bad RAM and

shuffling it between DIMM sockets when I have the time :)



--

Matthew
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-20 Thread Matthew Bloch
Jon Masters wrote:
> On Tue, 2007-12-18 at 17:06 +, Matthew Bloch wrote:
> 
>> I can see a few potential problems, but since my understanding of the
>> low-level memory mapping is muddy at best, I won't speculate; I'd just
>> appreciate any more expert views on whether this does work, or could be
>> made to work.
> 
> Yo,
> 
> I don't think your testing approach is thorough enough. Clearly (knowing
> your line of business - as a virtual machine provider), you want to do
> pre-production testing as part of your provisioning. I would suggest
> instead of using mlock() from userspace of simply writing a kernel
> module that does this for every page of available memory.

Yes this is to improve the efficiency of server burn-ins.  I would
consider a kernel module, but I still wouldn't be able to test the
memory in which the kernel is sitting, which is my problem.  I'm not
sure even a kernel module could reliably test the memory in which it is
residing (memtest86+ relocates itself to do this).  Also I don't see how
  userspace testing is any less thorough than doing it in the kernel; I
just need a creative way of accessing every single page of memory.

I may do some experiments with the memmap args, some bad RAM and
shuffling it between DIMM sockets when I have the time :)

-- 
Matthew

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-20 Thread Jon Masters
On Tue, 2007-12-18 at 17:06 +, Matthew Bloch wrote:

> I can see a few potential problems, but since my understanding of the
> low-level memory mapping is muddy at best, I won't speculate; I'd just
> appreciate any more expert views on whether this does work, or could be
> made to work.

Yo,

I don't think your testing approach is thorough enough. Clearly (knowing
your line of business - as a virtual machine provider), you want to do
pre-production testing as part of your provisioning. I would suggest
instead of using mlock() from userspace of simply writing a kernel
module that does this for every page of available memory.

You could script it via a minimal userland, containing only busybox,
some form of SSH implementation, whatever.

Jon.

P.S. With the above, you could also know which pages were faulty, an
consequently play with some of the bad RAM patches to exclude faulty
pages from the virtual machines running on a given host... ;-)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-20 Thread Jon Masters
On Tue, 2007-12-18 at 17:06 +, Matthew Bloch wrote:

 I can see a few potential problems, but since my understanding of the
 low-level memory mapping is muddy at best, I won't speculate; I'd just
 appreciate any more expert views on whether this does work, or could be
 made to work.

Yo,

I don't think your testing approach is thorough enough. Clearly (knowing
your line of business - as a virtual machine provider), you want to do
pre-production testing as part of your provisioning. I would suggest
instead of using mlock() from userspace of simply writing a kernel
module that does this for every page of available memory.

You could script it via a minimal userland, containing only busybox,
some form of SSH implementation, whatever.

Jon.

P.S. With the above, you could also know which pages were faulty, an
consequently play with some of the bad RAM patches to exclude faulty
pages from the virtual machines running on a given host... ;-)


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Testing RAM from userspace / question about memmap= arguments

2007-12-20 Thread Matthew Bloch
Jon Masters wrote:
 On Tue, 2007-12-18 at 17:06 +, Matthew Bloch wrote:
 
 I can see a few potential problems, but since my understanding of the
 low-level memory mapping is muddy at best, I won't speculate; I'd just
 appreciate any more expert views on whether this does work, or could be
 made to work.
 
 Yo,
 
 I don't think your testing approach is thorough enough. Clearly (knowing
 your line of business - as a virtual machine provider), you want to do
 pre-production testing as part of your provisioning. I would suggest
 instead of using mlock() from userspace of simply writing a kernel
 module that does this for every page of available memory.

Yes this is to improve the efficiency of server burn-ins.  I would
consider a kernel module, but I still wouldn't be able to test the
memory in which the kernel is sitting, which is my problem.  I'm not
sure even a kernel module could reliably test the memory in which it is
residing (memtest86+ relocates itself to do this).  Also I don't see how
  userspace testing is any less thorough than doing it in the kernel; I
just need a creative way of accessing every single page of memory.

I may do some experiments with the memmap args, some bad RAM and
shuffling it between DIMM sockets when I have the time :)

-- 
Matthew

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Testing RAM from userspace / question about memmap= arguments

2007-12-20 Thread Siva Prasad

Hi Matthew,

I worked on some thing similar. For one of our customer product that 
goes to defense and security markets, we had to support maximum possible 
memory test. We implemented a mechanism of pre-test to test the memory 
with walking 1's and 0's just before Linux kernel starts allocating 
serious memory for its use. That way, coverage was almost 99%. Once 
Linux boots, we do a very through test using various algorithms, however 
as you said coverage of memory is little less when we test the system 
after Linux boots up completely.


memtest86+ started as a very good alternative, until customer's customer 
started complaining about memory issues. Then we had no choice but to 
take this route and implement it ourselves from the scratch.


If you want 100% coverage, it may not be possible unless you do it in 
BIOS early on. If you take the route of implementing some simple memory 
test in Linux kernel before it starts allocating memory, you get very 
good % of coverage. Good Luck.


- Siva


Date: Thu, 20 Dec 2007 14:17:10 +

From: Matthew Bloch [EMAIL PROTECTED]

Subject: Re: Testing RAM from userspace / question about memmap=

 arguments

To: linux-kernel@vger.kernel.org

Message-ID: [EMAIL PROTECTED]

Content-Type: text/plain; charset=ISO-8859-1



Jon Masters wrote:

 On Tue, 2007-12-18 at 17:06 +, Matthew Bloch wrote:



 I can see a few potential problems, but since my understanding of the

 low-level memory mapping is muddy at best, I won't speculate; I'd just

 appreciate any more expert views on whether this does work, or could be

 made to work.



 Yo,



 I don't think your testing approach is thorough enough. Clearly (knowing

 your line of business - as a virtual machine provider), you want to do

 pre-production testing as part of your provisioning. I would suggest

 instead of using mlock() from userspace of simply writing a kernel

 module that does this for every page of available memory.



Yes this is to improve the efficiency of server burn-ins.  I would

consider a kernel module, but I still wouldn't be able to test the

memory in which the kernel is sitting, which is my problem.  I'm not

sure even a kernel module could reliably test the memory in which it is

residing (memtest86+ relocates itself to do this).  Also I don't see how

 userspace testing is any less thorough than doing it in the kernel; I

just need a creative way of accessing every single page of memory.



I may do some experiments with the memmap args, some bad RAM and

shuffling it between DIMM sockets when I have the time :)



--

Matthew
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Testing RAM from userspace / question about memmap= arguments

2007-12-18 Thread Matthew Bloch
Hi - I'm trying to come up with a way of thoroughly testing every byte
of RAM from within Linux on amd64 (so that it can be automated better
than using memtest86+), and came up with an idea which I'm not sure is
supported or practical.

The obvious problem with testing memory from user space is that you
can't mlock all of it, so the best you can do is about three quarters,
and hope that the rest of the memory is okay.

In order to test all of the memory, I'd like to run the user-space
memtester over two boots of the kernel.

Say we have a 1024MB machine, the first boot I'd not specify any
arguments and assume the kernel would start at the bottom of physical
memory and work its way up, so that the kernel & working userspace would
live at the bottom, and the rest would be testable from space.

On the second boot, could I then specify:

memmap=exact [EMAIL PROTECTED] [EMAIL PROTECTED]

i.e. such that the kernel's idea of the usable memory started in the
middle of physical RAM, and that's where it would locate itself?  That
way, on the second boot, the same test in userspace would definitely
grab the previously inaccessible RAM at the start for testing.

I can see a few potential problems, but since my understanding of the
low-level memory mapping is muddy at best, I won't speculate; I'd just
appreciate any more expert views on whether this does work, or could be
made to work.

Thanks,

-- 
Matthew

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Testing RAM from userspace / question about memmap= arguments

2007-12-18 Thread Matthew Bloch
Hi - I'm trying to come up with a way of thoroughly testing every byte
of RAM from within Linux on amd64 (so that it can be automated better
than using memtest86+), and came up with an idea which I'm not sure is
supported or practical.

The obvious problem with testing memory from user space is that you
can't mlock all of it, so the best you can do is about three quarters,
and hope that the rest of the memory is okay.

In order to test all of the memory, I'd like to run the user-space
memtester over two boots of the kernel.

Say we have a 1024MB machine, the first boot I'd not specify any
arguments and assume the kernel would start at the bottom of physical
memory and work its way up, so that the kernel  working userspace would
live at the bottom, and the rest would be testable from space.

On the second boot, could I then specify:

memmap=exact [EMAIL PROTECTED] [EMAIL PROTECTED]

i.e. such that the kernel's idea of the usable memory started in the
middle of physical RAM, and that's where it would locate itself?  That
way, on the second boot, the same test in userspace would definitely
grab the previously inaccessible RAM at the start for testing.

I can see a few potential problems, but since my understanding of the
low-level memory mapping is muddy at best, I won't speculate; I'd just
appreciate any more expert views on whether this does work, or could be
made to work.

Thanks,

-- 
Matthew

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/