Re: Testing RAM from userspace / question about memmap= arguments
David Newall wrote: Pavel Machek wrote: On Sat 2007-12-22 13:42:47, Richard D wrote: Cant you, modify bootmem allocator to test with memtest patterns and then use kexec (as Pavel suggested) to test the one where kernel was sitting earlier? I do not think you need to modify anything in kernel. Just use /dev/mem to test areas that kernel doesn't see, then kexec into place you already tested, and test the rest. That's still an insufficient test. One failure mode is writes at one location corrupting cells at another. The idea of wanting to do comprehensive and robust memory testing from within the operating system seems dubious at best, to me. Well if we're trying to be thorough, either way is flawed - you can't possibly test pathologically-misbehaving memory from code running from inside of it, you'd want some kind of non-uniform memory arrangement to do that properly. memtest86's value is that it at least *tries* to work in this environment by dynamically relocating itself, but its memory testing algorithms aren't the hard bit. Also I'm not necessarily interested in *which* section of which DIMM is faulty, just a yes or no is enough so I can send the faulty ones back to the shop. I don't agree that adding a network stack to memtest86's bare kernel is going to be easier than working out how to get Linux to do the same job, with its luxurious programming environment. I can already automate memtest via serial consoles, power cycling, network booting and so on but it's ugly. I will report back in the new year when I've had a chance to play with our collection of dodgy hardware. -- Matthew -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Testing RAM from userspace / question about memmap= arguments
David Newall wrote: Pavel Machek wrote: On Sat 2007-12-22 13:42:47, Richard D wrote: Cant you, modify bootmem allocator to test with memtest patterns and then use kexec (as Pavel suggested) to test the one where kernel was sitting earlier? I do not think you need to modify anything in kernel. Just use /dev/mem to test areas that kernel doesn't see, then kexec into place you already tested, and test the rest. That's still an insufficient test. One failure mode is writes at one location corrupting cells at another. The idea of wanting to do comprehensive and robust memory testing from within the operating system seems dubious at best, to me. Well if we're trying to be thorough, either way is flawed - you can't possibly test pathologically-misbehaving memory from code running from inside of it, you'd want some kind of non-uniform memory arrangement to do that properly. memtest86's value is that it at least *tries* to work in this environment by dynamically relocating itself, but its memory testing algorithms aren't the hard bit. Also I'm not necessarily interested in *which* section of which DIMM is faulty, just a yes or no is enough so I can send the faulty ones back to the shop. I don't agree that adding a network stack to memtest86's bare kernel is going to be easier than working out how to get Linux to do the same job, with its luxurious programming environment. I can already automate memtest via serial consoles, power cycling, network booting and so on but it's ugly. I will report back in the new year when I've had a chance to play with our collection of dodgy hardware. -- Matthew -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Testing RAM from userspace / question about memmap= arguments
Jon Masters wrote: > On Tue, 2007-12-18 at 17:06 +0000, Matthew Bloch wrote: > >> I can see a few potential problems, but since my understanding of the >> low-level memory mapping is muddy at best, I won't speculate; I'd just >> appreciate any more expert views on whether this does work, or could be >> made to work. > > Yo, > > I don't think your testing approach is thorough enough. Clearly (knowing > your line of business - as a virtual machine provider), you want to do > pre-production testing as part of your provisioning. I would suggest > instead of using mlock() from userspace of simply writing a kernel > module that does this for every page of available memory. Yes this is to improve the efficiency of server burn-ins. I would consider a kernel module, but I still wouldn't be able to test the memory in which the kernel is sitting, which is my problem. I'm not sure even a kernel module could reliably test the memory in which it is residing (memtest86+ relocates itself to do this). Also I don't see how userspace testing is any less thorough than doing it in the kernel; I just need a creative way of accessing every single page of memory. I may do some experiments with the memmap args, some bad RAM and shuffling it between DIMM sockets when I have the time :) -- Matthew -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Testing RAM from userspace / question about memmap= arguments
Jon Masters wrote: On Tue, 2007-12-18 at 17:06 +, Matthew Bloch wrote: I can see a few potential problems, but since my understanding of the low-level memory mapping is muddy at best, I won't speculate; I'd just appreciate any more expert views on whether this does work, or could be made to work. Yo, I don't think your testing approach is thorough enough. Clearly (knowing your line of business - as a virtual machine provider), you want to do pre-production testing as part of your provisioning. I would suggest instead of using mlock() from userspace of simply writing a kernel module that does this for every page of available memory. Yes this is to improve the efficiency of server burn-ins. I would consider a kernel module, but I still wouldn't be able to test the memory in which the kernel is sitting, which is my problem. I'm not sure even a kernel module could reliably test the memory in which it is residing (memtest86+ relocates itself to do this). Also I don't see how userspace testing is any less thorough than doing it in the kernel; I just need a creative way of accessing every single page of memory. I may do some experiments with the memmap args, some bad RAM and shuffling it between DIMM sockets when I have the time :) -- Matthew -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Testing RAM from userspace / question about memmap= arguments
Hi - I'm trying to come up with a way of thoroughly testing every byte of RAM from within Linux on amd64 (so that it can be automated better than using memtest86+), and came up with an idea which I'm not sure is supported or practical. The obvious problem with testing memory from user space is that you can't mlock all of it, so the best you can do is about three quarters, and hope that the rest of the memory is okay. In order to test all of the memory, I'd like to run the user-space memtester over two boots of the kernel. Say we have a 1024MB machine, the first boot I'd not specify any arguments and assume the kernel would start at the bottom of physical memory and work its way up, so that the kernel & working userspace would live at the bottom, and the rest would be testable from space. On the second boot, could I then specify: memmap=exact [EMAIL PROTECTED] [EMAIL PROTECTED] i.e. such that the kernel's idea of the usable memory started in the middle of physical RAM, and that's where it would locate itself? That way, on the second boot, the same test in userspace would definitely grab the previously inaccessible RAM at the start for testing. I can see a few potential problems, but since my understanding of the low-level memory mapping is muddy at best, I won't speculate; I'd just appreciate any more expert views on whether this does work, or could be made to work. Thanks, -- Matthew -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Testing RAM from userspace / question about memmap= arguments
Hi - I'm trying to come up with a way of thoroughly testing every byte of RAM from within Linux on amd64 (so that it can be automated better than using memtest86+), and came up with an idea which I'm not sure is supported or practical. The obvious problem with testing memory from user space is that you can't mlock all of it, so the best you can do is about three quarters, and hope that the rest of the memory is okay. In order to test all of the memory, I'd like to run the user-space memtester over two boots of the kernel. Say we have a 1024MB machine, the first boot I'd not specify any arguments and assume the kernel would start at the bottom of physical memory and work its way up, so that the kernel working userspace would live at the bottom, and the rest would be testable from space. On the second boot, could I then specify: memmap=exact [EMAIL PROTECTED] [EMAIL PROTECTED] i.e. such that the kernel's idea of the usable memory started in the middle of physical RAM, and that's where it would locate itself? That way, on the second boot, the same test in userspace would definitely grab the previously inaccessible RAM at the start for testing. I can see a few potential problems, but since my understanding of the low-level memory mapping is muddy at best, I won't speculate; I'd just appreciate any more expert views on whether this does work, or could be made to work. Thanks, -- Matthew -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/