Bug#652119: Bad pagetable 000f
Matthew Wakeling wrote: > Thanks. I tried GhostBSD, which didn't have par2 included and > couldn't run executables from my hard drive, and Knoppix which was > an i386 kernel (albeit very recent 3.0.something), so also couldn't > run my amd64 par2 from my hard drive. A rescue environment may be > the way forward, but how do I get a recent kernel and par2 onto it? > I have to admit I am not well-versed in setting that up. To test a recent kernel, there should be no need for a rescue environment. Just installing initramfs-tools, linux-base, and a kernel image from sid should work fine. (Everything else can stay at the version from squeeze.) Does GhostBSD include a C compiler? If so, it should be possible to build par2 from source to use there. > I think I have spotted another condition for the bug to be > triggered. I ran par2 with a different configuration, and it worked. > I think par2 does interleaved word by word access to n memory areas, > where n is configurable. When it crashed, n was 1400, and when it > worked n was 140. I also noticed the program worked much faster when > n was smaller. Now, the number of TLB entries on my processor is > 1024, so it is quite possible that every single word access causes a > TLB miss. Thanks for these updates. I suppose I should also mention - http://linux-mm.org/, the memory management subsystem wiki - linux...@kvack.org, the mailing list since they are more likely to be able to say what makes sense and what doesn't in explaining your symptoms, even when working with an old kernel. Please cc either me or this bug log if writing to them so we can track it. Thanks for your patience and sorry for the lack of progress. Jonathan -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#652119: Bad pagetable 000f
On Sun, 8 Jan 2012, Jonathan Nieder wrote: Could you point me in the direction of such a livecd please? It looks like no one is making an official Debian livecd with kFreeBSD any more (alas). But it should be possible to grab par2 and its dependencies and run them in the debian-installer[1] rescue environment, for example. Alternatively, there seem to be some other non-Linux live environments, such as [2]. Thanks. I tried GhostBSD, which didn't have par2 included and couldn't run executables from my hard drive, and Knoppix which was an i386 kernel (albeit very recent 3.0.something), so also couldn't run my amd64 par2 from my hard drive. A rescue environment may be the way forward, but how do I get a recent kernel and par2 onto it? I have to admit I am not well-versed in setting that up. I think I have spotted another condition for the bug to be triggered. I ran par2 with a different configuration, and it worked. I think par2 does interleaved word by word access to n memory areas, where n is configurable. When it crashed, n was 1400, and when it worked n was 140. I also noticed the program worked much faster when n was smaller. Now, the number of TLB entries on my processor is 1024, so it is quite possible that every single word access causes a TLB miss. Matthew -- "Interwoven alignment preambles are not allowed." If you have been so devious as to get this message, you will understand it, and you deserve no sympathy. -- Knuth, in the TeXbook -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#652119: Bad pagetable 000f
Hi Matthew, Matthew Wakeling wrote: > On Thu, 22 Dec 2011, Jonathan Nieder wrote: >> Other tests that would be useful might include >> (1) running memtest86+ and (2) trying the same workload using a livecd >> with some other kernel, like the kernel of FreeBSD, to see if this is >> likely to be a hardware bug or a kernel bug. > > Just ran a 10 hour memtest86+, no problems found. > > Could you point me in the direction of such a livecd please? Sorry for the slow response. It looks like no one is making an official Debian livecd with kFreeBSD any more (alas). But it should be possible to grab par2 and its dependencies and run them in the debian-installer[1] rescue environment, for example. Alternatively, there seem to be some other non-Linux live environments, such as [2]. Thanks, Jonathan [1] http://www.debian.org/CD/ http://www.debian.org/devel/debian-installer/ [2] http://people.freebsd.org/~mm/mfsbsd/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#652119: Bad pagetable 000f
On Thu, 22 Dec 2011, Jonathan Nieder wrote: I'll have to physically attend the machine to do this, which won't happen until January. Even then, testing will involve crashing my machine a few times, so it won't be the first thing I do. No problem; we can wait. Other tests that would be useful might include (1) running memtest86+ and (2) trying the same workload using a livecd with some other kernel, like the kernel of FreeBSD, to see if this is likely to be a hardware bug or a kernel bug. Just ran a 10 hour memtest86+, no problems found. Could you point me in the direction of such a livecd please? Matthew -- Reality is that which, when you stop believing in it, doesn't go away. -- Philip K. Dick -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#652119: Bad pagetable 000f
Matthew Wakeling wrote: > On Thu, 22 Dec 2011, Jonathan Nieder wrote: >> - was this a regression? (I.e., do you know of any older kernel >> versions without this bug?) > > I have seen this happen before on an older kernel. Not sure exactly which > one - maybe 2.6.26? Thanks. [...] >> If this is reproducible with newish kernels, we can get help from >> upstream. If it isn't, we can try to find what change fixed it and >> try applying the same fix to squeeze. > > Sure. How out of date is the squeeze kernel anyway? The 2.6.32.y series stabilized for about a year and a couple of months before squeeze was released. (v2.6.33 was released on 24 February 2010.) Since then, the 2.6.32.y kernel has received lots of fixes, so in that sense it is up to date. Upstream developers prefer to debug something closer to the codebase they are working on day-to-day. [...] > I'll have to physically attend the machine to do this, which won't happen > until January. Even then, testing will involve crashing my machine a few > times, so it won't be the first thing I do. No problem; we can wait. Other tests that would be useful might include (1) running memtest86+ and (2) trying the same workload using a livecd with some other kernel, like the kernel of FreeBSD, to see if this is likely to be a hardware bug or a kernel bug. Jonathan -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#652119: Bad pagetable 000f
On Thu, 22 Dec 2011, Jonathan Nieder wrote: Can you reproduce this on demand? Yes. It seems to take about two hours to fail. Thinking about it, par2 was accessing about 1400 independent areas of memory on a loop, so it would be causing cache thrash and TLB thrash. I'm thinking it might almost be worth having a look at the par2 program to see if it could improve its memory access pattern. But as it stands, it is probably a pretty good TLB management stress test. - was this a regression? (I.e., do you know of any older kernel versions without this bug?) I have seen this happen before on an older kernel. Not sure exactly which one - maybe 2.6.26? - can you reproduce it with a recent kernel from sid or experimental? (The only packages from outside squeeze you should need in order to test this aside from the kernel image itself are linux-base and initramfs-tools.) I'll have to physically attend the machine to do this, which won't happen until January. Even then, testing will involve crashing my machine a few times, so it won't be the first thing I do. If this is reproducible with newish kernels, we can get help from upstream. If it isn't, we can try to find what change fixed it and try applying the same fix to squeeze. Sure. How out of date is the squeeze kernel anyway? Matthew -- People who love sausages, respect the law, and work with IT standards shouldn't watch any of them being made. -- Peter Gutmann -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#652119: Bad pagetable 000f
Hey Matthew, Matthew Wakeling wrote: > Running the par2 program causes a bad pagetable fault which has > killed the process and killed the machine on two different > occasions. The machine is completely stable running other programs. > > The problem occurs when running par2 to generate 13.5GB of recovery > data for 50GB of data in eleven equal size files, a task that should > take about 10 hours on my system. The task seems to cause a crash > after about two hours. Can you reproduce this on demand? If so, some questions: - was this a regression? (I.e., do you know of any older kernel versions without this bug?) - can you reproduce it with a recent kernel from sid or experimental? (The only packages from outside squeeze you should need in order to test this aside from the kernel image itself are linux-base and initramfs-tools.) If this is reproducible with newish kernels, we can get help from upstream. If it isn't, we can try to find what change fixed it and try applying the same fix to squeeze. Thanks and hope that helps, Jonathan -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org