Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM II
Which in turn enables the iommu_merge functionality in gart_map_sg(). for_each_sg(sg, s, nents, i) { Hmm, another thought. Maybe this code just has trouble with the new linked SG lists and it's not really a SB600 problem? I did a quick test on two ATI machines with older chipset and iommu=force,merge and it didn't show a problem though. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM
On Tue, 20 Nov 2007, Andi Kleen wrote: This requires propably working 64bit DMA, which is not possible with the SB600 controller. It should not no. The remapping is done into the GART which is 4GB and that is the address the SB600 sees. Hmm, I just checked the boot logs of the failing 4GB kernel: BIOS-e820: 0001 - 00012000 (usable) ... CPU 0: aperture @ c00 size 32 MB Aperture too small (32 MB) No AGP bridge found Your BIOS doesn't leave a aperture memory hole Please enable the IOMMU option in the BIOS setup This costs you 64 MB of RAM Mapping aperture over 65536 KB of RAM @ c00 Memory: 4055984k/4718592k available (2146k kernel code, 136780k reserved, 1273k data, 296k init) 4718592k * 1024 == 0x12000 So now we have addresses 4G and I suspect that this is somehow related to the problem. When mem=3500M is given on the kernel command line, we do not use this address space. Also is the aperture size of 32MB somehow related to this ? tglx - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM
On Tuesday 20 November 2007 19:29:56 Thomas Gleixner wrote: On Tue, 20 Nov 2007, Andi Kleen wrote: This requires propably working 64bit DMA, which is not possible with the SB600 controller. It should not no. The remapping is done into the GART which is 4GB and that is the address the SB600 sees. Hmm, I just checked the boot logs of the failing 4GB kernel: BIOS-e820: 0001 - 00012000 (usable) ... CPU 0: aperture @ c00 size 32 MB Aperture too small (32 MB) No AGP bridge found Your BIOS doesn't leave a aperture memory hole Please enable the IOMMU option in the BIOS setup This costs you 64 MB of RAM Mapping aperture over 65536 KB of RAM @ c00 The aperture is mapped at c00 and c00 + 64MB 4GB Memory: 4055984k/4718592k available (2146k kernel code, 136780k reserved, 1273k data, 296k init) 4718592k * 1024 == 0x12000 So now we have addresses 4G and I suspect that this is somehow related to the problem. Yes of course -- without 4GB the PCI-GART would not be used at all (unless you force it) and then no merging. Also is the aperture size of 32MB somehow related to this ? This just means the BIOS didn't initialize it properly (a lot of BIOS don't do anymore these days because they assume it's a AGP only feature) -- that is why the kernel allocated its own over memory. I think we really have to find out which request freezes it. Can you perhaps just apply this patch and post the output? Index: linux-2.6.24-rc1-hack/arch/x86/kernel/pci-gart_64.c === --- linux-2.6.24-rc1-hack.orig/arch/x86/kernel/pci-gart_64.c +++ linux-2.6.24-rc1-hack/arch/x86/kernel/pci-gart_64.c @@ -385,13 +385,19 @@ static int gart_map_sg(struct device *de unsigned long pages = 0; int need = 0, nextneed; struct scatterlist *s, *ps, *start_sg, *sgmap; - + if (nents == 0) return 0; if (!dev) dev = fallback_dev; + if (*dev-dma_mask = 0x) { + for_each_sg(sg, s, nents, i) { + printk(%d: map %lx len %u dir %d\n, i, sg_phys(s), s-length, dir); + } + } + out = 0; start = 0; start_sg = sgmap = sg; Tejun can probably figure out from that output where it comes from in libata :) -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM
Andi Kleen wrote: The AHCI code falls back to 32bit DMA in that case. Which in turn causes the problem seen by Srihari. There is not much printk sticking necessary, the code is simply not handling this. What code is not handling what? IOMMU merging should be always safe. If it is not the driver should not submit things in a single SG list. Yeap, a sg merged by IOMMU should be safe. It's just another contiguous memory area from the POV of the controller anyway. I wonder what went wrong here. What has exactly changed with iommu_merge patch? -- tejun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM
[Sorry to reply to my own email thread] Srihari Vijayaraghavan [EMAIL PROTECTED] wrote: ... No problems. Here's the log of unworking kernel with IOMMU turned on. Basically it goes on reseting the SATA ports throwing many errors (none are present in 2.6.23 or on 2.6.24-rc with mem=3500M) for many minutes at which point I do a power reset :-(. Also the log of the working kernel with IOMMU but with mem=3500M is also attached for the record. It's basically the same above kernel just with the added parameter. Gentlemen, This changeset has introduced a regression in 2.6.24-rc, such that my machine boots no more: http://www.kernel.org/hg/linux-2.6/rev/ddf8804136fb changeset: 72064:ddf8804136fb user:Andi Kleen [EMAIL PROTECTED] date:Fri Oct 19 20:35:03 2007 +0200 files: arch/x86/kernel/pci-dma_64.c description: x86: enable iommu_merge by default [ tglx: arch/x86 adaptation ] Signed-off-by: Andi Kleen [EMAIL PROTECTED] Signed-off-by: Ingo Molnar [EMAIL PROTECTED] Signed-off-by: Thomas Gleixner [EMAIL PROTECTED] committer: Thomas Gleixner [EMAIL PROTECTED] diff -r 8c8683cbdc05 -r ddf8804136fb arch/x86/kernel/pci-dma_64.c --- a/arch/x86/kernel/pci-dma_64.c Fri Oct 19 20:35:03 2007 +0200 +++ b/arch/x86/kernel/pci-dma_64.c Fri Oct 19 20:35:03 2007 +0200 @@ -11,7 +11,7 @@ #include asm/iommu.h #include asm/calgary.h -int iommu_merge __read_mostly = 0; +int iommu_merge __read_mostly = 1; EXPORT_SYMBOL(iommu_merge); dma_addr_t bad_dma_address __read_mostly; As a work-around, I can get it to boot with mem=3500M, but then it's ugly ;-) I lose some valuable memory I have. Here's my email thread on linux-ide capturing the good bad kernel behaviour for reference: http://marc.info/?t=11945621325r=1w=2 Thanks Hari PS: Here's hoping for a kernel mem= parameter free bootable 2.6.24 ;-). Feel safe with award winning spam protection on Yahoo!7 Mail. www.yahoo.com.au/mail - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM
On Wednesday 14 November 2007 12:55, Srihari Vijayaraghavan wrote: [Sorry to reply to my own email thread] Srihari Vijayaraghavan [EMAIL PROTECTED] wrote: ... No problems. Here's the log of unworking kernel with IOMMU turned on. Basically it goes on reseting the SATA ports throwing many errors (none are present in 2.6.23 or on 2.6.24-rc with mem=3500M) for many minutes at which point I do a power reset :-(. Also the log of the working kernel with IOMMU but with mem=3500M is also attached for the record. It's basically the same above kernel just with the added parameter. Gentlemen, This changeset has introduced a regression in 2.6.24-rc, such that my machine boots no more: Hmm, you got an AHCI controller that does not do 64bit DMA masks? Or do you have CONFIG_IOMMU_DEBUG enabled? Anyways, not being able to deal with merged SG lists must be some driver or hardware bug. I would stick some printks into gart_map_sg() and try to find out where the failing DMA is initiatiated and then split it into multiple IO submissions at the caller level. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM
The AHCI code falls back to 32bit DMA in that case. Which in turn causes the problem seen by Srihari. There is not much printk sticking necessary, the code is simply not handling this. What code is not handling what? IOMMU merging should be always safe. If it is not the driver should not submit things in a single SG list. So the main option right now seems to revert the iommu_merge patch. I don't think that is the correct fix. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM
On Wed, 14 Nov 2007, Andi Kleen wrote: On Wednesday 14 November 2007 12:55, Srihari Vijayaraghavan wrote: [Sorry to reply to my own email thread] Srihari Vijayaraghavan [EMAIL PROTECTED] wrote: ... No problems. Here's the log of unworking kernel with IOMMU turned on. Basically it goes on reseting the SATA ports throwing many errors (none are present in 2.6.23 or on 2.6.24-rc with mem=3500M) for many minutes at which point I do a power reset :-(. Also the log of the working kernel with IOMMU but with mem=3500M is also attached for the record. It's basically the same above kernel just with the added parameter. Gentlemen, This changeset has introduced a regression in 2.6.24-rc, such that my machine boots no more: Hmm, you got an AHCI controller that does not do 64bit DMA masks? Or do you have CONFIG_IOMMU_DEBUG enabled? Anyways, not being able to deal with merged SG lists must be some driver or hardware bug. I would stick some printks into gart_map_sg() and try to find out where the failing DMA is initiatiated and then split it into multiple IO submissions at the caller level. 64bit DMA on SB600 was disabled in May/07 due to a chip bug: http://www.mail-archive.com/linux-ide@vger.kernel.org/msg06694.html The AHCI code falls back to 32bit DMA in that case. Which in turn causes the problem seen by Srihari. There is not much printk sticking necessary, the code is simply not handling this. So the main option right now seems to revert the iommu_merge patch. Thanks, tglx - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM
Tejun Heo [EMAIL PROTECTED] wrote: [...] Hmmm.. weird. The workaround is still there. Please post boot log. OK, that's good to hear. Alas, after the Fedora 7 to 8 upgrade, I'm no longer able to compile a kernel (some uhci-hcd module not found for the initrd). And I was too quick to overwrite the problematic kernel. Anyway, once I get the kernel compiled, I'll post the boot log. Sorry for the trouble. Thanks Hari Feel safe with award winning spam protection on Yahoo!7 Mail. www.yahoo.com.au/mail - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
2.6.24-rc SB600 AHCI no go on =4GB of RAM
(Same symptoms/behaviour as before: http://marc.info/?l=linux-idem=117949823328798w=2 http://marc.info/?t=11781097043r=1w=2) With mem=3500M all is well, otherwise it goes on reseting the ports in a loop not booting :-( Thanks Hari Feel safe with award winning spam protection on Yahoo!7 Mail. www.yahoo.com.au/mail - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM
Srihari Vijayaraghavan wrote: (Same symptoms/behaviour as before: http://marc.info/?l=linux-idem=117949823328798w=2 http://marc.info/?t=11781097043r=1w=2) With mem=3500M all is well, otherwise it goes on reseting the ports in a loop not booting :-( Hmmm.. weird. The workaround is still there. Please post boot log. Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html