Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM II

2007-11-20 Thread Andi Kleen

 Which in turn enables the iommu_merge functionality in gart_map_sg().

   for_each_sg(sg, s, nents, i) {

Hmm, another thought. Maybe this code just has trouble with the new 
linked SG lists and it's not really a SB600 problem?

I did a quick test on two ATI machines with older chipset and iommu=force,merge
and it didn't show a problem though.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM

2007-11-20 Thread Thomas Gleixner
On Tue, 20 Nov 2007, Andi Kleen wrote:

 
  This requires propably working 64bit DMA, which is not possible with
  the SB600 controller.
 
 It should not no. The remapping is done into the GART which is 4GB
 and that is the address the SB600 sees.

Hmm, I just checked the boot logs of the failing 4GB kernel:

BIOS-e820: 0001 - 00012000 (usable)
...
CPU 0: aperture @ c00 size 32 MB
Aperture too small (32 MB)
No AGP bridge found
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ c00
Memory: 4055984k/4718592k available (2146k kernel code, 136780k reserved, 1273k 
data, 296k init)

4718592k * 1024 == 0x12000

So now we have addresses  4G and I suspect that this is somehow
related to the problem. 

When mem=3500M is given on the kernel command line, we do not use this
address space.

Also is the aperture size of 32MB somehow related to this ?

 tglx
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM

2007-11-20 Thread Andi Kleen
On Tuesday 20 November 2007 19:29:56 Thomas Gleixner wrote:
 On Tue, 20 Nov 2007, Andi Kleen wrote:
 
  
   This requires propably working 64bit DMA, which is not possible with
   the SB600 controller.
  
  It should not no. The remapping is done into the GART which is 4GB
  and that is the address the SB600 sees.
 
 Hmm, I just checked the boot logs of the failing 4GB kernel:
 
 BIOS-e820: 0001 - 00012000 (usable)
 ...
 CPU 0: aperture @ c00 size 32 MB
 Aperture too small (32 MB)
 No AGP bridge found
 Your BIOS doesn't leave a aperture memory hole
 Please enable the IOMMU option in the BIOS setup
 This costs you 64 MB of RAM
 Mapping aperture over 65536 KB of RAM @ c00


The aperture is mapped at c00 and c00 + 64MB  4GB


 Memory: 4055984k/4718592k available (2146k kernel code, 136780k reserved, 
 1273k data, 296k init)
 
 4718592k * 1024 == 0x12000
 
 So now we have addresses  4G and I suspect that this is somehow
 related to the problem. 

Yes of course -- without 4GB the PCI-GART would not be used at all
(unless you force it) and then no merging.

 
 Also is the aperture size of 32MB somehow related to this ?

This just means the BIOS didn't initialize it properly (a lot of 
BIOS don't do anymore these days because they assume it's a AGP
only feature) -- that is why the kernel allocated its own over
memory.

I think we really have to find out which request freezes it.
Can you perhaps just apply this patch and post the output?

Index: linux-2.6.24-rc1-hack/arch/x86/kernel/pci-gart_64.c
===
--- linux-2.6.24-rc1-hack.orig/arch/x86/kernel/pci-gart_64.c
+++ linux-2.6.24-rc1-hack/arch/x86/kernel/pci-gart_64.c
@@ -385,13 +385,19 @@ static int gart_map_sg(struct device *de
unsigned long pages = 0;
int need = 0, nextneed;
struct scatterlist *s, *ps, *start_sg, *sgmap;
-
+   
if (nents == 0) 
return 0;
 
if (!dev)
dev = fallback_dev;
 
+   if (*dev-dma_mask = 0x) { 
+   for_each_sg(sg, s, nents, i) { 
+   printk(%d: map %lx len %u dir %d\n, i, sg_phys(s), 
s-length, dir);
+   }
+   }
+
out = 0;
start = 0;
start_sg = sgmap = sg;



Tejun can probably figure out from that output where it comes
from in libata :)

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM

2007-11-19 Thread Tejun Heo
Andi Kleen wrote:
  
 The AHCI code falls back to 32bit DMA in that case. Which in turn
 causes the problem seen by Srihari. There is not much printk sticking
 necessary, the code is simply not handling this. 
 
 What code is not handling what? 
 
 IOMMU merging should be always safe. If it is not the driver should
 not submit things in a single SG list.

Yeap, a sg merged by IOMMU should be safe.  It's just another contiguous
memory area from the POV of the controller anyway.  I wonder what went
wrong here.  What has exactly changed with iommu_merge patch?

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM

2007-11-14 Thread Srihari Vijayaraghavan
[Sorry to reply to my own email thread]

Srihari Vijayaraghavan [EMAIL PROTECTED] wrote:
...
 No problems. Here's the log of unworking kernel with IOMMU turned on.
 Basically it goes on reseting the SATA ports throwing many errors (none are
 present in 2.6.23 or on 2.6.24-rc with mem=3500M) for many minutes at which
 point I do a power reset :-(.
 
 Also the log of the working kernel with IOMMU but with mem=3500M is also
 attached for the record. It's basically the same above kernel just with the
 added parameter.

Gentlemen,

This changeset has introduced a regression in 2.6.24-rc, such that my machine
boots no more:
http://www.kernel.org/hg/linux-2.6/rev/ddf8804136fb
changeset:   72064:ddf8804136fb
user:Andi Kleen [EMAIL PROTECTED]
date:Fri Oct 19 20:35:03 2007 +0200
files:   arch/x86/kernel/pci-dma_64.c
description:
x86: enable iommu_merge by default

[ tglx: arch/x86 adaptation ]

Signed-off-by: Andi Kleen [EMAIL PROTECTED]
Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
Signed-off-by: Thomas Gleixner [EMAIL PROTECTED]

committer: Thomas Gleixner [EMAIL PROTECTED]


diff -r 8c8683cbdc05 -r ddf8804136fb arch/x86/kernel/pci-dma_64.c
--- a/arch/x86/kernel/pci-dma_64.c  Fri Oct 19 20:35:03 2007 +0200
+++ b/arch/x86/kernel/pci-dma_64.c  Fri Oct 19 20:35:03 2007 +0200
@@ -11,7 +11,7 @@
 #include asm/iommu.h
 #include asm/calgary.h

-int iommu_merge __read_mostly = 0;
+int iommu_merge __read_mostly = 1;
 EXPORT_SYMBOL(iommu_merge);

 dma_addr_t bad_dma_address __read_mostly;

As a work-around, I can get it to boot with mem=3500M, but then it's ugly ;-)
 I lose some valuable memory I have.

Here's my email thread on linux-ide capturing the good  bad kernel behaviour
for reference:
http://marc.info/?t=11945621325r=1w=2

Thanks

Hari

PS: Here's hoping for a kernel mem= parameter free bootable 2.6.24 ;-).



  Feel safe with award winning spam protection on Yahoo!7 Mail.  
www.yahoo.com.au/mail 

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM

2007-11-14 Thread Andi Kleen
On Wednesday 14 November 2007 12:55, Srihari Vijayaraghavan wrote:
 [Sorry to reply to my own email thread]

 Srihari Vijayaraghavan [EMAIL PROTECTED] wrote:
 ...

  No problems. Here's the log of unworking kernel with IOMMU turned on.
  Basically it goes on reseting the SATA ports throwing many errors (none
  are present in 2.6.23 or on 2.6.24-rc with mem=3500M) for many minutes at
  which point I do a power reset :-(.
 
  Also the log of the working kernel with IOMMU but with mem=3500M is also
  attached for the record. It's basically the same above kernel just with
  the added parameter.

 Gentlemen,

 This changeset has introduced a regression in 2.6.24-rc, such that my
 machine boots no more:

Hmm, you got an AHCI controller that does not do 64bit DMA masks?
Or do you have CONFIG_IOMMU_DEBUG enabled? 

Anyways, not being able to deal with merged SG lists must be some
driver or hardware bug. I would stick some printks into gart_map_sg()
and try to find out where the failing DMA is initiatiated and then
split it into multiple IO submissions at the caller level.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM

2007-11-14 Thread Andi Kleen
 
 The AHCI code falls back to 32bit DMA in that case. Which in turn
 causes the problem seen by Srihari. There is not much printk sticking
 necessary, the code is simply not handling this. 

What code is not handling what? 

IOMMU merging should be always safe. If it is not the driver should
not submit things in a single SG list.

 So the main option 
 right now seems to revert the iommu_merge patch.

I don't think that is the correct fix.

-Andi
 
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM

2007-11-14 Thread Thomas Gleixner
On Wed, 14 Nov 2007, Andi Kleen wrote:
 On Wednesday 14 November 2007 12:55, Srihari Vijayaraghavan wrote:
  [Sorry to reply to my own email thread]
 
  Srihari Vijayaraghavan [EMAIL PROTECTED] wrote:
  ...
 
   No problems. Here's the log of unworking kernel with IOMMU turned on.
   Basically it goes on reseting the SATA ports throwing many errors (none
   are present in 2.6.23 or on 2.6.24-rc with mem=3500M) for many minutes at
   which point I do a power reset :-(.
  
   Also the log of the working kernel with IOMMU but with mem=3500M is also
   attached for the record. It's basically the same above kernel just with
   the added parameter.
 
  Gentlemen,
 
  This changeset has introduced a regression in 2.6.24-rc, such that my
  machine boots no more:
 
 Hmm, you got an AHCI controller that does not do 64bit DMA masks?
 Or do you have CONFIG_IOMMU_DEBUG enabled? 
 
 Anyways, not being able to deal with merged SG lists must be some
 driver or hardware bug. I would stick some printks into gart_map_sg()
 and try to find out where the failing DMA is initiatiated and then
 split it into multiple IO submissions at the caller level.

64bit DMA on SB600 was disabled in May/07 due to a chip bug:
http://www.mail-archive.com/linux-ide@vger.kernel.org/msg06694.html

The AHCI code falls back to 32bit DMA in that case. Which in turn
causes the problem seen by Srihari. There is not much printk sticking
necessary, the code is simply not handling this. So the main option
right now seems to revert the iommu_merge patch.

Thanks,

tglx
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM

2007-11-12 Thread Srihari Vijayaraghavan
 Tejun Heo [EMAIL PROTECTED] wrote:
[...]
 Hmmm.. weird.  The workaround is still there.  Please post boot log.

OK, that's good to hear. Alas, after the Fedora 7 to 8 upgrade, I'm no longer
able to compile a kernel (some uhci-hcd module not found for the initrd). And
I was too quick to overwrite the problematic kernel.

Anyway, once I get the kernel compiled, I'll post the boot log. Sorry for the
trouble.

Thanks

Hari



  Feel safe with award winning spam protection on Yahoo!7 Mail.  
www.yahoo.com.au/mail 

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2.6.24-rc SB600 AHCI no go on =4GB of RAM

2007-11-08 Thread Srihari Vijayaraghavan
(Same symptoms/behaviour as before: 
http://marc.info/?l=linux-idem=117949823328798w=2   
http://marc.info/?t=11781097043r=1w=2)

With mem=3500M all is well, otherwise it goes on reseting the ports in a loop  
not booting :-(

Thanks

Hari





  Feel safe with award winning spam protection on Yahoo!7 Mail.  
www.yahoo.com.au/mail 

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM

2007-11-08 Thread Tejun Heo
Srihari Vijayaraghavan wrote:
 (Same symptoms/behaviour as before: 
 http://marc.info/?l=linux-idem=117949823328798w=2   
 http://marc.info/?t=11781097043r=1w=2)
 
 With mem=3500M all is well, otherwise it goes on reseting the ports in a loop 
  not booting :-(

Hmmm.. weird.  The workaround is still there.  Please post boot log.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html