Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-03-12 Thread Jeff Garzik

Andi Kleen wrote:

in Linux. Apparently in some cases sata_nv does DMA on an already freed and then
reused mapping.


Any data or additional info on that?  Did you discover this by tracking 
the DMA API software routines, or something lower level (like a bus 
analyzer)?


libata handles all the DMA allocation and mapping and cleanup for 
sata_nv, so any software problem would affect the whole of libata.


But it's possible that the nForce SATA chip has DMA padding needs that 
are different from those provided by libata-core (grep for "pad"), which 
could create a situation where the hardware continues DMA'ing past the 
end of the DMA area.


Jeff




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-03-12 Thread Andi Kleen

> Andi, have you had a look at this? I'm a bit surprised at the lack of 
> reaction to this find..


FYI the problem is still being analysed behind the scenes. Chip's patch didn't 
fix 
it in all cases unfortunately -- it just changed the timing enough to make it 
happen
less often. The latest evidence points to a DMA mapping management problem
in Linux. Apparently in some cases sata_nv does DMA on an already freed and then
reused mapping.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-03-04 Thread Robert Hancock

Chip Coldwell wrote:

On Wed, 17 Jan 2007, Andi Kleen wrote:


On Wednesday 17 January 2007 07:31, Chris Wedgwood wrote:

On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote:

I agree,... it seems drastic, but this is the only really secure
solution.

I'd like to here from Andi how he feels about this?  It seems like a
somewhat drastic solution in some ways given a lot of hardware doesn't
seem to be affected (or maybe in those cases it's just really hard to
hit, I don't know).

AMD is looking at the issue. Only Nvidia chipsets seem to be affected,
although there were similar problems on VIA in the past too.
Unless a good workaround comes around soon I'll probably default
to iommu=soft on Nvidia.


We (Sun, AMD, Nvidia and Red Hat) have been testing a patch that seems
to solve the problem.  AMD and Nvidia analyzed an HDT trace that
seemed to indicate that CPU updates of the GATT were still in cache
when a subsequent table walk caused by a device load used a stale GATT
PTE.  That analysis inspired this patch, submitted to this list as an
RFC.  It is not obvious (to me, at least) why this problem has only
shown up on Nvidia SATA controllers.

We are continuing to investigate.

diff --git a/arch/x86_64/kernel/pci-gart.c b/arch/x86_64/kernel/pci-gart.c
index 030eb37..1dd461a 100644
--- a/arch/x86_64/kernel/pci-gart.c
+++ b/arch/x86_64/kernel/pci-gart.c
@@ -69,6 +69,8 @@ static u32 gart_unmapped_entry;
 #define AGPEXTERN
 #endif
 
+#define GATT_CLFLUSH(i) asm volatile ("clflush (%0)" :: "r" (iommu_gatt_base + (i)))

+
 /* backdoor interface to AGP driver */
 AGPEXTERN int agp_memory_reserved;
 AGPEXTERN __u32 *agp_gatt_table;
@@ -221,6 +223,7 @@ static dma_addr_t dma_map_area(struct device *dev, 
dma_addr_t phys_mem,
for (i = 0; i < npages; i++) {
iommu_gatt_base[iommu_page + i] = GPTE_ENCODE(phys_mem);
SET_LEAK(iommu_page + i);
+   GATT_CLFLUSH(iommu_page + i);
phys_mem += PAGE_SIZE;
}
return iommu_bus_base + iommu_page*PAGE_SIZE + (phys_mem & ~PAGE_MASK);
@@ -348,6 +351,7 @@ static int __dma_map_cont(struct scatterlist *sg, int 
start, int stopat,
 		while (pages--) { 
 			iommu_gatt_base[iommu_page] = GPTE_ENCODE(addr); 
 			SET_LEAK(iommu_page);

+   GATT_CLFLUSH(iommu_page);
addr += PAGE_SIZE;
iommu_page++;
}




Andi, have you had a look at this? I'm a bit surprised at the lack of 
reaction to this find..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-02-21 Thread Chip Coldwell
On Wed, 17 Jan 2007, Andi Kleen wrote:

> On Wednesday 17 January 2007 07:31, Chris Wedgwood wrote:
> > On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote:
> > > I agree,... it seems drastic, but this is the only really secure
> > > solution.
> >
> > I'd like to here from Andi how he feels about this?  It seems like a
> > somewhat drastic solution in some ways given a lot of hardware doesn't
> > seem to be affected (or maybe in those cases it's just really hard to
> > hit, I don't know).
> 
> AMD is looking at the issue. Only Nvidia chipsets seem to be affected,
> although there were similar problems on VIA in the past too.
> Unless a good workaround comes around soon I'll probably default
> to iommu=soft on Nvidia.

We (Sun, AMD, Nvidia and Red Hat) have been testing a patch that seems
to solve the problem.  AMD and Nvidia analyzed an HDT trace that
seemed to indicate that CPU updates of the GATT were still in cache
when a subsequent table walk caused by a device load used a stale GATT
PTE.  That analysis inspired this patch, submitted to this list as an
RFC.  It is not obvious (to me, at least) why this problem has only
shown up on Nvidia SATA controllers.

We are continuing to investigate.

diff --git a/arch/x86_64/kernel/pci-gart.c b/arch/x86_64/kernel/pci-gart.c
index 030eb37..1dd461a 100644
--- a/arch/x86_64/kernel/pci-gart.c
+++ b/arch/x86_64/kernel/pci-gart.c
@@ -69,6 +69,8 @@ static u32 gart_unmapped_entry;
 #define AGPEXTERN
 #endif
 
+#define GATT_CLFLUSH(i) asm volatile ("clflush (%0)" :: "r" (iommu_gatt_base + 
(i)))
+
 /* backdoor interface to AGP driver */
 AGPEXTERN int agp_memory_reserved;
 AGPEXTERN __u32 *agp_gatt_table;
@@ -221,6 +223,7 @@ static dma_addr_t dma_map_area(struct device *dev, 
dma_addr_t phys_mem,
for (i = 0; i < npages; i++) {
iommu_gatt_base[iommu_page + i] = GPTE_ENCODE(phys_mem);
SET_LEAK(iommu_page + i);
+   GATT_CLFLUSH(iommu_page + i);
phys_mem += PAGE_SIZE;
}
return iommu_bus_base + iommu_page*PAGE_SIZE + (phys_mem & ~PAGE_MASK);
@@ -348,6 +351,7 @@ static int __dma_map_cont(struct scatterlist *sg, int 
start, int stopat,
while (pages--) { 
iommu_gatt_base[iommu_page] = GPTE_ENCODE(addr); 
SET_LEAK(iommu_page);
+   GATT_CLFLUSH(iommu_page);
addr += PAGE_SIZE;
iommu_page++;
}


Chip

-- 
Charles M. "Chip" Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-18 Thread Andi Kleen
On Thursday 18 January 2007 22:00, Erik Andersen wrote:

> I just tried again and while using iommu=soft does avoid the
> corruption problem, as with previous kernels with 2.6.20-rc5
> using iommu=soft still makes my pcHDTV HD5500 DVB cards not work.

This must be some separate bug and needs to be fixed anyways.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-18 Thread Andi Kleen
On Friday 19 January 2007 08:57, Chip Coldwell wrote:

> But it still might be a reasonable thing to do to test the theory that
> the problem is cache coherency across the graphics aperture, even if
> it isn't a long-term solution for the problem.

I suspect it would disturb timing so badly that it might hide the original
problem. If that is true then adding udelays might hide it too. 

Ok i guess you could test with a UP kernel. There change_page_attr
should be much cheaper because it doesn't need to IPI to other CPUs. Also use 
a .2.6.20-rc* kernel that uses CLFLUSH in there, not WBINVD which is also
very costly.

Anyways I guess we can just wait what the hardware people figure out.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-18 Thread Chip Coldwell

On Thu, 18 Jan 2007, Andi Kleen wrote:


The Northbridge guarantees coherency over the aperture, but
only if the caching attributes match.


That's interesting.  Makes sense, I suppose.


You would need to change_page_attr() every kernel address that is mapped into
the  IOMMU to use an uncached aperture. AGP does this, but the frequency of
mapping for the IOMMU  is much higher and it would be prohibitively costly
unfortunately.


But it still might be a reasonable thing to do to test the theory that
the problem is cache coherency across the graphics aperture, even if
it isn't a long-term solution for the problem.


In the past we saw corruptions from such conflicts, so this is more
than just theory. I suspect you traded a more easy to trigger
corruption with a more subtle one.


Yup.  That was the inspiration for the script.

Chip

--
Charles M. "Chip" Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-18 Thread Chris Wedgwood
On Thu, Jan 18, 2007 at 10:29:14AM +0100, joachim wrote:

> Not only has it only been on Nvidia chipsets but we have only seen
> reports on the Nvidia CK804 SATA controller.

People have reported problems with other controllers.  I have one here
I can test given a day or so.

I don't think it's SATA related, it just happens that it shows up well
there, for networking you would end up with the odd corrupted packet
probably and end up just dropping those so it might not be noticeable.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-18 Thread Chris Wedgwood
On Thu, Jan 18, 2007 at 04:00:28AM -0700, Erik Andersen wrote:

> I just tried again and while using iommu=soft does avoid the
> corruption problem, as with previous kernels with 2.6.20-rc5 using
> iommu=soft still makes my pcHDTV HD5500 DVB cards not work.

i would file a separate bug about that, presumably it won't work in
intel based machines too if the driver has dma api bugs

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-18 Thread Christoph Anton Mitterer
Erik Andersen wrote:
> I just tried again and while using iommu=soft does avoid the
> corruption problem, as with previous kernels with 2.6.20-rc5
> using iommu=soft still makes my pcHDTV HD5500 DVB cards not work.
> I still have to disable memhole and lose 1 GB.  :-(

Please add this to the bugreport
(http://bugzilla.kernel.org/show_bug.cgi?id=7768)

Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-18 Thread Christoph Anton Mitterer
joachim wrote:
> Not only has it only been on Nvidia chipsets but we have only seen
> reports on the Nvidia CK804 SATA controller.  Please write in or add
> yourself to the bugzilla entry [1] and tell us which hardware you have
> if you get 4kB pagesize corruption and it goes away with "iommu=soft".
How do I find out if I get a 4kB pagesize corruption (or is this the
same as "our corruption"?

Chris.

btw: Should we only post the controller, or other hardware details, too?
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-18 Thread Erik Andersen
On Wed Jan 17, 2007 at 08:29:53AM +1100, Andi Kleen wrote:
> AMD is looking at the issue. Only Nvidia chipsets seem to be affected,
> although there were similar problems on VIA in the past too.
> Unless a good workaround comes around soon I'll probably default
> to iommu=soft on Nvidia.

I just tried again and while using iommu=soft does avoid the
corruption problem, as with previous kernels with 2.6.20-rc5
using iommu=soft still makes my pcHDTV HD5500 DVB cards not work.
I still have to disable memhole and lose 1 GB.  :-(

 -Erik

--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-18 Thread joachim
Andi Kleen <[EMAIL PROTECTED]> wrote on 22:29 16/01/2007 +0100 :
> AMD is looking at the issue. Only Nvidia chipsets seem to be affected,
> although there were similar problems on VIA in the past too.
> Unless a good workaround comes around soon I'll probably default
> to iommu=soft on Nvidia.
> 
> -Andi

Not only has it only been on Nvidia chipsets but we have only seen
reports on the Nvidia CK804 SATA controller.  Please write in or add
yourself to the bugzilla entry [1] and tell us which hardware you have
if you get 4kB pagesize corruption and it goes away with "iommu=soft".

thanks
-joachim

[1] http://bugzilla.kernel.org/show_bug.cgi?id=7768
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-17 Thread Andi Kleen

> We've just verified that configuring the graphics aperture to be
> write-combining instead of write-back using an MTRR also solves the
> problem.  It appears to be a cache incoherency issue in the graphics
> aperture.

Interesting. 

Unfortunately it is also not correct. It was intentional to 
mark the IOMMU half. of the aperture write-back, as opposed
to uncached as the AGP half. Otherwise you get illegal cache attribute 
conflicts with the memory that is being remapped which can also cause 
corruption.

The Northbridge guarantees coherency over the aperture, but 
only if the caching attributes match. 

You would need to change_page_attr() every kernel address that is mapped into 
the  IOMMU to use an uncached aperture. AGP does this, but the frequency of 
mapping for the IOMMU  is much higher and it would be prohibitively costly
unfortunately. 

In the past we saw corruptions from such conflicts, so this is more
than just theory. I suspect  you traded a more easy to trigger corruption with 
a more subtle one.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-17 Thread Chip Coldwell

On Wed, 17 Jan 2007, Chip Coldwell wrote:


On Wed, 17 Jan 2007, Andi Kleen wrote:


On Wednesday 17 January 2007 07:31, Chris Wedgwood wrote:

On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote:

I agree,... it seems drastic, but this is the only really secure
solution.


I'd like to here from Andi how he feels about this?  It seems like a
somewhat drastic solution in some ways given a lot of hardware doesn't
seem to be affected (or maybe in those cases it's just really hard to
hit, I don't know).


AMD is looking at the issue. Only Nvidia chipsets seem to be affected,
although there were similar problems on VIA in the past too.
Unless a good workaround comes around soon I'll probably default
to iommu=soft on Nvidia.



We've just verified that configuring the graphics aperture to be
write-combining instead of write-back using an MTRR also solves the
problem.  It appears to be a cache incoherency issue in the graphics
aperture.


I take it back.  Further testing has revealed that this does not solve
the problem.

Chip

--
Charles M. "Chip" Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-17 Thread Chip Coldwell

On Wed, 17 Jan 2007, Andi Kleen wrote:


On Wednesday 17 January 2007 07:31, Chris Wedgwood wrote:

On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote:

I agree,... it seems drastic, but this is the only really secure
solution.


I'd like to here from Andi how he feels about this?  It seems like a
somewhat drastic solution in some ways given a lot of hardware doesn't
seem to be affected (or maybe in those cases it's just really hard to
hit, I don't know).


AMD is looking at the issue. Only Nvidia chipsets seem to be affected,
although there were similar problems on VIA in the past too.
Unless a good workaround comes around soon I'll probably default
to iommu=soft on Nvidia.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


We've just verified that configuring the graphics aperture to be
write-combining instead of write-back using an MTRR also solves the
problem.  It appears to be a cache incoherency issue in the graphics
aperture.

This script does the trick:

[ -- cut here -- ]
#!/bin/bash

# Read the northbridge offset 0x90 to get the size of the aperture
size=0x`lspci -xxx -s 0:18.3 | awk '/^90:/ { print $2 }'`

# bit 0 indicates the aperture is enabled, bits 1 - 3 indicate the size
if [ $((size & 1)) -eq 0 ] ; then
echo "GART disabled; exiting"
exit 0
fi

shft=$(((size >> 1) & 7))
size=$((0x200 << shft))

# Read the northbridge offset 0x94 to get the base address of the aperture
base=0x`lspci -xxx -s 0:18.3 | awk '/^90:/ { print $6 }'`
base=$((base << 25))
basehex=`printf 0x%08x $base`

printf "IOMMU aperture found at base=0x%08x size=0x%08x (%d KiB)\n" $base $size 
$((size/1024))

if grep -q $basehex /proc/mtrr ; then
echo "MTRR already configured for IOMMU aperture; exiting"
exit 0
fi

echo "Configuring write-combining MTRR for IOMMU aperture"
printf "base=0x%08x size=0x%08x type=write-combining\n" $base $size >/proc/mtrr

exit 0
[ -- cut here-- ]

Chip

--
Charles M. "Chip" Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-16 Thread Christoph Anton Mitterer
Andi Kleen wrote:
> AMD is looking at the issue. Only Nvidia chipsets seem to be affected,
> although there were similar problems on VIA in the past too.
> Unless a good workaround comes around soon I'll probably default
> to iommu=soft on Nvidia.
I've just read the posts about AMDs and NVIDIAs effort to find the
issue,... but in the meantime this would be the best solution.

And if "we"'ll ever find a rue solution,.. we could still deactivate the
iommu=soft setting.


Best wishes,
Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-16 Thread Christoph Anton Mitterer
Chris Wedgwood wrote:
> I'd like to here from Andi how he feels about this?  It seems like a
> somewhat drastic solution in some ways given a lot of hardware doesn't
> seem to be affected (or maybe in those cases it's just really hard to
> hit, I don't know).
>   
Yes this might be true,.. those who have reported working systems might
just have a configuration where the error happens even rarer or where
some other event(s) work around it.

>> Well we can hope that Nvidia will find out more (though I'm not too
>> optimistic).
>> 
> Ideally someone from AMD needs to look into this, if some mainboards
> really never see this problem, then why is that?  Is there errata that
> some BIOS/mainboard vendors are dealing with that others are not?
>   
Some time ago I've asked here in a post if some of you could try to
contact AMD and/or Nvidia,.. as no one did,... I wrote them again (to
all forums and email addresses I knew). (You can see the text here
http://www.nvnews.net/vbulletin/showthread.php?t=82909).
Now Nvidia replied and it seems (thanks to Mr. Friedman) that they're
actually try to investigate in the issue...

I received on reply from AMD (actually in German which is strange as I
wrote to their US support)... where they told me they'd have forwarded
my mail to their Linux engineers... but no reply since then.

Perhaps some of you have some "contacts" and can use them...
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



RE: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-16 Thread Allen Martin
> I'd like to here from Andi how he feels about this?  It seems like a
> somewhat drastic solution in some ways given a lot of hardware doesn't
> seem to be affected (or maybe in those cases it's just really hard to
> hit, I don't know).
> 
> > Well we can hope that Nvidia will find out more (though I'm not too
> > optimistic).
> 
> Ideally someone from AMD needs to look into this, if some mainboards
> really never see this problem, then why is that?  Is there errata that
> some BIOS/mainboard vendors are dealing with that others are not?

NVIDIA and AMD are ivestigating this issue, we don't know what the
problem is yet.
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-16 Thread Andi Kleen
On Wednesday 17 January 2007 07:31, Chris Wedgwood wrote:
> On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote:
> > I agree,... it seems drastic, but this is the only really secure
> > solution.
>
> I'd like to here from Andi how he feels about this?  It seems like a
> somewhat drastic solution in some ways given a lot of hardware doesn't
> seem to be affected (or maybe in those cases it's just really hard to
> hit, I don't know).

AMD is looking at the issue. Only Nvidia chipsets seem to be affected,
although there were similar problems on VIA in the past too.
Unless a good workaround comes around soon I'll probably default
to iommu=soft on Nvidia.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-16 Thread Chris Wedgwood
On Tue, Jan 16, 2007 at 09:31:31PM +0100, Krzysztof Halasa wrote:

> Do you (someone) have (maintain) a list of affected systems,
> including motherboard type and possibly version, BIOS version and
> CPU type? A similar list of unaffected systems with 4GB+ RAM could
> be useful, too.

All I know is that some system hit this and some don't seem to.  Why
it's not clear.

> I'm afraid with default iommu=soft it will be a mystery forever.

Right, but given windows doesn't use the iommu at all and that a lot
of newer hardware/drivers doesn't need it it might be the safest
option since it clearly has been causing corruption for a number of
people for well over a year now.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-16 Thread Chris Wedgwood
On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote:

> I agree,... it seems drastic, but this is the only really secure
> solution.

I'd like to here from Andi how he feels about this?  It seems like a
somewhat drastic solution in some ways given a lot of hardware doesn't
seem to be affected (or maybe in those cases it's just really hard to
hit, I don't know).

> Well we can hope that Nvidia will find out more (though I'm not too
> optimistic).

Ideally someone from AMD needs to look into this, if some mainboards
really never see this problem, then why is that?  Is there errata that
some BIOS/mainboard vendors are dealing with that others are not?

> But we should not forget about the issue, just because SATA is not
> longer affected.

Right.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-16 Thread Krzysztof Halasa
Chris Wedgwood <[EMAIL PROTECTED]> writes:

> right now i'm thinking if we can't figure out which cpu/bios
> combinations are safe we might almost be better off doing iommu=soft
> for *all* k8 stuff except for those that are whitelisted; though this
> seems extremely drastic

Do you (someone) have (maintain) a list of affected systems,
including motherboard type and possibly version, BIOS version and
CPU type? A similar list of unaffected systems with 4GB+ RAM could
be useful, too.

I'm afraid with default iommu=soft it will be a mystery forever.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-16 Thread Christoph Anton Mitterer
Arkadiusz Miskiewicz wrote:
> FYI it seems that I was also hit by this bug with qlogic fc card + adaptec 
> taro raid controller on Thunder K8SRE S2891 mainboard with nvidia chipset on 
> it.
>
> http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/b8bdbde9721f7d35/45701994c95fe2cf?lnk=st&q=arkadiusz+fibre&rnum=8#45701994c95fe2cf
>   
I'm aware of your old thread and at least I considered your postings
from it :-)

Anyway, thanks for your information. =)

Chris.

begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-16 Thread Arkadiusz Miskiewicz
On Tuesday 16 January 2007 19:01, Chris Wedgwood wrote:
> On Tue, Jan 16, 2007 at 08:26:05AM -0600, Robert Hancock wrote:
> > >If one use iommu=soft the sata_nv will continue to use the new code
> > >for the ADMA, right?
> >
> > Right, that shouldn't affect it.
>
> right now i'm thinking if we can't figure out which cpu/bios
> combinations are safe we might almost be better off doing iommu=soft
> for *all* k8 stuff except for those that are whitelisted; though this
> seems extremely drastic
>
> it's not clear if this only affect nvidia based chipsets, the nature
> of the corruption makes me think it's not an iommu software bug (we
> see a few bytes not entire pages corrupted, it's not even clear if
> it's entire cachelines trashed) --- perhaps other vendors have more
> recent bios errata or maybe it's just that nvidia has sold a lot of
> these so they are more visible? (i'm assuming at this point it might
> be some kind of cpu errata that some bioses deal with because some
> mainboards don't ever seem to see this whilst others do)

FYI it seems that I was also hit by this bug with qlogic fc card + adaptec 
taro raid controller on Thunder K8SRE S2891 mainboard with nvidia chipset on 
it.

http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/b8bdbde9721f7d35/45701994c95fe2cf?lnk=st&q=arkadiusz+fibre&rnum=8#45701994c95fe2cf


-- 
Arkadiusz MiƛkiewiczPLD/Linux Team
arekm / maven.plhttp://ftp.pld-linux.org/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-16 Thread Christoph Anton Mitterer
Chris Wedgwood wrote:
> right now i'm thinking if we can't figure out which cpu/bios
> combinations are safe we might almost be better off doing iommu=soft
> for *all* k8 stuff except for those that are whitelisted; though this
> seems extremely drastic
>   
I agree,... it seems drastic, but this is the only really secure solution.
But it seems that none of the responsible developers read our thread or
the bugreport and gave his opinion about the issue.

> it's not clear if this only affect nvidia based chipsets, the nature
> of the corruption makes me think it's not an iommu software bug (we
> see a few bytes not entire pages corrupted, it's not even clear if
> it's entire cachelines trashed) --- perhaps other vendors have more
> recent bios errata or maybe it's just that nvidia has sold a lot of
> these so they are more visible? (i'm assuming at this point it might
> be some kind of cpu errata that some bioses deal with because some
> mainboards don't ever seem to see this whilst others do)
>   
Well we can hope that Nvidia will find out more (though I'm not too
optimistic).


> in some ways the problem is worse with recent kernels --- because the
> ethernet and sata can address over 4GB and don't use the iommu anymore
> the problem is going to be *much* harder to hit, but still here
> lurking to cause problems for people.
Yes I agree,.. this is a dangerous situation...
But we should not forget about the issue, just because SATA is not
longer affected.

Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-16 Thread Chris Wedgwood
On Tue, Jan 16, 2007 at 08:26:05AM -0600, Robert Hancock wrote:

> >If one use iommu=soft the sata_nv will continue to use the new code
> >for the ADMA, right?
>
> Right, that shouldn't affect it.

right now i'm thinking if we can't figure out which cpu/bios
combinations are safe we might almost be better off doing iommu=soft
for *all* k8 stuff except for those that are whitelisted; though this
seems extremely drastic

it's not clear if this only affect nvidia based chipsets, the nature
of the corruption makes me think it's not an iommu software bug (we
see a few bytes not entire pages corrupted, it's not even clear if
it's entire cachelines trashed) --- perhaps other vendors have more
recent bios errata or maybe it's just that nvidia has sold a lot of
these so they are more visible? (i'm assuming at this point it might
be some kind of cpu errata that some bioses deal with because some
mainboards don't ever seem to see this whilst others do)

in some ways the problem is worse with recent kernels --- because the
ethernet and sata can address over 4GB and don't use the iommu anymore
the problem is going to be *much* harder to hit, but still here
lurking to cause problems for people.  with ethernet you'll probably
end up getting the odd trashed tcp frame and dropping it, so those
will go mostly unnoticed, so this is why sata seems to be the easier
way to show it
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/