Re: [openib-general] [RFC/BUG] DMA vs. CQ race

2007-02-27 Thread Roland Dreier
On our cell blade + PCI-e Mellanox. I don't see anything in arch/powerpc that looks like dma_alloc_coherent() will do anything other than allocate some memory and map it with DMA_BIDIRECTIONAL. So how does this altix fix help in your situation? Am I misreading the Cell IOMMU code?

Re: [openib-general] [RFC/BUG] DMA vs. CQ race

2007-02-27 Thread Shirley Ma
Roland Dreier [EMAIL PROTECTED] wrote on 02/27/2007 01:40:36 PM: Shirley, can you clarify why doing dma_alloc_coherent() in the kernel helps on your Cell blade? It really seems that dma_alloc_coherent() just allocates some memory and then does dma_map(DMA_BIDIRECTIONAL), which would be

Re: [openib-general] [RFC/BUG] DMA vs. CQ race

2007-02-26 Thread Roland Dreier
That would be great. We hit a similar problem in our cluster test -- data corruption because of this race. On what platform? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To

Re: [openib-general] [RFC/BUG] DMA vs. CQ race

2007-02-26 Thread Shirley Ma
Hmm, OK. Then I will do my best to make sure we get a fix for this into 2.6.22. That would be great. We hit a similar problem in our cluster test -- data corruption because of this race. Thanks Shirley Ma___ openib-general mailing list

Re: [openib-general] [RFC/BUG] DMA vs. CQ race

2007-02-26 Thread Shirley Ma
Roland Dreier [EMAIL PROTECTED] wrote on 02/26/2007 02:09:48 PM: That would be great. We hit a similar problem in our cluster test -- data corruption because of this race. On what platform? - R. On our cell blade + PCI-e Mellanox. Thanks Shirley

Re: [openib-general] [RFC/BUG] DMA vs. CQ race

2007-02-26 Thread Roland Dreier
On our cell blade + PCI-e Mellanox. I don't see anything in arch/powerpc that looks like dma_alloc_coherent() will do anything other than allocate some memory and map it with DMA_BIDIRECTIONAL. So how does this altix fix help in your situation? Am I misreading the Cell IOMMU code? - R.

Re: [openib-general] [RFC/BUG] DMA vs. CQ race

2007-02-22 Thread Roland Dreier
A first-cut at a patch was sent out, some very reasonable objections were raised, and the thread fizzled out. Sorry, I meant to respond again, but I never got around to it. The biggest concern with the earlier patch seemed to be backward compatibility. There was a stab at addressing

Re: [openib-general] [RFC/BUG] DMA vs. CQ race

2007-02-22 Thread akepner
On Thu, Feb 22, 2007 at 10:34:16AM -0800, Roland Dreier wrote: I actually have a vague plan for a somewhat cleaner way to get this fix. For a variety of reasons, I am planning on changing the way the kernel handles memory registration so that low-level drivers have more control over what

Re: [openib-general] [RFC/BUG] DMA vs. CQ race

2007-02-22 Thread Roland Dreier
We found this accidentally, running a normal MPI job, on a normally sized machine (i.e., tens, not hundreds of processors.) It appears to be more easily produced that we'd expected, and we consider it to be a severe problem. Hmm, OK. Then I will do my best to make sure we get a fix

[openib-general] [RFC/BUG] DMA vs. CQ race

2007-02-21 Thread akepner
In: http://openib.org/pipermail/openib-general/2006-December/030251.html I described a potential race between DMA and CQ updates on Altix systems. At that time the bug hadn't been observed, but was expected to be possible on large NUMA systems. A first-cut at a patch was sent out, some very