Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-26 Thread Joerg Roedel
On Wed, Mar 25, 2015 at 12:36:34PM -0400, Mark Hounschell wrote:
 BTW, so far the first 2 patches are working well. I was going to
 wait until the end of the day to report but so far I have been
 unable to produce the problems I was seeing. And I am in the middle
 of some driver work so lots of unloading/loading going on.

Great, thanks. Please let me know when you have test results for the
other patches too.


Joerg

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-26 Thread Joerg Roedel
Hi Mark,

On Thu, Mar 26, 2015 at 10:58:15AM -0400, Mark Hounschell wrote:
 Sorry but CMA was still badly broken. I have a patch below that works.

In which way is it broken? What happens when you try to allocate memory
with dma_alloc_coherent?

 I've tested it with small (no CMA) and large (with CMA) dma's using
 dma_alloc_coherent. The patch below is just the git diff from your
 cloned tree piped to a file then copied into this email. If you require
 an official patch I can send one. Just let me know.

The main differences I can spot are that you change the order (first
CMA, then buddy) and you manually align the input size. I can see the
reason for the later, but why does CMA need to be tried first?

 Also, in my opinion, this CMA thing is clearly a BUG not a feature
 request. The AMD iommu clearly breaks CMA. I feel what ever fix
 you are happy with should be back ported to stable.

It is not a BUG, the interface definition for dma_alloc_coherent does
not specify that it can allocate infinite amounts of memory. So this
patch does not qualify for stable.


Joerg

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-25 Thread Joerg Roedel
On Tue, Mar 24, 2015 at 07:57:49AM -0400, Mark Hounschell wrote:
 I'll be happy to test it.

Okay, here you go:

git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git amd-iommu

This branch has two patches on-top of v4.0-rc5 which should fix the
issue you described here. Please let me know if it works for you.


Joerg

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-25 Thread Joerg Roedel
Hi again,

On Wed, Mar 25, 2015 at 02:59:37PM +0100, Joerg Roedel wrote:
 On Tue, Mar 24, 2015 at 07:57:49AM -0400, Mark Hounschell wrote:
  I'll be happy to test it.
 
 Okay, here you go:
 
   git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git amd-iommu

I just added more patches to implement the use of the contiguous memory
allocator for the dma_alloc_coherent path. Since you asked for it, can
you test this as well?


Thanks,

Joerg

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-23 Thread Joerg Roedel
Hi Mark,

On Tue, Mar 03, 2015 at 02:36:19PM -0500, Mark Hounschell wrote:
 It looks like this problem is NOT a bug with the SCSI aic7xxx driver
 after all. I can duplicate this BUG very easily with other hardware.
 Simply removing a driver module  (whether it its self, has actually
 used any of the DMA API or not) that is sitting on the same pci bus
 as a card that is actually using DMA will cause this. And that card
 that is in use and using DMA will no longer function. It looks and
 feels like unloading a module causes the IOMMU to improperly unmap
 valid mappings.

You are right, I looked into the code and found the problem. I'll post a
fix for testing this week.


Joerg

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hpsa driver bug crack kernel down!

2014-04-10 Thread Joerg Roedel
[+ David, VT-d maintainer ]

Jiang, David, can you please have a look into this issue?

Thanks,

Joerg

On Wed, Apr 09, 2014 at 11:32:37PM -0700, Davidlohr Bueso wrote:
 On Wed, 2014-04-09 at 22:03 -0600, Bjorn Helgaas wrote:
  [+cc Joerg, iommu list]
  
  On Wed, Apr 9, 2014 at 6:19 PM, Davidlohr Bueso davidl...@hp.com wrote:
   On Wed, 2014-04-09 at 16:50 -0700, James Bottomley wrote:
   On Wed, 2014-04-09 at 16:40 -0700, Davidlohr Bueso wrote:
On Wed, 2014-04-09 at 16:10 -0700, James Bottomley wrote:
 On Wed, 2014-04-09 at 16:08 -0700, James Bottomley wrote:
  [+linux-scsi]
  On Wed, 2014-04-09 at 15:49 -0700, Davidlohr Bueso wrote:
   On Wed, 2014-04-09 at 10:39 +0800, Baoquan He wrote:
Hi,
   
The kernel is 3.14.0+ which is pulled just now.
  
   Cc'ing more people.
  
   While the hpsa driver appears to be involved in some way, I'm 
   sure if
   this is a related issue, but as of today's pull I'm getting 
   another
   problem that causes my DL980 not to come up.
  
   *Massive* amounts of:
  
   DMAR:[fault reason 02] Present bit in context entry is clear
   dmar: DRHD: handling fault status reg 602
   dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr 
   7f61e000
  
   Then:
  
   hpsa :03:00.0: Controller lockup detected: 0x
   ...
   Workqueue: events hpsa_monitor_ctlr_worker [hpsa]
   ...
  
   Screenshot of the actual LOCKUP:
   http://stgolabs.net/hpsa-hard-lockup-3.14+.png
  
   While I haven't bisected, things worked fine until at least 
   until commit
   39de65aa2c3e (April 2nd).
  
   Any ideas?
 
  Well, it's either a DMA remapping issue or a hpsa one.  Your 
  assertion
  that everything worked fine until 39de65aa2c3e would tend to 
  vindicate
  hpsa,
   
Hmm here you mean DMA, right?
  
   No, it vindicates the hpsa changes ... they don't seem to be causing
   problems until something goes wrong with dma remapping.
  
 because all the hpsa changes went in before that under
 Missing crucial info:

 commit 1a0b6abaea78f73d9bc0a2f6df2d9e4c917cade1

  Merge: 3e75c6d b2bff6c
  Author: Linus Torvalds torva...@linux-foundation.org
  Date:   Tue Apr 1 18:49:04 2014 -0700
 
  Merge tag 'scsi-misc' of
  git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
 
  can you revalidate that this commit works OK just to make sure?
   
Ok so I don't see those DMA messages and system starts just fine. I'm
thinking perhaps something broke after the IO mmu stuff in commit
3f583bc21977a608908b83d03ee2250426a5695c... could this be indirectly
causing the CPU stalls and just blame hpsa in the path as a side 
effect?
   
/me goes out to try the commit.
  
   That's my guess.  The DMAR messages are DMA remapping issues caused in
   the IOMMU.  If I had to guess, I'd say the DMAR fault message is
   indicating the IOMMU is calling for a mapping address before it can
   satisfy the driver read request, which is causing the hang apparently in
   the hpsa driver.
  
   I've added linux-pci to the cc; I think they deal with iommu issues on
   x86.
  
   So that merge commit appears to be the culprit, I see both the DMA
   messages and the lockup blaming hpsa...
  
  My understanding so far (please correct me if I'm wrong):
  
  39de65aa2c3e OK (Merge branch 'i2c/for-next')
  1a0b6abaea78 OK (Merge tag 'scsi-misc')
  3f583bc21977 BAD (Merge tag 'iommu-updates-v3.15')
 
 Yes, specifically (finally done bisecting):
 
 commit 2e45528930388658603ea24d49cf52867b928d3e
 Author: Jiang Liu jiang@linux.intel.com
 Date:   Wed Feb 19 14:07:36 2014 +0800
 
 iommu/vt-d: Unify the way to process DMAR device scope array
 
 Now we have a PCI bus notification based mechanism to update DMAR
 device scope array, we could extend the mechanism to support boot
 time initialization too, which will help to unify and simplify
 the implementation.
 
 Signed-off-by: Jiang Liu jiang@linux.intel.com
 Signed-off-by: Joerg Roedel j...@8bytes.org
 

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] SCSI: amd_iommu dma_boundary overflow

2013-02-21 Thread Joerg Roedel
Hi Eddie,

 On Tue, 2013-02-19 at 18:30 -0800, Eddie Wai wrote:
  The code seems correct as it make sense to impose the same hardware
  segment boundary limit on both the blk queue and the DMA code.  It would
  be an easy alternative to simply prevent the shost-dma_boundary from
  being set to DMA_BIT_MASK(64), but it seems more correct to fix the
  amd_iommu code itself to detect and handle this max 64-bit mask condition.

Thanks for tracking this problem down. It turns out that this code does
not only exist in the AMD IOMMU driver but also in other ones (Calgary
and GART at least, havn't checked all).

  --- a/drivers/iommu/amd_iommu.c
  +++ b/drivers/iommu/amd_iommu.c
  @@ -1526,11 +1526,14 @@ static unsigned long dma_ops_area_alloc(struct 
  device *dev,
  unsigned long boundary_size;
  unsigned long address = -1;
  unsigned long limit;
  +   unsigned long mask;
   
  next_bit = PAGE_SHIFT;
   
  -   boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
  -   PAGE_SIZE)  PAGE_SHIFT;

Given that there is a BUG_ON() in the iommu-helpers which checks for
!is_power_of_2(boundary_size) I think we can simplify the this macro and
avoid the overflow in a more clever way:

boundary_size = (dma_get_seg_boundary(dev)  PAGE_SHIFT) + 1;

This should work because dma_get_seg_boundary(dev) really needs to be a
bitmask which becomes a power_of_2 on incrementing.


Regards,

Joerg


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html