Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-26 Thread Joerg Roedel
On Wed, Mar 25, 2015 at 12:36:34PM -0400, Mark Hounschell wrote:
 BTW, so far the first 2 patches are working well. I was going to
 wait until the end of the day to report but so far I have been
 unable to produce the problems I was seeing. And I am in the middle
 of some driver work so lots of unloading/loading going on.

Great, thanks. Please let me know when you have test results for the
other patches too.


Joerg

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-26 Thread Mark Hounschell

On 03/26/2015 11:52 AM, Joerg Roedel wrote:

Hi Mark,

On Thu, Mar 26, 2015 at 10:58:15AM -0400, Mark Hounschell wrote:

Sorry but CMA was still badly broken. I have a patch below that works.


In which way is it broken? What happens when you try to allocate memory
with dma_alloc_coherent?



I got Oops on both the alloc and free and I had to hit the reset button.


I've tested it with small (no CMA) and large (with CMA) dma's using
dma_alloc_coherent. The patch below is just the git diff from your
cloned tree piped to a file then copied into this email. If you require
an official patch I can send one. Just let me know.


The main differences I can spot are that you change the order (first
CMA, then buddy) and you manually align the input size. I can see the
reason for the later, but why does CMA need to be tried first?



I believe that is how dma_alloc_coherent works. If  CMA is present it 
uses it. Even for small allocations. If not present then it does the 
best it can. That patch was derived by looking at the Intel iommu driver 
works properly. Maybe you should take a look at that.



Also, in my opinion, this CMA thing is clearly a BUG not a feature
request. The AMD iommu clearly breaks CMA. I feel what ever fix
you are happy with should be back ported to stable.


It is not a BUG, the interface definition for dma_alloc_coherent does
not specify that it can allocate infinite amounts of memory. So this
patch does not qualify for stable.




I don't care what the the interface definition for dma_alloc_coherent 
says. If it does not describe it's use with CMA, it is outdated. Maybe 
the interface definition for dma_alloc_coherent was done before CMA 
was implemented.


Maybe you should be looking at CMA Documentation.  Unfortunately there 
does not yet seem to be any in the Documentation dir. Probably because 
CMA is supposed to be transparent to the DMA API. You know, one should 
not need to details about it, etc...


It is a BUG. To access CMA you use dma_alloc_coherent. You have 
completely highjacked that function.  You have broken CMA. PERIOD. That 
is a BUG.


Regards
Mark
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-26 Thread Mark Hounschell
Hi Joerg,

On 03/26/2015 08:45 AM, Joerg Roedel wrote:
 On Wed, Mar 25, 2015 at 12:36:34PM -0400, Mark Hounschell wrote:
 BTW, so far the first 2 patches are working well. I was going to
 wait until the end of the day to report but so far I have been
 unable to produce the problems I was seeing. And I am in the middle
 of some driver work so lots of unloading/loading going on.
 
 Great, thanks. Please let me know when you have test results for the
 other patches too.
 
 
   Joerg
 
Sorry but CMA was still badly broken. I have a patch below that works.
I've tested it with small (no CMA) and large (with CMA) dma's using
dma_alloc_coherent. The patch below is just the git diff from your
cloned tree piped to a file then copied into this email. If you require
an official patch I can send one. Just let me know.

Also, in my opinion, this CMA thing is clearly a BUG not a feature
request. The AMD iommu clearly breaks CMA. I feel what ever fix
you are happy with should be back ported to stable.

Regards
Mark 

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index a0197d0..5ea4fed 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2917,28 +2917,36 @@ static void *alloc_coherent(struct device *dev, size_t 
size,
u64 dma_mask = dev-coherent_dma_mask;
struct protection_domain *domain;
unsigned long flags;
-   struct page *page;
+   struct page *page = 0;
+   int order;
+   unsigned int count;
+
+   size = PAGE_ALIGN(size);
+   order = get_order(size);
+   count = size  PAGE_SHIFT;
 
INC_STATS_COUNTER(cnt_alloc_coherent);
 
domain = get_domain(dev);
if (PTR_ERR(domain) == -EINVAL) {
-   page = alloc_pages(flag, get_order(size));
+   page = alloc_pages(flag, order);
*dma_addr = page_to_phys(page);
return page_address(page);
} else if (IS_ERR(domain))
return NULL;
 
-   dma_mask  = dev-coherent_dma_mask;
flag = ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32);
+   flag |= __GFP_ZERO;
 
-   page = alloc_pages(flag, get_order(size));
+   if (flag  __GFP_WAIT) {
+   page = dma_alloc_from_contiguous(dev, count, order);
+   if (page  page_to_phys(page) + size  dma_mask) {
+   dma_release_from_contiguous(dev, page, count);
+   page = NULL;
+   }
+   }
if (!page) {
-   if (!(flag  __GFP_WAIT))
-   return NULL;
-
-   page = dma_alloc_from_contiguous(dev, size  PAGE_SHIFT,
-get_order(size));
+   page = alloc_pages(flag, order);
if (!page)
return NULL;
}
@@ -2951,7 +2959,7 @@ static void *alloc_coherent(struct device *dev, size_t 
size,
*dma_addr = __map_single(dev, domain-priv, page_to_phys(page),
 size, DMA_BIDIRECTIONAL, true, dma_mask);
 
-   if (*dma_addr == DMA_ERROR_CODE) {
+   if (!dma_addr || (*dma_addr == DMA_ERROR_CODE)) {
spin_unlock_irqrestore(domain-lock, flags);
goto out_free;
}
@@ -2965,7 +2973,7 @@ static void *alloc_coherent(struct device *dev, size_t 
size,
 out_free:
 
if (!dma_release_from_contiguous(dev, page, size  PAGE_SHIFT))
-   __free_pages(page, get_order(size));
+   __free_pages(page, order);
 
return NULL;
 }
@@ -2979,6 +2987,11 @@ static void free_coherent(struct device *dev, size_t 
size,
 {
unsigned long flags;
struct protection_domain *domain;
+   int order;
+   struct page *page = virt_to_page(virt_addr);
+
+   size = PAGE_ALIGN(size);
+   order = get_order(size);
 
INC_STATS_COUNTER(cnt_free_coherent);
 
@@ -2995,7 +3008,9 @@ static void free_coherent(struct device *dev, size_t size,
spin_unlock_irqrestore(domain-lock, flags);
 
 free_mem:
-   free_pages((unsigned long)virt_addr, get_order(size));
+   if (!dma_release_from_contiguous(dev, page, size  PAGE_SHIFT))
+   __free_pages(page, order);
+
 }
 
 /*

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-26 Thread Joerg Roedel
Hi Mark,

On Thu, Mar 26, 2015 at 10:58:15AM -0400, Mark Hounschell wrote:
 Sorry but CMA was still badly broken. I have a patch below that works.

In which way is it broken? What happens when you try to allocate memory
with dma_alloc_coherent?

 I've tested it with small (no CMA) and large (with CMA) dma's using
 dma_alloc_coherent. The patch below is just the git diff from your
 cloned tree piped to a file then copied into this email. If you require
 an official patch I can send one. Just let me know.

The main differences I can spot are that you change the order (first
CMA, then buddy) and you manually align the input size. I can see the
reason for the later, but why does CMA need to be tried first?

 Also, in my opinion, this CMA thing is clearly a BUG not a feature
 request. The AMD iommu clearly breaks CMA. I feel what ever fix
 you are happy with should be back ported to stable.

It is not a BUG, the interface definition for dma_alloc_coherent does
not specify that it can allocate infinite amounts of memory. So this
patch does not qualify for stable.


Joerg

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-25 Thread Joerg Roedel
On Tue, Mar 24, 2015 at 07:57:49AM -0400, Mark Hounschell wrote:
 I'll be happy to test it.

Okay, here you go:

git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git amd-iommu

This branch has two patches on-top of v4.0-rc5 which should fix the
issue you described here. Please let me know if it works for you.


Joerg

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-25 Thread Joerg Roedel
Hi again,

On Wed, Mar 25, 2015 at 02:59:37PM +0100, Joerg Roedel wrote:
 On Tue, Mar 24, 2015 at 07:57:49AM -0400, Mark Hounschell wrote:
  I'll be happy to test it.
 
 Okay, here you go:
 
   git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git amd-iommu

I just added more patches to implement the use of the contiguous memory
allocator for the dma_alloc_coherent path. Since you asked for it, can
you test this as well?


Thanks,

Joerg

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-25 Thread Mark Hounschell

On 03/25/2015 11:13 AM, Joerg Roedel wrote:

Hi again,

On Wed, Mar 25, 2015 at 02:59:37PM +0100, Joerg Roedel wrote:

On Tue, Mar 24, 2015 at 07:57:49AM -0400, Mark Hounschell wrote:

I'll be happy to test it.


Okay, here you go:

git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git amd-iommu


I just added more patches to implement the use of the contiguous memory
allocator for the dma_alloc_coherent path. Since you asked for it, can
you test this as well?




Sure. No problem. If I update my tree from the above, will I get it.

BTW, so far the first 2 patches are working well. I was going to wait 
until the end of the day to report but so far I have been unable to 
produce the problems I was seeing. And I am in the middle of some driver 
work so lots of unloading/loading going on.


Regards
Mark

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-24 Thread Mark Hounschell

On 03/23/2015 11:03 AM, Joerg Roedel wrote:

Hi Mark,

On Tue, Mar 03, 2015 at 02:36:19PM -0500, Mark Hounschell wrote:

It looks like this problem is NOT a bug with the SCSI aic7xxx driver
after all. I can duplicate this BUG very easily with other hardware.
Simply removing a driver module  (whether it its self, has actually
used any of the DMA API or not) that is sitting on the same pci bus
as a card that is actually using DMA will cause this. And that card
that is in use and using DMA will no longer function. It looks and
feels like unloading a module causes the IOMMU to improperly unmap
valid mappings.


You are right, I looked into the code and found the problem. I'll post a
fix for testing this week.



I'll be happy to test it.

Regards
Mark

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-23 Thread Joerg Roedel
Hi Mark,

On Tue, Mar 03, 2015 at 02:36:19PM -0500, Mark Hounschell wrote:
 It looks like this problem is NOT a bug with the SCSI aic7xxx driver
 after all. I can duplicate this BUG very easily with other hardware.
 Simply removing a driver module  (whether it its self, has actually
 used any of the DMA API or not) that is sitting on the same pci bus
 as a card that is actually using DMA will cause this. And that card
 that is in use and using DMA will no longer function. It looks and
 feels like unloading a module causes the IOMMU to improperly unmap
 valid mappings.

You are right, I looked into the code and found the problem. I'll post a
fix for testing this week.


Joerg

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-03 Thread Mark Hounschell

Hi Joerg,

It looks like this problem is NOT a bug with the SCSI aic7xxx driver 
after all. I can duplicate this BUG very easily with other hardware. 
Simply removing a driver module  (whether it its self, has actually used 
any of the DMA API or not) that is sitting on the same pci bus as a card 
that is actually using DMA will cause this. And that card that is in use 
and using DMA will no longer function. It looks and feels like 
unloading a module causes the IOMMU to improperly unmap valid mappings.


Regards
Mark

 Kernel 3.18.7 (x86_64) with an enabled IOMMU. This happens 
immediately after

unloading the aic7xxx driver. In fact this happens unloading any module 
associated
with any pci card on the same bus as the Adaptec AHA-2930CU card. If I blacklist
the aic7xxx driver the problem never shows.  If I disable the IOMMU, the problem
never shows.

This was brought up over at io...@lists.linux-foundation.org in this thread.
http://lists.linuxfoundation.org/pipermail/iommu/2015-February/012274.html
and they think it is not an iommu problem, but a possibly a problem with the
aic7xxx driver.  Complete dmesg attached.

[  106.355752] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x 
address=0x00073980 flags=0x0070]
[  106.358732] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x 
address=0x000739a0 flags=0x0070]
[  106.361870] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x 
address=0x000739b0 flags=0x0070]
[  112.352950] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x 
address=0x00074d00 flags=0x0070]
[  112.356057] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x 
address=0x00074d20 flags=0x0070]
[  112.359142] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x 
address=0x00074d30 flags=0x0070]
[  126.588896] AMD-Vi: Using protection domain 1 for device :10:05.0
[  126.591992] [ cut here ]
[  126.595107] WARNING: CPU: 4 PID: 0 at drivers/iommu/amd_iommu.c:2637 
dma_ops_domain_unmap.part.13+0x65/0x70()
[  126.598408] Modules linked in: iscsi_ibft iscsi_boot_sysfs af_packet kvm 
crc32_pclmul crc32c_intel aesni_intel snd_hda_codec_realtek 
snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec aes_x86_64 
drm snd_hwdep snd_pcm glue_helper lrw aic79xx gf128mul ablk_helper cryptd 
snd_timer 3c59x scsi_transport_spi snd r8169 xhci_pci xhci_hcd serio_raw pcspkr 
mii tpm_infineon tpm_tis fam15h_power k10temp i2c_piix4 processor thermal_sys 
shpchp tpm soundcore 8250_fintek dm_mod sr_mod cdrom ata_generic mxm_wmi 
pata_atiixp ohci_pci wmi button sg autofs4 [last unloaded: aic7xxx]
[  126.613992] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 3.18.7-lcrs #1
[  126.618146] Hardware name: Gigabyte Technology Co., Ltd. To be filled by 
O.E.M./990FXA-UD5, BIOS FB 01/23/2013
[  126.622456]  0009 88044fd03cb8 807bde4d 

[  126.626815]   88044fd03cf8 8024cf7c 
88044fd03cf8
[  126.631172]  88043cdbc380  0600 
000702c0
[  126.635504] Call Trace:
[  126.639703]  IRQ  [807bde4d] dump_stack+0x4e/0x71
[  126.644003]  [8024cf7c] warn_slowpath_common+0x7c/0xa0
[  126.648335]  [8024d045] warn_slowpath_null+0x15/0x20
[  126.652632]  [806a47e5] dma_ops_domain_unmap.part.13+0x65/0x70
[  126.656919]  [806a675b] __unmap_single.isra.16+0x9b/0x100
[  126.661242]  [806a7198] unmap_page+0x48/0x70
[  126.665572]  [a0157373] boomerang_rx+0x333/0x600 [3c59x]
[  126.669968]  [a015784a] boomerang_interrupt+0x16a/0x4f0 [3c59x]
[  126.674400]  [8029600e] handle_irq_event_percpu+0x3e/0x1e0
[  126.678860]  [802961ec] handle_irq_event+0x3c/0x60
[  126.683298]  [80298bfe] handle_fasteoi_irq+0x7e/0x130
[  126.687732]  [8020543d] handle_irq+0x1d/0x30
[  126.692164]  [80204cee] do_IRQ+0x4e/0xf0
[  126.696577]  [807c596a] common_interrupt+0x6a/0x6a
[  126.700873]  EOI  [802a6e23] ? hrtimer_start+0x13/0x20
[  126.705107]  [8020cad7] ? default_idle+0x17/0x100
[  126.709386]  [8020d4ca] arch_cpu_idle+0xa/0x10
[  126.713685]  [80281dda] cpu_startup_entry+0x34a/0x380
[  126.718010]  [80281a91] ? cpu_startup_entry+0x1/0x380
[  126.722361]  [80233857] start_secondary+0x157/0x180
[  126.726731] ---[ end trace ace20602a5302afb ]---
[  128.557546] [ cut here ]
[  128.560501] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x0001 
address=0x000766c0 flags=0x0020]
[  128.560502] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x0001 
address=0x00076700 flags=0x0020]
[  128.560503] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x0001 
address=0x00076740 flags=0x0020]
[  128.560504] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0