Re: Bad DMA from Marvell 9230

2014-05-30 Thread Roger Heflin
pretty much any smartcommands...I was running something that got all
of the smart stats 1x per hour per disk...and this made it crash about
1x per week, if you were pushing the disks hard it appear to make it
even more likely to crash under the smart cmds, removing the commands
took things up to 2-3 months between crashes.

I suspect if you just put a simple smartcmd --all /dev/sdX and ran it
a few times a minute if it had the issue it would almost certainly
crash in less than a day, I did not figure out the smart cmds were
crashing it, someone else's post indicate that they had determined
that and I figured out what I had doing smartcmds and removed them and
things got much betterr.

For finding good vendors, I know others on the md-raid list have given
up on cheap and found decent but more expensive controllers.

I would expect LSI and Adaptec to care enough about their names to
make a decent quality product.   There appears to be  4pt (1-8087
pt-jbod/nonraid) adaptec that may be some variant of marvell that is
about $130US on newegg, given it is adaptec they may have made the
marvell actually work.   There are a number of 8pt non-raid cards up
around $250-$300 that would probably work great if you wanted to pay
that much, these cards have 2x8087 ports and need a 8087->4sata cable
cable.   Given how nice it is to have a machine that just mostly works
without messing around with it I would probably pay the extra for
stability.

Last time I looked at the 2pt/pciex1 cards I found significant
indications of instability enough to expect that I would have to put
several hours (or more) of testing/crashing/RMA  pain in to figure out
which worked.I went so far as crossing out any of the motherboards
with non-AMD/non-intel sata ports as I have been burned before on
large MB vendors doing a bad job of integrating others (possibly bad)
sata ports in, it is a sad state, but it also has been this way for a
long time.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bad DMA from Marvell 9230

2014-05-30 Thread Benjamin Herrenschmidt
On Fri, 2014-05-30 at 09:13 -0500, Roger Heflin wrote:
> Do enough smartcmds and the entire board (all 4 ports) locked up and
> required a reboot, I quit doing smartcmds and stability went way up,
> but it was still not 100% stable.

Any chance you can give me an example of "enough smartcmds" ? IE a
script or something that reliably breaks it for you ? I'd like to try on
my 9235.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bad DMA from Marvell 9230

2014-05-30 Thread Benjamin Herrenschmidt
On Fri, 2014-05-30 at 09:58 -0400, Jérôme Carretero wrote:
> Weird (I hadn't seen that you reported the 9235 working...), I have
> IOMMU problems with a 9235...
> 
> What system are you running it on (when you say "power box", is it a
> beefy x86 computer or literally a PowerPC)?
> For me, AMD 990FX chipset, latest master linux.
> My board works fine* on another non-IOMMU system.

A PowerPC POWER8 prototype machine :-)

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bad DMA from Marvell 9230

2014-05-30 Thread Jérôme Carretero
On Fri, 30 May 2014 09:13:43 -0500
Roger Heflin  wrote:

> I had a 9230...
> [...]
> Supplier support "claimed" it to be a Linux AHCI bug as the "claim"
> that their board correctly supports AHCI, even though all other AHCI
> boards work right in this exact same use case in the exact same
> machine.

Does somebody know about another supplier that provides equivalent
SATA adapters that behave well, are robust and support FIS switching,
and don't come with proprietary drivers/utilities but rather *support*
their linux driver?

I'd bite the bullet and get a better, more expensive device, but it
doesn't seem to come with appropriate software support either.

There are some RAID adapters that don't expose the disks if we're
not creating RAID volumes... with an ugly CLI, and where we don't know
what's written where on the disk in case we are to create one volume
per disk, and do software RAID later.
Not tempted to use that.

Or waste PCIe slots and use more el-cheapo ASMedia 1061 PCIe-1x
devices... do these work well?

-- 
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bad DMA from Marvell 9230

2014-05-30 Thread Roger Heflin
I had a 9230...on older kernels it worked "ok" so long as you did not
do any smart commands, I removed it and went to something that works.
   Marvell appears to be hit and miss with some cards/chips working
right and some not...

Do enough smartcmds and the entire board (all 4 ports) locked up and
required a reboot, I quit doing smartcmds and stability went way up,
but it was still not 100% stable.

Supplier support "claimed" it to be a Linux AHCI bug as the "claim"
that their board correctly supports AHCI, even though all other AHCI
boards work right in this exact same use case in the exact same
machine.

On Fri, May 30, 2014 at 8:58 AM, Jérôme Carretero  wrote:
> On Fri, 30 May 2014 20:37:58 +1000
> Benjamin Herrenschmidt  wrote:
>
>> We've switched to a 9235 instead which seems to work fine.
>
> Weird (I hadn't seen that you reported the 9235 working...), I have
> IOMMU problems with a 9235...
>
> What system are you running it on (when you say "power box", is it a
> beefy x86 computer or literally a PowerPC)?
> For me, AMD 990FX chipset, latest master linux.
> My board works fine* on another non-IOMMU system.
>
> --
> Jérôme
>
> * with issues with port multipliers
>
> Link to Benjamin's first message: https://lkml.org/lkml/2014/3/27/43
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bad DMA from Marvell 9230

2014-05-30 Thread Jérôme Carretero
On Fri, 30 May 2014 20:37:58 +1000
Benjamin Herrenschmidt  wrote:

> We've switched to a 9235 instead which seems to work fine.

Weird (I hadn't seen that you reported the 9235 working...), I have
IOMMU problems with a 9235...

What system are you running it on (when you say "power box", is it a
beefy x86 computer or literally a PowerPC)?
For me, AMD 990FX chipset, latest master linux.
My board works fine* on another non-IOMMU system.

-- 
Jérôme

* with issues with port multipliers

Link to Benjamin's first message: https://lkml.org/lkml/2014/3/27/43
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bad DMA from Marvell 9230

2014-05-30 Thread Benjamin Herrenschmidt
On Fri, 2014-05-30 at 03:06 -0400, Jérôme Carretero wrote:
> On Thu, 27 Mar 2014 17:57:37 +1100
> Benjamin Herrenschmidt  wrote:
> 
> > I've been trying a 9230 on a power box here (a 9235 on the same
> > machine works fine) and it blows up with an IOMMU violation early
> > during init.
> 
> Hi,
> 
> That's https://bugzilla.kernel.org/show_bug.cgi?id=42679
> if you haven't already found it.

Somewhat... It's not the phantom function, the error I capture in my
IOMMU shows that it's trying to read from address 0 which is unmapped
but with the right initiator.

This device is a pile of crap. We've talked to Marvell support channel,
sent driver traces etc... but they didn't admit anything.

We've switched to a 9235 instead which seems to work fine.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bad DMA from Marvell 9230

2014-05-30 Thread Jérôme Carretero
On Thu, 27 Mar 2014 17:57:37 +1100
Benjamin Herrenschmidt  wrote:

> I've been trying a 9230 on a power box here (a 9235 on the same
> machine works fine) and it blows up with an IOMMU violation early
> during init.

Hi,

That's https://bugzilla.kernel.org/show_bug.cgi?id=42679
if you haven't already found it.

-- 
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bad DMA from Marvell 9230

2014-05-30 Thread Jérôme Carretero
On Thu, 27 Mar 2014 17:57:37 +1100
Benjamin Herrenschmidt b...@kernel.crashing.org wrote:

 I've been trying a 9230 on a power box here (a 9235 on the same
 machine works fine) and it blows up with an IOMMU violation early
 during init.

Hi,

That's https://bugzilla.kernel.org/show_bug.cgi?id=42679
if you haven't already found it.

-- 
Jérôme
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bad DMA from Marvell 9230

2014-05-30 Thread Benjamin Herrenschmidt
On Fri, 2014-05-30 at 03:06 -0400, Jérôme Carretero wrote:
 On Thu, 27 Mar 2014 17:57:37 +1100
 Benjamin Herrenschmidt b...@kernel.crashing.org wrote:
 
  I've been trying a 9230 on a power box here (a 9235 on the same
  machine works fine) and it blows up with an IOMMU violation early
  during init.
 
 Hi,
 
 That's https://bugzilla.kernel.org/show_bug.cgi?id=42679
 if you haven't already found it.

Somewhat... It's not the phantom function, the error I capture in my
IOMMU shows that it's trying to read from address 0 which is unmapped
but with the right initiator.

This device is a pile of crap. We've talked to Marvell support channel,
sent driver traces etc... but they didn't admit anything.

We've switched to a 9235 instead which seems to work fine.

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bad DMA from Marvell 9230

2014-05-30 Thread Jérôme Carretero
On Fri, 30 May 2014 20:37:58 +1000
Benjamin Herrenschmidt b...@kernel.crashing.org wrote:

 We've switched to a 9235 instead which seems to work fine.

Weird (I hadn't seen that you reported the 9235 working...), I have
IOMMU problems with a 9235...

What system are you running it on (when you say power box, is it a
beefy x86 computer or literally a PowerPC)?
For me, AMD 990FX chipset, latest master linux.
My board works fine* on another non-IOMMU system.

-- 
Jérôme

* with issues with port multipliers

Link to Benjamin's first message: https://lkml.org/lkml/2014/3/27/43
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bad DMA from Marvell 9230

2014-05-30 Thread Roger Heflin
I had a 9230...on older kernels it worked ok so long as you did not
do any smart commands, I removed it and went to something that works.
   Marvell appears to be hit and miss with some cards/chips working
right and some not...

Do enough smartcmds and the entire board (all 4 ports) locked up and
required a reboot, I quit doing smartcmds and stability went way up,
but it was still not 100% stable.

Supplier support claimed it to be a Linux AHCI bug as the claim
that their board correctly supports AHCI, even though all other AHCI
boards work right in this exact same use case in the exact same
machine.

On Fri, May 30, 2014 at 8:58 AM, Jérôme Carretero cj...@zougloub.eu wrote:
 On Fri, 30 May 2014 20:37:58 +1000
 Benjamin Herrenschmidt b...@kernel.crashing.org wrote:

 We've switched to a 9235 instead which seems to work fine.

 Weird (I hadn't seen that you reported the 9235 working...), I have
 IOMMU problems with a 9235...

 What system are you running it on (when you say power box, is it a
 beefy x86 computer or literally a PowerPC)?
 For me, AMD 990FX chipset, latest master linux.
 My board works fine* on another non-IOMMU system.

 --
 Jérôme

 * with issues with port multipliers

 Link to Benjamin's first message: https://lkml.org/lkml/2014/3/27/43
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bad DMA from Marvell 9230

2014-05-30 Thread Jérôme Carretero
On Fri, 30 May 2014 09:13:43 -0500
Roger Heflin rogerhef...@gmail.com wrote:

 I had a 9230...
 [...]
 Supplier support claimed it to be a Linux AHCI bug as the claim
 that their board correctly supports AHCI, even though all other AHCI
 boards work right in this exact same use case in the exact same
 machine.

Does somebody know about another supplier that provides equivalent
SATA adapters that behave well, are robust and support FIS switching,
and don't come with proprietary drivers/utilities but rather *support*
their linux driver?

I'd bite the bullet and get a better, more expensive device, but it
doesn't seem to come with appropriate software support either.

There are some RAID adapters that don't expose the disks if we're
not creating RAID volumes... with an ugly CLI, and where we don't know
what's written where on the disk in case we are to create one volume
per disk, and do software RAID later.
Not tempted to use that.

Or waste PCIe slots and use more el-cheapo ASMedia 1061 PCIe-1x
devices... do these work well?

-- 
Jérôme
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bad DMA from Marvell 9230

2014-05-30 Thread Benjamin Herrenschmidt
On Fri, 2014-05-30 at 09:58 -0400, Jérôme Carretero wrote:
 Weird (I hadn't seen that you reported the 9235 working...), I have
 IOMMU problems with a 9235...
 
 What system are you running it on (when you say power box, is it a
 beefy x86 computer or literally a PowerPC)?
 For me, AMD 990FX chipset, latest master linux.
 My board works fine* on another non-IOMMU system.

A PowerPC POWER8 prototype machine :-)

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bad DMA from Marvell 9230

2014-05-30 Thread Benjamin Herrenschmidt
On Fri, 2014-05-30 at 09:13 -0500, Roger Heflin wrote:
 Do enough smartcmds and the entire board (all 4 ports) locked up and
 required a reboot, I quit doing smartcmds and stability went way up,
 but it was still not 100% stable.

Any chance you can give me an example of enough smartcmds ? IE a
script or something that reliably breaks it for you ? I'd like to try on
my 9235.

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bad DMA from Marvell 9230

2014-05-30 Thread Roger Heflin
pretty much any smartcommands...I was running something that got all
of the smart stats 1x per hour per disk...and this made it crash about
1x per week, if you were pushing the disks hard it appear to make it
even more likely to crash under the smart cmds, removing the commands
took things up to 2-3 months between crashes.

I suspect if you just put a simple smartcmd --all /dev/sdX and ran it
a few times a minute if it had the issue it would almost certainly
crash in less than a day, I did not figure out the smart cmds were
crashing it, someone else's post indicate that they had determined
that and I figured out what I had doing smartcmds and removed them and
things got much betterr.

For finding good vendors, I know others on the md-raid list have given
up on cheap and found decent but more expensive controllers.

I would expect LSI and Adaptec to care enough about their names to
make a decent quality product.   There appears to be  4pt (1-8087
pt-jbod/nonraid) adaptec that may be some variant of marvell that is
about $130US on newegg, given it is adaptec they may have made the
marvell actually work.   There are a number of 8pt non-raid cards up
around $250-$300 that would probably work great if you wanted to pay
that much, these cards have 2x8087 ports and need a 8087-4sata cable
cable.   Given how nice it is to have a machine that just mostly works
without messing around with it I would probably pay the extra for
stability.

Last time I looked at the 2pt/pciex1 cards I found significant
indications of instability enough to expect that I would have to put
several hours (or more) of testing/crashing/RMA  pain in to figure out
which worked.I went so far as crossing out any of the motherboards
with non-AMD/non-intel sata ports as I have been burned before on
large MB vendors doing a bad job of integrating others (possibly bad)
sata ports in, it is a sad state, but it also has been this way for a
long time.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bad DMA from Marvell 9230

2014-04-04 Thread Robert Hancock

On 27/03/14 09:19 AM, Tejun Heo wrote:

On Thu, Mar 27, 2014 at 05:57:37PM +1100, Benjamin Herrenschmidt wrote:

I've contacted Marvell, but I was wondering if anybody here had already
experienced something similar or has an idea of what else the chip
might be doing wrong so we can try to find a workaround ?


No idea.  First time to hear such problem. :(



There are other Marvell controllers that do DMA requests from the wrong 
PCI function ID and cause IOMMU issues, so it seems like testing on such 
systems (or just validating the DMA transactions done by the controller 
by some other means) isn't something that Marvell likes to do. 
Presumably reading from address 0 is normally fine without an IOMMU, so 
this problem wouldn't be noticed otherwise.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bad DMA from Marvell 9230

2014-04-04 Thread Robert Hancock

On 27/03/14 09:19 AM, Tejun Heo wrote:

On Thu, Mar 27, 2014 at 05:57:37PM +1100, Benjamin Herrenschmidt wrote:

I've contacted Marvell, but I was wondering if anybody here had already
experienced something similar or has an idea of what else the chip
might be doing wrong so we can try to find a workaround ?


No idea.  First time to hear such problem. :(



There are other Marvell controllers that do DMA requests from the wrong 
PCI function ID and cause IOMMU issues, so it seems like testing on such 
systems (or just validating the DMA transactions done by the controller 
by some other means) isn't something that Marvell likes to do. 
Presumably reading from address 0 is normally fine without an IOMMU, so 
this problem wouldn't be noticed otherwise.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bad DMA from Marvell 9230

2014-03-27 Thread Tejun Heo
On Thu, Mar 27, 2014 at 05:57:37PM +1100, Benjamin Herrenschmidt wrote:
> I've contacted Marvell, but I was wondering if anybody here had already
> experienced something similar or has an idea of what else the chip
> might be doing wrong so we can try to find a workaround ?

No idea.  First time to hear such problem. :(

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Bad DMA from Marvell 9230

2014-03-27 Thread Benjamin Herrenschmidt
Hi Folks !

Do that ring any bell ?

I've been trying a 9230 on a power box here (a 9235 on the same machine
works fine) and it blows up with an IOMMU violation early during init.

>From what I can tell the scenario is:

- So we still haven't issued any command per-se, all our DMA command
buffers etc... are all 0's at the point of the error.

 - The core libata calls the AHCI driver's ahci_hardreset() for each
port in a separate thread. They all call sata_link_hardreset().

 - This in turns calls sata_link_resume() which write to the SCR_CONTROL
register as follow:

scontrol = (scontrol & 0x0f0) | 0x300;
if ((rc = sata_scr_write(link, SCR_CONTROL, scontrol)))
{
printk(" -> sata_link_resume FAIL 2\n");
return rc;
}

/*
 * Some PHYs react badly if SStatus is pounded
 * immediately after resuming.  Delay 200ms before
 * debouncing.
 */
ata_msleep(link->ap, 200);

I get the interrupt from the IOMMU about 2ms after the write to
SCR_CONTROL.

Now, pending misinterpretation of some bits on my side, it looks like
the bad DMA is a DMA *read* from address 0 (which we never map,
typically to catch driver bugs).

I went through a few theories with this one but so far none held. I
don't think it's a D2H FIS issue since the DMA pointers for that appear
to be setup properly, the memory mapped, etc...

I though the chip might incorrectly/inadvertently try to (pre)fetch a
command. At that point all 32 command slots are all 0's, so if it
ignored the size it might try to fetch from command address 0.

So I added a loop to fill all 32 slots with a valid command address
in ahci_hardreset:

+   for (i = 0; i < 32; i++)
+   ahci_fill_cmd_slot(pp, i, 0);
rc = sata_link_hardreset(link, timing, deadline, ,
 ahci_check_ready);

But that had basically no effect.

I've contacted Marvell, but I was wondering if anybody here had already
experienced something similar or has an idea of what else the chip
might be doing wrong so we can try to find a workaround ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Bad DMA from Marvell 9230

2014-03-27 Thread Benjamin Herrenschmidt
Hi Folks !

Do that ring any bell ?

I've been trying a 9230 on a power box here (a 9235 on the same machine
works fine) and it blows up with an IOMMU violation early during init.

From what I can tell the scenario is:

- So we still haven't issued any command per-se, all our DMA command
buffers etc... are all 0's at the point of the error.

 - The core libata calls the AHCI driver's ahci_hardreset() for each
port in a separate thread. They all call sata_link_hardreset().

 - This in turns calls sata_link_resume() which write to the SCR_CONTROL
register as follow:

scontrol = (scontrol  0x0f0) | 0x300;
if ((rc = sata_scr_write(link, SCR_CONTROL, scontrol)))
{
printk( - sata_link_resume FAIL 2\n);
return rc;
}

/*
 * Some PHYs react badly if SStatus is pounded
 * immediately after resuming.  Delay 200ms before
 * debouncing.
 */
ata_msleep(link-ap, 200);

I get the interrupt from the IOMMU about 2ms after the write to
SCR_CONTROL.

Now, pending misinterpretation of some bits on my side, it looks like
the bad DMA is a DMA *read* from address 0 (which we never map,
typically to catch driver bugs).

I went through a few theories with this one but so far none held. I
don't think it's a D2H FIS issue since the DMA pointers for that appear
to be setup properly, the memory mapped, etc...

I though the chip might incorrectly/inadvertently try to (pre)fetch a
command. At that point all 32 command slots are all 0's, so if it
ignored the size it might try to fetch from command address 0.

So I added a loop to fill all 32 slots with a valid command address
in ahci_hardreset:

+   for (i = 0; i  32; i++)
+   ahci_fill_cmd_slot(pp, i, 0);
rc = sata_link_hardreset(link, timing, deadline, online,
 ahci_check_ready);

But that had basically no effect.

I've contacted Marvell, but I was wondering if anybody here had already
experienced something similar or has an idea of what else the chip
might be doing wrong so we can try to find a workaround ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bad DMA from Marvell 9230

2014-03-27 Thread Tejun Heo
On Thu, Mar 27, 2014 at 05:57:37PM +1100, Benjamin Herrenschmidt wrote:
 I've contacted Marvell, but I was wondering if anybody here had already
 experienced something similar or has an idea of what else the chip
 might be doing wrong so we can try to find a workaround ?

No idea.  First time to hear such problem. :(

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/