subject:"Serious AMD\-Vi\(\?\) issue"

Re: Serious AMD-Vi(?) issue

2024-05-15 Thread Elliott Mitchell

On Wed, May 15, 2024 at 02:40:31PM +0100, Kelly Choi wrote:
> 
> As explained previously, we are happy to help resolve issues and provide
> advice where necessary. However, to do this, our developers need the
> relevant information to provide accurate resolutions. Given that our
> developers have repeatedly voiced their concerns, and are debugging this
> out of interest, please help us by providing all the necessary information.
> 
> Until we have this information, it will be very difficult to help you
> further. Should anything change, we would be glad to assist you.

Usually private submission of logs (PGP) is acceptable.

Note, I am not claiming Xen's `dmesg` contains truly concerning
information.  The issue is there is enough data for problematic
information to unintentionally leak in.  Alternatively no pieces would
be individually concerning, but all together information may leak.

Hopefully ACPI table addresses nor table order are effected by the
motherboard serial number, yet those could readily leak information.

So far this is acting like a major bug.  The paucity of reports is likely
due to few people using RAID1 with flash (most people relying greater
reliability even before the first large studies came out).

-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

Re: Serious AMD-Vi(?) issue

2024-05-15 Thread Kelly Choi

Hello Elliott,

Most of our developers are based in the EU timezone, however we are a
worldwide community.
The Xen Project is an open source community that everyone contributes to
and we do not divide how we provide help, based on location.

As explained previously, we are happy to help resolve issues and provide
advice where necessary. However, to do this, our developers need the
relevant information to provide accurate resolutions. Given that our
developers have repeatedly voiced their concerns, and are debugging this
out of interest, please help us by providing all the necessary information.

Until we have this information, it will be very difficult to help you
further. Should anything change, we would be glad to assist you.

Many thanks,
Kelly Choi

Community Manager
Xen Project

On Tue, May 14, 2024 at 9:51 PM Elliott Mitchell  wrote:

> On Tue, May 14, 2024 at 10:22:51AM +0200, Jan Beulich wrote:
> > On 13.05.2024 22:11, Elliott Mitchell wrote:
> > > On Mon, May 13, 2024 at 10:44:59AM +0200, Roger Pau Monné wrote:
> > >> Why do you mask the device SBDF in the above snippet?  I would really
> > >> like to understand what's so privacy relevant in a PCI SBDF number.
> > >
> > > I doubt it reveals much.  Simply seems unlikely to help debugging and
> > > therefore I prefer to mask it.
> >
> > SBDF in one place may be matchable against a memory address in another
> > place. _Any_ hiding of information is hindering analysis. Please can
> > you finally accept that it needs to be the person doing the analysis
> > to judge what is or is not relevant to them?
>
> Not going to happen as I'd accepted this long ago.  The usual approach
> is all developers have PGP keys (needed for security issues anyway) and
> you don't require all logs to be public.
>
> I've noticed the core of the Xen project appears centered in the EU.  Yet
> you're not catering to data privacy at all?  Or is this a service
> exclusively provided to people who prove they're EU citizens?
>
>
> --
> (\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
>  \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
>   \_CS\   |  _  -O #include  O-   _  |   /  _/
> 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
>
>
>

Re: Serious AMD-Vi(?) issue

2024-05-14 Thread Elliott Mitchell

On Tue, May 14, 2024 at 10:22:51AM +0200, Jan Beulich wrote:
> On 13.05.2024 22:11, Elliott Mitchell wrote:
> > On Mon, May 13, 2024 at 10:44:59AM +0200, Roger Pau Monné wrote:
> >> Why do you mask the device SBDF in the above snippet?  I would really
> >> like to understand what's so privacy relevant in a PCI SBDF number.
> > 
> > I doubt it reveals much.  Simply seems unlikely to help debugging and
> > therefore I prefer to mask it.
> 
> SBDF in one place may be matchable against a memory address in another
> place. _Any_ hiding of information is hindering analysis. Please can
> you finally accept that it needs to be the person doing the analysis
> to judge what is or is not relevant to them?

Not going to happen as I'd accepted this long ago.  The usual approach
is all developers have PGP keys (needed for security issues anyway) and
you don't require all logs to be public.

I've noticed the core of the Xen project appears centered in the EU.  Yet
you're not catering to data privacy at all?  Or is this a service
exclusively provided to people who prove they're EU citizens?


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

Re: Serious AMD-Vi(?) issue

2024-05-14 Thread Jan Beulich

On 13.05.2024 22:11, Elliott Mitchell wrote:
> On Mon, May 13, 2024 at 10:44:59AM +0200, Roger Pau Monné wrote:
>> Why do you mask the device SBDF in the above snippet?  I would really
>> like to understand what's so privacy relevant in a PCI SBDF number.
> 
> I doubt it reveals much.  Simply seems unlikely to help debugging and
> therefore I prefer to mask it.

SBDF in one place may be matchable against a memory address in another
place. _Any_ hiding of information is hindering analysis. Please can
you finally accept that it needs to be the person doing the analysis
to judge what is or is not relevant to them?

Jan

Re: Serious AMD-Vi(?) issue

2024-05-14 Thread Jan Beulich

On 13.05.2024 10:44, Roger Pau Monné wrote:
> On Fri, May 10, 2024 at 09:09:54PM -0700, Elliott Mitchell wrote:
>> On Thu, Apr 18, 2024 at 09:33:31PM -0700, Elliott Mitchell wrote:
>>>
>>> I suspect this is a case of there is some step which is missing from
>>> Xen's IOMMU handling.  Perhaps something which Linux does during an early
>>> DMA setup stage, but the current Xen implementation does lazily?
>>> Alternatively some flag setting or missing step?
>>>
>>> I should be able to do another test approach in a few weeks, but I would
>>> love if something could be found sooner.
>>
>> Turned out to be disturbingly easy to get the first entry when it
>> happened.  Didn't even need `dbench`, it simply showed once the OS was
>> fully loaded.  I did get some additional data points.
>>
>> Appears this requires an AMD IOMMUv2.  A test system with known
>> functioning AMD IOMMUv1 didn't display the issue at all.
>>
>> (XEN) AMD-Vi: IO_PAGE_FAULT: :bb:dd.f d0 addr fffdf800 flags 0x8 
>> I
> 
> I would expect the address field to contain more information about the
> fault, but I'm not finding any information on the AMD-Vi specification
> apart from that it contains the DVA, which makes no sense when the
> fault is caused by an interrupt.

Isn't the address above in the "magic" HT range (and hence still meaningful
as an address)?

Jan

Re: Serious AMD-Vi(?) issue

2024-05-13 Thread Elliott Mitchell

On Mon, May 13, 2024 at 10:44:59AM +0200, Roger Pau Monné wrote:
> On Fri, May 10, 2024 at 09:09:54PM -0700, Elliott Mitchell wrote:
> > On Thu, Apr 18, 2024 at 09:33:31PM -0700, Elliott Mitchell wrote:
> > > 
> > > I suspect this is a case of there is some step which is missing from
> > > Xen's IOMMU handling.  Perhaps something which Linux does during an early
> > > DMA setup stage, but the current Xen implementation does lazily?
> > > Alternatively some flag setting or missing step?
> > > 
> > > I should be able to do another test approach in a few weeks, but I would
> > > love if something could be found sooner.
> > 
> > Turned out to be disturbingly easy to get the first entry when it
> > happened.  Didn't even need `dbench`, it simply showed once the OS was
> > fully loaded.  I did get some additional data points.
> > 
> > Appears this requires an AMD IOMMUv2.  A test system with known
> > functioning AMD IOMMUv1 didn't display the issue at all.
> > 
> > (XEN) AMD-Vi: IO_PAGE_FAULT: :bb:dd.f d0 addr fffdf800 flags 
> > 0x8 I
> 
> I would expect the address field to contain more information about the
> fault, but I'm not finding any information on the AMD-Vi specification
> apart from that it contains the DVA, which makes no sense when the
> fault is caused by an interrupt.
> 
> > (XEN) :bb:dd.f root @ 83b5f5 (3 levels) dfn=fffdf8000
> > (XEN)   L3[1f7] = 0 np
> 
> Attempting to print the page table walk for an Interrupt remapping
> fault is useless, we should likely avoid that when the I flag is set.

> > I find it surprising this required "iommu=debug" to get this level of
> > detail.  This amount of output seems more appropriate for "verbose".
> 
> "verbose" should also print this information.

Mostly I've noticed Xen's dmesg seems a bit sparse at default settings.
Confirming IOMMU was recognized and operational had been a challenge.  On
the flip side this does mean less potentially sensitive data gets in.

> > I strongly prefer to provide snippets.  There is a fair bit of output,
> > I'm unsure which portion is most pertinent.
> 
> I've already voiced my concern that I think what yo uare doing is not
> fair.  We are debugging this out of interest, and hence you refusing
> to provide all information just hampers our ability to debug, and
> makes us spend more time than required just thinking what snippets we
> need to ask for.
> 
> I will ask again, what's there in the Xen or the Linux dmesgs that you
> are so worried about leaking? Please provide an specific example.

I cannot point to specific data in Xen's dmesg which is known to be
sensitive.  On the flip side all the addresses could readily function as
a subliminal channel.

Might only be kernels from certain vendors, but hardware serial numbers
frequently make it into Linux's dmesg.  All the data coming from ACPI
tables could readily hide something.  Worse, data which seems harmless
now might later turn out to reveal things.

The usual approach is everyone has PGP keys and logs are kept private
on request.

> Why do you mask the device SBDF in the above snippet?  I would really
> like to understand what's so privacy relevant in a PCI SBDF number.

I doubt it reveals much.  Simply seems unlikely to help debugging and
therefore I prefer to mask it.  One more Xen dmesg line:

(XEN) AMD-Vi: Setup I/O page table: device id = 0xbbdd, type = 0x1, root table 
= 0xADDRADDR, domain = 0, paging mode = 3

> Does booting with `iommu=no-intremap` lead to any issues being
> reported?

I'll try that next time I restart the system.

Another viable approach.  I imagine one or more of the Xen developers
have computers with AMD processors.  I could send a pair of SATA devices
which are known to exhibit the behavior to someone.

The known reproductions have featured ASUS motherboards.  I doubt this is
a requirement, but if one of the main developers has such a system that
is a better target.  I also note these are plugged into motherboard SATA
ports.  It is possible add-on card SATA ports might not exhibit the
behavior.

Then you may discover not much log data is being provided simply due to
not much log data being generated.

-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

Re: Serious AMD-Vi(?) issue

2024-05-13 Thread Roger Pau Monné

On Fri, May 10, 2024 at 09:09:54PM -0700, Elliott Mitchell wrote:
> On Thu, Apr 18, 2024 at 09:33:31PM -0700, Elliott Mitchell wrote:
> > 
> > I suspect this is a case of there is some step which is missing from
> > Xen's IOMMU handling.  Perhaps something which Linux does during an early
> > DMA setup stage, but the current Xen implementation does lazily?
> > Alternatively some flag setting or missing step?
> > 
> > I should be able to do another test approach in a few weeks, but I would
> > love if something could be found sooner.
> 
> Turned out to be disturbingly easy to get the first entry when it
> happened.  Didn't even need `dbench`, it simply showed once the OS was
> fully loaded.  I did get some additional data points.
> 
> Appears this requires an AMD IOMMUv2.  A test system with known
> functioning AMD IOMMUv1 didn't display the issue at all.
> 
> (XEN) AMD-Vi: IO_PAGE_FAULT: :bb:dd.f d0 addr fffdf800 flags 0x8 I

I would expect the address field to contain more information about the
fault, but I'm not finding any information on the AMD-Vi specification
apart from that it contains the DVA, which makes no sense when the
fault is caused by an interrupt.

> (XEN) :bb:dd.f root @ 83b5f5 (3 levels) dfn=fffdf8000
> (XEN)   L3[1f7] = 0 np

Attempting to print the page table walk for an Interrupt remapping
fault is useless, we should likely avoid that when the I flag is set.

> 
> I find it surprising this required "iommu=debug" to get this level of
> detail.  This amount of output seems more appropriate for "verbose".

"verbose" should also print this information.

> 
> I strongly prefer to provide snippets.  There is a fair bit of output,
> I'm unsure which portion is most pertinent.

I've already voiced my concern that I think what yo uare doing is not
fair.  We are debugging this out of interest, and hence you refusing
to provide all information just hampers our ability to debug, and
makes us spend more time than required just thinking what snippets we
need to ask for.

I will ask again, what's there in the Xen or the Linux dmesgs that you
are so worried about leaking? Please provide an specific example.

Why do you mask the device SBDF in the above snippet?  I would really
like to understand what's so privacy relevant in a PCI SBDF number.

Does booting with `iommu=no-intremap` lead to any issues being
reported?

Regards, Roger.

Re: Serious AMD-Vi(?) issue

2024-05-10 Thread Elliott Mitchell

On Thu, Apr 18, 2024 at 09:33:31PM -0700, Elliott Mitchell wrote:
> 
> I suspect this is a case of there is some step which is missing from
> Xen's IOMMU handling.  Perhaps something which Linux does during an early
> DMA setup stage, but the current Xen implementation does lazily?
> Alternatively some flag setting or missing step?
> 
> I should be able to do another test approach in a few weeks, but I would
> love if something could be found sooner.

Turned out to be disturbingly easy to get the first entry when it
happened.  Didn't even need `dbench`, it simply showed once the OS was
fully loaded.  I did get some additional data points.

Appears this requires an AMD IOMMUv2.  A test system with known
functioning AMD IOMMUv1 didn't display the issue at all.

(XEN) AMD-Vi: IO_PAGE_FAULT: :bb:dd.f d0 addr fffdf800 flags 0x8 I
(XEN) :bb:dd.f root @ 83b5f5 (3 levels) dfn=fffdf8000
(XEN)   L3[1f7] = 0 np

I find it surprising this required "iommu=debug" to get this level of
detail.  This amount of output seems more appropriate for "verbose".

I strongly prefer to provide snippets.  There is a fair bit of output,
I'm unsure which portion is most pertinent.

-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

Re: Serious AMD-Vi(?) issue

2024-04-18 Thread Elliott Mitchell

On Thu, Apr 18, 2024 at 09:09:51AM +0200, Jan Beulich wrote:
> On 18.04.2024 08:45, Elliott Mitchell wrote:
> > On Wed, Apr 17, 2024 at 02:40:09PM +0200, Jan Beulich wrote:
> >> On 11.04.2024 04:41, Elliott Mitchell wrote:
> >>> On Thu, Mar 28, 2024 at 07:25:02AM +0100, Jan Beulich wrote:
>  On 27.03.2024 18:27, Elliott Mitchell wrote:
> > On Mon, Mar 25, 2024 at 02:43:44PM -0700, Elliott Mitchell wrote:
> >> On Mon, Mar 25, 2024 at 08:55:56AM +0100, Jan Beulich wrote:
> >>>
> >>> In fact when running into trouble, the usual course of action would 
> >>> be to
> >>> increase verbosity in both hypervisor and kernel, just to make sure no
> >>> potentially relevant message is missed.
> >>
> >> More/better information might have been obtained if I'd been engaged
> >> earlier.
> >
> > This is still true, things are in full mitigation mode and I'll be
> > quite unhappy to go back with experiments at this point.
> 
>  Well, it very likely won't work without further experimenting by someone
>  able to observe the bad behavior. Recall we're on xen-devel here; it is
>  kind of expected that without clear (and practical) repro instructions
>  experimenting as well as info collection will remain with the reporter.
> >>>
> >>> After looking at the situation and considering the issues, I /may/ be
> >>> able to setup for doing more testing.  I guess I should confirm, which of
> >>> those criteria do you think currently provided information fails at?
> >>>
> >>> AMD-IOMMU + Linux MD RAID1 + dual Samsung SATA (or various NVMe) +
> >>> dbench; seems a pretty specific setup.
> >>
> >> Indeed. If that's the only way to observe the issue, it suggests to me
> >> that it'll need to be mainly you to do further testing, and perhaps even
> >> debugging. Which isn't to say we're not available to help, but from all
> >> I have gathered so far we're pretty much in the dark even as to which
> >> component(s) may be to blame. As can still be seen at the top in reply
> >> context, some suggestions were given as to obtaining possible further
> >> information (or confirming the absence thereof).
> > 
> > There may be other ways which haven't yet been found.
> > 
> > I've been left with the suspicion AMD was to some degree sponsoring
> > work to ensure Xen works on their hardware.  Given the severity of this
> > problem I would kind of expect them not want to gain a reputation for
> > having data loss issues.  Assuming a suitable pair of devices weren't
> > already on-hand, I would kind of expect this to be well within their
> > budget.
> 
> You've got to talk to AMD then. Plus I assume it's clear to you that
> even if the (presumably) necessary hardware was available, it still
> would require respective setup, leaving open whether the issue then
> could indeed be reproduced.

I had a vain hope your links to AMD would allow you to say "we've got a
major problem in need of addressing ASAP".

I suspect it will reproduce readily.  The sparsity of reports is likely
due to few people using RAID1 for flash.  Yet even though the initial
surveys suggest flash has a rather lower initial failure rate, they're
still pointing to rather non-zero failures in the first 5 years.

> >> I'd also like to come back to the vague theory you did voice, in that
> >> you're suspecting flushes to take too long. I continue to have trouble
> >> with this, and I would therefore like to ask that you put this down in
> >> more technical terms, making connections to actual actions taken by
> >> software / hardware.
> > 
> > I'm trying to figure out a pattern.
> > 
> > Nominally all the devices are roughly on par (only a very cheap flash
> > device will be unable to overwhelm SATA's bandwidth).  Yet why did the
> > Crucial SATA device /seem/ not to have the issue?  Why did a Crucial NVMe
> > device demonstrate the issue.
> > 
> > My guess is the flash controllers Samsung uses may be able to start
> > executing commands faster than the ones Crucial uses.  Meanwhile NVMe
> > is lower overhead and latency than SATA (SATA's overhead isn't an issue
> > for actual disks).  Perhaps the IOMMU is still flushing its TLB, or
> > hasn't loaded the new tables.
> 
> Which would be an IOMMU issue then, that software at best may be able to
> work around.

Yet even if uses of RAID1 with flash are uncommon or rare, I would
expect this to have already manifested on Linux without Xen.  In turn
this would suggest Linux likely already has some sort of workaround.

I suspect this is a case of there is some step which is missing from
Xen's IOMMU handling.  Perhaps something which Linux does during an early
DMA setup stage, but the current Xen implementation does lazily?
Alternatively some flag setting or missing step?

I should be able to do another test approach in a few weeks, but I would
love if something could be found sooner.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (|

Re: Serious AMD-Vi(?) issue

2024-04-18 Thread Jan Beulich

On 18.04.2024 08:45, Elliott Mitchell wrote:
> On Wed, Apr 17, 2024 at 02:40:09PM +0200, Jan Beulich wrote:
>> On 11.04.2024 04:41, Elliott Mitchell wrote:
>>> On Thu, Mar 28, 2024 at 07:25:02AM +0100, Jan Beulich wrote:
 On 27.03.2024 18:27, Elliott Mitchell wrote:
> On Mon, Mar 25, 2024 at 02:43:44PM -0700, Elliott Mitchell wrote:
>> On Mon, Mar 25, 2024 at 08:55:56AM +0100, Jan Beulich wrote:
>>>
>>> In fact when running into trouble, the usual course of action would be 
>>> to
>>> increase verbosity in both hypervisor and kernel, just to make sure no
>>> potentially relevant message is missed.
>>
>> More/better information might have been obtained if I'd been engaged
>> earlier.
>
> This is still true, things are in full mitigation mode and I'll be
> quite unhappy to go back with experiments at this point.

 Well, it very likely won't work without further experimenting by someone
 able to observe the bad behavior. Recall we're on xen-devel here; it is
 kind of expected that without clear (and practical) repro instructions
 experimenting as well as info collection will remain with the reporter.
>>>
>>> After looking at the situation and considering the issues, I /may/ be
>>> able to setup for doing more testing.  I guess I should confirm, which of
>>> those criteria do you think currently provided information fails at?
>>>
>>> AMD-IOMMU + Linux MD RAID1 + dual Samsung SATA (or various NVMe) +
>>> dbench; seems a pretty specific setup.
>>
>> Indeed. If that's the only way to observe the issue, it suggests to me
>> that it'll need to be mainly you to do further testing, and perhaps even
>> debugging. Which isn't to say we're not available to help, but from all
>> I have gathered so far we're pretty much in the dark even as to which
>> component(s) may be to blame. As can still be seen at the top in reply
>> context, some suggestions were given as to obtaining possible further
>> information (or confirming the absence thereof).
> 
> There may be other ways which haven't yet been found.
> 
> I've been left with the suspicion AMD was to some degree sponsoring
> work to ensure Xen works on their hardware.  Given the severity of this
> problem I would kind of expect them not want to gain a reputation for
> having data loss issues.  Assuming a suitable pair of devices weren't
> already on-hand, I would kind of expect this to be well within their
> budget.

You've got to talk to AMD then. Plus I assume it's clear to you that
even if the (presumably) necessary hardware was available, it still
would require respective setup, leaving open whether the issue then
could indeed be reproduced.

>> I'd also like to come back to the vague theory you did voice, in that
>> you're suspecting flushes to take too long. I continue to have trouble
>> with this, and I would therefore like to ask that you put this down in
>> more technical terms, making connections to actual actions taken by
>> software / hardware.
> 
> I'm trying to figure out a pattern.
> 
> Nominally all the devices are roughly on par (only a very cheap flash
> device will be unable to overwhelm SATA's bandwidth).  Yet why did the
> Crucial SATA device /seem/ not to have the issue?  Why did a Crucial NVMe
> device demonstrate the issue.
> 
> My guess is the flash controllers Samsung uses may be able to start
> executing commands faster than the ones Crucial uses.  Meanwhile NVMe
> is lower overhead and latency than SATA (SATA's overhead isn't an issue
> for actual disks).  Perhaps the IOMMU is still flushing its TLB, or
> hasn't loaded the new tables.

Which would be an IOMMU issue then, that software at best may be able to
work around.

Jan

> I suspect when the MD-RAID1 issues block requests to a pair of devices,
> it likely sends the block to one device and then reuses most/all of the
> structures for the second device.  As a result the second request would
> likely get a command to the device rather faster than the first request.
> 
> Perhaps look into what structures the MD-RAID1 subsystem reuses are.
> Then see whether doing early setup of those structures triggers the
> issue?
> 
> (okay I'm deep into speculation here, but this seems the simplest
> explanation for what could be occuring)
> 
>

Re: Serious AMD-Vi(?) issue

2024-04-18 Thread Elliott Mitchell

On Wed, Apr 17, 2024 at 02:40:09PM +0200, Jan Beulich wrote:
> On 11.04.2024 04:41, Elliott Mitchell wrote:
> > On Thu, Mar 28, 2024 at 07:25:02AM +0100, Jan Beulich wrote:
> >> On 27.03.2024 18:27, Elliott Mitchell wrote:
> >>> On Mon, Mar 25, 2024 at 02:43:44PM -0700, Elliott Mitchell wrote:
>  On Mon, Mar 25, 2024 at 08:55:56AM +0100, Jan Beulich wrote:
> >
> > In fact when running into trouble, the usual course of action would be 
> > to
> > increase verbosity in both hypervisor and kernel, just to make sure no
> > potentially relevant message is missed.
> 
>  More/better information might have been obtained if I'd been engaged
>  earlier.
> >>>
> >>> This is still true, things are in full mitigation mode and I'll be
> >>> quite unhappy to go back with experiments at this point.
> >>
> >> Well, it very likely won't work without further experimenting by someone
> >> able to observe the bad behavior. Recall we're on xen-devel here; it is
> >> kind of expected that without clear (and practical) repro instructions
> >> experimenting as well as info collection will remain with the reporter.
> > 
> > After looking at the situation and considering the issues, I /may/ be
> > able to setup for doing more testing.  I guess I should confirm, which of
> > those criteria do you think currently provided information fails at?
> > 
> > AMD-IOMMU + Linux MD RAID1 + dual Samsung SATA (or various NVMe) +
> > dbench; seems a pretty specific setup.
> 
> Indeed. If that's the only way to observe the issue, it suggests to me
> that it'll need to be mainly you to do further testing, and perhaps even
> debugging. Which isn't to say we're not available to help, but from all
> I have gathered so far we're pretty much in the dark even as to which
> component(s) may be to blame. As can still be seen at the top in reply
> context, some suggestions were given as to obtaining possible further
> information (or confirming the absence thereof).

There may be other ways which haven't yet been found.

I've been left with the suspicion AMD was to some degree sponsoring
work to ensure Xen works on their hardware.  Given the severity of this
problem I would kind of expect them not want to gain a reputation for
having data loss issues.  Assuming a suitable pair of devices weren't
already on-hand, I would kind of expect this to be well within their
budget.

> I'd also like to come back to the vague theory you did voice, in that
> you're suspecting flushes to take too long. I continue to have trouble
> with this, and I would therefore like to ask that you put this down in
> more technical terms, making connections to actual actions taken by
> software / hardware.

I'm trying to figure out a pattern.

Nominally all the devices are roughly on par (only a very cheap flash
device will be unable to overwhelm SATA's bandwidth).  Yet why did the
Crucial SATA device /seem/ not to have the issue?  Why did a Crucial NVMe
device demonstrate the issue.

My guess is the flash controllers Samsung uses may be able to start
executing commands faster than the ones Crucial uses.  Meanwhile NVMe
is lower overhead and latency than SATA (SATA's overhead isn't an issue
for actual disks).  Perhaps the IOMMU is still flushing its TLB, or
hasn't loaded the new tables.

I suspect when the MD-RAID1 issues block requests to a pair of devices,
it likely sends the block to one device and then reuses most/all of the
structures for the second device.  As a result the second request would
likely get a command to the device rather faster than the first request.

Perhaps look into what structures the MD-RAID1 subsystem reuses are.
Then see whether doing early setup of those structures triggers the
issue?

(okay I'm deep into speculation here, but this seems the simplest
explanation for what could be occuring)

-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

Re: Serious AMD-Vi(?) issue

2024-04-17 Thread Jan Beulich

On 11.04.2024 04:41, Elliott Mitchell wrote:
> On Thu, Mar 28, 2024 at 07:25:02AM +0100, Jan Beulich wrote:
>> On 27.03.2024 18:27, Elliott Mitchell wrote:
>>> On Mon, Mar 25, 2024 at 02:43:44PM -0700, Elliott Mitchell wrote:
 On Mon, Mar 25, 2024 at 08:55:56AM +0100, Jan Beulich wrote:
>
> In fact when running into trouble, the usual course of action would be to
> increase verbosity in both hypervisor and kernel, just to make sure no
> potentially relevant message is missed.

 More/better information might have been obtained if I'd been engaged
 earlier.
>>>
>>> This is still true, things are in full mitigation mode and I'll be
>>> quite unhappy to go back with experiments at this point.
>>
>> Well, it very likely won't work without further experimenting by someone
>> able to observe the bad behavior. Recall we're on xen-devel here; it is
>> kind of expected that without clear (and practical) repro instructions
>> experimenting as well as info collection will remain with the reporter.
> 
> After looking at the situation and considering the issues, I /may/ be
> able to setup for doing more testing.  I guess I should confirm, which of
> those criteria do you think currently provided information fails at?
> 
> AMD-IOMMU + Linux MD RAID1 + dual Samsung SATA (or various NVMe) +
> dbench; seems a pretty specific setup.

Indeed. If that's the only way to observe the issue, it suggests to me
that it'll need to be mainly you to do further testing, and perhaps even
debugging. Which isn't to say we're not available to help, but from all
I have gathered so far we're pretty much in the dark even as to which
component(s) may be to blame. As can still be seen at the top in reply
context, some suggestions were given as to obtaining possible further
information (or confirming the absence thereof).

I'd also like to come back to the vague theory you did voice, in that
you're suspecting flushes to take too long. I continue to have trouble
with this, and I would therefore like to ask that you put this down in
more technical terms, making connections to actual actions taken by
software / hardware.

Jan

> I could see this being criticised as impractical if /new/ devices were
> required, but the confirmed flash devices are several generations old.
> Difficulty is cheaper candidate devices are being recycled for their
> precious metal content, rather than resold as used.
> 
>

Re: Serious AMD-Vi(?) issue

2024-04-10 Thread Elliott Mitchell

On Thu, Mar 28, 2024 at 07:25:02AM +0100, Jan Beulich wrote:
> On 27.03.2024 18:27, Elliott Mitchell wrote:
> > On Mon, Mar 25, 2024 at 02:43:44PM -0700, Elliott Mitchell wrote:
> >> On Mon, Mar 25, 2024 at 08:55:56AM +0100, Jan Beulich wrote:
> >>>
> >>> In fact when running into trouble, the usual course of action would be to
> >>> increase verbosity in both hypervisor and kernel, just to make sure no
> >>> potentially relevant message is missed.
> >>
> >> More/better information might have been obtained if I'd been engaged
> >> earlier.
> > 
> > This is still true, things are in full mitigation mode and I'll be
> > quite unhappy to go back with experiments at this point.
> 
> Well, it very likely won't work without further experimenting by someone
> able to observe the bad behavior. Recall we're on xen-devel here; it is
> kind of expected that without clear (and practical) repro instructions
> experimenting as well as info collection will remain with the reporter.

After looking at the situation and considering the issues, I /may/ be
able to setup for doing more testing.  I guess I should confirm, which of
those criteria do you think currently provided information fails at?

AMD-IOMMU + Linux MD RAID1 + dual Samsung SATA (or various NVMe) +
dbench; seems a pretty specific setup.

I could see this being criticised as impractical if /new/ devices were
required, but the confirmed flash devices are several generations old.
Difficulty is cheaper candidate devices are being recycled for their
precious metal content, rather than resold as used.

-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

Re: Serious AMD-Vi(?) issue

2024-03-28 Thread Elliott Mitchell

On Thu, Mar 28, 2024 at 08:22:31AM -0700, Elliott Mitchell wrote:
> On Thu, Mar 28, 2024 at 07:25:02AM +0100, Jan Beulich wrote:
> > On 27.03.2024 18:27, Elliott Mitchell wrote:
> > > On Mon, Mar 25, 2024 at 02:43:44PM -0700, Elliott Mitchell wrote:
> > >> On Mon, Mar 25, 2024 at 08:55:56AM +0100, Jan Beulich wrote:
> > >>>
> > >>> In fact when running into trouble, the usual course of action would be 
> > >>> to
> > >>> increase verbosity in both hypervisor and kernel, just to make sure no
> > >>> potentially relevant message is missed.
> > >>
> > >> More/better information might have been obtained if I'd been engaged
> > >> earlier.
> > > 
> > > This is still true, things are in full mitigation mode and I'll be
> > > quite unhappy to go back with experiments at this point.
> > 
> > Well, it very likely won't work without further experimenting by someone
> > able to observe the bad behavior. Recall we're on xen-devel here; it is
> > kind of expected that without clear (and practical) repro instructions
> > experimenting as well as info collection will remain with the reporter.
> 
> The first reporter: https://bugs.debian.org/988477 gave pretty specific
> details about their setups.
> 
> While the exact border isn't very well defined, that seems enough to give
> a pretty good start.  We don't know whether all Samsung SATA devices are
> effected, but most of the recent ones (<5 years old) are.  This requires
> a pair of devices in software RAID1.  Likely reproduces better with AMD
> AM4/AM5 processors, but almost certainly needs a fully operational IOMMU.
> 
> (ASUS motherboards tend to have well setup IOMMUs)
> 
> I would be surprised if you don't have all of the hardware on-hand.  Only
> issue would be finding an appropriate pair of SATA devices, since those
> tend to remain in service.  I would look for older devices which were
> removed from service due to being too small (128GB 840 PRO from the first
> report), or were pulled from service due to having had too many writes.

Come to think of it, one more possible ingredient to this.  Similar to
the first report, when the problem occurred, the SATA device was plugged
into an on chipset SATA port, not the extra controller this motherboard
has.  I don't know whether the performance difference of an off-main
chip controller would influence this, but it might.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

Re: Serious AMD-Vi(?) issue

2024-03-28 Thread Elliott Mitchell

On Thu, Mar 28, 2024 at 07:25:02AM +0100, Jan Beulich wrote:
> On 27.03.2024 18:27, Elliott Mitchell wrote:
> > On Mon, Mar 25, 2024 at 02:43:44PM -0700, Elliott Mitchell wrote:
> >> On Mon, Mar 25, 2024 at 08:55:56AM +0100, Jan Beulich wrote:
> >>> On 22.03.2024 20:22, Elliott Mitchell wrote:
>  On Fri, Mar 22, 2024 at 04:41:45PM +, Kelly Choi wrote:
> >
> > I can see you've recently engaged with our community with some issues 
> > you'd
> > like help with.
> > We love the fact you are participating in our project, however, our
> > developers aren't able to help if you do not provide the specific 
> > details.
> 
>  Please point to specific details which have been omitted.  Fairly little
>  data has been provided as fairly little data is available.  The primary
>  observation is large numbers of:
> 
>  (XEN) AMD-Vi: IO_PAGE_FAULT: :bb:dd.f d0 addr ff???000 flags 
>  0x8 I
> 
>  Lines in Xen's ring buffer.
> >>>
> >>> Yet this is (part of) the problem: By providing only the messages that 
> >>> appear
> >>> relevant to you, you imply that you know that no other message is in any 
> >>> way
> >>> relevant. That's judgement you'd better leave to people actually trying to
> >>> investigate. Unless of course you were proposing an actual code change, 
> >>> with
> >>> suitable justification.
> >>
> >> Honestly, I forgot about the very small number of messages from the SATA
> >> subsystem.  The question of whether the current mitigation actions are
> >> effective right now was a bigger issue.  As such monitoring `xl dmesg`
> >> was a priority to looking at SATA messages which failed to reliably
> >> indicate status.
> >>
> >> I *thought* I would be able to retrieve those via other slow means, but a
> >> different and possibly overlapping issue has shown up.  Unfortunately
> >> this means those are no longer retrievable.   :-(
> > 
> > With some persistence I was able to retrieve them.  There are other
> > pieces of software with worse UIs than Xen.
> > 
> >>> In fact when running into trouble, the usual course of action would be to
> >>> increase verbosity in both hypervisor and kernel, just to make sure no
> >>> potentially relevant message is missed.
> >>
> >> More/better information might have been obtained if I'd been engaged
> >> earlier.
> > 
> > This is still true, things are in full mitigation mode and I'll be
> > quite unhappy to go back with experiments at this point.
> 
> Well, it very likely won't work without further experimenting by someone
> able to observe the bad behavior. Recall we're on xen-devel here; it is
> kind of expected that without clear (and practical) repro instructions
> experimenting as well as info collection will remain with the reporter.

The first reporter: https://bugs.debian.org/988477 gave pretty specific
details about their setups.

While the exact border isn't very well defined, that seems enough to give
a pretty good start.  We don't know whether all Samsung SATA devices are
effected, but most of the recent ones (<5 years old) are.  This requires
a pair of devices in software RAID1.  Likely reproduces better with AMD
AM4/AM5 processors, but almost certainly needs a fully operational IOMMU.

(ASUS motherboards tend to have well setup IOMMUs)

I would be surprised if you don't have all of the hardware on-hand.  Only
issue would be finding an appropriate pair of SATA devices, since those
tend to remain in service.  I would look for older devices which were
removed from service due to being too small (128GB 840 PRO from the first
report), or were pulled from service due to having had too many writes.


> > I now see why I left those out.  The messages from the SATA subsystem
> > were from a kernel which a bad patch had leaked into a LTS branch.  Looks
> > like the SATA subsystem was significantly broken and I'm unsure whether
> > any useful information could be retrieved.  Notably there is quite a bit
> > of noise from SATA devices not effected by this issue.
> > 
> > Some of the messages /might/ be useful, but the amount of noise is quite
> > high.  Do messages from a broken kernel interest you?
> 
> If this was a less vague (in terms of possible root causes) issue, I'd
> probably have answered "yes". But in the case here I'm afraid such might
> further confuse things rather than clarifying them.

Okay.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

Re: Serious AMD-Vi(?) issue

2024-03-28 Thread Jan Beulich

On 27.03.2024 18:27, Elliott Mitchell wrote:
> On Mon, Mar 25, 2024 at 02:43:44PM -0700, Elliott Mitchell wrote:
>> On Mon, Mar 25, 2024 at 08:55:56AM +0100, Jan Beulich wrote:
>>> On 22.03.2024 20:22, Elliott Mitchell wrote:
 On Fri, Mar 22, 2024 at 04:41:45PM +, Kelly Choi wrote:
>
> I can see you've recently engaged with our community with some issues 
> you'd
> like help with.
> We love the fact you are participating in our project, however, our
> developers aren't able to help if you do not provide the specific details.

 Please point to specific details which have been omitted.  Fairly little
 data has been provided as fairly little data is available.  The primary
 observation is large numbers of:

 (XEN) AMD-Vi: IO_PAGE_FAULT: :bb:dd.f d0 addr ff???000 flags 
 0x8 I

 Lines in Xen's ring buffer.
>>>
>>> Yet this is (part of) the problem: By providing only the messages that 
>>> appear
>>> relevant to you, you imply that you know that no other message is in any way
>>> relevant. That's judgement you'd better leave to people actually trying to
>>> investigate. Unless of course you were proposing an actual code change, with
>>> suitable justification.
>>
>> Honestly, I forgot about the very small number of messages from the SATA
>> subsystem.  The question of whether the current mitigation actions are
>> effective right now was a bigger issue.  As such monitoring `xl dmesg`
>> was a priority to looking at SATA messages which failed to reliably
>> indicate status.
>>
>> I *thought* I would be able to retrieve those via other slow means, but a
>> different and possibly overlapping issue has shown up.  Unfortunately
>> this means those are no longer retrievable.   :-(
> 
> With some persistence I was able to retrieve them.  There are other
> pieces of software with worse UIs than Xen.
> 
>>> In fact when running into trouble, the usual course of action would be to
>>> increase verbosity in both hypervisor and kernel, just to make sure no
>>> potentially relevant message is missed.
>>
>> More/better information might have been obtained if I'd been engaged
>> earlier.
> 
> This is still true, things are in full mitigation mode and I'll be
> quite unhappy to go back with experiments at this point.

Well, it very likely won't work without further experimenting by someone
able to observe the bad behavior. Recall we're on xen-devel here; it is
kind of expected that without clear (and practical) repro instructions
experimenting as well as info collection will remain with the reporter.

> I now see why I left those out.  The messages from the SATA subsystem
> were from a kernel which a bad patch had leaked into a LTS branch.  Looks
> like the SATA subsystem was significantly broken and I'm unsure whether
> any useful information could be retrieved.  Notably there is quite a bit
> of noise from SATA devices not effected by this issue.
> 
> Some of the messages /might/ be useful, but the amount of noise is quite
> high.  Do messages from a broken kernel interest you?

If this was a less vague (in terms of possible root causes) issue, I'd
probably have answered "yes". But in the case here I'm afraid such might
further confuse things rather than clarifying them.

Jan

Re: Serious AMD-Vi(?) issue

2024-03-27 Thread Elliott Mitchell

On Mon, Mar 25, 2024 at 02:43:44PM -0700, Elliott Mitchell wrote:
> On Mon, Mar 25, 2024 at 08:55:56AM +0100, Jan Beulich wrote:
> > On 22.03.2024 20:22, Elliott Mitchell wrote:
> > > On Fri, Mar 22, 2024 at 04:41:45PM +, Kelly Choi wrote:
> > >>
> > >> I can see you've recently engaged with our community with some issues 
> > >> you'd
> > >> like help with.
> > >> We love the fact you are participating in our project, however, our
> > >> developers aren't able to help if you do not provide the specific 
> > >> details.
> > > 
> > > Please point to specific details which have been omitted.  Fairly little
> > > data has been provided as fairly little data is available.  The primary
> > > observation is large numbers of:
> > > 
> > > (XEN) AMD-Vi: IO_PAGE_FAULT: :bb:dd.f d0 addr ff???000 flags 
> > > 0x8 I
> > > 
> > > Lines in Xen's ring buffer.
> > 
> > Yet this is (part of) the problem: By providing only the messages that 
> > appear
> > relevant to you, you imply that you know that no other message is in any way
> > relevant. That's judgement you'd better leave to people actually trying to
> > investigate. Unless of course you were proposing an actual code change, with
> > suitable justification.
> 
> Honestly, I forgot about the very small number of messages from the SATA
> subsystem.  The question of whether the current mitigation actions are
> effective right now was a bigger issue.  As such monitoring `xl dmesg`
> was a priority to looking at SATA messages which failed to reliably
> indicate status.
> 
> I *thought* I would be able to retrieve those via other slow means, but a
> different and possibly overlapping issue has shown up.  Unfortunately
> this means those are no longer retrievable.   :-(

With some persistence I was able to retrieve them.  There are other
pieces of software with worse UIs than Xen.

> > In fact when running into trouble, the usual course of action would be to
> > increase verbosity in both hypervisor and kernel, just to make sure no
> > potentially relevant message is missed.
> 
> More/better information might have been obtained if I'd been engaged
> earlier.

This is still true, things are in full mitigation mode and I'll be
quite unhappy to go back with experiments at this point.


I now see why I left those out.  The messages from the SATA subsystem
were from a kernel which a bad patch had leaked into a LTS branch.  Looks
like the SATA subsystem was significantly broken and I'm unsure whether
any useful information could be retrieved.  Notably there is quite a bit
of noise from SATA devices not effected by this issue.

Some of the messages /might/ be useful, but the amount of noise is quite
high.  Do messages from a broken kernel interest you?


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

Re: Serious AMD-Vi(?) issue

2024-03-25 Thread Elliott Mitchell

On Mon, Mar 25, 2024 at 08:55:56AM +0100, Jan Beulich wrote:
> On 22.03.2024 20:22, Elliott Mitchell wrote:
> > On Fri, Mar 22, 2024 at 04:41:45PM +, Kelly Choi wrote:
> >>
> >> I can see you've recently engaged with our community with some issues you'd
> >> like help with.
> >> We love the fact you are participating in our project, however, our
> >> developers aren't able to help if you do not provide the specific details.
> > 
> > Please point to specific details which have been omitted.  Fairly little
> > data has been provided as fairly little data is available.  The primary
> > observation is large numbers of:
> > 
> > (XEN) AMD-Vi: IO_PAGE_FAULT: :bb:dd.f d0 addr ff???000 flags 
> > 0x8 I
> > 
> > Lines in Xen's ring buffer.
> 
> Yet this is (part of) the problem: By providing only the messages that appear
> relevant to you, you imply that you know that no other message is in any way
> relevant. That's judgement you'd better leave to people actually trying to
> investigate. Unless of course you were proposing an actual code change, with
> suitable justification.

Honestly, I forgot about the very small number of messages from the SATA
subsystem.  The question of whether the current mitigation actions are
effective right now was a bigger issue.  As such monitoring `xl dmesg`
was a priority to looking at SATA messages which failed to reliably
indicate status.

I *thought* I would be able to retrieve those via other slow means, but a
different and possibly overlapping issue has shown up.  Unfortunately
this means those are no longer retrievable.   :-(

> In fact when running into trouble, the usual course of action would be to
> increase verbosity in both hypervisor and kernel, just to make sure no
> potentially relevant message is missed.

More/better information might have been obtained if I'd been engaged
earlier.

> > The most overt sign was telling the Linux kernel to scan for
> > inconsistencies and the kernel finding some.  The domain didn't otherwise
> > appear to notice trouble.
> > 
> > This is from memory, it would take some time to discover whether any
> > messages were missed.  Present mitigation action is inhibiting the
> > messages, but the trouble is certainly still lurking.
> 
> Iirc you were considering whether any of this might be a timing issue. Yet
> beyond voicing that suspicion, you didn't provide any technical details as
> to why you think so. Such technical details would include taking into
> account how IOMMU mappings and associated IOMMU TLB flushing are carried
> out. Right now, to me at least, your speculation in this regard fails
> basic sanity checking. Therefore the scenario that you're thinking of
> would need better describing, imo.

True.  Mostly I'm analyzing the known information and considering what
the patterns suggest.

Presently I'm aware of two reports (Imre Szőllősi and mine).

Both of these feature AMD processor machines.  Could be people with AMD
processors are less trustful of flash storage or could be an AMD-only
IOMMU issue.  Ideally someone would test and confirm there is no issue
with Linux software RAID1 on flash on an Intel machine.

Both reports feature two flash storage devices being run through Linux
MD RAID1.  Could be the MD RAID1 subsystem is abusing the DMA interface
in some fashion.  While Imre Szőllősi reported this not occuring with a
single device, the report does not explicitly state whether that was a
degenerate RAID1 versus non-RAID.  I'm unaware of any testing with 3x
devices in RAID1.

Both reports feature Samsung SATA flash devices.  My case also includes a
Crucial NVMe device.  My case also features a Crucial SATA flash device
for which the problem did NOT occur.  So the question becomes, why did
the problem not occur for this Crucial SATA device?

According to the specifications, the Crucial SATA device is roughly on
par with the Samsung SATA devices in terms of read/write speeds.  The
NVMe device's specifications are massively better.

What comes to mind is the Crucial SATA device might have higher latency
before executing commands.  Specifications don't mention command
execution latency, so it isn't possible to know whether this is the
issue.

Yes, latency/timing is speculation.  Does seem a good fit for the pattern
though.

This could be a Linux MD RAID1 bug or a Xen bug.

Unfortunately data loss is a very serious type of bug so I'm highly
reluctant to let go of mitigations without hope for progress.

-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

Re: Serious AMD-Vi(?) issue

2024-03-25 Thread Jan Beulich

On 22.03.2024 20:22, Elliott Mitchell wrote:
> On Fri, Mar 22, 2024 at 04:41:45PM +, Kelly Choi wrote:
>>
>> I can see you've recently engaged with our community with some issues you'd
>> like help with.
>> We love the fact you are participating in our project, however, our
>> developers aren't able to help if you do not provide the specific details.
> 
> Please point to specific details which have been omitted.  Fairly little
> data has been provided as fairly little data is available.  The primary
> observation is large numbers of:
> 
> (XEN) AMD-Vi: IO_PAGE_FAULT: :bb:dd.f d0 addr ff???000 flags 0x8 I
> 
> Lines in Xen's ring buffer.

Yet this is (part of) the problem: By providing only the messages that appear
relevant to you, you imply that you know that no other message is in any way
relevant. That's judgement you'd better leave to people actually trying to
investigate. Unless of course you were proposing an actual code change, with
suitable justification.

In fact when running into trouble, the usual course of action would be to
increase verbosity in both hypervisor and kernel, just to make sure no
potentially relevant message is missed.

>  I recall spotting 3 messages from Linux's
> SATA driver (which weren't saved due to other causes being suspected),
> which would likely be associated with hundreds or thousands of the above
> log messages.  I never observed any messages from the NVMe subsystem
> during that phase.
> 
> The most overt sign was telling the Linux kernel to scan for
> inconsistencies and the kernel finding some.  The domain didn't otherwise
> appear to notice trouble.
> 
> This is from memory, it would take some time to discover whether any
> messages were missed.  Present mitigation action is inhibiting the
> messages, but the trouble is certainly still lurking.

Iirc you were considering whether any of this might be a timing issue. Yet
beyond voicing that suspicion, you didn't provide any technical details as
to why you think so. Such technical details would include taking into
account how IOMMU mappings and associated IOMMU TLB flushing are carried
out. Right now, to me at least, your speculation in this regard fails
basic sanity checking. Therefore the scenario that you're thinking of
would need better describing, imo.

Jan

Re: Serious AMD-Vi(?) issue

2024-03-22 Thread Elliott Mitchell

On Fri, Mar 22, 2024 at 04:41:45PM +, Kelly Choi wrote:
> 
> I can see you've recently engaged with our community with some issues you'd
> like help with.
> We love the fact you are participating in our project, however, our
> developers aren't able to help if you do not provide the specific details.

Please point to specific details which have been omitted.  Fairly little
data has been provided as fairly little data is available.  The primary
observation is large numbers of:

(XEN) AMD-Vi: IO_PAGE_FAULT: :bb:dd.f d0 addr ff???000 flags 0x8 I

Lines in Xen's ring buffer.  I recall spotting 3 messages from Linux's
SATA driver (which weren't saved due to other causes being suspected),
which would likely be associated with hundreds or thousands of the above
log messages.  I never observed any messages from the NVMe subsystem
during that phase.

The most overt sign was telling the Linux kernel to scan for
inconsistencies and the kernel finding some.  The domain didn't otherwise
appear to notice trouble.

This is from memory, it would take some time to discover whether any
messages were missed.  Present mitigation action is inhibiting the
messages, but the trouble is certainly still lurking.

-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

Re: Serious AMD-Vi(?) issue

2024-03-22 Thread Kelly Choi

Hi Elliott,

I hope you're well.

I'm Kelly, the community manager at the Xen Project.

I can see you've recently engaged with our community with some issues you'd
like help with.
We love the fact you are participating in our project, however, our
developers aren't able to help if you do not provide the specific details.

As an open-source project, our developers are committed to helping and
contributing as much as possible. We welcome you to continue participating,
however, please refrain from requesting help without providing the
necessary details as this takes up a lot of our community's time to analyze
what is possible and what assistance you might need.

I'd recommend providing logs or specific information so the community can
help you further.

If you'd like to chat more, let me know.

Many thanks,
Kelly Choi

Community Manager
Xen Project


On Mon, Mar 18, 2024 at 7:42 PM Elliott Mitchell  wrote:

> I sent a ping on this about 2 weeks ago.  Since the plan is to move x86
> IOMMU under general x86, the other x86 maintainers should be aware of
> this:
>
> On Mon, Feb 12, 2024 at 03:23:00PM -0800, Elliott Mitchell wrote:
> > On Thu, Jan 25, 2024 at 12:24:53PM -0800, Elliott Mitchell wrote:
> > > Apparently this was first noticed with 4.14, but more recently I've
> been
> > > able to reproduce the issue:
> > >
> > > https://bugs.debian.org/988477
> > >
> > > The original observation features MD-RAID1 using a pair of Samsung
> > > SATA-attached flash devices.  The main line shows up in `xl dmesg`:
> > >
> > > (XEN) AMD-Vi: IO_PAGE_FAULT: :bb:dd.f d0 addr ff???000
> flags 0x8 I
> > >
> > > Where the device points at the SATA controller.  I've ended up
> > > reproducing this with some noticable differences.
> > >
> > > A major goal of RAID is to have different devices fail at different
> > > times.  Hence my initial run had a Samsung device plus a device from
> > > another reputable flash manufacturer.
> > >
> > > I initially noticed this due to messages in domain 0's dmesg about
> > > errors from the SATA device.  Wasn't until rather later that I noticed
> > > the IOMMU warnings in Xen's dmesg (perhaps post-domain 0 messages
> should
> > > be duplicated into domain 0's dmesg?).
> > >
> > > All of the failures consistently pointed at the Samsung device.  Due to
> > > the expectation it would fail first (lower quality offering with
> > > lesser guarantees), I proceeded to replace it with a NVMe device.
> > >
> > > With some monitoring I discovered the NVMe device was now triggering
> > > IOMMU errors, though not nearly as many as the Samsung SATA device did.
> > > As such looks like AMD-Vi plus MD-RAID1 appears to be exposing some
> sort
> > > of IOMMU issue with Xen.
> > >
> > >
> > > All I can do is offer speculation about the underlying cause.  There
> > > does seem to be a pattern of higher-performance flash storage devices
> > > being more severely effected.
> > >
> > > I was speculating about the issue being the MD-RAID1 driver abusing
> > > Linux's DMA infrastructure in some fashion.
> > >
> > > Upon further consideration, I'm wondering if this is perhaps a latency
> > > issue.  I imagine there is some sort of flush after the IOMMU tables
> are
> > > modified.  Perhaps the Samsung SATA (and all NVMe) devices were trying
> to
> > > execute commands before reloading the IOMMU tables is complete.
> >
> > Ping!
> >
> > The recipe seems to be Linux MD RAID1, plus Samsung SATA or any NVMe.
> >
> > To make it explicit, when I tried Crucial SATA + Samsung SATA.  IOMMU
> > errors matched the Samsung SATA (a number of times the SATA driver
> > complained).
> >
> > As stated, I'm speculating lower latency devices starting to execute
> > commands before IOMMU tables have finished reloading.  When originally
> > implemented fast flash devices were rare.
>
> Both reproductions of this issue I'm aware of were on systems with AMD
> processors.  I'm doubtul suspicion of flash storage hardware is unique
> to owners of AMD systems.  As a result while this /could/ also effect
> Intel systems, the lack of reports /suggests/ otherwise.
>
> I've noticed two things when glancing at the original report.  LVM is not
> in use here, so that doesn't seem to effect the problem.  The Phenom II
> the original reporter tested as not having the issue might have lacked
> proper BIOS support, hence IOMMU not being functional.
>
> This being a latency issue is *speculation*, but would explain the
> pattern of devices being effected.
>
> This is rather serious as it can lead to data loss (phew!  glad I just
> barely dodged this outcome).
>
>
> --
> (\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
>  \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
>   \_CS\   |  _  -O #include  O-   _  |   /  _/
> 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
>
>
>
>

Re: Serious AMD-Vi(?) issue

2024-03-18 Thread Elliott Mitchell

I sent a ping on this about 2 weeks ago.  Since the plan is to move x86
IOMMU under general x86, the other x86 maintainers should be aware of
this:

On Mon, Feb 12, 2024 at 03:23:00PM -0800, Elliott Mitchell wrote:
> On Thu, Jan 25, 2024 at 12:24:53PM -0800, Elliott Mitchell wrote:
> > Apparently this was first noticed with 4.14, but more recently I've been
> > able to reproduce the issue:
> > 
> > https://bugs.debian.org/988477
> > 
> > The original observation features MD-RAID1 using a pair of Samsung
> > SATA-attached flash devices.  The main line shows up in `xl dmesg`:
> > 
> > (XEN) AMD-Vi: IO_PAGE_FAULT: :bb:dd.f d0 addr ff???000 flags 
> > 0x8 I
> > 
> > Where the device points at the SATA controller.  I've ended up
> > reproducing this with some noticable differences.
> > 
> > A major goal of RAID is to have different devices fail at different
> > times.  Hence my initial run had a Samsung device plus a device from
> > another reputable flash manufacturer.
> > 
> > I initially noticed this due to messages in domain 0's dmesg about
> > errors from the SATA device.  Wasn't until rather later that I noticed
> > the IOMMU warnings in Xen's dmesg (perhaps post-domain 0 messages should
> > be duplicated into domain 0's dmesg?).
> > 
> > All of the failures consistently pointed at the Samsung device.  Due to
> > the expectation it would fail first (lower quality offering with
> > lesser guarantees), I proceeded to replace it with a NVMe device.
> > 
> > With some monitoring I discovered the NVMe device was now triggering
> > IOMMU errors, though not nearly as many as the Samsung SATA device did.
> > As such looks like AMD-Vi plus MD-RAID1 appears to be exposing some sort
> > of IOMMU issue with Xen.
> > 
> > 
> > All I can do is offer speculation about the underlying cause.  There
> > does seem to be a pattern of higher-performance flash storage devices
> > being more severely effected.
> > 
> > I was speculating about the issue being the MD-RAID1 driver abusing
> > Linux's DMA infrastructure in some fashion.
> > 
> > Upon further consideration, I'm wondering if this is perhaps a latency
> > issue.  I imagine there is some sort of flush after the IOMMU tables are
> > modified.  Perhaps the Samsung SATA (and all NVMe) devices were trying to
> > execute commands before reloading the IOMMU tables is complete.
> 
> Ping!
> 
> The recipe seems to be Linux MD RAID1, plus Samsung SATA or any NVMe.
> 
> To make it explicit, when I tried Crucial SATA + Samsung SATA.  IOMMU
> errors matched the Samsung SATA (a number of times the SATA driver
> complained).
> 
> As stated, I'm speculating lower latency devices starting to execute
> commands before IOMMU tables have finished reloading.  When originally
> implemented fast flash devices were rare.

Both reproductions of this issue I'm aware of were on systems with AMD
processors.  I'm doubtul suspicion of flash storage hardware is unique
to owners of AMD systems.  As a result while this /could/ also effect
Intel systems, the lack of reports /suggests/ otherwise.

I've noticed two things when glancing at the original report.  LVM is not
in use here, so that doesn't seem to effect the problem.  The Phenom II
the original reporter tested as not having the issue might have lacked
proper BIOS support, hence IOMMU not being functional.

This being a latency issue is *speculation*, but would explain the
pattern of devices being effected.

This is rather serious as it can lead to data loss (phew!  glad I just
barely dodged this outcome).


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

Re: Serious AMD-Vi issue

2024-03-04 Thread Elliott Mitchell

On Mon, Feb 12, 2024 at 03:23:00PM -0800, Elliott Mitchell wrote:
> On Thu, Jan 25, 2024 at 12:24:53PM -0800, Elliott Mitchell wrote:
> > Apparently this was first noticed with 4.14, but more recently I've been
> > able to reproduce the issue:
> > 
> > https://bugs.debian.org/988477
> > 
> > The original observation features MD-RAID1 using a pair of Samsung
> > SATA-attached flash devices.  The main line shows up in `xl dmesg`:
> > 
> > (XEN) AMD-Vi: IO_PAGE_FAULT: :bb:dd.f d0 addr ff???000 flags 
> > 0x8 I
> > 
> > Where the device points at the SATA controller.  I've ended up
> > reproducing this with some noticable differences.
> > 
> > A major goal of RAID is to have different devices fail at different
> > times.  Hence my initial run had a Samsung device plus a device from
> > another reputable flash manufacturer.
> > 
> > I initially noticed this due to messages in domain 0's dmesg about
> > errors from the SATA device.  Wasn't until rather later that I noticed
> > the IOMMU warnings in Xen's dmesg (perhaps post-domain 0 messages should
> > be duplicated into domain 0's dmesg?).
> > 
> > All of the failures consistently pointed at the Samsung device.  Due to
> > the expectation it would fail first (lower quality offering with
> > lesser guarantees), I proceeded to replace it with a NVMe device.
> > 
> > With some monitoring I discovered the NVMe device was now triggering
> > IOMMU errors, though not nearly as many as the Samsung SATA device did.
> > As such looks like AMD-Vi plus MD-RAID1 appears to be exposing some sort
> > of IOMMU issue with Xen.
> > 
> > 
> > All I can do is offer speculation about the underlying cause.  There
> > does seem to be a pattern of higher-performance flash storage devices
> > being more severely effected.
> > 
> > I was speculating about the issue being the MD-RAID1 driver abusing
> > Linux's DMA infrastructure in some fashion.
> > 
> > Upon further consideration, I'm wondering if this is perhaps a latency
> > issue.  I imagine there is some sort of flush after the IOMMU tables are
> > modified.  Perhaps the Samsung SATA (and all NVMe) devices were trying to
> > execute commands before reloading the IOMMU tables is complete.
> 
> Ping!
> 
> The recipe seems to be Linux MD RAID1, plus Samsung SATA or any NVMe.
> 
> To make it explicit, when I tried Crucial SATA + Samsung SATA.  IOMMU
> errors matched the Samsung SATA (a number of times the SATA driver
> complained).
> 
> As stated, I'm speculating lower latency devices starting to execute
> commands before IOMMU tables have finished reloading.  When originally
> implemented fast flash devices were rare.

I guess I'm lucky I ended up with some slightly higher-latency hardware.
This is a very serious issue as data loss can occur.

AMD needs to fund their Xen engineers more, otherwise soon AMD hardware
may no longer be viable with Xen.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

Re: Serious AMD-Vi issue

2024-02-12 Thread Elliott Mitchell

On Thu, Jan 25, 2024 at 12:24:53PM -0800, Elliott Mitchell wrote:
> Apparently this was first noticed with 4.14, but more recently I've been
> able to reproduce the issue:
> 
> https://bugs.debian.org/988477
> 
> The original observation features MD-RAID1 using a pair of Samsung
> SATA-attached flash devices.  The main line shows up in `xl dmesg`:
> 
> (XEN) AMD-Vi: IO_PAGE_FAULT: :bb:dd.f d0 addr ff???000 flags 0x8 I
> 
> Where the device points at the SATA controller.  I've ended up
> reproducing this with some noticable differences.
> 
> A major goal of RAID is to have different devices fail at different
> times.  Hence my initial run had a Samsung device plus a device from
> another reputable flash manufacturer.
> 
> I initially noticed this due to messages in domain 0's dmesg about
> errors from the SATA device.  Wasn't until rather later that I noticed
> the IOMMU warnings in Xen's dmesg (perhaps post-domain 0 messages should
> be duplicated into domain 0's dmesg?).
> 
> All of the failures consistently pointed at the Samsung device.  Due to
> the expectation it would fail first (lower quality offering with
> lesser guarantees), I proceeded to replace it with a NVMe device.
> 
> With some monitoring I discovered the NVMe device was now triggering
> IOMMU errors, though not nearly as many as the Samsung SATA device did.
> As such looks like AMD-Vi plus MD-RAID1 appears to be exposing some sort
> of IOMMU issue with Xen.
> 
> 
> All I can do is offer speculation about the underlying cause.  There
> does seem to be a pattern of higher-performance flash storage devices
> being more severely effected.
> 
> I was speculating about the issue being the MD-RAID1 driver abusing
> Linux's DMA infrastructure in some fashion.
> 
> Upon further consideration, I'm wondering if this is perhaps a latency
> issue.  I imagine there is some sort of flush after the IOMMU tables are
> modified.  Perhaps the Samsung SATA (and all NVMe) devices were trying to
> execute commands before reloading the IOMMU tables is complete.

Ping!

The recipe seems to be Linux MD RAID1, plus Samsung SATA or any NVMe.

To make it explicit, when I tried Crucial SATA + Samsung SATA.  IOMMU
errors matched the Samsung SATA (a number of times the SATA driver
complained).

As stated, I'm speculating lower latency devices starting to execute
commands before IOMMU tables have finished reloading.  When originally
implemented fast flash devices were rare.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

Serious AMD-Vi issue

2024-01-25 Thread Elliott Mitchell

Apparently this was first noticed with 4.14, but more recently I've been
able to reproduce the issue:

https://bugs.debian.org/988477

The original observation features MD-RAID1 using a pair of Samsung
SATA-attached flash devices.  The main line shows up in `xl dmesg`:

(XEN) AMD-Vi: IO_PAGE_FAULT: :bb:dd.f d0 addr ff???000 flags 0x8 I

Where the device points at the SATA controller.  I've ended up
reproducing this with some noticable differences.

A major goal of RAID is to have different devices fail at different
times.  Hence my initial run had a Samsung device plus a device from
another reputable flash manufacturer.

I initially noticed this due to messages in domain 0's dmesg about
errors from the SATA device.  Wasn't until rather later that I noticed
the IOMMU warnings in Xen's dmesg (perhaps post-domain 0 messages should
be duplicated into domain 0's dmesg?).

All of the failures consistently pointed at the Samsung device.  Due to
the expectation it would fail first (lower quality offering with
lesser guarantees), I proceeded to replace it with a NVMe device.

With some monitoring I discovered the NVMe device was now triggering
IOMMU errors, though not nearly as many as the Samsung SATA device did.
As such looks like AMD-Vi plus MD-RAID1 appears to be exposing some sort
of IOMMU issue with Xen.


All I can do is offer speculation about the underlying cause.  There
does seem to be a pattern of higher-performance flash storage devices
being more severely effected.

I was speculating about the issue being the MD-RAID1 driver abusing
Linux's DMA infrastructure in some fashion.

Upon further consideration, I'm wondering if this is perhaps a latency
issue.  I imagine there is some sort of flush after the IOMMU tables are
modified.  Perhaps the Samsung SATA (and all NVMe) devices were trying to
execute commands before reloading the IOMMU tables is complete.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

Re: Serious AMD-Vi(?) issue

Re: Serious AMD-Vi(?) issue

Re: Serious AMD-Vi(?) issue

Re: Serious AMD-Vi(?) issue

Re: Serious AMD-Vi(?) issue

Re: Serious AMD-Vi(?) issue

Re: Serious AMD-Vi(?) issue

Re: Serious AMD-Vi(?) issue

Re: Serious AMD-Vi(?) issue

Re: Serious AMD-Vi(?) issue

Re: Serious AMD-Vi(?) issue

Re: Serious AMD-Vi(?) issue

Re: Serious AMD-Vi(?) issue

Re: Serious AMD-Vi(?) issue

Re: Serious AMD-Vi(?) issue

Re: Serious AMD-Vi(?) issue

Re: Serious AMD-Vi(?) issue

Re: Serious AMD-Vi(?) issue

Re: Serious AMD-Vi(?) issue

Re: Serious AMD-Vi(?) issue

Re: Serious AMD-Vi(?) issue

Re: Serious AMD-Vi(?) issue

Re: Serious AMD-Vi issue

Re: Serious AMD-Vi issue

Serious AMD-Vi issue

25 matches

Site Navigation

Mail list logo

Footer information