hdm: Bail on endpoint init fail

Dan Williams Fri, 13 May 2022 08:03:31 -0700

On Fri, May 13, 2022 at 5:09 AM Jonathan Cameron
<[email protected]> wrote:
>
> On Thu, 12 May 2022 10:27:38 -0700
> Luis Chamberlain <[email protected]> wrote:
>
> > On Thu, May 12, 2022 at 08:50:14AM -0700, Ben Widawsky wrote:
> > > On 22-04-18 16:37:12, Adam Manzanares wrote:
> > > > On Wed, Apr 13, 2022 at 02:31:42PM -0700, Dan Williams wrote:
> > > > > On Wed, Apr 13, 2022 at 11:38 AM Ben Widawsky 
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > Endpoint decoder enumeration is the only way in which we can 
> > > > > > determine
> > > > > > Device Physical Address (DPA) -> Host Physical Address (HPA) 
> > > > > > mappings.
> > > > > > Information is obtained only when the register state can be read
> > > > > > sequentially. If when enumerating the decoders a failure occurs, all
> > > > > > other decoders must also fail since the decoders can no longer be
> > > > > > accurately managed (unless it's the last decoder in which case it 
> > > > > > can
> > > > > > still work).
> > > > >
> > > > > I think this should be expanded to fail if any decoder fails to
> > > > > allocate anywhere in the topology otherwise it leaves a mess for
> > > > > future address translation code to work through cases where decoder
> > > > > information is missing.
> > > > >
> > > > > The current approach is based around the current expectation that
> > > > > nothing is enumerating pre-existing regions, and nothing is performing
> > > > > address translation.
> > > >
> > > > Does the qemu support currently allow testing of this patch? If so, it 
> > > > would
> > > > be good to reference qemu configurations. Any other alternatives would 
> > > > be
> > > > welcome as well.
> > > >
> > > > +Luis on cc.
> > > >
> > >
> > > No. This type of error injection would be cool to have, but I'm not sure 
> > > of a
> > > good way to support that in a scalable way. Maybe Jonathan has some ideas?
> >
> > In case it helps on the Linux front the least intrusive way is to use
> > ALLOW_ERROR_INJECTION(). It's what I hope we'll slowly strive for on
> > the block layer and filesystems slowly. That incurs one macro call per error
> > routine you want to allow error injection on.
> >
> > Then you use debugfs to dynamically enable / disable the error
> > injection / rate etc.
> >
> > So I think this begs the question, what error injection mechanisms
> > exist for qemu and would new functionality be welcomed?
>
> So what paths can actually cause this to fail? Looking at the upstream
> code in init_hdm_decoder() looks like there are only a few things that
> are checked.
>
> base or size being all fs or interleave ways not being a value the
> kernel understands.
>
> For all fs, I'm not sure how we'd get that value?
>
> For interleave ways:
> Our current verification of writes to these registers in QEMU is very
> limited I think you can currently push in an invalid value. We are only
> masking writes, not checking for mid range values that don't exist.
> However, that's something I'll be looking to restrict soon as we add
> more input verification so I wouldn't rely on it.
>
> I'm not aware of anything general affecting QEMU devices emulation.
> I've hacked cases in as temporary tests but not sure
> we'd want to carry something specific for this one.


This is another motivation for cxl_test. QEMU is meant to faithfully
emulate the hardware, not unit test drivers.

Re: [RFC PATCH 02/15] cxl/core/hdm: Bail on endpoint init fail

Reply via email to