Jonathan Cameron wrote:
> On Wed, 22 Jun 2022 22:40:48 -0700
> Dan Williams <[email protected]> wrote:
> 
> > Jonathan Cameron wrote:
> > > ....
> > >   
> > > > > Hi Ben,
> > > > >
> > > > > I finally got around to actually trying this out on top of Dan's 
> > > > > recent fix set
> > > > > (I rebased it from the cxl/preview branch on kernel.org).
> > > > >
> > > > > I'm not having much luck actually bring up a region.
> > > > >
> > > > > The patch set refers to configuring the end point decoders, but all 
> > > > > their
> > > > > sysfs attributes are read only.  Am I missing a dependency somewhere 
> > > > > or
> > > > > is the intent that this series is part of the solution only?
> > > > >
> > > > > I'm confused!    
> > > > 
> > > > There's a new series that's being reviewed internally before going to 
> > > > the list:
> > > > 
> > > > https://gitlab.com/bwidawsk/linux/-/tree/cxl_region-redux3
> > > > 
> > > > Given the proximity to the merge window opening and the need to get
> > > > the "mem_enabled" series staged, I asked Ben to hold it back from the
> > > > list for now.
> > > > 
> > > > There are some changes I am folding into it, but I hope to send it out
> > > > in the next few days after "mem_enabled" is finalized.  
> > > 
> > > Hi Dan,
> > > 
> > > I switched from an earlier version of the region code over to a rebase of 
> > > the tree.
> > > Two issues below you may already have fixed.
> > > 
> > > The second is a carry over from an earlier set so I haven't tested
> > > without it but looks like it's still valid.
> > > 
> > > Anyhow, thought it might save some cycles to preempt you sending
> > > out the series if these issues are still present.
> > > 
> > > Minimal testing so far on these with 2 hb, 2 rp, 4 directly connected
> > > devices, but once you post I'll test more extensively.  I've not
> > > really thought about the below much, so might not be best way to fix.
> > > 
> > > Found a bug in QEMU code as well (missing write masks for the
> > > target list registers) - will post fix for that shortly.  
> > 
> > Hi Jonathan,
> > 
> > Tomorrow I'll post the tranche to the list, but wanted to let you and
> > others watching that that the 'preview' branch [1] now has the proposed
> > initial region support. Once the bots give the thumbs up I'll send it
> > along.
> > 
> > To date I've only tested it with cxl_test and an internal test vehicle.
> > The cxl_test script I used to setup and teardown a x8 interleave across
> > x2 host bridges and x4 switches is:
> 
> Thanks.  Trivial feedback from a very quick play (busy day).
> 
> Bit odd that regionX/size is once write - get an error even if
> writing same value to it twice.

Ah true, that should just silently succeed.

> Also not debugged yet but on just got a null pointer dereference on
> 
> echo decoder3.0 > target0
> 
> Beyond a stacktrace pointing at store_targetN and dereference is of
> 0x00008 no idea yet.

The compiler unfortunately does a good job inlining the entirety of all the
leaf functions beneath store_targetN() so I have found myself needing to
sprinkle "noinline" to get better back traces.

> 
> I was testing with a slightly modified version of a nasty script
> I was using to test with Ben's code previously.  Might well be
> doing something wrong but obviously need to fix that crash anyway!

Most definitely.

> Will move to your nicer script below at somepoint as I've been lazy
> enough I'm still hand editing a few lines depending on number on
> a particular run.
> 
> Should have some time tomorrow to debug, but definitely 'here be
> dragons' at the moment.

Yes. Even before this posting I had shaken out a few crash scenarios just from
moving from my old QEMU baseline to "jic123/cxl-rework-draft-2" which did
things like collide PCI MMIO with cxl_test fake CXL ranges. By the way, is
there a "latest" tag I should be following to stay in sync with what you are
running for QEMU+CXL?  If only to reproduce the same crash scenarios.

Reply via email to