Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-18 Thread Mattias Nissler
On Thu, Feb 15, 2024 at 4:29 PM Jonathan Cameron <
jonathan.came...@huawei.com> wrote:

> On Thu, 8 Feb 2024 14:50:59 +
> Jonathan Cameron  wrote:
>
> > On Wed, 7 Feb 2024 17:34:15 +
> > Jonathan Cameron  wrote:
> >
> > > On Fri, 2 Feb 2024 16:56:18 +
> > > Peter Maydell  wrote:
> > >
> > > > On Fri, 2 Feb 2024 at 16:50, Gregory Price <
> gregory.pr...@memverge.com> wrote:
> > > > >
> > > > > On Fri, Feb 02, 2024 at 04:33:20PM +, Peter Maydell wrote:
>
> > > > > > Here we are trying to take an interrupt. This isn't related to
> the
> > > > > > other can_do_io stuff, it's happening because do_ld_mmio_beN
> assumes
> > > > > > it's called with the BQL not held, but in fact there are some
> > > > > > situations where we call into the memory subsystem and we do
> > > > > > already have the BQL.
> > > >
> > > > > It's bugs all the way down as usual!
> > > > > https://xkcd.com/1416/
> > > > >
> > > > > I'll dig in a little next week to see if there's an easy fix. We
> can see
> > > > > the return address is already 0 going into mmu_translate, so it
> does
> > > > > look unrelated to the patch I threw out - but probably still has
> to do
> > > > > with things being on IO.
> > > >
> > > > Yes, the low level memory accessors only need to take the BQL if the
> thing
> > > > being accessed is an MMIO device. Probably what is wanted is for
> those
> > > > functions to do "take the lock if we don't already have it",
> something
> > > > like hw/core/cpu-common.c:cpu_reset_interrupt() does.
> >
> > Got back to x86 testing and indeed not taking the lock in that one path
> > does get things running (with all Gregory's earlier hacks + DMA limits as
> > described below).  Guess it's time to roll some cleaned up patches and
> > see how much everyone screams :)
> >
>
> 3 series sent out:
> (all also on gitlab.com/jic23/qemu cxl-2024-02-15 though I updated patch
> descriptions
> a little after pushing that out)
>
> Main set of fixes (x86 'works' under my light testing after this one)
>
> https://lore.kernel.org/qemu-devel/20240215150133.2088-1-jonathan.came...@huawei.com/
>
> ARM FEAT_HADFS (access and dirty it updating in PTW) workaround for
> missing atomic CAS
>
> https://lore.kernel.org/qemu-devel/20240215151804.2426-1-jonathan.came...@huawei.com/T/#t
>
> DMA / virtio fix:
>
> https://lore.kernel.org/qemu-devel/20240215142817.1904-1-jonathan.came...@huawei.com/
>
> Last thing I need to do is propose a suitable flag to make
> Mattias' bounce buffering size parameter apply to "memory" address space.


For background, I actually had a global bounce buffer size parameter apply
to all address spaces in an earlier version of my series. After discussion
on the list, we settled on an address-space specific parameter so it can be
configured per device. I haven't looked into where the memory accesses in
your context originate from - can they be attributed to a specific entity
to house the parameter?


> Currently
> I'm carrying this: (I've no idea how much is need but it's somewhere
> between 4k and 1G)
>
> diff --git a/system/physmem.c b/system/physmem.c
> index 43b37942cf..49b961c7a5 100644
> --- a/system/physmem.c
> +++ b/system/physmem.c
> @@ -2557,6 +2557,7 @@ static void memory_map_init(void)
>  memory_region_init(system_memory, NULL, "system", UINT64_MAX);
>  address_space_init(_space_memory, system_memory, "memory");
>
> +address_space_memory.max_bounce_buffer_size = 1024 * 1024 * 1024;
>  system_io = g_malloc(sizeof(*system_io));
>  memory_region_init_io(system_io, NULL, _io_ops, NULL, "io",
>65536);
>
> Please take a look. These are all in areas of QEMU I'm not particularly
> confident
> about so relying on nice people giving feedback even more than normal!
>
> Thanks to all those who helped with debugging and suggestions.
>
> Thanks,
>
> Jonathan
>
> > Jonathan
> >
> >
> > > >
> > > > -- PMM
> > >
> > > Still a work in progress but I thought I'd give an update on some of
> the fun...
> > >
> > > I have a set of somewhat dubious workarounds that sort of do the job
> (where
> > > the aim is to be able to safely run any workload on top of any valid
> > > emulated CXL device setup).
> > >
> > > To recap, the issue is that for CXL memory interleaving we need to have
> > > find grained routing to each device (16k Max Gran).  That was fine
> whilst
> > > pretty much all the testing was DAX based so software wasn't running
> out
> > > of it.  Now the kernel is rather more aggressive in defaulting any
> volatile
> > > CXL memory it finds to being normal memory (in some configs anyway)
> people
> > > started hitting problems. Given one of the most important functions of
> the
> > > emulation is to check data ends up in the right backing stores, I'm not
> > > keen to drop that feature unless we absolutely have to.
> > >
> > > 1) For the simple case of no interleave I have working code that just
> > >shoves the MemoryRegion in directly and all works fine.  That was
> 

Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-15 Thread Jonathan Cameron via
On Thu, 8 Feb 2024 14:50:59 +
Jonathan Cameron  wrote:

> On Wed, 7 Feb 2024 17:34:15 +
> Jonathan Cameron  wrote:
> 
> > On Fri, 2 Feb 2024 16:56:18 +
> > Peter Maydell  wrote:
> >   
> > > On Fri, 2 Feb 2024 at 16:50, Gregory Price  
> > > wrote:
> > > >
> > > > On Fri, Feb 02, 2024 at 04:33:20PM +, Peter Maydell wrote:  
> > > > > Here we are trying to take an interrupt. This isn't related to the
> > > > > other can_do_io stuff, it's happening because do_ld_mmio_beN assumes
> > > > > it's called with the BQL not held, but in fact there are some
> > > > > situations where we call into the memory subsystem and we do
> > > > > already have the BQL.  
> > > 
> > > > It's bugs all the way down as usual!
> > > > https://xkcd.com/1416/
> > > >
> > > > I'll dig in a little next week to see if there's an easy fix. We can see
> > > > the return address is already 0 going into mmu_translate, so it does
> > > > look unrelated to the patch I threw out - but probably still has to do
> > > > with things being on IO.  
> > > 
> > > Yes, the low level memory accessors only need to take the BQL if the thing
> > > being accessed is an MMIO device. Probably what is wanted is for those
> > > functions to do "take the lock if we don't already have it", something
> > > like hw/core/cpu-common.c:cpu_reset_interrupt() does.  
> 
> Got back to x86 testing and indeed not taking the lock in that one path
> does get things running (with all Gregory's earlier hacks + DMA limits as
> described below).  Guess it's time to roll some cleaned up patches and
> see how much everyone screams :)
> 

3 series sent out:
(all also on gitlab.com/jic23/qemu cxl-2024-02-15 though I updated patch 
descriptions
a little after pushing that out)

Main set of fixes (x86 'works' under my light testing after this one)
https://lore.kernel.org/qemu-devel/20240215150133.2088-1-jonathan.came...@huawei.com/

ARM FEAT_HADFS (access and dirty it updating in PTW) workaround for missing 
atomic CAS
https://lore.kernel.org/qemu-devel/20240215151804.2426-1-jonathan.came...@huawei.com/T/#t

DMA / virtio fix:
https://lore.kernel.org/qemu-devel/20240215142817.1904-1-jonathan.came...@huawei.com/

Last thing I need to do is propose a suitable flag to make 
Mattias' bounce buffering size parameter apply to "memory" address space.  
Currently
I'm carrying this: (I've no idea how much is need but it's somewhere between 4k 
and 1G)

diff --git a/system/physmem.c b/system/physmem.c
index 43b37942cf..49b961c7a5 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2557,6 +2557,7 @@ static void memory_map_init(void)
 memory_region_init(system_memory, NULL, "system", UINT64_MAX);
 address_space_init(_space_memory, system_memory, "memory");

+address_space_memory.max_bounce_buffer_size = 1024 * 1024 * 1024;
 system_io = g_malloc(sizeof(*system_io));
 memory_region_init_io(system_io, NULL, _io_ops, NULL, "io",
   65536);

Please take a look. These are all in areas of QEMU I'm not particularly 
confident
about so relying on nice people giving feedback even more than normal!

Thanks to all those who helped with debugging and suggestions.

Thanks,

Jonathan

> Jonathan
> 
> 
> > > 
> > > -- PMM
> > 
> > Still a work in progress but I thought I'd give an update on some of the 
> > fun...
> > 
> > I have a set of somewhat dubious workarounds that sort of do the job (where
> > the aim is to be able to safely run any workload on top of any valid
> > emulated CXL device setup).
> > 
> > To recap, the issue is that for CXL memory interleaving we need to have
> > find grained routing to each device (16k Max Gran).  That was fine whilst
> > pretty much all the testing was DAX based so software wasn't running out
> > of it.  Now the kernel is rather more aggressive in defaulting any volatile
> > CXL memory it finds to being normal memory (in some configs anyway) people
> > started hitting problems. Given one of the most important functions of the
> > emulation is to check data ends up in the right backing stores, I'm not
> > keen to drop that feature unless we absolutely have to.
> > 
> > 1) For the simple case of no interleave I have working code that just
> >shoves the MemoryRegion in directly and all works fine.  That was always
> >on the todo list for virtualization cases anyway were we pretend the
> >underlying devices aren't interleaved and frig the reported perf numbers
> >to present aggregate performance etc.  I'll tidy this up and post it.
> >We may want a config parameter to 'reject' address decoder programming
> >that would result in interleave - it's not remotely spec compliant, but
> >meh, it will make it easier to understand.  For virt case we'll probably
> >present locked down decoders (as if a FW has set them up) but for 
> > emulation
> >that limits usefulness too much.
> >
> > 2) Unfortunately, for the interleaved case can't just add a lot of 

Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-15 Thread Jonathan Cameron via
On Thu, 1 Feb 2024 16:00:56 +
Peter Maydell  wrote:

> On Thu, 1 Feb 2024 at 15:17, Alex Bennée  wrote:
> >
> > Peter Maydell  writes:  
> > > So, that looks like:
> > >  * we call cpu_tb_exec(), which executes some generated code
> > >  * that generated code calls the lookup_tb_ptr helper to see
> > >if we have a generated TB already for the address we're going
> > >to execute next
> > >  * lookup_tb_ptr probes the TLB to see if we know the host RAM
> > >address for the guest address
> > >  * this results in a TLB walk for an instruction fetch
> > >  * the page table descriptor load is to IO memory
> > >  * io_prepare assumes it needs to do a TLB recompile, because
> > >can_do_io is clear
> > >
> > > I am not surprised that the corner case of "the guest put its
> > > page tables in an MMIO device" has not yet come up :-)
> > >
> > > I'm really not sure how the icount handling should interact
> > > with that...  
> >
> > Its not just icount - we need to handle it for all modes now. That said
> > seeing as we are at the end of a block shouldn't can_do_io be set?  
> 
> The lookup_tb_ptr helper gets called from tcg_gen_goto_tb(),
> which happens earlier than the tb_stop callback (it can
> happen in the trans function for branch etc insns, for
> example).
> 
> I think it should be OK to clear can_do_io at the start
> of the lookup_tb_ptr helper, something like:
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index 977576ca143..7818537f318 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -396,6 +396,15 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env)
>  uint64_t cs_base;
>  uint32_t flags, cflags;
> 
> +/*
> + * By definition we've just finished a TB, so I/O is OK.
> + * Avoid the possibility of calling cpu_io_recompile() if
> + * a page table walk triggered by tb_lookup() calling
> + * probe_access_internal() happens to touch an MMIO device.
> + * The next TB, if we chain to it, will clear the flag again.
> + */
> +cpu->neg.can_do_io = true;
> +
>  cpu_get_tb_cpu_state(env, , _base, );
> 
>  cflags = curr_cflags(cpu);
> 
> -- PMM

Hi Peter,

I've included this in the series I just sent out:
https://lore.kernel.org/qemu-devel/20240215150133.2088-1-jonathan.came...@huawei.com/T/#t

Could you add your Signed-off-by if you are happy doing so?

Jonathan



Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-08 Thread Jonathan Cameron via
On Wed, 7 Feb 2024 17:34:15 +
Jonathan Cameron  wrote:

> On Fri, 2 Feb 2024 16:56:18 +
> Peter Maydell  wrote:
> 
> > On Fri, 2 Feb 2024 at 16:50, Gregory Price  
> > wrote:  
> > >
> > > On Fri, Feb 02, 2024 at 04:33:20PM +, Peter Maydell wrote:
> > > > Here we are trying to take an interrupt. This isn't related to the
> > > > other can_do_io stuff, it's happening because do_ld_mmio_beN assumes
> > > > it's called with the BQL not held, but in fact there are some
> > > > situations where we call into the memory subsystem and we do
> > > > already have the BQL.
> >   
> > > It's bugs all the way down as usual!
> > > https://xkcd.com/1416/
> > >
> > > I'll dig in a little next week to see if there's an easy fix. We can see
> > > the return address is already 0 going into mmu_translate, so it does
> > > look unrelated to the patch I threw out - but probably still has to do
> > > with things being on IO.
> > 
> > Yes, the low level memory accessors only need to take the BQL if the thing
> > being accessed is an MMIO device. Probably what is wanted is for those
> > functions to do "take the lock if we don't already have it", something
> > like hw/core/cpu-common.c:cpu_reset_interrupt() does.

Got back to x86 testing and indeed not taking the lock in that one path
does get things running (with all Gregory's earlier hacks + DMA limits as
described below).  Guess it's time to roll some cleaned up patches and
see how much everyone screams :)

Jonathan


> > 
> > -- PMM  
> 
> Still a work in progress but I thought I'd give an update on some of the 
> fun...
> 
> I have a set of somewhat dubious workarounds that sort of do the job (where
> the aim is to be able to safely run any workload on top of any valid
> emulated CXL device setup).
> 
> To recap, the issue is that for CXL memory interleaving we need to have
> find grained routing to each device (16k Max Gran).  That was fine whilst
> pretty much all the testing was DAX based so software wasn't running out
> of it.  Now the kernel is rather more aggressive in defaulting any volatile
> CXL memory it finds to being normal memory (in some configs anyway) people
> started hitting problems. Given one of the most important functions of the
> emulation is to check data ends up in the right backing stores, I'm not
> keen to drop that feature unless we absolutely have to.
> 
> 1) For the simple case of no interleave I have working code that just
>shoves the MemoryRegion in directly and all works fine.  That was always
>on the todo list for virtualization cases anyway were we pretend the
>underlying devices aren't interleaved and frig the reported perf numbers
>to present aggregate performance etc.  I'll tidy this up and post it.
>We may want a config parameter to 'reject' address decoder programming
>that would result in interleave - it's not remotely spec compliant, but
>meh, it will make it easier to understand.  For virt case we'll probably
>present locked down decoders (as if a FW has set them up) but for emulation
>that limits usefulness too much.
>
> 2) Unfortunately, for the interleaved case can't just add a lot of memory
>regions because even at highest granularity (16k) and minimum size
>512MiB it takes for ever to eventually run into an assert in
>phys_section_add with the comment:
>"The physical section number is ORed with a page-aligned
> pointer to produce the iotlb entries.  Thus it should
> never overflow into the page-aligned value."
> That sounds hard to 'fix' though I've not looked into it.
> 
> So back to plan (A) papering over the cracks with TCG.
> 
> I've focused on arm64 which seems a bit easier than x86 (and is arguably
> part of my day job)
> 
> Challenges
> 1) The atomic updates of accessed and dirty bits in
>arm_casq_ptw() fail because we don't have a proper address to do them
>on.  However, there is precedence for non atomic updates in there
>already (used when the host system doesn't support big enough cas)
>I think we can do something similar under the bql for this case.
>Not 100% sure I'm writing to the correct address but a simple frig
>superficially appears to work.
> 2) Emulated devices try to do DMA to buffers in the CXL emulated interleave
>memory (virtio_blk for example).  Can't do that because there is no
>actual translation available - just read and write functions.
> 
>So should be easy to avoid as we know how to handle DMA limitations.
>Just set the max dma address width to 40 bits (so below the CXL Fixed 
> Memory
>Windows and rely on Linux to bounce buffer with swiotlb). For a while
>I couldn't work out why changing IORT to provide this didn't work and
>I saw errors for virtio-pci-blk. So digging ensued.
>Virtio devices by default (sort of) bypass the dma-api in linux.
>vring_use_dma_api() in Linux. That is reasonable from the translation
>point of view, but not the DMA 

Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-07 Thread Jonathan Cameron via
On Fri, 2 Feb 2024 16:56:18 +
Peter Maydell  wrote:

> On Fri, 2 Feb 2024 at 16:50, Gregory Price  wrote:
> >
> > On Fri, Feb 02, 2024 at 04:33:20PM +, Peter Maydell wrote:  
> > > Here we are trying to take an interrupt. This isn't related to the
> > > other can_do_io stuff, it's happening because do_ld_mmio_beN assumes
> > > it's called with the BQL not held, but in fact there are some
> > > situations where we call into the memory subsystem and we do
> > > already have the BQL.  
> 
> > It's bugs all the way down as usual!
> > https://xkcd.com/1416/
> >
> > I'll dig in a little next week to see if there's an easy fix. We can see
> > the return address is already 0 going into mmu_translate, so it does
> > look unrelated to the patch I threw out - but probably still has to do
> > with things being on IO.  
> 
> Yes, the low level memory accessors only need to take the BQL if the thing
> being accessed is an MMIO device. Probably what is wanted is for those
> functions to do "take the lock if we don't already have it", something
> like hw/core/cpu-common.c:cpu_reset_interrupt() does.
> 
> -- PMM

Still a work in progress but I thought I'd give an update on some of the fun...

I have a set of somewhat dubious workarounds that sort of do the job (where
the aim is to be able to safely run any workload on top of any valid
emulated CXL device setup).

To recap, the issue is that for CXL memory interleaving we need to have
find grained routing to each device (16k Max Gran).  That was fine whilst
pretty much all the testing was DAX based so software wasn't running out
of it.  Now the kernel is rather more aggressive in defaulting any volatile
CXL memory it finds to being normal memory (in some configs anyway) people
started hitting problems. Given one of the most important functions of the
emulation is to check data ends up in the right backing stores, I'm not
keen to drop that feature unless we absolutely have to.

1) For the simple case of no interleave I have working code that just
   shoves the MemoryRegion in directly and all works fine.  That was always
   on the todo list for virtualization cases anyway were we pretend the
   underlying devices aren't interleaved and frig the reported perf numbers
   to present aggregate performance etc.  I'll tidy this up and post it.
   We may want a config parameter to 'reject' address decoder programming
   that would result in interleave - it's not remotely spec compliant, but
   meh, it will make it easier to understand.  For virt case we'll probably
   present locked down decoders (as if a FW has set them up) but for emulation
   that limits usefulness too much.
   
2) Unfortunately, for the interleaved case can't just add a lot of memory
   regions because even at highest granularity (16k) and minimum size
   512MiB it takes for ever to eventually run into an assert in
   phys_section_add with the comment:
   "The physical section number is ORed with a page-aligned
pointer to produce the iotlb entries.  Thus it should
never overflow into the page-aligned value."
That sounds hard to 'fix' though I've not looked into it.

So back to plan (A) papering over the cracks with TCG.

I've focused on arm64 which seems a bit easier than x86 (and is arguably
part of my day job)

Challenges
1) The atomic updates of accessed and dirty bits in
   arm_casq_ptw() fail because we don't have a proper address to do them
   on.  However, there is precedence for non atomic updates in there
   already (used when the host system doesn't support big enough cas)
   I think we can do something similar under the bql for this case.
   Not 100% sure I'm writing to the correct address but a simple frig
   superficially appears to work.
2) Emulated devices try to do DMA to buffers in the CXL emulated interleave
   memory (virtio_blk for example).  Can't do that because there is no
   actual translation available - just read and write functions.

   So should be easy to avoid as we know how to handle DMA limitations.
   Just set the max dma address width to 40 bits (so below the CXL Fixed Memory
   Windows and rely on Linux to bounce buffer with swiotlb). For a while
   I couldn't work out why changing IORT to provide this didn't work and
   I saw errors for virtio-pci-blk. So digging ensued.
   Virtio devices by default (sort of) bypass the dma-api in linux.
   vring_use_dma_api() in Linux. That is reasonable from the translation
   point of view, but not the DMA limits (and resulting need to use bounce
   buffers).  Maybe could put a sanity check in linux on no iommu +
   a DMA restriction to below 64 bits but I'm not 100% sure we wouldn't
   break other platforms.
   Alternatively just use emulated real device and all seems fine
   - I've tested with nvme.

3) I need to fix the kernel handling for CXL CDAT table originated
   NUMA nodes on ARM64. For now I have a hack in place so I can make
   sure I hit the memory I intend to when testing. I suspect we need
   some 

Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-02 Thread Peter Maydell
On Fri, 2 Feb 2024 at 16:50, Gregory Price  wrote:
>
> On Fri, Feb 02, 2024 at 04:33:20PM +, Peter Maydell wrote:
> > Here we are trying to take an interrupt. This isn't related to the
> > other can_do_io stuff, it's happening because do_ld_mmio_beN assumes
> > it's called with the BQL not held, but in fact there are some
> > situations where we call into the memory subsystem and we do
> > already have the BQL.

> It's bugs all the way down as usual!
> https://xkcd.com/1416/
>
> I'll dig in a little next week to see if there's an easy fix. We can see
> the return address is already 0 going into mmu_translate, so it does
> look unrelated to the patch I threw out - but probably still has to do
> with things being on IO.

Yes, the low level memory accessors only need to take the BQL if the thing
being accessed is an MMIO device. Probably what is wanted is for those
functions to do "take the lock if we don't already have it", something
like hw/core/cpu-common.c:cpu_reset_interrupt() does.

-- PMM



Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-02 Thread Gregory Price
On Fri, Feb 02, 2024 at 04:33:20PM +, Peter Maydell wrote:
> On Fri, 2 Feb 2024 at 16:26, Jonathan Cameron
>  wrote:
> > #7  0x55ab1929 in bql_lock_impl (file=0x56049122 
> > "../../accel/tcg/cputlb.c", line=2033) at ../../system/cpus.c:524
> > #8  bql_lock_impl (file=file@entry=0x56049122 
> > "../../accel/tcg/cputlb.c", line=line@entry=2033) at ../../system/cpus.c:520
> > #9  0x55c9f7d6 in do_ld_mmio_beN (cpu=0x578e0cb0, 
> > full=0x7ffe88012950, ret_be=ret_be@entry=0, addr=19595792376, 
> > size=size@entry=8, mmu_idx=4, type=MMU_DATA_LOAD, ra=0) at 
> > ../../accel/tcg/cputlb.c:2033
> > #10 0x55ca0fbd in do_ld_8 (cpu=cpu@entry=0x578e0cb0, 
> > p=p@entry=0x74efd1d0, mmu_idx=, 
> > type=type@entry=MMU_DATA_LOAD, memop=, ra=ra@entry=0) at 
> > ../../accel/tcg/cputlb.c:2356
> > #11 0x55ca341f in do_ld8_mmu (cpu=cpu@entry=0x578e0cb0, 
> > addr=addr@entry=19595792376, oi=oi@entry=52, ra=0, ra@entry=52, 
> > access_type=access_type@entry=MMU_DATA_LOAD) at 
> > ../../accel/tcg/cputlb.c:2439
> > #12 0x55ca5f59 in cpu_ldq_mmu (ra=52, oi=52, addr=19595792376, 
> > env=0x578e3470) at ../../accel/tcg/ldst_common.c.inc:169
> > #13 cpu_ldq_le_mmuidx_ra (env=0x578e3470, addr=19595792376, 
> > mmu_idx=, ra=ra@entry=0) at 
> > ../../accel/tcg/ldst_common.c.inc:301
> > #14 0x55b4b5fc in ptw_ldq (ra=0, in=0x74efd320) at 
> > ../../target/i386/tcg/sysemu/excp_helper.c:98
> > #15 ptw_ldq (ra=0, in=0x74efd320) at 
> > ../../target/i386/tcg/sysemu/excp_helper.c:93
> > #16 mmu_translate (env=env@entry=0x578e3470, in=0x74efd3e0, 
> > out=0x74efd3b0, err=err@entry=0x74efd3c0, ra=ra@entry=0) at 
> > ../../target/i386/tcg/sysemu/excp_helper.c:174
> > #17 0x55b4c4b3 in get_physical_address (ra=0, err=0x74efd3c0, 
> > out=0x74efd3b0, mmu_idx=0, access_type=MMU_DATA_LOAD, 
> > addr=18446741874686299840, env=0x578e3470) at 
> > ../../target/i386/tcg/sysemu/excp_helper.c:580
> > #18 x86_cpu_tlb_fill (cs=0x578e0cb0, addr=18446741874686299840, 
> > size=, access_type=MMU_DATA_LOAD, mmu_idx=0, 
> > probe=, retaddr=0) at 
> > ../../target/i386/tcg/sysemu/excp_helper.c:606
> > #19 0x55ca0ee9 in tlb_fill (retaddr=0, mmu_idx=0, 
> > access_type=MMU_DATA_LOAD, size=, addr=18446741874686299840, 
> > cpu=0x74efd540) at ../../accel/tcg/cputlb.c:1315
> > #20 mmu_lookup1 (cpu=cpu@entry=0x578e0cb0, 
> > data=data@entry=0x74efd540, mmu_idx=0, 
> > access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at 
> > ../../accel/tcg/cputlb.c:1713
> 
> Here we are trying to take an interrupt. This isn't related to the
> other can_do_io stuff, it's happening because do_ld_mmio_beN assumes
> it's called with the BQL not held, but in fact there are some
> situations where we call into the memory subsystem and we do
> already have the BQL.
> 
> -- PMM

It's bugs all the way down as usual!
https://xkcd.com/1416/

I'll dig in a little next week to see if there's an easy fix. We can see
the return address is already 0 going into mmu_translate, so it does
look unrelated to the patch I threw out - but probably still has to do
with things being on IO.

~Gregory



Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-02 Thread Peter Maydell
On Fri, 2 Feb 2024 at 16:26, Jonathan Cameron
 wrote:
> New exciting trace...
> Thread 5 "qemu-system-x86" received signal SIGABRT, Aborted.
> [Switching to Thread 0x74efe6c0 (LWP 16503)]
> __pthread_kill_implementation (no_tid=0, signo=6, threadid=) 
> at ./nptl/pthread_kill.c:44
> Download failed: Invalid argument.  Continuing without source file 
> ./nptl/./nptl/pthread_kill.c.
> 44  ./nptl/pthread_kill.c: No such file or directory.
> (gdb) bt
> #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid= out>) at ./nptl/pthread_kill.c:44
> #1  __pthread_kill_internal (signo=6, threadid=) at 
> ./nptl/pthread_kill.c:78
> #2  __GI___pthread_kill (threadid=, signo=signo@entry=6) at 
> ./nptl/pthread_kill.c:89
> #3  0x777c43b6 in __GI_raise (sig=sig@entry=6) at 
> ../sysdeps/posix/raise.c:26
> #4  0x777aa87c in __GI_abort () at ./stdlib/abort.c:79
> #5  0x77b2ed1e in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
> #6  0x77b9622e in g_assertion_message_expr () at 
> /lib/x86_64-linux-gnu/libglib-2.0.so.0
> #7  0x55ab1929 in bql_lock_impl (file=0x56049122 
> "../../accel/tcg/cputlb.c", line=2033) at ../../system/cpus.c:524
> #8  bql_lock_impl (file=file@entry=0x56049122 "../../accel/tcg/cputlb.c", 
> line=line@entry=2033) at ../../system/cpus.c:520
> #9  0x55c9f7d6 in do_ld_mmio_beN (cpu=0x578e0cb0, 
> full=0x7ffe88012950, ret_be=ret_be@entry=0, addr=19595792376, 
> size=size@entry=8, mmu_idx=4, type=MMU_DATA_LOAD, ra=0) at 
> ../../accel/tcg/cputlb.c:2033
> #10 0x55ca0fbd in do_ld_8 (cpu=cpu@entry=0x578e0cb0, 
> p=p@entry=0x74efd1d0, mmu_idx=, 
> type=type@entry=MMU_DATA_LOAD, memop=, ra=ra@entry=0) at 
> ../../accel/tcg/cputlb.c:2356
> #11 0x55ca341f in do_ld8_mmu (cpu=cpu@entry=0x578e0cb0, 
> addr=addr@entry=19595792376, oi=oi@entry=52, ra=0, ra@entry=52, 
> access_type=access_type@entry=MMU_DATA_LOAD) at ../../accel/tcg/cputlb.c:2439
> #12 0x55ca5f59 in cpu_ldq_mmu (ra=52, oi=52, addr=19595792376, 
> env=0x578e3470) at ../../accel/tcg/ldst_common.c.inc:169
> #13 cpu_ldq_le_mmuidx_ra (env=0x578e3470, addr=19595792376, 
> mmu_idx=, ra=ra@entry=0) at 
> ../../accel/tcg/ldst_common.c.inc:301
> #14 0x55b4b5fc in ptw_ldq (ra=0, in=0x74efd320) at 
> ../../target/i386/tcg/sysemu/excp_helper.c:98
> #15 ptw_ldq (ra=0, in=0x74efd320) at 
> ../../target/i386/tcg/sysemu/excp_helper.c:93
> #16 mmu_translate (env=env@entry=0x578e3470, in=0x74efd3e0, 
> out=0x74efd3b0, err=err@entry=0x74efd3c0, ra=ra@entry=0) at 
> ../../target/i386/tcg/sysemu/excp_helper.c:174
> #17 0x55b4c4b3 in get_physical_address (ra=0, err=0x74efd3c0, 
> out=0x74efd3b0, mmu_idx=0, access_type=MMU_DATA_LOAD, 
> addr=18446741874686299840, env=0x578e3470) at 
> ../../target/i386/tcg/sysemu/excp_helper.c:580
> #18 x86_cpu_tlb_fill (cs=0x578e0cb0, addr=18446741874686299840, 
> size=, access_type=MMU_DATA_LOAD, mmu_idx=0, probe= out>, retaddr=0) at ../../target/i386/tcg/sysemu/excp_helper.c:606
> #19 0x55ca0ee9 in tlb_fill (retaddr=0, mmu_idx=0, 
> access_type=MMU_DATA_LOAD, size=, addr=18446741874686299840, 
> cpu=0x74efd540) at ../../accel/tcg/cputlb.c:1315
> #20 mmu_lookup1 (cpu=cpu@entry=0x578e0cb0, 
> data=data@entry=0x74efd540, mmu_idx=0, 
> access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at 
> ../../accel/tcg/cputlb.c:1713
> #21 0x55ca2c61 in mmu_lookup (cpu=cpu@entry=0x578e0cb0, 
> addr=addr@entry=18446741874686299840, oi=oi@entry=32, ra=ra@entry=0, 
> type=type@entry=MMU_DATA_LOAD, l=l@entry=0x74efd540) at 
> ../../accel/tcg/cputlb.c:1803
> #22 0x55ca3165 in do_ld4_mmu (cpu=cpu@entry=0x578e0cb0, 
> addr=addr@entry=18446741874686299840, oi=oi@entry=32, ra=ra@entry=0, 
> access_type=access_type@entry=MMU_DATA_LOAD) at ../../accel/tcg/cputlb.c:2416
> #23 0x55ca5ef9 in cpu_ldl_mmu (ra=0, oi=32, 
> addr=18446741874686299840, env=0x578e3470) at 
> ../../accel/tcg/ldst_common.c.inc:158
> #24 cpu_ldl_le_mmuidx_ra (env=env@entry=0x578e3470, 
> addr=addr@entry=18446741874686299840, mmu_idx=, ra=ra@entry=0) 
> at ../../accel/tcg/ldst_common.c.inc:294
> #25 0x55bb6cdd in do_interrupt64 (is_hw=1, 
> next_eip=18446744072399775809, error_code=0, is_int=0, intno=236, 
> env=0x578e3470) at ../../target/i386/tcg/seg_helper.c:889
> #26 do_interrupt_all (cpu=cpu@entry=0x578e0cb0, intno=236, 
> is_int=is_int@entry=0, error_code=error_code@entry=0, 
> next_eip=next_eip@entry=0, is_hw=is_hw@entry=1) at 
> ../../target/i386/tcg/seg_helper.c:1130
> #27 0x55bb87da in do_interrupt_x86_hardirq 
> (env=env@entry=0x578e3470, intno=, is_hw=is_hw@entry=1) at 
> ../../target/i386/tcg/seg_helper.c:1162
> #28 0x55b5039c in x86_cpu_exec_interrupt (cs=0x578e0cb0, 
> interrupt_request=) at 
> ../../target/i386/tcg/sysemu/seg_helper.c:197
> #29 0x55c94480 in cpu_handle_interrupt (last_tb=, 
> 

Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-02 Thread Jonathan Cameron via
On Thu, 1 Feb 2024 13:56:09 -0500
Gregory Price  wrote:

> On Thu, Feb 01, 2024 at 06:04:26PM +, Peter Maydell wrote:
> > On Thu, 1 Feb 2024 at 17:25, Alex Bennée  wrote:  
> > >
> > > Jonathan Cameron  writes:  
> > > >> > #21 0x55ca3e5d in do_st8_mmu (cpu=0x578e0cb0, 
> > > >> > addr=23937, val=18386491784638059520, oi=6, ra=140736029817822) at 
> > > >> > ../../accel/tcg/cputlb.c:2853
> > > >> > #22 0x7fffa9107c63 in code_gen_buffer ()  
> > > >>
> > > >> No thats different - we are actually writing to the MMIO region here.
> > > >> But the fact we hit cpu_abort because we can't find the TB we are
> > > >> executing is a little problematic.
> > > >>
> > > >> Does ra properly point to the code buffer here?  
> > > >
> > > > Err.  How would I tell?  
> > >
> > > (gdb) p/x 140736029817822
> > > $1 = 0x7fffa9107bde
> > >
> > > seems off because code_gen_buffer starts at 0x7fffa9107c63  
> > 
> > The code_gen_buffer doesn't *start* at 0x7fffa9107c63 --
> > that is our return address into it, which is to say it's just
> > after the call insn to the do_st8_mmu helper. The 'ra' argument
> > to the helper function is going to be a number slightly lower
> > than that, because it points within the main lump of generated
> > code for the TB, whereas the helper call is done as part of
> > an out-of-line lump of common code at the end of the TB.
> > 
> > The 'ra' here is fine -- the problem is that we don't
> > pass it all the way down the callstack and instead end
> > up using 0 as a 'ra' within the ptw code.
> > 
> > -- PMM  
> 
> Is there any particular reason not to, as below?
> ~Gregory
> 
One patch blindly applied. 
New exciting trace...
Thread 5 "qemu-system-x86" received signal SIGABRT, Aborted.
[Switching to Thread 0x74efe6c0 (LWP 16503)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=) at 
./nptl/pthread_kill.c:44
Download failed: Invalid argument.  Continuing without source file 
./nptl/./nptl/pthread_kill.c.
44  ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=) 
at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=) at 
./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=, signo=signo@entry=6) at 
./nptl/pthread_kill.c:89
#3  0x777c43b6 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/posix/raise.c:26
#4  0x777aa87c in __GI_abort () at ./stdlib/abort.c:79
#5  0x77b2ed1e in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#6  0x77b9622e in g_assertion_message_expr () at 
/lib/x86_64-linux-gnu/libglib-2.0.so.0
#7  0x55ab1929 in bql_lock_impl (file=0x56049122 
"../../accel/tcg/cputlb.c", line=2033) at ../../system/cpus.c:524
#8  bql_lock_impl (file=file@entry=0x56049122 "../../accel/tcg/cputlb.c", 
line=line@entry=2033) at ../../system/cpus.c:520
#9  0x55c9f7d6 in do_ld_mmio_beN (cpu=0x578e0cb0, 
full=0x7ffe88012950, ret_be=ret_be@entry=0, addr=19595792376, 
size=size@entry=8, mmu_idx=4, type=MMU_DATA_LOAD, ra=0) at 
../../accel/tcg/cputlb.c:2033
#10 0x55ca0fbd in do_ld_8 (cpu=cpu@entry=0x578e0cb0, 
p=p@entry=0x74efd1d0, mmu_idx=, 
type=type@entry=MMU_DATA_LOAD, memop=, ra=ra@entry=0) at 
../../accel/tcg/cputlb.c:2356
#11 0x55ca341f in do_ld8_mmu (cpu=cpu@entry=0x578e0cb0, 
addr=addr@entry=19595792376, oi=oi@entry=52, ra=0, ra@entry=52, 
access_type=access_type@entry=MMU_DATA_LOAD) at ../../accel/tcg/cputlb.c:2439
#12 0x55ca5f59 in cpu_ldq_mmu (ra=52, oi=52, addr=19595792376, 
env=0x578e3470) at ../../accel/tcg/ldst_common.c.inc:169
#13 cpu_ldq_le_mmuidx_ra (env=0x578e3470, addr=19595792376, 
mmu_idx=, ra=ra@entry=0) at ../../accel/tcg/ldst_common.c.inc:301
#14 0x55b4b5fc in ptw_ldq (ra=0, in=0x74efd320) at 
../../target/i386/tcg/sysemu/excp_helper.c:98
#15 ptw_ldq (ra=0, in=0x74efd320) at 
../../target/i386/tcg/sysemu/excp_helper.c:93
#16 mmu_translate (env=env@entry=0x578e3470, in=0x74efd3e0, 
out=0x74efd3b0, err=err@entry=0x74efd3c0, ra=ra@entry=0) at 
../../target/i386/tcg/sysemu/excp_helper.c:174
#17 0x55b4c4b3 in get_physical_address (ra=0, err=0x74efd3c0, 
out=0x74efd3b0, mmu_idx=0, access_type=MMU_DATA_LOAD, 
addr=18446741874686299840, env=0x578e3470) at 
../../target/i386/tcg/sysemu/excp_helper.c:580
#18 x86_cpu_tlb_fill (cs=0x578e0cb0, addr=18446741874686299840, 
size=, access_type=MMU_DATA_LOAD, mmu_idx=0, probe=, retaddr=0) at ../../target/i386/tcg/sysemu/excp_helper.c:606
#19 0x55ca0ee9 in tlb_fill (retaddr=0, mmu_idx=0, 
access_type=MMU_DATA_LOAD, size=, addr=18446741874686299840, 
cpu=0x74efd540) at ../../accel/tcg/cputlb.c:1315
#20 mmu_lookup1 (cpu=cpu@entry=0x578e0cb0, data=data@entry=0x74efd540, 
mmu_idx=0, access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at 
../../accel/tcg/cputlb.c:1713
#21 0x55ca2c61 in mmu_lookup (cpu=cpu@entry=0x578e0cb0, 

Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-01 Thread Gregory Price
On Thu, Feb 01, 2024 at 06:04:26PM +, Peter Maydell wrote:
> On Thu, 1 Feb 2024 at 17:25, Alex Bennée  wrote:
> >
> > Jonathan Cameron  writes:
> > >> > #21 0x55ca3e5d in do_st8_mmu (cpu=0x578e0cb0, addr=23937, 
> > >> > val=18386491784638059520, oi=6, ra=140736029817822) at 
> > >> > ../../accel/tcg/cputlb.c:2853
> > >> > #22 0x7fffa9107c63 in code_gen_buffer ()
> > >>
> > >> No thats different - we are actually writing to the MMIO region here.
> > >> But the fact we hit cpu_abort because we can't find the TB we are
> > >> executing is a little problematic.
> > >>
> > >> Does ra properly point to the code buffer here?
> > >
> > > Err.  How would I tell?
> >
> > (gdb) p/x 140736029817822
> > $1 = 0x7fffa9107bde
> >
> > seems off because code_gen_buffer starts at 0x7fffa9107c63
> 
> The code_gen_buffer doesn't *start* at 0x7fffa9107c63 --
> that is our return address into it, which is to say it's just
> after the call insn to the do_st8_mmu helper. The 'ra' argument
> to the helper function is going to be a number slightly lower
> than that, because it points within the main lump of generated
> code for the TB, whereas the helper call is done as part of
> an out-of-line lump of common code at the end of the TB.
> 
> The 'ra' here is fine -- the problem is that we don't
> pass it all the way down the callstack and instead end
> up using 0 as a 'ra' within the ptw code.
> 
> -- PMM

Is there any particular reason not to, as below?
~Gregory


diff --git a/target/i386/tcg/sysemu/excp_helper.c 
b/target/i386/tcg/sysemu/excp_helper.c
index 5b86f439ad..2f581b9bfb 100644
--- a/target/i386/tcg/sysemu/excp_helper.c
+++ b/target/i386/tcg/sysemu/excp_helper.c
@@ -59,14 +59,14 @@ typedef struct PTETranslate {
 hwaddr gaddr;
 } PTETranslate;

-static bool ptw_translate(PTETranslate *inout, hwaddr addr)
+static bool ptw_translate(PTETranslate *inout, hwaddr addr, uint64_t ra)
 {
 CPUTLBEntryFull *full;
 int flags;

 inout->gaddr = addr;
 flags = probe_access_full(inout->env, addr, 0, MMU_DATA_STORE,
-  inout->ptw_idx, true, >haddr, , 0);
+  inout->ptw_idx, true, >haddr, , ra);

 if (unlikely(flags & TLB_INVALID_MASK)) {
 TranslateFault *err = inout->err;
@@ -82,20 +82,20 @@ static bool ptw_translate(PTETranslate *inout, hwaddr addr)
 return true;
 }

-static inline uint32_t ptw_ldl(const PTETranslate *in)
+static inline uint32_t ptw_ldl(const PTETranslate *in, uint64_t ra)
 {
 if (likely(in->haddr)) {
 return ldl_p(in->haddr);
 }
-return cpu_ldl_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, 0);
+return cpu_ldl_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, ra);
 }

-static inline uint64_t ptw_ldq(const PTETranslate *in)
+static inline uint64_t ptw_ldq(const PTETranslate *in, uint64_t ra)
 {
 if (likely(in->haddr)) {
 return ldq_p(in->haddr);
 }
-return cpu_ldq_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, 0);
+return cpu_ldq_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, ra);
 }

 /*
@@ -132,7 +132,8 @@ static inline bool ptw_setl(const PTETranslate *in, 
uint32_t old, uint32_t set)
 }

 static bool mmu_translate(CPUX86State *env, const TranslateParams *in,
-  TranslateResult *out, TranslateFault *err)
+  TranslateResult *out, TranslateFault *err,
+  uint64_t ra)
 {
 const int32_t a20_mask = x86_get_a20_mask(env);
 const target_ulong addr = in->addr;
@@ -166,11 +167,11 @@ static bool mmu_translate(CPUX86State *env, const 
TranslateParams *in,
  */
 pte_addr = ((in->cr3 & ~0xfff) +
 (((addr >> 48) & 0x1ff) << 3)) & a20_mask;
-if (!ptw_translate(_trans, pte_addr)) {
+if (!ptw_translate(_trans, pte_addr, ra)) {
 return false;
 }
 restart_5:
-pte = ptw_ldq(_trans);
+pte = ptw_ldq(_trans, ra);
 if (!(pte & PG_PRESENT_MASK)) {
 goto do_fault;
 }
@@ -191,11 +192,11 @@ static bool mmu_translate(CPUX86State *env, const 
TranslateParams *in,
  */
 pte_addr = ((pte & PG_ADDRESS_MASK) +
 (((addr >> 39) & 0x1ff) << 3)) & a20_mask;
-if (!ptw_translate(_trans, pte_addr)) {
+if (!ptw_translate(_trans, pte_addr, ra)) {
 return false;
 }
 restart_4:
-pte = ptw_ldq(_trans);
+pte = ptw_ldq(_trans, ra);
 if (!(pte & PG_PRESENT_MASK)) {
 goto do_fault;
 }
@@ -212,11 +213,11 @@ static bool mmu_translate(CPUX86State *env, const 
TranslateParams *in,
  */
 pte_addr = ((pte & PG_ADDRESS_MASK) +
 (((addr >> 30) & 0x1ff) << 3)) & a20_mask;
-if (!ptw_translate(_trans, 

Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-01 Thread Peter Maydell
On Thu, 1 Feb 2024 at 17:25, Alex Bennée  wrote:
>
> Jonathan Cameron  writes:
> >> > #21 0x55ca3e5d in do_st8_mmu (cpu=0x578e0cb0, addr=23937, 
> >> > val=18386491784638059520, oi=6, ra=140736029817822) at 
> >> > ../../accel/tcg/cputlb.c:2853
> >> > #22 0x7fffa9107c63 in code_gen_buffer ()
> >>
> >> No thats different - we are actually writing to the MMIO region here.
> >> But the fact we hit cpu_abort because we can't find the TB we are
> >> executing is a little problematic.
> >>
> >> Does ra properly point to the code buffer here?
> >
> > Err.  How would I tell?
>
> (gdb) p/x 140736029817822
> $1 = 0x7fffa9107bde
>
> seems off because code_gen_buffer starts at 0x7fffa9107c63

The code_gen_buffer doesn't *start* at 0x7fffa9107c63 --
that is our return address into it, which is to say it's just
after the call insn to the do_st8_mmu helper. The 'ra' argument
to the helper function is going to be a number slightly lower
than that, because it points within the main lump of generated
code for the TB, whereas the helper call is done as part of
an out-of-line lump of common code at the end of the TB.

The 'ra' here is fine -- the problem is that we don't
pass it all the way down the callstack and instead end
up using 0 as a 'ra' within the ptw code.

-- PMM



Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-01 Thread Jonathan Cameron via
On Thu, 1 Feb 2024 17:21:49 +
Peter Maydell  wrote:

> On Thu, 1 Feb 2024 at 17:08, Jonathan Cameron
>  wrote:
> >
> > On Thu, 01 Feb 2024 16:45:30 +
> > Alex Bennée  wrote:
> >  
> > > Jonathan Cameron  writes:
> > >  
> > > > On Thu, 1 Feb 2024 16:00:56 +
> > > > Peter Maydell  wrote:
> > > >  
> > > >> On Thu, 1 Feb 2024 at 15:17, Alex Bennée  
> > > >> wrote:  
> > > >> >
> > > >> > Peter Maydell  writes:  
> > > >> > > So, that looks like:
> > > >> > >  * we call cpu_tb_exec(), which executes some generated code
> > > >> > >  * that generated code calls the lookup_tb_ptr helper to see
> > > >> > >if we have a generated TB already for the address we're going
> > > >> > >to execute next
> > > >> > >  * lookup_tb_ptr probes the TLB to see if we know the host RAM
> > > >> > >address for the guest address
> > > >> > >  * this results in a TLB walk for an instruction fetch
> > > >> > >  * the page table descriptor load is to IO memory
> > > >> > >  * io_prepare assumes it needs to do a TLB recompile, because
> > > >> > >can_do_io is clear
> > > >> > >
> > > >> > > I am not surprised that the corner case of "the guest put its
> > > >> > > page tables in an MMIO device" has not yet come up :-)
> > > >> > >
> > > >> > > I'm really not sure how the icount handling should interact
> > > >> > > with that...  
> > > >> >
> > > >> > Its not just icount - we need to handle it for all modes now. That 
> > > >> > said
> > > >> > seeing as we are at the end of a block shouldn't can_do_io be set?  
> > > >>
> > > >> The lookup_tb_ptr helper gets called from tcg_gen_goto_tb(),
> > > >> which happens earlier than the tb_stop callback (it can
> > > >> happen in the trans function for branch etc insns, for
> > > >> example).
> > > >>
> > > >> I think it should be OK to clear can_do_io at the start
> > > >> of the lookup_tb_ptr helper, something like:
> > > >> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> > > >> index 977576ca143..7818537f318 100644
> > > >> --- a/accel/tcg/cpu-exec.c
> > > >> +++ b/accel/tcg/cpu-exec.c
> > > >> @@ -396,6 +396,15 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState 
> > > >> *env)
> > > >>  uint64_t cs_base;
> > > >>  uint32_t flags, cflags;
> > > >>
> > > >> +/*
> > > >> + * By definition we've just finished a TB, so I/O is OK.
> > > >> + * Avoid the possibility of calling cpu_io_recompile() if
> > > >> + * a page table walk triggered by tb_lookup() calling
> > > >> + * probe_access_internal() happens to touch an MMIO device.
> > > >> + * The next TB, if we chain to it, will clear the flag again.
> > > >> + */
> > > >> +cpu->neg.can_do_io = true;
> > > >> +
> > > >>  cpu_get_tb_cpu_state(env, , _base, );
> > > >>
> > > >>  cflags = curr_cflags(cpu);
> > > >>
> > > >> -- PMM  
> > > >
> > > > No joy.  Seems like a very similar backtrace.
> > > >
> > > > Thread 5 "qemu-system-x86" received signal SIGABRT, Aborted.
> > > > [Switching to Thread 0x74efe6c0 (LWP 23937)]
> > > > __pthread_kill_implementation (no_tid=0, signo=6, threadid= > > > out>) at ./nptl/pthread_kill.c:44
> > > > 44  ./nptl/pthread_kill.c: No such file or directory.
> > > > (gdb) bt
> > > > #0  __pthread_kill_implementation (no_tid=0, signo=6, 
> > > > threadid=) at ./nptl/pthread_kill.c:44
> > > > #1  __pthread_kill_internal (signo=6, threadid=) at 
> > > > ./nptl/pthread_kill.c:78
> > > > #2  __GI___pthread_kill (threadid=, signo=signo@entry=6) 
> > > > at ./nptl/pthread_kill.c:89
> > > > #3  0x777c43b6 in __GI_raise (sig=sig@entry=6) at 
> > > > ../sysdeps/posix/raise.c:26
> > > > #4  0x777aa87c in __GI_abort () at ./stdlib/abort.c:79
> > > > #5  0x55c4d19e in cpu_abort (cpu=cpu@entry=0x578e0cb0, 
> > > > fmt=fmt@entry=0x56048ee8 "cpu_io_recompile: could not find TB for 
> > > > pc=%p") at ../../cpu-target.c:373
> > > > #6  0x55c9cb25 in cpu_io_recompile 
> > > > (cpu=cpu@entry=0x578e0cb0, retaddr=retaddr@entry=0) at 
> > > > ../../accel/tcg/translate-all.c:611
> > > > #7  0x55c9f744 in io_prepare (retaddr=0, addr=19595790664, 
> > > > attrs=..., xlat=, cpu=0x578e0cb0, 
> > > > out_offset=) at ../../accel/tcg/cputlb.c:1339
> > > > #8  do_ld_mmio_beN (cpu=0x578e0cb0, full=0x7ffe88012890, 
> > > > ret_be=ret_be@entry=0, addr=19595790664, size=size@entry=8, mmu_idx=4, 
> > > > type=MMU_DATA_LOAD, ra=0) at ../../accel/tcg/cputlb.c:2030
> > > > #9  0x55ca0ecd in do_ld_8 (cpu=cpu@entry=0x578e0cb0, 
> > > > p=p@entry=0x74efcdd0, mmu_idx=, 
> > > > type=type@entry=MMU_DATA_LOAD, memop=, ra=ra@entry=0) at 
> > > > ../../accel/tcg/cputlb.c:2356
> > > > #10 0x55ca332f in do_ld8_mmu (cpu=cpu@entry=0x578e0cb0, 
> > > > addr=addr@entry=19595790664, oi=oi@entry=52, ra=ra@entry=0, 
> > > > access_type=access_type@entry=MMU_DATA_LOAD) at 
> > > > ../../accel/tcg/cputlb.c:2439
> > > > #11 0x55ca5e69 in cpu_ldq_mmu (ra=0, oi=52, addr=19595790664, 
> > > 

Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-01 Thread Gregory Price
On Thu, Feb 01, 2024 at 05:07:31PM +, Peter Maydell wrote:
> On Thu, 1 Feb 2024 at 17:04, Gregory Price  wrote:
> >
> > On Thu, Feb 01, 2024 at 04:45:30PM +, Alex Bennée wrote:
> 
> > > No thats different - we are actually writing to the MMIO region here.
> > > But the fact we hit cpu_abort because we can't find the TB we are
> > > executing is a little problematic.
> > >
> > > Does ra properly point to the code buffer here?
> > >
> >
> > What if the code block is ALSO in CXL (MMIO)? :D
> 
> In that case the TB is supposed to be a single insn,
> so the insn will by definition be the last one in its
> TB, and IO should be OK for it -- so can_do_io ought
> to be true and we shouldn't get into the io_recompile.
> 
> -- PMM

We saw a bug early on in CXL emulation with instructions hosted on CXL
that split a page boundary (e.g. 0xEB|0xFE)..  I'm wondering about a
code block that splits a page boundary and whether there's a similar
corner case.

~Gregory



Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-01 Thread Alex Bennée
Jonathan Cameron  writes:

> On Thu, 01 Feb 2024 16:45:30 +
> Alex Bennée  wrote:
>
>> Jonathan Cameron  writes:
>> 
>> > On Thu, 1 Feb 2024 16:00:56 +
>> > Peter Maydell  wrote:
>> >  
>> >> On Thu, 1 Feb 2024 at 15:17, Alex Bennée  wrote:  
>> >> >
>> >> > Peter Maydell  writes:
>> >> > > So, that looks like:
>> >> > >  * we call cpu_tb_exec(), which executes some generated code
>> >> > >  * that generated code calls the lookup_tb_ptr helper to see
>> >> > >if we have a generated TB already for the address we're going
>> >> > >to execute next
>> >> > >  * lookup_tb_ptr probes the TLB to see if we know the host RAM
>> >> > >address for the guest address
>> >> > >  * this results in a TLB walk for an instruction fetch
>> >> > >  * the page table descriptor load is to IO memory
>> >> > >  * io_prepare assumes it needs to do a TLB recompile, because
>> >> > >can_do_io is clear
>> >> > >
>> >> > > I am not surprised that the corner case of "the guest put its
>> >> > > page tables in an MMIO device" has not yet come up :-)
>> >> > >
>> >> > > I'm really not sure how the icount handling should interact
>> >> > > with that...
>> >> >
>> >> > Its not just icount - we need to handle it for all modes now. That said
>> >> > seeing as we are at the end of a block shouldn't can_do_io be set?
>> >> 
>> >> The lookup_tb_ptr helper gets called from tcg_gen_goto_tb(),
>> >> which happens earlier than the tb_stop callback (it can
>> >> happen in the trans function for branch etc insns, for
>> >> example).
>> >> 
>> >> I think it should be OK to clear can_do_io at the start
>> >> of the lookup_tb_ptr helper, something like:
>> >> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
>> >> index 977576ca143..7818537f318 100644
>> >> --- a/accel/tcg/cpu-exec.c
>> >> +++ b/accel/tcg/cpu-exec.c
>> >> @@ -396,6 +396,15 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env)
>> >>  uint64_t cs_base;
>> >>  uint32_t flags, cflags;
>> >> 
>> >> +/*
>> >> + * By definition we've just finished a TB, so I/O is OK.
>> >> + * Avoid the possibility of calling cpu_io_recompile() if
>> >> + * a page table walk triggered by tb_lookup() calling
>> >> + * probe_access_internal() happens to touch an MMIO device.
>> >> + * The next TB, if we chain to it, will clear the flag again.
>> >> + */
>> >> +cpu->neg.can_do_io = true;
>> >> +
>> >>  cpu_get_tb_cpu_state(env, , _base, );
>> >> 
>> >>  cflags = curr_cflags(cpu);
>> >> 
>> >> -- PMM  
>> >
>> > No joy.  Seems like a very similar backtrace.
>> >
>> > Thread 5 "qemu-system-x86" received signal SIGABRT, Aborted.
>> > [Switching to Thread 0x74efe6c0 (LWP 23937)]
>> > __pthread_kill_implementation (no_tid=0, signo=6, threadid=> > out>) at ./nptl/pthread_kill.c:44
>> > 44  ./nptl/pthread_kill.c: No such file or directory.
>> > (gdb) bt
>> > #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=> > out>) at ./nptl/pthread_kill.c:44
>> > #1  __pthread_kill_internal (signo=6, threadid=) at 
>> > ./nptl/pthread_kill.c:78
>> > #2  __GI___pthread_kill (threadid=, signo=signo@entry=6) at 
>> > ./nptl/pthread_kill.c:89
>> > #3  0x777c43b6 in __GI_raise (sig=sig@entry=6) at 
>> > ../sysdeps/posix/raise.c:26
>> > #4  0x777aa87c in __GI_abort () at ./stdlib/abort.c:79
>> > #5  0x55c4d19e in cpu_abort (cpu=cpu@entry=0x578e0cb0, 
>> > fmt=fmt@entry=0x56048ee8 "cpu_io_recompile: could not find TB for 
>> > pc=%p") at ../../cpu-target.c:373
>> > #6  0x55c9cb25 in cpu_io_recompile (cpu=cpu@entry=0x578e0cb0, 
>> > retaddr=retaddr@entry=0) at ../../accel/tcg/translate-all.c:611
>> > #7  0x55c9f744 in io_prepare (retaddr=0, addr=19595790664, 
>> > attrs=..., xlat=, cpu=0x578e0cb0, out_offset=> > pointer>) at ../../accel/tcg/cputlb.c:1339
>> > #8  do_ld_mmio_beN (cpu=0x578e0cb0, full=0x7ffe88012890, 
>> > ret_be=ret_be@entry=0, addr=19595790664, size=size@entry=8, mmu_idx=4, 
>> > type=MMU_DATA_LOAD, ra=0) at ../../accel/tcg/cputlb.c:2030
>> > #9  0x55ca0ecd in do_ld_8 (cpu=cpu@entry=0x578e0cb0, 
>> > p=p@entry=0x74efcdd0, mmu_idx=, 
>> > type=type@entry=MMU_DATA_LOAD, memop=, ra=ra@entry=0) at 
>> > ../../accel/tcg/cputlb.c:2356
>> > #10 0x55ca332f in do_ld8_mmu (cpu=cpu@entry=0x578e0cb0, 
>> > addr=addr@entry=19595790664, oi=oi@entry=52, ra=ra@entry=0, 
>> > access_type=access_type@entry=MMU_DATA_LOAD) at 
>> > ../../accel/tcg/cputlb.c:2439
>> > #11 0x55ca5e69 in cpu_ldq_mmu (ra=0, oi=52, addr=19595790664, 
>> > env=0x578e3470) at ../../accel/tcg/ldst_common.c.inc:169
>> > #12 cpu_ldq_le_mmuidx_ra (env=0x578e3470, addr=19595790664, 
>> > mmu_idx=, ra=ra@entry=0) at 
>> > ../../accel/tcg/ldst_common.c.inc:301
>> > #13 0x55b4b5de in ptw_ldq (in=0x74efcf10) at 
>> > ../../target/i386/tcg/sysemu/excp_helper.c:98
>> > #14 ptw_ldq (in=0x74efcf10) at 
>> > 

Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-01 Thread Peter Maydell
On Thu, 1 Feb 2024 at 17:08, Jonathan Cameron
 wrote:
>
> On Thu, 01 Feb 2024 16:45:30 +
> Alex Bennée  wrote:
>
> > Jonathan Cameron  writes:
> >
> > > On Thu, 1 Feb 2024 16:00:56 +
> > > Peter Maydell  wrote:
> > >
> > >> On Thu, 1 Feb 2024 at 15:17, Alex Bennée  wrote:
> > >> >
> > >> > Peter Maydell  writes:
> > >> > > So, that looks like:
> > >> > >  * we call cpu_tb_exec(), which executes some generated code
> > >> > >  * that generated code calls the lookup_tb_ptr helper to see
> > >> > >if we have a generated TB already for the address we're going
> > >> > >to execute next
> > >> > >  * lookup_tb_ptr probes the TLB to see if we know the host RAM
> > >> > >address for the guest address
> > >> > >  * this results in a TLB walk for an instruction fetch
> > >> > >  * the page table descriptor load is to IO memory
> > >> > >  * io_prepare assumes it needs to do a TLB recompile, because
> > >> > >can_do_io is clear
> > >> > >
> > >> > > I am not surprised that the corner case of "the guest put its
> > >> > > page tables in an MMIO device" has not yet come up :-)
> > >> > >
> > >> > > I'm really not sure how the icount handling should interact
> > >> > > with that...
> > >> >
> > >> > Its not just icount - we need to handle it for all modes now. That said
> > >> > seeing as we are at the end of a block shouldn't can_do_io be set?
> > >>
> > >> The lookup_tb_ptr helper gets called from tcg_gen_goto_tb(),
> > >> which happens earlier than the tb_stop callback (it can
> > >> happen in the trans function for branch etc insns, for
> > >> example).
> > >>
> > >> I think it should be OK to clear can_do_io at the start
> > >> of the lookup_tb_ptr helper, something like:
> > >> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> > >> index 977576ca143..7818537f318 100644
> > >> --- a/accel/tcg/cpu-exec.c
> > >> +++ b/accel/tcg/cpu-exec.c
> > >> @@ -396,6 +396,15 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env)
> > >>  uint64_t cs_base;
> > >>  uint32_t flags, cflags;
> > >>
> > >> +/*
> > >> + * By definition we've just finished a TB, so I/O is OK.
> > >> + * Avoid the possibility of calling cpu_io_recompile() if
> > >> + * a page table walk triggered by tb_lookup() calling
> > >> + * probe_access_internal() happens to touch an MMIO device.
> > >> + * The next TB, if we chain to it, will clear the flag again.
> > >> + */
> > >> +cpu->neg.can_do_io = true;
> > >> +
> > >>  cpu_get_tb_cpu_state(env, , _base, );
> > >>
> > >>  cflags = curr_cflags(cpu);
> > >>
> > >> -- PMM
> > >
> > > No joy.  Seems like a very similar backtrace.
> > >
> > > Thread 5 "qemu-system-x86" received signal SIGABRT, Aborted.
> > > [Switching to Thread 0x74efe6c0 (LWP 23937)]
> > > __pthread_kill_implementation (no_tid=0, signo=6, threadid= > > out>) at ./nptl/pthread_kill.c:44
> > > 44  ./nptl/pthread_kill.c: No such file or directory.
> > > (gdb) bt
> > > #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid= > > out>) at ./nptl/pthread_kill.c:44
> > > #1  __pthread_kill_internal (signo=6, threadid=) at 
> > > ./nptl/pthread_kill.c:78
> > > #2  __GI___pthread_kill (threadid=, signo=signo@entry=6) 
> > > at ./nptl/pthread_kill.c:89
> > > #3  0x777c43b6 in __GI_raise (sig=sig@entry=6) at 
> > > ../sysdeps/posix/raise.c:26
> > > #4  0x777aa87c in __GI_abort () at ./stdlib/abort.c:79
> > > #5  0x55c4d19e in cpu_abort (cpu=cpu@entry=0x578e0cb0, 
> > > fmt=fmt@entry=0x56048ee8 "cpu_io_recompile: could not find TB for 
> > > pc=%p") at ../../cpu-target.c:373
> > > #6  0x55c9cb25 in cpu_io_recompile (cpu=cpu@entry=0x578e0cb0, 
> > > retaddr=retaddr@entry=0) at ../../accel/tcg/translate-all.c:611
> > > #7  0x55c9f744 in io_prepare (retaddr=0, addr=19595790664, 
> > > attrs=..., xlat=, cpu=0x578e0cb0, 
> > > out_offset=) at ../../accel/tcg/cputlb.c:1339
> > > #8  do_ld_mmio_beN (cpu=0x578e0cb0, full=0x7ffe88012890, 
> > > ret_be=ret_be@entry=0, addr=19595790664, size=size@entry=8, mmu_idx=4, 
> > > type=MMU_DATA_LOAD, ra=0) at ../../accel/tcg/cputlb.c:2030
> > > #9  0x55ca0ecd in do_ld_8 (cpu=cpu@entry=0x578e0cb0, 
> > > p=p@entry=0x74efcdd0, mmu_idx=, 
> > > type=type@entry=MMU_DATA_LOAD, memop=, ra=ra@entry=0) at 
> > > ../../accel/tcg/cputlb.c:2356
> > > #10 0x55ca332f in do_ld8_mmu (cpu=cpu@entry=0x578e0cb0, 
> > > addr=addr@entry=19595790664, oi=oi@entry=52, ra=ra@entry=0, 
> > > access_type=access_type@entry=MMU_DATA_LOAD) at 
> > > ../../accel/tcg/cputlb.c:2439
> > > #11 0x55ca5e69 in cpu_ldq_mmu (ra=0, oi=52, addr=19595790664, 
> > > env=0x578e3470) at ../../accel/tcg/ldst_common.c.inc:169
> > > #12 cpu_ldq_le_mmuidx_ra (env=0x578e3470, addr=19595790664, 
> > > mmu_idx=, ra=ra@entry=0) at 
> > > ../../accel/tcg/ldst_common.c.inc:301
> > > #13 0x55b4b5de in ptw_ldq (in=0x74efcf10) at 
> > > 

Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-01 Thread Gregory Price
On Thu, Feb 01, 2024 at 04:45:30PM +, Alex Bennée wrote:
> Jonathan Cameron  writes:
> 
> > On Thu, 1 Feb 2024 16:00:56 +
> > Peter Maydell  wrote:
> >
> >> On Thu, 1 Feb 2024 at 15:17, Alex Bennée  wrote:
> >> >
> >> > Peter Maydell  writes:  
> >> > > So, that looks like:
> >> > >  * we call cpu_tb_exec(), which executes some generated code
> >> > >  * that generated code calls the lookup_tb_ptr helper to see
> >> > >if we have a generated TB already for the address we're going
> >> > >to execute next
> >> > >  * lookup_tb_ptr probes the TLB to see if we know the host RAM
> >> > >address for the guest address
> >> > >  * this results in a TLB walk for an instruction fetch
> >> > >  * the page table descriptor load is to IO memory
> >> > >  * io_prepare assumes it needs to do a TLB recompile, because
> >> > >can_do_io is clear
> >> > >
> >> > > I am not surprised that the corner case of "the guest put its
> >> > > page tables in an MMIO device" has not yet come up :-)
> >> > >
> >> > > I'm really not sure how the icount handling should interact
> >> > > with that...  
> >> >
> >> > Its not just icount - we need to handle it for all modes now. That said
> >> > seeing as we are at the end of a block shouldn't can_do_io be set?  
> >> 
> >> The lookup_tb_ptr helper gets called from tcg_gen_goto_tb(),
> >> which happens earlier than the tb_stop callback (it can
> >> happen in the trans function for branch etc insns, for
> >> example).
> >> 
> >> I think it should be OK to clear can_do_io at the start
> >> of the lookup_tb_ptr helper, something like:
> >> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> >> index 977576ca143..7818537f318 100644
> >> --- a/accel/tcg/cpu-exec.c
> >> +++ b/accel/tcg/cpu-exec.c
> >> @@ -396,6 +396,15 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env)
> >>  uint64_t cs_base;
> >>  uint32_t flags, cflags;
> >> 
> >> +/*
> >> + * By definition we've just finished a TB, so I/O is OK.
> >> + * Avoid the possibility of calling cpu_io_recompile() if
> >> + * a page table walk triggered by tb_lookup() calling
> >> + * probe_access_internal() happens to touch an MMIO device.
> >> + * The next TB, if we chain to it, will clear the flag again.
> >> + */
> >> +cpu->neg.can_do_io = true;
> >> +
> >>  cpu_get_tb_cpu_state(env, , _base, );
> >> 
> >>  cflags = curr_cflags(cpu);
> >> 
> >> -- PMM
> >
> > No joy.  Seems like a very similar backtrace.
> >
> > Thread 5 "qemu-system-x86" received signal SIGABRT, Aborted.
> > [Switching to Thread 0x74efe6c0 (LWP 23937)]
> > __pthread_kill_implementation (no_tid=0, signo=6, threadid=) 
> > at ./nptl/pthread_kill.c:44
> > 44  ./nptl/pthread_kill.c: No such file or directory.
> > (gdb) bt
> > #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid= > out>) at ./nptl/pthread_kill.c:44
> > #1  __pthread_kill_internal (signo=6, threadid=) at 
> > ./nptl/pthread_kill.c:78
> > #2  __GI___pthread_kill (threadid=, signo=signo@entry=6) at 
> > ./nptl/pthread_kill.c:89
> > #3  0x777c43b6 in __GI_raise (sig=sig@entry=6) at 
> > ../sysdeps/posix/raise.c:26
> > #4  0x777aa87c in __GI_abort () at ./stdlib/abort.c:79
> > #5  0x55c4d19e in cpu_abort (cpu=cpu@entry=0x578e0cb0, 
> > fmt=fmt@entry=0x56048ee8 "cpu_io_recompile: could not find TB for 
> > pc=%p") at ../../cpu-target.c:373
> > #6  0x55c9cb25 in cpu_io_recompile (cpu=cpu@entry=0x578e0cb0, 
> > retaddr=retaddr@entry=0) at ../../accel/tcg/translate-all.c:611
> > #7  0x55c9f744 in io_prepare (retaddr=0, addr=19595790664, 
> > attrs=..., xlat=, cpu=0x578e0cb0, out_offset= > pointer>) at ../../accel/tcg/cputlb.c:1339
> > #8  do_ld_mmio_beN (cpu=0x578e0cb0, full=0x7ffe88012890, 
> > ret_be=ret_be@entry=0, addr=19595790664, size=size@entry=8, mmu_idx=4, 
> > type=MMU_DATA_LOAD, ra=0) at ../../accel/tcg/cputlb.c:2030
> > #9  0x55ca0ecd in do_ld_8 (cpu=cpu@entry=0x578e0cb0, 
> > p=p@entry=0x74efcdd0, mmu_idx=, 
> > type=type@entry=MMU_DATA_LOAD, memop=, ra=ra@entry=0) at 
> > ../../accel/tcg/cputlb.c:2356
> > #10 0x55ca332f in do_ld8_mmu (cpu=cpu@entry=0x578e0cb0, 
> > addr=addr@entry=19595790664, oi=oi@entry=52, ra=ra@entry=0, 
> > access_type=access_type@entry=MMU_DATA_LOAD) at 
> > ../../accel/tcg/cputlb.c:2439
> > #11 0x55ca5e69 in cpu_ldq_mmu (ra=0, oi=52, addr=19595790664, 
> > env=0x578e3470) at ../../accel/tcg/ldst_common.c.inc:169
> > #12 cpu_ldq_le_mmuidx_ra (env=0x578e3470, addr=19595790664, 
> > mmu_idx=, ra=ra@entry=0) at 
> > ../../accel/tcg/ldst_common.c.inc:301
> > #13 0x55b4b5de in ptw_ldq (in=0x74efcf10) at 
> > ../../target/i386/tcg/sysemu/excp_helper.c:98
> > #14 ptw_ldq (in=0x74efcf10) at 
> > ../../target/i386/tcg/sysemu/excp_helper.c:93
> > #15 mmu_translate (env=env@entry=0x578e3470, in=0x74efcfd0, 
> > out=0x74efcfa0, err=err@entry=0x74efcfb0) at 
> > 

Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-01 Thread Jonathan Cameron via
On Thu, 01 Feb 2024 16:45:30 +
Alex Bennée  wrote:

> Jonathan Cameron  writes:
> 
> > On Thu, 1 Feb 2024 16:00:56 +
> > Peter Maydell  wrote:
> >  
> >> On Thu, 1 Feb 2024 at 15:17, Alex Bennée  wrote:  
> >> >
> >> > Peter Maydell  writes:
> >> > > So, that looks like:
> >> > >  * we call cpu_tb_exec(), which executes some generated code
> >> > >  * that generated code calls the lookup_tb_ptr helper to see
> >> > >if we have a generated TB already for the address we're going
> >> > >to execute next
> >> > >  * lookup_tb_ptr probes the TLB to see if we know the host RAM
> >> > >address for the guest address
> >> > >  * this results in a TLB walk for an instruction fetch
> >> > >  * the page table descriptor load is to IO memory
> >> > >  * io_prepare assumes it needs to do a TLB recompile, because
> >> > >can_do_io is clear
> >> > >
> >> > > I am not surprised that the corner case of "the guest put its
> >> > > page tables in an MMIO device" has not yet come up :-)
> >> > >
> >> > > I'm really not sure how the icount handling should interact
> >> > > with that...
> >> >
> >> > Its not just icount - we need to handle it for all modes now. That said
> >> > seeing as we are at the end of a block shouldn't can_do_io be set?
> >> 
> >> The lookup_tb_ptr helper gets called from tcg_gen_goto_tb(),
> >> which happens earlier than the tb_stop callback (it can
> >> happen in the trans function for branch etc insns, for
> >> example).
> >> 
> >> I think it should be OK to clear can_do_io at the start
> >> of the lookup_tb_ptr helper, something like:
> >> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> >> index 977576ca143..7818537f318 100644
> >> --- a/accel/tcg/cpu-exec.c
> >> +++ b/accel/tcg/cpu-exec.c
> >> @@ -396,6 +396,15 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env)
> >>  uint64_t cs_base;
> >>  uint32_t flags, cflags;
> >> 
> >> +/*
> >> + * By definition we've just finished a TB, so I/O is OK.
> >> + * Avoid the possibility of calling cpu_io_recompile() if
> >> + * a page table walk triggered by tb_lookup() calling
> >> + * probe_access_internal() happens to touch an MMIO device.
> >> + * The next TB, if we chain to it, will clear the flag again.
> >> + */
> >> +cpu->neg.can_do_io = true;
> >> +
> >>  cpu_get_tb_cpu_state(env, , _base, );
> >> 
> >>  cflags = curr_cflags(cpu);
> >> 
> >> -- PMM  
> >
> > No joy.  Seems like a very similar backtrace.
> >
> > Thread 5 "qemu-system-x86" received signal SIGABRT, Aborted.
> > [Switching to Thread 0x74efe6c0 (LWP 23937)]
> > __pthread_kill_implementation (no_tid=0, signo=6, threadid=) 
> > at ./nptl/pthread_kill.c:44
> > 44  ./nptl/pthread_kill.c: No such file or directory.
> > (gdb) bt
> > #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid= > out>) at ./nptl/pthread_kill.c:44
> > #1  __pthread_kill_internal (signo=6, threadid=) at 
> > ./nptl/pthread_kill.c:78
> > #2  __GI___pthread_kill (threadid=, signo=signo@entry=6) at 
> > ./nptl/pthread_kill.c:89
> > #3  0x777c43b6 in __GI_raise (sig=sig@entry=6) at 
> > ../sysdeps/posix/raise.c:26
> > #4  0x777aa87c in __GI_abort () at ./stdlib/abort.c:79
> > #5  0x55c4d19e in cpu_abort (cpu=cpu@entry=0x578e0cb0, 
> > fmt=fmt@entry=0x56048ee8 "cpu_io_recompile: could not find TB for 
> > pc=%p") at ../../cpu-target.c:373
> > #6  0x55c9cb25 in cpu_io_recompile (cpu=cpu@entry=0x578e0cb0, 
> > retaddr=retaddr@entry=0) at ../../accel/tcg/translate-all.c:611
> > #7  0x55c9f744 in io_prepare (retaddr=0, addr=19595790664, 
> > attrs=..., xlat=, cpu=0x578e0cb0, out_offset= > pointer>) at ../../accel/tcg/cputlb.c:1339
> > #8  do_ld_mmio_beN (cpu=0x578e0cb0, full=0x7ffe88012890, 
> > ret_be=ret_be@entry=0, addr=19595790664, size=size@entry=8, mmu_idx=4, 
> > type=MMU_DATA_LOAD, ra=0) at ../../accel/tcg/cputlb.c:2030
> > #9  0x55ca0ecd in do_ld_8 (cpu=cpu@entry=0x578e0cb0, 
> > p=p@entry=0x74efcdd0, mmu_idx=, 
> > type=type@entry=MMU_DATA_LOAD, memop=, ra=ra@entry=0) at 
> > ../../accel/tcg/cputlb.c:2356
> > #10 0x55ca332f in do_ld8_mmu (cpu=cpu@entry=0x578e0cb0, 
> > addr=addr@entry=19595790664, oi=oi@entry=52, ra=ra@entry=0, 
> > access_type=access_type@entry=MMU_DATA_LOAD) at 
> > ../../accel/tcg/cputlb.c:2439
> > #11 0x55ca5e69 in cpu_ldq_mmu (ra=0, oi=52, addr=19595790664, 
> > env=0x578e3470) at ../../accel/tcg/ldst_common.c.inc:169
> > #12 cpu_ldq_le_mmuidx_ra (env=0x578e3470, addr=19595790664, 
> > mmu_idx=, ra=ra@entry=0) at 
> > ../../accel/tcg/ldst_common.c.inc:301
> > #13 0x55b4b5de in ptw_ldq (in=0x74efcf10) at 
> > ../../target/i386/tcg/sysemu/excp_helper.c:98
> > #14 ptw_ldq (in=0x74efcf10) at 
> > ../../target/i386/tcg/sysemu/excp_helper.c:93
> > #15 mmu_translate (env=env@entry=0x578e3470, in=0x74efcfd0, 
> > out=0x74efcfa0, err=err@entry=0x74efcfb0) at 
> > 

Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-01 Thread Peter Maydell
On Thu, 1 Feb 2024 at 17:04, Gregory Price  wrote:
>
> On Thu, Feb 01, 2024 at 04:45:30PM +, Alex Bennée wrote:

> > No thats different - we are actually writing to the MMIO region here.
> > But the fact we hit cpu_abort because we can't find the TB we are
> > executing is a little problematic.
> >
> > Does ra properly point to the code buffer here?
> >
>
> What if the code block is ALSO in CXL (MMIO)? :D

In that case the TB is supposed to be a single insn,
so the insn will by definition be the last one in its
TB, and IO should be OK for it -- so can_do_io ought
to be true and we shouldn't get into the io_recompile.

-- PMM



Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-01 Thread Alex Bennée
Jonathan Cameron  writes:

> On Thu, 1 Feb 2024 16:00:56 +
> Peter Maydell  wrote:
>
>> On Thu, 1 Feb 2024 at 15:17, Alex Bennée  wrote:
>> >
>> > Peter Maydell  writes:  
>> > > So, that looks like:
>> > >  * we call cpu_tb_exec(), which executes some generated code
>> > >  * that generated code calls the lookup_tb_ptr helper to see
>> > >if we have a generated TB already for the address we're going
>> > >to execute next
>> > >  * lookup_tb_ptr probes the TLB to see if we know the host RAM
>> > >address for the guest address
>> > >  * this results in a TLB walk for an instruction fetch
>> > >  * the page table descriptor load is to IO memory
>> > >  * io_prepare assumes it needs to do a TLB recompile, because
>> > >can_do_io is clear
>> > >
>> > > I am not surprised that the corner case of "the guest put its
>> > > page tables in an MMIO device" has not yet come up :-)
>> > >
>> > > I'm really not sure how the icount handling should interact
>> > > with that...  
>> >
>> > Its not just icount - we need to handle it for all modes now. That said
>> > seeing as we are at the end of a block shouldn't can_do_io be set?  
>> 
>> The lookup_tb_ptr helper gets called from tcg_gen_goto_tb(),
>> which happens earlier than the tb_stop callback (it can
>> happen in the trans function for branch etc insns, for
>> example).
>> 
>> I think it should be OK to clear can_do_io at the start
>> of the lookup_tb_ptr helper, something like:
>> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
>> index 977576ca143..7818537f318 100644
>> --- a/accel/tcg/cpu-exec.c
>> +++ b/accel/tcg/cpu-exec.c
>> @@ -396,6 +396,15 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env)
>>  uint64_t cs_base;
>>  uint32_t flags, cflags;
>> 
>> +/*
>> + * By definition we've just finished a TB, so I/O is OK.
>> + * Avoid the possibility of calling cpu_io_recompile() if
>> + * a page table walk triggered by tb_lookup() calling
>> + * probe_access_internal() happens to touch an MMIO device.
>> + * The next TB, if we chain to it, will clear the flag again.
>> + */
>> +cpu->neg.can_do_io = true;
>> +
>>  cpu_get_tb_cpu_state(env, , _base, );
>> 
>>  cflags = curr_cflags(cpu);
>> 
>> -- PMM
>
> No joy.  Seems like a very similar backtrace.
>
> Thread 5 "qemu-system-x86" received signal SIGABRT, Aborted.
> [Switching to Thread 0x74efe6c0 (LWP 23937)]
> __pthread_kill_implementation (no_tid=0, signo=6, threadid=) 
> at ./nptl/pthread_kill.c:44
> 44  ./nptl/pthread_kill.c: No such file or directory.
> (gdb) bt
> #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid= out>) at ./nptl/pthread_kill.c:44
> #1  __pthread_kill_internal (signo=6, threadid=) at 
> ./nptl/pthread_kill.c:78
> #2  __GI___pthread_kill (threadid=, signo=signo@entry=6) at 
> ./nptl/pthread_kill.c:89
> #3  0x777c43b6 in __GI_raise (sig=sig@entry=6) at 
> ../sysdeps/posix/raise.c:26
> #4  0x777aa87c in __GI_abort () at ./stdlib/abort.c:79
> #5  0x55c4d19e in cpu_abort (cpu=cpu@entry=0x578e0cb0, 
> fmt=fmt@entry=0x56048ee8 "cpu_io_recompile: could not find TB for pc=%p") 
> at ../../cpu-target.c:373
> #6  0x55c9cb25 in cpu_io_recompile (cpu=cpu@entry=0x578e0cb0, 
> retaddr=retaddr@entry=0) at ../../accel/tcg/translate-all.c:611
> #7  0x55c9f744 in io_prepare (retaddr=0, addr=19595790664, attrs=..., 
> xlat=, cpu=0x578e0cb0, out_offset=) at 
> ../../accel/tcg/cputlb.c:1339
> #8  do_ld_mmio_beN (cpu=0x578e0cb0, full=0x7ffe88012890, 
> ret_be=ret_be@entry=0, addr=19595790664, size=size@entry=8, mmu_idx=4, 
> type=MMU_DATA_LOAD, ra=0) at ../../accel/tcg/cputlb.c:2030
> #9  0x55ca0ecd in do_ld_8 (cpu=cpu@entry=0x578e0cb0, 
> p=p@entry=0x74efcdd0, mmu_idx=, 
> type=type@entry=MMU_DATA_LOAD, memop=, ra=ra@entry=0) at 
> ../../accel/tcg/cputlb.c:2356
> #10 0x55ca332f in do_ld8_mmu (cpu=cpu@entry=0x578e0cb0, 
> addr=addr@entry=19595790664, oi=oi@entry=52, ra=ra@entry=0, 
> access_type=access_type@entry=MMU_DATA_LOAD) at ../../accel/tcg/cputlb.c:2439
> #11 0x55ca5e69 in cpu_ldq_mmu (ra=0, oi=52, addr=19595790664, 
> env=0x578e3470) at ../../accel/tcg/ldst_common.c.inc:169
> #12 cpu_ldq_le_mmuidx_ra (env=0x578e3470, addr=19595790664, 
> mmu_idx=, ra=ra@entry=0) at 
> ../../accel/tcg/ldst_common.c.inc:301
> #13 0x55b4b5de in ptw_ldq (in=0x74efcf10) at 
> ../../target/i386/tcg/sysemu/excp_helper.c:98
> #14 ptw_ldq (in=0x74efcf10) at 
> ../../target/i386/tcg/sysemu/excp_helper.c:93
> #15 mmu_translate (env=env@entry=0x578e3470, in=0x74efcfd0, 
> out=0x74efcfa0, err=err@entry=0x74efcfb0) at 
> ../../target/i386/tcg/sysemu/excp_helper.c:173
> #16 0x55b4c3f3 in get_physical_address (err=0x74efcfb0, 
> out=0x74efcfa0, mmu_idx=0, access_type=MMU_DATA_STORE, 
> addr=18386491786698339392, env=0x578e3470) at 
> ../../target/i386/tcg/sysemu/excp_helper.c:578
> #17 

Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-01 Thread Jonathan Cameron via
On Thu, 1 Feb 2024 16:00:56 +
Peter Maydell  wrote:

> On Thu, 1 Feb 2024 at 15:17, Alex Bennée  wrote:
> >
> > Peter Maydell  writes:  
> > > So, that looks like:
> > >  * we call cpu_tb_exec(), which executes some generated code
> > >  * that generated code calls the lookup_tb_ptr helper to see
> > >if we have a generated TB already for the address we're going
> > >to execute next
> > >  * lookup_tb_ptr probes the TLB to see if we know the host RAM
> > >address for the guest address
> > >  * this results in a TLB walk for an instruction fetch
> > >  * the page table descriptor load is to IO memory
> > >  * io_prepare assumes it needs to do a TLB recompile, because
> > >can_do_io is clear
> > >
> > > I am not surprised that the corner case of "the guest put its
> > > page tables in an MMIO device" has not yet come up :-)
> > >
> > > I'm really not sure how the icount handling should interact
> > > with that...  
> >
> > Its not just icount - we need to handle it for all modes now. That said
> > seeing as we are at the end of a block shouldn't can_do_io be set?  
> 
> The lookup_tb_ptr helper gets called from tcg_gen_goto_tb(),
> which happens earlier than the tb_stop callback (it can
> happen in the trans function for branch etc insns, for
> example).
> 
> I think it should be OK to clear can_do_io at the start
> of the lookup_tb_ptr helper, something like:
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index 977576ca143..7818537f318 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -396,6 +396,15 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env)
>  uint64_t cs_base;
>  uint32_t flags, cflags;
> 
> +/*
> + * By definition we've just finished a TB, so I/O is OK.
> + * Avoid the possibility of calling cpu_io_recompile() if
> + * a page table walk triggered by tb_lookup() calling
> + * probe_access_internal() happens to touch an MMIO device.
> + * The next TB, if we chain to it, will clear the flag again.
> + */
> +cpu->neg.can_do_io = true;
> +
>  cpu_get_tb_cpu_state(env, , _base, );
> 
>  cflags = curr_cflags(cpu);
> 
> -- PMM

No joy.  Seems like a very similar backtrace.

Thread 5 "qemu-system-x86" received signal SIGABRT, Aborted.
[Switching to Thread 0x74efe6c0 (LWP 23937)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=) at 
./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=) 
at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=) at 
./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=, signo=signo@entry=6) at 
./nptl/pthread_kill.c:89
#3  0x777c43b6 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/posix/raise.c:26
#4  0x777aa87c in __GI_abort () at ./stdlib/abort.c:79
#5  0x55c4d19e in cpu_abort (cpu=cpu@entry=0x578e0cb0, 
fmt=fmt@entry=0x56048ee8 "cpu_io_recompile: could not find TB for pc=%p") 
at ../../cpu-target.c:373
#6  0x55c9cb25 in cpu_io_recompile (cpu=cpu@entry=0x578e0cb0, 
retaddr=retaddr@entry=0) at ../../accel/tcg/translate-all.c:611
#7  0x55c9f744 in io_prepare (retaddr=0, addr=19595790664, attrs=..., 
xlat=, cpu=0x578e0cb0, out_offset=) at 
../../accel/tcg/cputlb.c:1339
#8  do_ld_mmio_beN (cpu=0x578e0cb0, full=0x7ffe88012890, 
ret_be=ret_be@entry=0, addr=19595790664, size=size@entry=8, mmu_idx=4, 
type=MMU_DATA_LOAD, ra=0) at ../../accel/tcg/cputlb.c:2030
#9  0x55ca0ecd in do_ld_8 (cpu=cpu@entry=0x578e0cb0, 
p=p@entry=0x74efcdd0, mmu_idx=, 
type=type@entry=MMU_DATA_LOAD, memop=, ra=ra@entry=0) at 
../../accel/tcg/cputlb.c:2356
#10 0x55ca332f in do_ld8_mmu (cpu=cpu@entry=0x578e0cb0, 
addr=addr@entry=19595790664, oi=oi@entry=52, ra=ra@entry=0, 
access_type=access_type@entry=MMU_DATA_LOAD) at ../../accel/tcg/cputlb.c:2439
#11 0x55ca5e69 in cpu_ldq_mmu (ra=0, oi=52, addr=19595790664, 
env=0x578e3470) at ../../accel/tcg/ldst_common.c.inc:169
#12 cpu_ldq_le_mmuidx_ra (env=0x578e3470, addr=19595790664, 
mmu_idx=, ra=ra@entry=0) at ../../accel/tcg/ldst_common.c.inc:301
#13 0x55b4b5de in ptw_ldq (in=0x74efcf10) at 
../../target/i386/tcg/sysemu/excp_helper.c:98
#14 ptw_ldq (in=0x74efcf10) at ../../target/i386/tcg/sysemu/excp_helper.c:93
#15 mmu_translate (env=env@entry=0x578e3470, in=0x74efcfd0, 
out=0x74efcfa0, err=err@entry=0x74efcfb0) at 
../../target/i386/tcg/sysemu/excp_helper.c:173
#16 0x55b4c3f3 in get_physical_address (err=0x74efcfb0, 
out=0x74efcfa0, mmu_idx=0, access_type=MMU_DATA_STORE, 
addr=18386491786698339392, env=0x578e3470) at 
../../target/i386/tcg/sysemu/excp_helper.c:578
#17 x86_cpu_tlb_fill (cs=0x578e0cb0, addr=18386491786698339392, 
size=, access_type=MMU_DATA_STORE, mmu_idx=0, probe=, retaddr=140736029817822) at ../../target/i386/tcg/sysemu/excp_helper.c:604
#18 

Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-01 Thread Peter Maydell
On Thu, 1 Feb 2024 at 15:17, Alex Bennée  wrote:
>
> Peter Maydell  writes:
> > So, that looks like:
> >  * we call cpu_tb_exec(), which executes some generated code
> >  * that generated code calls the lookup_tb_ptr helper to see
> >if we have a generated TB already for the address we're going
> >to execute next
> >  * lookup_tb_ptr probes the TLB to see if we know the host RAM
> >address for the guest address
> >  * this results in a TLB walk for an instruction fetch
> >  * the page table descriptor load is to IO memory
> >  * io_prepare assumes it needs to do a TLB recompile, because
> >can_do_io is clear
> >
> > I am not surprised that the corner case of "the guest put its
> > page tables in an MMIO device" has not yet come up :-)
> >
> > I'm really not sure how the icount handling should interact
> > with that...
>
> Its not just icount - we need to handle it for all modes now. That said
> seeing as we are at the end of a block shouldn't can_do_io be set?

The lookup_tb_ptr helper gets called from tcg_gen_goto_tb(),
which happens earlier than the tb_stop callback (it can
happen in the trans function for branch etc insns, for
example).

I think it should be OK to clear can_do_io at the start
of the lookup_tb_ptr helper, something like:
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 977576ca143..7818537f318 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -396,6 +396,15 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env)
 uint64_t cs_base;
 uint32_t flags, cflags;

+/*
+ * By definition we've just finished a TB, so I/O is OK.
+ * Avoid the possibility of calling cpu_io_recompile() if
+ * a page table walk triggered by tb_lookup() calling
+ * probe_access_internal() happens to touch an MMIO device.
+ * The next TB, if we chain to it, will clear the flag again.
+ */
+cpu->neg.can_do_io = true;
+
 cpu_get_tb_cpu_state(env, , _base, );

 cflags = curr_cflags(cpu);

-- PMM



Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-01 Thread Jonathan Cameron via
On Thu, 01 Feb 2024 15:17:53 +
Alex Bennée  wrote:

> Peter Maydell  writes:
> 
> > On Thu, 1 Feb 2024 at 14:01, Jonathan Cameron
> >  wrote:  
> >> > Can you run QEMU under gdb and give the backtrace when it stops
> >> > on the abort() ? That will probably have a helpful clue. I
> >> > suspect something is failing to pass a valid retaddr in
> >> > when it calls a load/store function.  
> >  
> >> [Switching to Thread 0x756ff6c0 (LWP 21916)]
> >> __pthread_kill_implementation (no_tid=0, signo=6, threadid= >> out>) at ./nptl/pthread_kill.c:44
> >> 44  ./nptl/pthread_kill.c: No such file or directory.
> >> (gdb) bt
> >> #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid= >> out>) at ./nptl/pthread_kill.c:44
> >> #1  __pthread_kill_internal (signo=6, threadid=) at 
> >> ./nptl/pthread_kill.c:78
> >> #2  __GI___pthread_kill (threadid=, signo=signo@entry=6) at 
> >> ./nptl/pthread_kill.c:89
> >> #3  0x777c43b6 in __GI_raise (sig=sig@entry=6) at 
> >> ../sysdeps/posix/raise.c:26
> >> #4  0x777aa87c in __GI_abort () at ./stdlib/abort.c:79
> >> #5  0x55c0d4ce in cpu_abort
> >> (cpu=cpu@entry=0x56fd9000, fmt=fmt@entry=0x55fe3378 
> >> "cpu_io_recompile: could not find TB for pc=%p")
> >> at ../../cpu-target.c:359
> >> #6  0x55c59435 in cpu_io_recompile (cpu=cpu@entry=0x56fd9000, 
> >> retaddr=retaddr@entry=0) at ../../accel/tcg/translate-all.c:611
> >> #7  0x55c5c956 in io_prepare
> >> (retaddr=0, addr=19595792376, attrs=..., xlat=, 
> >> cpu=0x56fd9000, out_offset=)
> >> at ../../accel/tcg/cputlb.c:1339  
> 
> >> #21 tb_htable_lookup (cpu=, 
> >> pc=pc@entry=18446744072116178925, cs_base=0, flags=415285936, 
> >> cflags=4278353920)
> >> at ../../accel/tcg/cpu-exec.c:231
> >> #22 0x55c50c08 in tb_lookup
> >> (cpu=cpu@entry=0x56fd9000, pc=pc@entry=18446744072116178925, 
> >> cs_base=cs_base@entry=0, flags=, cflags=) at 
> >> ../../accel/tcg/cpu-exec.c:267
> >> #23 0x55c51e23 in helper_lookup_tb_ptr (env=0x56fdb7c0) at 
> >> ../../accel/tcg/cpu-exec.c:423
> >> #24 0x7fffa9076ead in code_gen_buffer ()
> >> #25 0x55c50fab in cpu_tb_exec (cpu=cpu@entry=0x56fd9000, 
> >> itb=, tb_exit=tb_exit@entry=0x756fe708)
> >> at ../../accel/tcg/cpu-exec.c:458
> >> #26 0x55c51492 in cpu_loop_exec_tb
> >> (tb_exit=0x756fe708, last_tb=, 
> >> pc=18446744072116179169, tb=, cpu=0x56fd9000)
> >> at ../../accel/tcg/cpu-exec.c:920
> >> #27 cpu_exec_loop (cpu=cpu@entry=0x56fd9000, 
> >> sc=sc@entry=0x756fe7a0) at ../../accel/tcg/cpu-exec.c:1041
> >> #28 0x55c51d11 in cpu_exec_setjmp (cpu=cpu@entry=0x56fd9000, 
> >> sc=sc@entry=0x756fe7a0) at ../../accel/tcg/cpu-exec.c:1058
> >> #29 0x55c523b4 in cpu_exec (cpu=cpu@entry=0x56fd9000) at 
> >> ../../accel/tcg/cpu-exec.c:1084
> >> #30 0x55c74053 in tcg_cpus_exec (cpu=cpu@entry=0x56fd9000) at 
> >> ../../accel/tcg/tcg-accel-ops.c:76
> >> #31 0x55c741a0 in mttcg_cpu_thread_fn 
> >> (arg=arg@entry=0x56fd9000) at ../../accel/tcg/tcg-accel-ops-mttcg.c:95
> >> #32 0x55dfb580 in qemu_thread_start (args=0x5703c3e0) at 
> >> ../../util/qemu-thread-posix.c:541
> >> #33 0x778176ba in start_thread (arg=) at 
> >> ./nptl/pthread_create.c:444
> >> #34 0x778a60d0 in clone3 () at 
> >> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81  
> >
> > So, that looks like:
> >  * we call cpu_tb_exec(), which executes some generated code
> >  * that generated code calls the lookup_tb_ptr helper to see
> >if we have a generated TB already for the address we're going
> >to execute next
> >  * lookup_tb_ptr probes the TLB to see if we know the host RAM
> >address for the guest address
> >  * this results in a TLB walk for an instruction fetch
> >  * the page table descriptor load is to IO memory
> >  * io_prepare assumes it needs to do a TLB recompile, because
> >can_do_io is clear
> >
> > I am not surprised that the corner case of "the guest put its
> > page tables in an MMIO device" has not yet come up :-)
> >
> > I'm really not sure how the icount handling should interact
> > with that...  
> 
> Its not just icount - we need to handle it for all modes now. That said
> seeing as we are at the end of a block shouldn't can_do_io be set?
> 
> Does:
> 
> modified   accel/tcg/translator.c
> @@ -201,6 +201,8 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, 
> int *max_insns,
>  }
>  }
>  
> +set_can_do_io(db, true);
> +
>  /* Emit code to exit the TB, as indicated by db->is_jmp.  */
>  ops->tb_stop(db, cpu);
>  gen_tb_end(tb, cflags, icount_start_insn, db->num_insns);
> 
> do the trick?

no :(

> 
> >
> > -- PMM  
> 




Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-01 Thread Alex Bennée
Peter Maydell  writes:

> On Thu, 1 Feb 2024 at 14:01, Jonathan Cameron
>  wrote:
>> > Can you run QEMU under gdb and give the backtrace when it stops
>> > on the abort() ? That will probably have a helpful clue. I
>> > suspect something is failing to pass a valid retaddr in
>> > when it calls a load/store function.
>
>> [Switching to Thread 0x756ff6c0 (LWP 21916)]
>> __pthread_kill_implementation (no_tid=0, signo=6, threadid=) 
>> at ./nptl/pthread_kill.c:44
>> 44  ./nptl/pthread_kill.c: No such file or directory.
>> (gdb) bt
>> #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=> out>) at ./nptl/pthread_kill.c:44
>> #1  __pthread_kill_internal (signo=6, threadid=) at 
>> ./nptl/pthread_kill.c:78
>> #2  __GI___pthread_kill (threadid=, signo=signo@entry=6) at 
>> ./nptl/pthread_kill.c:89
>> #3  0x777c43b6 in __GI_raise (sig=sig@entry=6) at 
>> ../sysdeps/posix/raise.c:26
>> #4  0x777aa87c in __GI_abort () at ./stdlib/abort.c:79
>> #5  0x55c0d4ce in cpu_abort
>> (cpu=cpu@entry=0x56fd9000, fmt=fmt@entry=0x55fe3378 
>> "cpu_io_recompile: could not find TB for pc=%p")
>> at ../../cpu-target.c:359
>> #6  0x55c59435 in cpu_io_recompile (cpu=cpu@entry=0x56fd9000, 
>> retaddr=retaddr@entry=0) at ../../accel/tcg/translate-all.c:611
>> #7  0x55c5c956 in io_prepare
>> (retaddr=0, addr=19595792376, attrs=..., xlat=, 
>> cpu=0x56fd9000, out_offset=)
>> at ../../accel/tcg/cputlb.c:1339

>> #21 tb_htable_lookup (cpu=, pc=pc@entry=18446744072116178925, 
>> cs_base=0, flags=415285936, cflags=4278353920)
>> at ../../accel/tcg/cpu-exec.c:231
>> #22 0x55c50c08 in tb_lookup
>> (cpu=cpu@entry=0x56fd9000, pc=pc@entry=18446744072116178925, 
>> cs_base=cs_base@entry=0, flags=, cflags=) at 
>> ../../accel/tcg/cpu-exec.c:267
>> #23 0x55c51e23 in helper_lookup_tb_ptr (env=0x56fdb7c0) at 
>> ../../accel/tcg/cpu-exec.c:423
>> #24 0x7fffa9076ead in code_gen_buffer ()
>> #25 0x55c50fab in cpu_tb_exec (cpu=cpu@entry=0x56fd9000, 
>> itb=, tb_exit=tb_exit@entry=0x756fe708)
>> at ../../accel/tcg/cpu-exec.c:458
>> #26 0x55c51492 in cpu_loop_exec_tb
>> (tb_exit=0x756fe708, last_tb=, 
>> pc=18446744072116179169, tb=, cpu=0x56fd9000)
>> at ../../accel/tcg/cpu-exec.c:920
>> #27 cpu_exec_loop (cpu=cpu@entry=0x56fd9000, sc=sc@entry=0x756fe7a0) 
>> at ../../accel/tcg/cpu-exec.c:1041
>> #28 0x55c51d11 in cpu_exec_setjmp (cpu=cpu@entry=0x56fd9000, 
>> sc=sc@entry=0x756fe7a0) at ../../accel/tcg/cpu-exec.c:1058
>> #29 0x55c523b4 in cpu_exec (cpu=cpu@entry=0x56fd9000) at 
>> ../../accel/tcg/cpu-exec.c:1084
>> #30 0x55c74053 in tcg_cpus_exec (cpu=cpu@entry=0x56fd9000) at 
>> ../../accel/tcg/tcg-accel-ops.c:76
>> #31 0x55c741a0 in mttcg_cpu_thread_fn (arg=arg@entry=0x56fd9000) 
>> at ../../accel/tcg/tcg-accel-ops-mttcg.c:95
>> #32 0x55dfb580 in qemu_thread_start (args=0x5703c3e0) at 
>> ../../util/qemu-thread-posix.c:541
>> #33 0x778176ba in start_thread (arg=) at 
>> ./nptl/pthread_create.c:444
>> #34 0x778a60d0 in clone3 () at 
>> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
>
> So, that looks like:
>  * we call cpu_tb_exec(), which executes some generated code
>  * that generated code calls the lookup_tb_ptr helper to see
>if we have a generated TB already for the address we're going
>to execute next
>  * lookup_tb_ptr probes the TLB to see if we know the host RAM
>address for the guest address
>  * this results in a TLB walk for an instruction fetch
>  * the page table descriptor load is to IO memory
>  * io_prepare assumes it needs to do a TLB recompile, because
>can_do_io is clear
>
> I am not surprised that the corner case of "the guest put its
> page tables in an MMIO device" has not yet come up :-)
>
> I'm really not sure how the icount handling should interact
> with that...

Its not just icount - we need to handle it for all modes now. That said
seeing as we are at the end of a block shouldn't can_do_io be set?

Does:

modified   accel/tcg/translator.c
@@ -201,6 +201,8 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, 
int *max_insns,
 }
 }
 
+set_can_do_io(db, true);
+
 /* Emit code to exit the TB, as indicated by db->is_jmp.  */
 ops->tb_stop(db, cpu);
 gen_tb_end(tb, cflags, icount_start_insn, db->num_insns);

do the trick?

>
> -- PMM

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-01 Thread Peter Maydell
On Thu, 1 Feb 2024 at 14:01, Jonathan Cameron
 wrote:
> > Can you run QEMU under gdb and give the backtrace when it stops
> > on the abort() ? That will probably have a helpful clue. I
> > suspect something is failing to pass a valid retaddr in
> > when it calls a load/store function.

> [Switching to Thread 0x756ff6c0 (LWP 21916)]
> __pthread_kill_implementation (no_tid=0, signo=6, threadid=) 
> at ./nptl/pthread_kill.c:44
> 44  ./nptl/pthread_kill.c: No such file or directory.
> (gdb) bt
> #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid= out>) at ./nptl/pthread_kill.c:44
> #1  __pthread_kill_internal (signo=6, threadid=) at 
> ./nptl/pthread_kill.c:78
> #2  __GI___pthread_kill (threadid=, signo=signo@entry=6) at 
> ./nptl/pthread_kill.c:89
> #3  0x777c43b6 in __GI_raise (sig=sig@entry=6) at 
> ../sysdeps/posix/raise.c:26
> #4  0x777aa87c in __GI_abort () at ./stdlib/abort.c:79
> #5  0x55c0d4ce in cpu_abort
> (cpu=cpu@entry=0x56fd9000, fmt=fmt@entry=0x55fe3378 
> "cpu_io_recompile: could not find TB for pc=%p")
> at ../../cpu-target.c:359
> #6  0x55c59435 in cpu_io_recompile (cpu=cpu@entry=0x56fd9000, 
> retaddr=retaddr@entry=0) at ../../accel/tcg/translate-all.c:611
> #7  0x55c5c956 in io_prepare
> (retaddr=0, addr=19595792376, attrs=..., xlat=, 
> cpu=0x56fd9000, out_offset=)
> at ../../accel/tcg/cputlb.c:1339
> #8  do_ld_mmio_beN
> (cpu=0x56fd9000, full=0x7fffee0d96e0, ret_be=ret_be@entry=0, 
> addr=19595792376, size=size@entry=8, mmu_idx=4, type=MMU_DATA_LOAD, ra=0) at 
> ../../accel/tcg/cputlb.c:2030
> #9  0x55c5dfad in do_ld_8
> (cpu=cpu@entry=0x56fd9000, p=p@entry=0x756fddc0, 
> mmu_idx=, type=type@entry=MMU_DATA_LOAD, memop= out>, ra=ra@entry=0) at ../../accel/tcg/cputlb.c:2356
> #10 0x55c6026f in do_ld8_mmu
> (cpu=cpu@entry=0x56fd9000, addr=addr@entry=19595792376, 
> oi=oi@entry=52, ra=ra@entry=0, access_type=access_type@entry=MMU_DATA_LOAD) 
> at ../../accel/tcg/cputlb.c:2439
> #11 0x55c629f9 in cpu_ldq_mmu (ra=0, oi=52, addr=19595792376, 
> env=0x56fdb7c0) at ../../accel/tcg/ldst_common.c.inc:169
> #12 cpu_ldq_le_mmuidx_ra (env=0x56fdb7c0, addr=19595792376, 
> mmu_idx=, ra=ra@entry=0)
> at ../../accel/tcg/ldst_common.c.inc:301
> #13 0x55b18ede in ptw_ldq (in=0x756fdf00) at 
> ../../target/i386/tcg/sysemu/excp_helper.c:98
> #14 ptw_ldq (in=0x756fdf00) at 
> ../../target/i386/tcg/sysemu/excp_helper.c:93
> #15 mmu_translate (env=env@entry=0x56fdb7c0, in=in@entry=0x756fdfa0, 
> out=out@entry=0x756fdf70, err=err@entry=0x756fdf80)
> at ../../target/i386/tcg/sysemu/excp_helper.c:173
> #16 0x55b19c95 in get_physical_address
> (err=0x756fdf80, out=0x756fdf70, mmu_idx=0, 
> access_type=MMU_INST_FETCH, addr=18446744072116178925, env=0x56fdb7c0)
> at ../../target/i386/tcg/sysemu/excp_helper.c:578
> #17 x86_cpu_tlb_fill
> (cs=0x56fd9000, addr=18446744072116178925, size=, 
> access_type=MMU_INST_FETCH, mmu_idx=0, probe=, retaddr=0) at 
> ../../target/i386/tcg/sysemu/excp_helper.c:604
> #18 0x55c5dd2b in probe_access_internal
> (cpu=, addr=18446744072116178925, 
> fault_size=fault_size@entry=1, access_type=access_type@entry=MMU_INST_FETCH, 
> mmu_idx=0, nonfault=nonfault@entry=false, phost=0x756fe0d0, 
> pfull=0x756fe0c8, retaddr=0, check_mem_cbs=false)
> at ../../accel/tcg/cputlb.c:1432
> #19 0x55c61ff8 in get_page_addr_code_hostp (env=, 
> addr=addr@entry=18446744072116178925, hostp=hostp@entry=0x0)
> at ../../accel/tcg/cputlb.c:1603
> #20 0x55c50a2b in get_page_addr_code (addr=18446744072116178925, 
> env=)
> at /home/jic23/src/qemu/include/exec/exec-all.h:594
> #21 tb_htable_lookup (cpu=, pc=pc@entry=18446744072116178925, 
> cs_base=0, flags=415285936, cflags=4278353920)
> at ../../accel/tcg/cpu-exec.c:231
> #22 0x55c50c08 in tb_lookup
> (cpu=cpu@entry=0x56fd9000, pc=pc@entry=18446744072116178925, 
> cs_base=cs_base@entry=0, flags=, cflags=) at 
> ../../accel/tcg/cpu-exec.c:267
> #23 0x55c51e23 in helper_lookup_tb_ptr (env=0x56fdb7c0) at 
> ../../accel/tcg/cpu-exec.c:423
> #24 0x7fffa9076ead in code_gen_buffer ()
> #25 0x55c50fab in cpu_tb_exec (cpu=cpu@entry=0x56fd9000, 
> itb=, tb_exit=tb_exit@entry=0x756fe708)
> at ../../accel/tcg/cpu-exec.c:458
> #26 0x55c51492 in cpu_loop_exec_tb
> (tb_exit=0x756fe708, last_tb=, 
> pc=18446744072116179169, tb=, cpu=0x56fd9000)
> at ../../accel/tcg/cpu-exec.c:920
> #27 cpu_exec_loop (cpu=cpu@entry=0x56fd9000, sc=sc@entry=0x756fe7a0) 
> at ../../accel/tcg/cpu-exec.c:1041
> #28 0x55c51d11 in cpu_exec_setjmp (cpu=cpu@entry=0x56fd9000, 
> sc=sc@entry=0x756fe7a0) at ../../accel/tcg/cpu-exec.c:1058
> #29 0x55c523b4 in cpu_exec (cpu=cpu@entry=0x56fd9000) at 
> 

Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-01 Thread Jonathan Cameron via
On Thu, 1 Feb 2024 13:12:23 +
Peter Maydell  wrote:

> On Thu, 1 Feb 2024 at 13:04, Jonathan Cameron via  
> wrote:
> >  
> 
> 
> 
> 
> > root@localhost:~/devmem2# numactl --membind=1 touch a
> > qemu: fatal: cpu_io_recompile: could not find TB for pc=(nil)  
> 
> Can you run QEMU under gdb and give the backtrace when it stops
> on the abort() ? That will probably have a helpful clue. I
> suspect something is failing to pass a valid retaddr in
> when it calls a load/store function.
> 
> thanks
> -- PMM

[Switching to Thread 0x756ff6c0 (LWP 21916)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=) at 
./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=) 
at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=) at 
./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=, signo=signo@entry=6) at 
./nptl/pthread_kill.c:89
#3  0x777c43b6 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/posix/raise.c:26
#4  0x777aa87c in __GI_abort () at ./stdlib/abort.c:79
#5  0x55c0d4ce in cpu_abort
(cpu=cpu@entry=0x56fd9000, fmt=fmt@entry=0x55fe3378 
"cpu_io_recompile: could not find TB for pc=%p")
at ../../cpu-target.c:359
#6  0x55c59435 in cpu_io_recompile (cpu=cpu@entry=0x56fd9000, 
retaddr=retaddr@entry=0) at ../../accel/tcg/translate-all.c:611
#7  0x55c5c956 in io_prepare
(retaddr=0, addr=19595792376, attrs=..., xlat=, 
cpu=0x56fd9000, out_offset=)
at ../../accel/tcg/cputlb.c:1339
#8  do_ld_mmio_beN
(cpu=0x56fd9000, full=0x7fffee0d96e0, ret_be=ret_be@entry=0, 
addr=19595792376, size=size@entry=8, mmu_idx=4, type=MMU_DATA_LOAD, ra=0) at 
../../accel/tcg/cputlb.c:2030
#9  0x55c5dfad in do_ld_8
(cpu=cpu@entry=0x56fd9000, p=p@entry=0x756fddc0, mmu_idx=, type=type@entry=MMU_DATA_LOAD, memop=, ra=ra@entry=0) at 
../../accel/tcg/cputlb.c:2356
#10 0x55c6026f in do_ld8_mmu
(cpu=cpu@entry=0x56fd9000, addr=addr@entry=19595792376, oi=oi@entry=52, 
ra=ra@entry=0, access_type=access_type@entry=MMU_DATA_LOAD) at 
../../accel/tcg/cputlb.c:2439
#11 0x55c629f9 in cpu_ldq_mmu (ra=0, oi=52, addr=19595792376, 
env=0x56fdb7c0) at ../../accel/tcg/ldst_common.c.inc:169
#12 cpu_ldq_le_mmuidx_ra (env=0x56fdb7c0, addr=19595792376, 
mmu_idx=, ra=ra@entry=0)
at ../../accel/tcg/ldst_common.c.inc:301
#13 0x55b18ede in ptw_ldq (in=0x756fdf00) at 
../../target/i386/tcg/sysemu/excp_helper.c:98
#14 ptw_ldq (in=0x756fdf00) at ../../target/i386/tcg/sysemu/excp_helper.c:93
#15 mmu_translate (env=env@entry=0x56fdb7c0, in=in@entry=0x756fdfa0, 
out=out@entry=0x756fdf70, err=err@entry=0x756fdf80)
at ../../target/i386/tcg/sysemu/excp_helper.c:173
#16 0x55b19c95 in get_physical_address
(err=0x756fdf80, out=0x756fdf70, mmu_idx=0, 
access_type=MMU_INST_FETCH, addr=18446744072116178925, env=0x56fdb7c0)
at ../../target/i386/tcg/sysemu/excp_helper.c:578
#17 x86_cpu_tlb_fill
(cs=0x56fd9000, addr=18446744072116178925, size=, 
access_type=MMU_INST_FETCH, mmu_idx=0, probe=, retaddr=0) at 
../../target/i386/tcg/sysemu/excp_helper.c:604
#18 0x55c5dd2b in probe_access_internal
(cpu=, addr=18446744072116178925, 
fault_size=fault_size@entry=1, access_type=access_type@entry=MMU_INST_FETCH, 
mmu_idx=0, nonfault=nonfault@entry=false, phost=0x756fe0d0, 
pfull=0x756fe0c8, retaddr=0, check_mem_cbs=false)
at ../../accel/tcg/cputlb.c:1432
#19 0x55c61ff8 in get_page_addr_code_hostp (env=, 
addr=addr@entry=18446744072116178925, hostp=hostp@entry=0x0)
at ../../accel/tcg/cputlb.c:1603
#20 0x55c50a2b in get_page_addr_code (addr=18446744072116178925, 
env=)
at /home/jic23/src/qemu/include/exec/exec-all.h:594
#21 tb_htable_lookup (cpu=, pc=pc@entry=18446744072116178925, 
cs_base=0, flags=415285936, cflags=4278353920)
at ../../accel/tcg/cpu-exec.c:231
#22 0x55c50c08 in tb_lookup
(cpu=cpu@entry=0x56fd9000, pc=pc@entry=18446744072116178925, 
cs_base=cs_base@entry=0, flags=, cflags=) at 
../../accel/tcg/cpu-exec.c:267
#23 0x55c51e23 in helper_lookup_tb_ptr (env=0x56fdb7c0) at 
../../accel/tcg/cpu-exec.c:423
#24 0x7fffa9076ead in code_gen_buffer ()
#25 0x55c50fab in cpu_tb_exec (cpu=cpu@entry=0x56fd9000, 
itb=, tb_exit=tb_exit@entry=0x756fe708)
at ../../accel/tcg/cpu-exec.c:458
#26 0x55c51492 in cpu_loop_exec_tb
(tb_exit=0x756fe708, last_tb=, 
pc=18446744072116179169, tb=, cpu=0x56fd9000)
at ../../accel/tcg/cpu-exec.c:920
#27 cpu_exec_loop (cpu=cpu@entry=0x56fd9000, sc=sc@entry=0x756fe7a0) at 
../../accel/tcg/cpu-exec.c:1041
#28 0x55c51d11 in cpu_exec_setjmp (cpu=cpu@entry=0x56fd9000, 
sc=sc@entry=0x756fe7a0) at ../../accel/tcg/cpu-exec.c:1058
#29 0x55c523b4 in cpu_exec 

Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-01 Thread Peter Maydell
On Thu, 1 Feb 2024 at 13:04, Jonathan Cameron via  wrote:
>




> root@localhost:~/devmem2# numactl --membind=1 touch a
> qemu: fatal: cpu_io_recompile: could not find TB for pc=(nil)

Can you run QEMU under gdb and give the backtrace when it stops
on the abort() ? That will probably have a helpful clue. I
suspect something is failing to pass a valid retaddr in
when it calls a load/store function.

thanks
-- PMM



Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-01 Thread Jonathan Cameron via
On Tue, 30 Jan 2024 13:50:18 +0530
Sajjan Rao  wrote:

> Hi Jonathan,
> 
> The QEMU command line in the original email has been corrected back in
> August 2023 based on the subsequent responses.
> 
> My current QEMU command line reads like below. As you can see I am not
> assigning numa to the CXL memory object.
> 
> qemu-system-x86_64 \
>  -hda /var/lib/libvirt/images/CXL-Test_1.qcow2 \
>  -machine type=q35,nvdimm=on,cxl=on \
>  -accel tcg,thread=single \
>  -m 4G \
>  -smp cpus=4 \
>  -object memory-backend-ram,size=4G,id=m0 \
>  -object memory-backend-ram,size=256M,id=cxl-mem1 \
>  -object memory-backend-ram,size=256M,id=cxl-mem2 \
>  -numa node,memdev=m0,cpus=0-3,nodeid=0 \
>  -netdev 
> user,id=net0,net=192.168.0.0/24,dhcpstart=192.168.0.9,hostfwd=tcp::-:22
> \
>  -device virtio-net-pci,netdev=net0 \
>  -device pxb-cxl,bus_nr=2,bus=pcie.0,id=cxl.1,hdm_for_passthrough=true \
>  -device cxl-rp,port=0,bus=cxl.1,id=cxl_rp_port0,chassis=0,slot=2 \
>  -device cxl-upstream,bus=cxl_rp_port0,id=us0,addr=0.0,multifunction=on, \
>  -device cxl-switch-mailbox-cci,bus=cxl_rp_port0,addr=0.2,target=us0 \
>  -device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \
>  -device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=8 \
>  -device cxl-type3,bus=swport0,volatile-memdev=cxl-mem1,id=cxl-vmem1 \
>  -device cxl-type3,bus=swport1,volatile-memdev=cxl-mem2,id=cxl-vmem2 \
>  -M 
> cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=512M,cxl-fmw.0.interleave-granularity=2k
> \
>  -D /tmp/qemu.log \
>  -nographic
> 
> Until I moved to Qemu version 8.2 recently, I was able to create
> regions and run linux native commands on CXL memory using
> #numactl --membind  top
> 
> You had advised me to turn off KVM and use tcg since the membind
> command will run code out of CXL memory which is not supported. By
> disabling KVM the membind command worked fine.
> However with Qemu version 8.2 the same membind command results in a
> kernel hard crash.

Just to check, kernel crashes, or qemu crashes?

I've probably replicated and it seems to be qemu that is going down with a TCG 
issue.

Bisection underway.

This may take a while.
Our use of TCG is unusual with what QEMU thinks of as io memory is unusual
so we tend to run into corners no one else cares about.

Richard, +CC on off chance you can guess what has happened and save
me a bisection run..

x86 machine pretty much as described above

root@localhost:~/devmem2# numactl --membind=1 touch a
qemu: fatal: cpu_io_recompile: could not find TB for pc=(nil)
RAX=00d6b969c000 RBX=ff294696c000 RCX=0028 
RDX=
RSI=0275 RDI= RBP=00049000 
RSP=ff4f8767805d3d20
R8 = R9 =ff4f8767805d3cdc R10= 
R11=0040
R12=ff294696c0044980 R13= R14=ff294696c51d 
R15=
RIP=9d270fed RFL=0007 [-PC] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =   
CS =0010   00af9b00 DPL=0 CS64 [-RA]
SS =0018   00cf9300 DPL=0 DS   [-WA]
DS =   
FS =   
GS = ff2946973bc0  
LDT=   8200 DPL=0 LDT
TR =0040 fe37d29e7000 4087 8900 DPL=0 TSS64-avl
GDT= fe37d29e5000 007f
IDT= fe00 0fff
CR0=80050033 CR2=7f2972bdc450 CR3=00049000 CR4=00751ef0
DR0= DR1= DR2= 
DR3=
DR6=0ff0 DR7=0400
CCS=00d6b969c000 CCD=00049000 CCO=ADDQ
EFER=0d01
FCW=037f FSW= [ST=0] FTW=00 MXCSR=1f80
FPR0=  FPR1= 
FPR2=  FPR3= 
FPR4=  FPR5= 
FPR6=  FPR7= 
YMM00=  3a3a3a3a3a3a3a3a 3a3a3a3a3a3a3a3a
YMM01=   
YMM02=   
YMM03=  00ff00ff 
YMM04=  5f796c7261655f63 62696c5f5f004554
YMM05=   0060
YMM06=   
YMM07=  0909090909090909 0909090909090909
YMM08=   
YMM09=   
YMM10=   
YMM11=   
YMM12=   
YMM13=