Re: [PATCH v2 0/5] linux-user: Rewrite target_shmat

2024-02-29 Thread Richard Purdie
On Wed, 2024-02-28 at 10:25 -1000, Richard Henderson wrote:
> There are multiple issues with the implementation of shmat().
> 
> (1) With reserved_va, which is the default for 32-on-64-bit, we mmap
> the
>     entire guest address space.  Unlike mmap, shmat refuses to
> replace an
>     existing mapping without setting SHM_REMAP.  This is the original
>     subject of issue #115, though it quicky gets distracted by
>     something else.
> 
> (2) With target page size > host page size, and a shm area
>     that is not a multiple of the target page size, we leave
>     an unmapped hole that the target expects to be mapped.
>     This is the subject of 
> 
>   
> https://lore.kernel.org/qemu-devel/2no4imvz2zrar5kchz2l3oddqbgpj77jg
> wcuf7aritkn2ok763@i2mvpcihztho/
> 
>     wherein qemu itself expects a mapping to exist, and
>     dies in open_self_maps_2.
> 
> So: reimplement the thing.
> 
> Changes for v2:
>   - Include Ilya's test case, which caught extra errors: Yay!
>   - Include x86_64 /proc/self/maps fix, which the test triggers.
>   - Dropped r-b for the shmat rewrite due to number of changes.

I tested these against our problem with webkitgkt and an happy to
report it does solve the segfault we were seeing, thanks!

Cheers,

Richard



Re: [PULL 05/13] linux-user: Use walk_memory_regions for open_self_maps

2024-02-05 Thread Richard Purdie
On Mon, 2024-02-05 at 13:05 +1000, Richard Henderson wrote:
> On 1/26/24 23:52, Richard Purdie wrote:
> > On Fri, 2024-01-26 at 16:33 +0300, Michael Tokarev wrote:
> > > 26.01.2024 16:03, Richard Purdie wrote:
> > > > I've run into a problem with this change.
> > > > 
> > > > We (Yocto Project) upgraded to qemu 8.2.0 recently and after that we
> > > > started seeing errors cross compiling webkitgtk on x86_64 for x86_64
> > > > during the introspection code which runs under user mode qemu.
> > > 
> > > Besides your observations, please be aware there's quite a few issues in 
> > > 8.2.0.
> > > Please take a look at https://gitlab.com/mjt0k/qemu/-/commits/staging-8.2/
> > > (and https://gitlab.com/qemu-project/qemu/-/commits/staging-8.2/ which is 
> > > updated
> > > less often) for fixes already queued up, if you haven't looked there 
> > > already.
> > > 8.2.1 stable/bugfix release is scheduled for the beginning of the next 
> > > week.
> > 
> > Thanks.
> > 
> > I should note that I did test the staging-8.2 branch and nothing there
> > helped. The issue was also present with master as of yesterday.
> > 
> > https://bugzilla.yoctoproject.org/show_bug.cgi?id=15367 is Yocto
> > Projects tracking of the issue which has the commits for master and
> > staging-8.2 that I tested.
> 
> The yocto logs referenced here are not helpful for reproducing the problem.

It took me a couple of days I didn't have to workout which commit
caused it, which versions showed the issue and how to work around it.

It looks host kernel specific since it doesn't happen on some systems
so even with the binaries/command/environment vars, it may not be
enough.

I was hoping the indication of the cause might help point to the fix as
there is quite a bit of work in trying to extract this into a
reproducer. The failure is 20 mins into a webkitgtk compile on a remote
CI system which no longer has the context on it.

> Please extract a binary to run, inputs, and command-line.

I wish I could say that to the bug reports I get! :)

I'll do my best but finding the time is going to be a challenge.

Cheers,

Richard



Re: [PULL 05/13] linux-user: Use walk_memory_regions for open_self_maps

2024-01-26 Thread Richard Purdie
Hi Michael,

On Fri, 2024-01-26 at 16:33 +0300, Michael Tokarev wrote:
> 26.01.2024 16:03, Richard Purdie wrote:
> > I've run into a problem with this change.
> > 
> > We (Yocto Project) upgraded to qemu 8.2.0 recently and after that we
> > started seeing errors cross compiling webkitgtk on x86_64 for x86_64
> > during the introspection code which runs under user mode qemu.
> 
> Besides your observations, please be aware there's quite a few issues in 
> 8.2.0.
> Please take a look at https://gitlab.com/mjt0k/qemu/-/commits/staging-8.2/
> (and https://gitlab.com/qemu-project/qemu/-/commits/staging-8.2/ which is 
> updated
> less often) for fixes already queued up, if you haven't looked there already.
> 8.2.1 stable/bugfix release is scheduled for the beginning of the next week.

Thanks.

I should note that I did test the staging-8.2 branch and nothing there
helped. The issue was also present with master as of yesterday.

https://bugzilla.yoctoproject.org/show_bug.cgi?id=15367 is Yocto
Projects tracking of the issue which has the commits for master and
staging-8.2 that I tested.

Cheers,

Richard





Re: [PULL 05/13] linux-user: Use walk_memory_regions for open_self_maps

2024-01-26 Thread Richard Purdie
Hi,

I've run into a problem with this change.

We (Yocto Project) upgraded to qemu 8.2.0 recently and after that we
started seeing errors cross compiling webkitgtk on x86_64 for x86_64
during the introspection code which runs under user mode qemu.

The error we see is:

qemu-x86_64: QEMU internal SIGSEGV {code=MAPERR, addr=0x20}
Segmentation fault

e.g. here:

https://autobuilder.yoctoproject.org/typhoon/#/builders/40/builds/8488/steps/11/logs/stdio

This usually seems to happen on our debian 11 based autobuilder
machines.

I took one of the broken builds and bisected it to this change (commit
7b7a3366e142d3baeb3fd1d3660a50e7956c19eb).

There was a change in output from commit
7dfd3ca8d95f9962cdd2ebdfcdd699279b98fa18, before that it was:

ERROR:../git/accel/tcg/cpu-exec.c:532:cpu_exec_longjmp_cleanup: assertion 
failed: (cpu == current_cpu)
Bail out! ERROR:../git/accel/tcg/cpu-exec.c:532:cpu_exec_longjmp_cleanup: 
assertion failed: (cpu == current_cpu)

After digging into the code and trying to work out what is going on, I
realised that n is NULL when it fails so this makes the problem "go
away":

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index e384e14248..2577fb770d 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8085,6 +8085,9 @@ static int open_self_maps_2(void *opaque, target_ulong 
guest_start,
 while (1) {
 IntervalTreeNode *n =
 interval_tree_iter_first(d->host_maps, host_start, host_start);
+if (!n) {
+return 0;
+}
 MapInfo *mi = container_of(n, MapInfo, itree);
 uintptr_t this_hlast = MIN(host_last, n->last);
 target_ulong this_gend = h2g(this_hlast) + 1;


I'm hoping that might be enough to give you an idea of what is going on
and what the correct fix may be?

I haven't managed to make an isolated test to reproduce the issue yet.

Cheers,

Richard



Re: mips system emulation failure with virtio

2023-09-06 Thread Richard Purdie
On Wed, 2023-09-06 at 17:50 +0200, Philippe Mathieu-Daudé wrote:
> +rth/pm215/dhildenb
> 
> On 5/9/23 16:50, Richard Purdie wrote:
> > On Tue, 2023-09-05 at 14:59 +0100, Alex Bennée wrote:
> > > Richard Purdie  writes:
> > > 
> > > > With qemu 8.1.0 we see boot hangs fox x86-64 targets.
> > > > 
> > > > These are fixed by 0d58c660689f6da1e3feff8a997014003d928b3b (softmmu:
> > > > Use async_run_on_cpu in tcg_commit) but if I add that commit, mips and
> > > > mips64 break, hanging at boot unable to find a rootfs.
> > > 
> > > (Widen CC list)
> > > 
> > > > 
> > > > We use virtio for network and disk and both of those change in the
> > > > bootlog from messages like:
> > > > 
> > > > [1.726118] virtio-pci :00:13.0: enabling device ( -> 0003)
> > > > [1.728864] virtio-pci :00:14.0: enabling device ( -> 0003)
> > > > [1.729948] virtio-pci :00:15.0: enabling device ( -> 0003)
> > > > ...
> > > > [2.162148] virtio_blk virtio2: 1/0/0 default/read/poll queues
> > > > [2.168311] virtio_blk virtio2: [vda] 1184242 512-byte logical
> > > > 
> > > > to:
> > > > 
> > > > [1.777051] virtio-pci :00:13.0: enabling device ( -> 0003)
> > > > [1.779822] virtio-pci :00:14.0: enabling device ( -> 0003)
> > > > [1.780926] virtio-pci :00:15.0: enabling device ( -> 0003)
> > > > ...
> > > > [1.894852] virtio_rng: probe of virtio1 failed with error -28
> > > > ...
> > > > [2.063553] virtio_blk virtio2: 1/0/0 default/read/poll queues
> > > > [2.064260] virtio_blk: probe of virtio2 failed with error -28
> > > > [2.069080] virtio_net: probe of virtio0 failed with error -28
> > > > 
> > > > 
> > > > i.e. the virtio drivers no longer work.
> > > 
> > > Interesting, as you say this seems to be VirtIO specific as the baseline
> > > tests (using IDE) work fine:
> > > 
> > >➜  ./tests/venv/bin/avocado run 
> > > ./tests/avocado/tuxrun_baselines.py:test_mips64
> > >JOB ID : 71f3e3b7080164b78ef1c8c1bb6bc880932d8c9b
> > >JOB LOG: 
> > > /home/alex/avocado/job-results/job-2023-09-05T15.01-71f3e3b/job.log
> > > (1/2) 
> > > ./tests/avocado/tuxrun_baselines.py:TuxRunBaselineTest.test_mips64: PASS 
> > > (12.19 s)
> > > (2/2) 
> > > ./tests/avocado/tuxrun_baselines.py:TuxRunBaselineTest.test_mips64el: 
> > > PASS (11.78 s)
> > >RESULTS: PASS 2 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 
> > > | CANCEL 0
> > >JOB TIME   : 24.79 s
> > > 
> > > > I tested with current qemu master
> > > > (17780edd81d27fcfdb7a802efc870a99788bd2fc) and mips is still broken
> > > > there.
> > > > 
> > > > Is this issue known about?
> > > 
> > > Could you raise a bug at:
> > > 
> > >https://gitlab.com/qemu-project/qemu/-/issues
> > 
> > Done, https://gitlab.com/qemu-project/qemu/-/issues/1866
> > 
> > > I'm curious why MIPS VirtIO is affected but nothing else is...
> > 
> > Me too, it seems there is a real code issue somewhere in this...
> 
> This seems to fix the issue for me, but I'm not really sure what
> I'm doing after various hours debugging, so sharing here before
> I take some rest:
> 
> -- >8 --
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index 18277ddd67..ec31ebcb56 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -2517,7 +2517,7 @@ static void tcg_commit(MemoryListener *listener)
>* That said, the listener is also called during realize, before
>* all of the tcg machinery for run-on is initialized: thus halt_cond.
>*/
> -if (cpu->halt_cond) {
> +if (cpu->halt_cond && !qemu_cpu_is_self(cpu)) {
>   async_run_on_cpu(cpu, tcg_commit_cpu, RUN_ON_CPU_HOST_PTR(cpuas));
>   } else {
>   tcg_commit_cpu(cpu, RUN_ON_CPU_HOST_PTR(cpuas));

I tested with the above and confirmed it does fix 8.1.0 for the mips
test I was using.

Thanks!

Richard



Re: mips system emulation failure with virtio

2023-09-05 Thread Richard Purdie
On Tue, 2023-09-05 at 18:46 +0200, Philippe Mathieu-Daudé wrote:
> On 5/9/23 17:53, Richard Purdie wrote:
> > On Tue, 2023-09-05 at 17:12 +0200, Philippe Mathieu-Daudé wrote:
> > > Hi Richard,
> > > 
> > > On 5/9/23 16:50, Richard Purdie wrote:
> > > > On Tue, 2023-09-05 at 14:59 +0100, Alex Bennée wrote:
> > > > > Richard Purdie  writes:
> > > > > 
> > > > > > With qemu 8.1.0 we see boot hangs fox x86-64 targets.
> > > > > > 
> > > > > > These are fixed by 0d58c660689f6da1e3feff8a997014003d928b3b 
> > > > > > (softmmu:
> > > > > > Use async_run_on_cpu in tcg_commit) but if I add that commit, mips 
> > > > > > and
> > > > > > mips64 break, hanging at boot unable to find a rootfs.
> > > 
> > > Are you testing mipsel / mips64el?
> > 
> > No, it was mips/mips64, i.e. big endian.
> 
> Sorry my question was not clear. I meant: Do you also
> test mipsel / mips64el guests, and if so, do they work?
> (IOW, is this bug only big-endian guest specific?)

Sorry, I misunderstood. We don't test mipsel/mips64el so I don't know
if that is working or not unfortunately.

Cheers,

Richard



Re: mips system emulation failure with virtio

2023-09-05 Thread Richard Purdie
On Tue, 2023-09-05 at 17:12 +0200, Philippe Mathieu-Daudé wrote:
> Hi Richard,
> 
> On 5/9/23 16:50, Richard Purdie wrote:
> > On Tue, 2023-09-05 at 14:59 +0100, Alex Bennée wrote:
> > > Richard Purdie  writes:
> > > 
> > > > With qemu 8.1.0 we see boot hangs fox x86-64 targets.
> > > > 
> > > > These are fixed by 0d58c660689f6da1e3feff8a997014003d928b3b (softmmu:
> > > > Use async_run_on_cpu in tcg_commit) but if I add that commit, mips and
> > > > mips64 break, hanging at boot unable to find a rootfs.
> 
> Are you testing mipsel / mips64el?

No, it was mips/mips64, i.e. big endian.

Cheers,

Richard



Re: mips system emulation failure with virtio

2023-09-05 Thread Richard Purdie
On Tue, 2023-09-05 at 14:59 +0100, Alex Bennée wrote:
> Richard Purdie  writes:
> 
> > With qemu 8.1.0 we see boot hangs fox x86-64 targets. 
> > 
> > These are fixed by 0d58c660689f6da1e3feff8a997014003d928b3b (softmmu:
> > Use async_run_on_cpu in tcg_commit) but if I add that commit, mips and
> > mips64 break, hanging at boot unable to find a rootfs. 
> 
> (Widen CC list)
> 
> > 
> > We use virtio for network and disk and both of those change in the
> > bootlog from messages like:
> > 
> > [1.726118] virtio-pci :00:13.0: enabling device ( -> 0003)
> > [1.728864] virtio-pci :00:14.0: enabling device ( -> 0003)
> > [1.729948] virtio-pci :00:15.0: enabling device ( -> 0003)
> > ...
> > [2.162148] virtio_blk virtio2: 1/0/0 default/read/poll queues
> > [2.168311] virtio_blk virtio2: [vda] 1184242 512-byte logical 
> > 
> > to:
> > 
> > [1.777051] virtio-pci :00:13.0: enabling device ( -> 0003)
> > [1.779822] virtio-pci :00:14.0: enabling device ( -> 0003)
> > [1.780926] virtio-pci :00:15.0: enabling device ( -> 0003)
> > ...
> > [1.894852] virtio_rng: probe of virtio1 failed with error -28
> > ...
> > [2.063553] virtio_blk virtio2: 1/0/0 default/read/poll queues
> > [2.064260] virtio_blk: probe of virtio2 failed with error -28
> > [2.069080] virtio_net: probe of virtio0 failed with error -28
> > 
> > 
> > i.e. the virtio drivers no longer work.
> 
> Interesting, as you say this seems to be VirtIO specific as the baseline
> tests (using IDE) work fine:
> 
>   ➜  ./tests/venv/bin/avocado run 
> ./tests/avocado/tuxrun_baselines.py:test_mips64
>   JOB ID : 71f3e3b7080164b78ef1c8c1bb6bc880932d8c9b
>   JOB LOG: 
> /home/alex/avocado/job-results/job-2023-09-05T15.01-71f3e3b/job.log
>(1/2) ./tests/avocado/tuxrun_baselines.py:TuxRunBaselineTest.test_mips64: 
> PASS (12.19 s)
>(2/2) 
> ./tests/avocado/tuxrun_baselines.py:TuxRunBaselineTest.test_mips64el: PASS 
> (11.78 s)
>   RESULTS: PASS 2 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | 
> CANCEL 0
>   JOB TIME   : 24.79 s
> 
> > I tested with current qemu master
> > (17780edd81d27fcfdb7a802efc870a99788bd2fc) and mips is still broken
> > there.
> > 
> > Is this issue known about?
> 
> Could you raise a bug at:
> 
>   https://gitlab.com/qemu-project/qemu/-/issues

Done, https://gitlab.com/qemu-project/qemu/-/issues/1866

> I'm curious why MIPS VirtIO is affected but nothing else is...

Me too, it seems there is a real code issue somewhere in this...

Cheers,

Richard






mips system emulation failure with virtio

2023-09-05 Thread Richard Purdie
With qemu 8.1.0 we see boot hangs fox x86-64 targets. 

These are fixed by 0d58c660689f6da1e3feff8a997014003d928b3b (softmmu:
Use async_run_on_cpu in tcg_commit) but if I add that commit, mips and
mips64 break, hanging at boot unable to find a rootfs. 

We use virtio for network and disk and both of those change in the
bootlog from messages like:

[1.726118] virtio-pci :00:13.0: enabling device ( -> 0003)
[1.728864] virtio-pci :00:14.0: enabling device ( -> 0003)
[1.729948] virtio-pci :00:15.0: enabling device ( -> 0003)
...
[2.162148] virtio_blk virtio2: 1/0/0 default/read/poll queues
[2.168311] virtio_blk virtio2: [vda] 1184242 512-byte logical 

to:

[1.777051] virtio-pci :00:13.0: enabling device ( -> 0003)
[1.779822] virtio-pci :00:14.0: enabling device ( -> 0003)
[1.780926] virtio-pci :00:15.0: enabling device ( -> 0003)
...
[1.894852] virtio_rng: probe of virtio1 failed with error -28
...
[2.063553] virtio_blk virtio2: 1/0/0 default/read/poll queues
[2.064260] virtio_blk: probe of virtio2 failed with error -28
[2.069080] virtio_net: probe of virtio0 failed with error -28


i.e. the virtio drivers no longer work.

I tested with current qemu master
(17780edd81d27fcfdb7a802efc870a99788bd2fc) and mips is still broken
there.

Is this issue known about?

Cheers,

Richard



qemu-system-ppc failures with glibc 2.38

2023-08-31 Thread Richard Purdie
Hi,

Yocto Project's CI is noticing a lot of issues with qemu-system-ppc
emulation on loaded systems after we switch glibc to 2.38.

This is manifesting as hangs in the emulated system and for example,
systemd units then timeout and restart. If we have long running
commands running over ssh (e.g. configure or kernel module builds),
these are then are then disconnected.

It does appear the system does 'freeze' periodically. It also seems
related to the load on the system running the emulation. On an
otherwise idle system it is fine, when there is other load, we hit the
freezes, watchdog resets and resulting faiilures.

We do sometimes see kernel rcu stalls too, the frequency of them may be
higher with the new glibc, it is hard to tell.

Can anyone think of something that changed in glibc 2.38 which might
affect qemu-system-ppc in this way? Everything else we test seems ok.

I've tested with qemu 8.0.0, 8.0.3, 8.0.4 and 8.1.0 with similar
results. The latter has other problems unfortunately which we're still
trying to debug (x86 hangs, we then tried the "softmmu: Assert data in
bounds in iotlb_to_section" patches but that breaks mips). 

Cheers,

Richard



Re: [PULL 00/10] ppc queue

2023-05-29 Thread Richard Purdie
On Mon, 2023-05-29 at 16:30 +1000, Nicholas Piggin wrote:
> On Mon May 29, 2023 at 4:01 PM AEST, Michael Tokarev wrote:
> > 29.05.2023 05:18, Nicholas Piggin wrote:
> > ..
> > 
> > > > 01/10 target/ppc: Fix fallback to MFSS for MFFS* instructions on pre 
> > > > 3.0 ISAs
> > > > 02/10 target/ppc: Fix width of some 32-bit SPRs
> > > > 03/10 target/ppc: Alignment faults do not set DSISR in ISA v3.0 onward
> > > > 05/10 hw/ppc/prep: Fix wiring of PIC -> CPU interrupt
> > > > 
> > > > Or are these not important for -stable?  Or maybe there are other 
> > > > changes
> > > > which should be picked too?
> > > 
> > > They certainly fix some parts of target emulation, but what is the
> > > guidance for backporting those type of fixes? Most of the patches I sent
> > > including 2,3 were just found from inspection or new test code and not
> > > real software failing.
> > > 
> > > Should just simple ones go in? 32-bit SPRs do not fix entirely the
> > > behaviour of all SPRs, just one aspect. In another fix I had (that
> > > didn't make it in this merge), was a bit more complicated and the
> > > first iteration caused a deadlock that didn't show up in basic test
> > > like booting Linux.
> > > 
> > > My guess is that fixes that correct an issue with real software running
> > > on the target should be ported to stable. Perhaps "obviously correct"
> > > small fixes as well. But not sure about larger changes.
> > 
> > This is exactly why I asked, - because I don't clearly understand how
> > important these to have in -stable. And also to remind that -stable
> > exist, just in case.. ;)
> 
> Ah okay, makes sense. I was just clarifying myself since I wasn't
> too sure.
> 
> > So be it, no actual issue so not applying to -stable.
> 
> I will think about it and try to keep -stable in mind. Of my patches
> there are one or two coming up that could probably go in there, if
> not these ones.

1/10 from me (fallback to MFSS) did fix software failures for Yocto
Project so might be a good candidate for stable. We're carrying that
patch against the last release for now.

Cheers,

Richard





[PATCH v3] target/ppc: Fix fallback to MFSS for MFFS* instructions on pre 3.0 ISAs

2023-05-10 Thread Richard Purdie
The following commits changed the code such that the fallback to MFSS for 
MFFSCRN,
MFFSCRNI, MFFSCE and MFFSL on pre 3.0 ISAs was removed and became an illegal 
instruction:

  bf8adfd88b547680aa857c46098f3a1e94373160 - target/ppc: Move mffscrn[i] to 
decodetree
  394c2e2fda70da722f20fb60412d6c0ca4bfaa03 - target/ppc: Move mffsce to 
decodetree
  3e5bce70efe6bd1f684efbb21fd2a316cbf0657e - target/ppc: Move mffsl to 
decodetree

The hardware will handle them as a MFFS instruction as the code did previously.
This means applications that were segfaulting under qemu when encountering these
instructions which is used in glibc libm functions for example.

The fallback for MFFSCDRN and MFFSCDRNI added in a later patch was also missing.

This patch restores the fallback to MFSS for these instructions on pre 3.0s ISAs
as the hardware decoder would, fixing the segfaulting libm code. It doesn't have
the fallback for 3.0 onwards to match hardware behaviour.

Signed-off-by: Richard Purdie 
---
 target/ppc/insn32.decode   | 20 +---
 target/ppc/translate/fp-impl.c.inc | 22 --
 2 files changed, 29 insertions(+), 13 deletions(-)

v3 - drop fallback to MFFS for 3.0 ISA to match hardware
v2 - switch to use decodetree pattern groups per feedback

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index f8f589e9fd..4fcf3af8d0 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -390,13 +390,19 @@ SETNBCR 01 . . - 00 -   
@X_bi
 
 ### Move To/From FPSCR
 
-MFFS11 . 0 - 1001000111 .   @X_t_rc
-MFFSCE  11 . 1 - 1001000111 -   @X_t
-MFFSCRN 11 . 10110 . 1001000111 -   @X_tb
-MFFSCDRN11 . 10100 . 1001000111 -   @X_tb
-MFFSCRNI11 . 10111 ---.. 1001000111 -   @X_imm2
-MFFSCDRNI   11 . 10101 --... 1001000111 -   @X_imm3
-MFFSL   11 . 11000 - 1001000111 -   @X_t
+{
+  # Before Power ISA v3.0, MFFS bits 11~15 were reserved and should be ignored
+  MFFS_ISA207 11 . - - 1001000111 .   @X_t_rc
+  [
+MFFS11 . 0 - 1001000111 .   @X_t_rc
+MFFSCE  11 . 1 - 1001000111 -   @X_t
+MFFSCRN 11 . 10110 . 1001000111 -   @X_tb
+MFFSCDRN11 . 10100 . 1001000111 -   @X_tb
+MFFSCRNI11 . 10111 ---.. 1001000111 -   @X_imm2
+MFFSCDRNI   11 . 10101 --... 1001000111 -   @X_imm3
+MFFSL   11 . 11000 - 1001000111 -   @X_t
+  ]
+}
 
 ### Decimal Floating-Point Arithmetic Instructions
 
diff --git a/target/ppc/translate/fp-impl.c.inc 
b/target/ppc/translate/fp-impl.c.inc
index 57d8437851..874774eade 100644
--- a/target/ppc/translate/fp-impl.c.inc
+++ b/target/ppc/translate/fp-impl.c.inc
@@ -568,6 +568,22 @@ static void store_fpscr_masked(TCGv_i64 fpscr, uint64_t 
clear_mask,
 gen_helper_store_fpscr(cpu_env, fpscr_masked, st_mask);
 }
 
+static bool trans_MFFS_ISA207(DisasContext *ctx, arg_X_t_rc *a)
+{
+if (!(ctx->insns_flags2 & PPC2_ISA300)) {
+/*
+ * Before Power ISA v3.0, MFFS bits 11~15 were reserved, any 
instruction
+ * with OPCD=63 and XO=583 should be decoded as MFFS.
+ */
+return trans_MFFS(ctx, a);
+}
+/*
+ * For Power ISA v3.0+, return false and let the pattern group
+ * select the correct instruction.
+ */
+return false;
+}
+
 static bool trans_MFFS(DisasContext *ctx, arg_X_t_rc *a)
 {
 REQUIRE_FPU(ctx);
@@ -584,7 +600,6 @@ static bool trans_MFFSCE(DisasContext *ctx, arg_X_t *a)
 {
 TCGv_i64 fpscr;
 
-REQUIRE_INSNS_FLAGS2(ctx, ISA300);
 REQUIRE_FPU(ctx);
 
 gen_reset_fpstatus();
@@ -597,7 +612,6 @@ static bool trans_MFFSCRN(DisasContext *ctx, arg_X_tb *a)
 {
 TCGv_i64 t1, fpscr;
 
-REQUIRE_INSNS_FLAGS2(ctx, ISA300);
 REQUIRE_FPU(ctx);
 
 t1 = tcg_temp_new_i64();
@@ -614,7 +628,6 @@ static bool trans_MFFSCDRN(DisasContext *ctx, arg_X_tb *a)
 {
 TCGv_i64 t1, fpscr;
 
-REQUIRE_INSNS_FLAGS2(ctx, ISA300);
 REQUIRE_FPU(ctx);
 
 t1 = tcg_temp_new_i64();
@@ -631,7 +644,6 @@ static bool trans_MFFSCRNI(DisasContext *ctx, arg_X_imm2 *a)
 {
 TCGv_i64 t1, fpscr;
 
-REQUIRE_INSNS_FLAGS2(ctx, ISA300);
 REQUIRE_FPU(ctx);
 
 t1 = tcg_temp_new_i64();
@@ -647,7 +659,6 @@ static bool trans_MFFSCDRNI(DisasContext *ctx, arg_X_imm3 
*a)
 {
 TCGv_i64 t1, fpscr;
 
-REQUIRE_INSNS_FLAGS2(ctx, ISA300);
 REQUIRE_FPU(ctx);
 
 t1 = tcg_temp_new_i64();
@@ -661,7 +672,6 @@ static bool trans_MFFSCDRNI(DisasContext *ctx, arg_X_imm3 
*a)
 
 static bool trans_MFFSL(DisasContext *ctx, arg_X_t *a)
 {
-REQUIRE_INSNS_FLAGS2(ctx, ISA300);
 REQUIRE_FPU(ctx);
 
 gen_reset_fpstatus();
-- 
2.39.2




[PATCH v2] target/ppc: Fix fallback to MFSS for MFFS* instructions on pre 3.0 ISAs

2023-05-06 Thread Richard Purdie
The following commits changed the code such that the fallback to MFSS for 
MFFSCRN,
MFFSCRNI, MFFSCE and MFFSL on pre 3.0 ISAs was removed and became an illegal 
instruction:

  bf8adfd88b547680aa857c46098f3a1e94373160 - target/ppc: Move mffscrn[i] to 
decodetree
  394c2e2fda70da722f20fb60412d6c0ca4bfaa03 - target/ppc: Move mffsce to 
decodetree
  3e5bce70efe6bd1f684efbb21fd2a316cbf0657e - target/ppc: Move mffsl to 
decodetree

The hardware will handle them as a MFFS instruction as the code did previously.
This means applications that were segfaulting under qemu when encountering these
instructions which is used in glibc libm functions for example.

The fallback for MFFSCDRN and MFFSCDRNI added in a later patch was also missing.

This patch restores the fallback to MFSS for these instructions on pre 3.0s ISAs
as the hardware decoder would, fixing the segfaulting libm code. It and also 
ensures
the MFSS instruction is used for currently reserved bits to handle other 
potential
ISA additions more correctly.

Signed-off-by: Richard Purdie 
---
 target/ppc/insn32.decode   | 19 ---
 target/ppc/translate/fp-impl.c.inc | 30 --
 2 files changed, 36 insertions(+), 13 deletions(-)

v2 - switch to use decodetree pattern groups per feedback

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index f8f589e9fd..3c4e2c2fc2 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -390,13 +390,18 @@ SETNBCR 01 . . - 00 -   
@X_bi
 
 ### Move To/From FPSCR
 
-MFFS11 . 0 - 1001000111 .   @X_t_rc
-MFFSCE  11 . 1 - 1001000111 -   @X_t
-MFFSCRN 11 . 10110 . 1001000111 -   @X_tb
-MFFSCDRN11 . 10100 . 1001000111 -   @X_tb
-MFFSCRNI11 . 10111 ---.. 1001000111 -   @X_imm2
-MFFSCDRNI   11 . 10101 --... 1001000111 -   @X_imm3
-MFFSL   11 . 11000 - 1001000111 -   @X_t
+{ 
+  # Before Power ISA v3.0, MFFS bits 11~15 were reserved and should be ignored
+  [
+MFFSCE  11 . 1 - 1001000111 -   @X_t
+MFFSCRN 11 . 10110 . 1001000111 -   @X_tb
+MFFSCDRN11 . 10100 . 1001000111 -   @X_tb
+MFFSCRNI11 . 10111 ---.. 1001000111 -   @X_imm2
+MFFSCDRNI   11 . 10101 --... 1001000111 -   @X_imm3
+MFFSL   11 . 11000 - 1001000111 -   @X_t
+  ]
+  MFFS11 . - - 1001000111 .   @X_t_rc
+}
 
 ### Decimal Floating-Point Arithmetic Instructions
 
diff --git a/target/ppc/translate/fp-impl.c.inc 
b/target/ppc/translate/fp-impl.c.inc
index 57d8437851..10dfd91aa4 100644
--- a/target/ppc/translate/fp-impl.c.inc
+++ b/target/ppc/translate/fp-impl.c.inc
@@ -584,7 +584,10 @@ static bool trans_MFFSCE(DisasContext *ctx, arg_X_t *a)
 {
 TCGv_i64 fpscr;
 
-REQUIRE_INSNS_FLAGS2(ctx, ISA300);
+if (!(ctx->insns_flags2 & PPC2_ISA300)) {
+return false;
+}
+
 REQUIRE_FPU(ctx);
 
 gen_reset_fpstatus();
@@ -597,7 +600,10 @@ static bool trans_MFFSCRN(DisasContext *ctx, arg_X_tb *a)
 {
 TCGv_i64 t1, fpscr;
 
-REQUIRE_INSNS_FLAGS2(ctx, ISA300);
+if (!(ctx->insns_flags2 & PPC2_ISA300)) {
+return false;
+}
+
 REQUIRE_FPU(ctx);
 
 t1 = tcg_temp_new_i64();
@@ -614,7 +620,10 @@ static bool trans_MFFSCDRN(DisasContext *ctx, arg_X_tb *a)
 {
 TCGv_i64 t1, fpscr;
 
-REQUIRE_INSNS_FLAGS2(ctx, ISA300);
+if (!(ctx->insns_flags2 & PPC2_ISA300)) {
+return false;
+}
+
 REQUIRE_FPU(ctx);
 
 t1 = tcg_temp_new_i64();
@@ -631,7 +640,10 @@ static bool trans_MFFSCRNI(DisasContext *ctx, arg_X_imm2 
*a)
 {
 TCGv_i64 t1, fpscr;
 
-REQUIRE_INSNS_FLAGS2(ctx, ISA300);
+if (!(ctx->insns_flags2 & PPC2_ISA300)) {
+return false;
+}
+
 REQUIRE_FPU(ctx);
 
 t1 = tcg_temp_new_i64();
@@ -647,7 +659,10 @@ static bool trans_MFFSCDRNI(DisasContext *ctx, arg_X_imm3 
*a)
 {
 TCGv_i64 t1, fpscr;
 
-REQUIRE_INSNS_FLAGS2(ctx, ISA300);
+if (!(ctx->insns_flags2 & PPC2_ISA300)) {
+return false;
+}
+
 REQUIRE_FPU(ctx);
 
 t1 = tcg_temp_new_i64();
@@ -661,7 +676,10 @@ static bool trans_MFFSCDRNI(DisasContext *ctx, arg_X_imm3 
*a)
 
 static bool trans_MFFSL(DisasContext *ctx, arg_X_t *a)
 {
-REQUIRE_INSNS_FLAGS2(ctx, ISA300);
+if (!(ctx->insns_flags2 & PPC2_ISA300)) {
+return false;
+}
+
 REQUIRE_FPU(ctx);
 
 gen_reset_fpstatus();
-- 
2.39.2




[PATCH] target/ppc: Fix fallback to MFSS for MFFSCRN, MFFSCRNI, MFFSCE and MFFSL

2023-05-04 Thread Richard Purdie
The following commits changed the code such that these instructions became 
invalid
on pre 3.0 ISAs:

  bf8adfd88b547680aa857c46098f3a1e94373160 - target/ppc: Move mffscrn[i] to 
decodetree
  394c2e2fda70da722f20fb60412d6c0ca4bfaa03 - target/ppc: Move mffsce to 
decodetree
  3e5bce70efe6bd1f684efbb21fd2a316cbf0657e - target/ppc: Move mffsl to 
decodetree

The hardware will handle them as a MFFS instruction as the code did previously.
Restore that behaviour. This means applications that were segfaulting under qemu
when encountering these instructions now operate correctly. The instruction
is used in glibc libm functions for example.

Signed-off-by: Richard Purdie 
---
 target/ppc/translate/fp-impl.c.inc | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/target/ppc/translate/fp-impl.c.inc 
b/target/ppc/translate/fp-impl.c.inc
index 57d8437851..cb86381c3f 100644
--- a/target/ppc/translate/fp-impl.c.inc
+++ b/target/ppc/translate/fp-impl.c.inc
@@ -584,7 +584,10 @@ static bool trans_MFFSCE(DisasContext *ctx, arg_X_t *a)
 {
 TCGv_i64 fpscr;
 
-REQUIRE_INSNS_FLAGS2(ctx, ISA300);
+if (unlikely(!(ctx->insns_flags2 & PPC2_ISA300))) {
+return trans_MFFS(ctx, a);
+}
+
 REQUIRE_FPU(ctx);
 
 gen_reset_fpstatus();
@@ -597,7 +600,10 @@ static bool trans_MFFSCRN(DisasContext *ctx, arg_X_tb *a)
 {
 TCGv_i64 t1, fpscr;
 
-REQUIRE_INSNS_FLAGS2(ctx, ISA300);
+if (unlikely(!(ctx->insns_flags2 & PPC2_ISA300))) {
+return trans_MFFS(ctx, a);
+}
+
 REQUIRE_FPU(ctx);
 
 t1 = tcg_temp_new_i64();
@@ -631,7 +637,10 @@ static bool trans_MFFSCRNI(DisasContext *ctx, arg_X_imm2 
*a)
 {
 TCGv_i64 t1, fpscr;
 
-REQUIRE_INSNS_FLAGS2(ctx, ISA300);
+if (unlikely(!(ctx->insns_flags2 & PPC2_ISA300))) {
+return trans_MFFS(ctx, a);
+}
+
 REQUIRE_FPU(ctx);
 
 t1 = tcg_temp_new_i64();
@@ -661,7 +670,10 @@ static bool trans_MFFSCDRNI(DisasContext *ctx, arg_X_imm3 
*a)
 
 static bool trans_MFFSL(DisasContext *ctx, arg_X_t *a)
 {
-REQUIRE_INSNS_FLAGS2(ctx, ISA300);
+if (unlikely(!(ctx->insns_flags2 & PPC2_ISA300))) {
+return trans_MFFS(ctx, a);
+}
+
 REQUIRE_FPU(ctx);
 
 gen_reset_fpstatus();




Re: VMs hanging with rcu stall problems

2021-06-29 Thread Richard Purdie
On Thu, 2021-06-03 at 11:02 +0100, Richard Purdie wrote:
> We're seeing intermittent rcu stalls in qemu system mode emulation which is 
> causing CI issues for us (Yocto Project). We're struggling to understand
> the cause and have tried several things to mitigate without much success. 
> The stalls are odd in that the log messages look incomplete. An example
> from last night:

To answer my own question, the issue is a kernel bug in RCU stall handling.
The fix:

https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=rcu/next=c583bcb8f5ed

and patch which introduced the problem:

https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=rcu/next=c583bcb8f5ed

Cheers,

Richard




VMs hanging with rcu stall problems

2021-06-03 Thread Richard Purdie
Hi,

We're seeing intermittent rcu stalls in qemu system mode emulation which is 
causing CI issues for us (Yocto Project). We're struggling to understand
the cause and have tried several things to mitigate without much success. 
The stalls are odd in that the log messages look incomplete. An example
from last night:

[  133.333475] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  133.337109]  (detected by 2, t=25864 jiffies, g=1529, q=10)
[  133.339025] rcu: All QSes seen, last rcu_preempt kthread activity 4865 
(4294800423-4294795558), jiffies_till_next_fqs=3, root ->qsmask 0x0
[  133.343445] rcu: rcu_preempt kthread starved for 4870 jiffies! g1529 f0x2 
RCU_GP_WAIT_FQS(5) ->state=0x200 ->cpu=2
[  133.346976] rcu: Unless rcu_preempt kthread gets sufficient CPU time, 
OOM is now expected behavior.
[  133.350262] rcu: RCU grace-period kthread stack dump:
[  133.352704] task:rcu_preempt state:R stack:0 pid:   13 ppid: 2 
flags:0x4000
[  133.355581] Call Trace:
[  133.356488]  __schedule+0x1dc/0x570
[  133.357693]  ? __mod_timer+0x220/0x3c0
[  133.359018]  schedule+0x68/0xe0
[  133.36]  schedule_timeout+0x8f/0x160
[  133.361267]  ? force_qs_rnp+0x8d/0x1c0
[  133.362515]  ? __next_timer_interrupt+0x100/0x100
[  133.364264]  rcu_gp_kthread+0x55f/0xba0
[  133.365701]  ? note_gp_changes+0x70/0x70
[  133.367356]  kthread+0x145/0x170
[  133.368597]  ? kthread_associate_blkcg+0xc0/0xc0
[  133.370686]  ret_from_fork+0x22/0x30
[  133.371976] BUG: scheduling while atomic: swapper/2/0/0x0002
[  133.374066] Modules linked in: bnep
[  133.375324] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.10.41-yocto-standard 
#1
[  133.377813] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[  133.381882] Call Trace:
[  133.382744]  dump_stack+0x5e/0x74
[  133.384027]  __schedule_bug.cold+0x4b/0x59
[  133.385362]  __schedule+0x3f6/0x570
[  133.386655]  schedule_idle+0x2c/0x40
[  133.388033]  do_idle+0x15a/0x250
[  133.389257]  ? complete+0x3f/0x50
[  133.390406]  cpu_startup_entry+0x20/0x30
[  133.391827]  start_secondary+0xf1/0x100
[  133.393143]  secondary_startup_64_no_verify+0xc2/0xcb
[  191.482302] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  255.155323] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:

(full kernel+boot log is below)

What strikes me as odd is the scheduling whilst atomic (which we don't normally 
see) and the lack of task info (which is never there).

We have some evidence that the qemus are using say 350% of cpu when this happens
so it doesn't actually appear to be staved of cpu in reality. Unfortunately 
catching
that data is hard.

We only see this issue intermittently, it does seem to coincide with load but
equally, the kernel messages seem odd and ps output suggests the qemu processes
are getting ample cpu time.

This is for x86_64 and using kvm but this isn't kvm on kvm. We have many
different base distros (Centos, Debian, OpenSUSE, Ubuntu), it happens on them 
all.

We're using qemu 6.0.0 although we saw the same issue with 5.2.0. Our
kernel is mainly a 5.10 although we do have a 5.4 kernel and have also tested
a 5.12 and they all seem to do this.

We can have periods of significant load on the autobuilders from build processes
however we have taken steps to mitigate this. The qemu processes are priority 
-5, 
the builds are at 5 and we also ioprio 2,0 for qemu. We don't use cgroups since
not all of our builders have support for v2 and it complicates permission issues
as our builds are not virtualised and don't run as root. I'm not convinced it 
would
really help.

In an effort to try and mitigate issues we moved the images we're booting onto
a tmpfs. We have also ended up "pre-caching" all the mmaped libraries used by 
the binary just in case something was stalling on IO whilst qemu was running. 
To do that we start with qmp, wait for the port, read through the files in 
/proc//map_files.

We also did try disabling kvmclock but that doesn't appear to help.

We also did change the systems from a single cpu to having 4 cpus. This changed
the trace output to be multi cpu but the basic issue seems to remain. We wanted
to use multiple cores anyway so will keep that but it doesn't help this issue.

The systems running these qemu images are 48 core dual socket Xeons E5-2697
with 128GB memory.

We have seen some stalls in arm/mips/ppc qemu system testing without KVM too
although not since the above mitigations were added. I've suspected there are 
multiple causes.

The corresponding build in CI was this:
https://autobuilder.yoctoproject.org/typhoon/#/builders/72/builds/3538
I've detailed more about the background on how we're using this below.

Does anyone have any idea what might be causing this and how we might 
fix or mitigate it?

We're at the point where a component of our CI fails on pretty much every run
and I'm struggling with ideas on how to proceed at this point other than 
reducing 

[PATCH v2] linux-user/mmap: Return EFAULT/EINVAL for invalid addresses

2021-02-16 Thread Richard Purdie
When using qemu-i386 to build qemux86 webkitgtk on musl, it sits in an
infinite loop of mremap calls of ever decreasing/increasing addresses.

I suspect something in the musl memory allocation code loops
indefinitely if it only sees ENOMEM and only exits when it hits other
errors such as EFAULT or EINVAL.

According to the docs, trying to mremap outside the address space
can/should return EFAULT and changing this allows the build to succeed.

A better return value for the other cases of invalid addresses is
EINVAL rather than ENOMEM so adjust the other part of the test to this.

Signed-off-by: Richard Purdie 

Re: [RFC PATCH] linux-user/mmap: Return EFAULT for invalid addresses

2021-02-16 Thread Richard Purdie
On Sat, 2021-02-13 at 18:40 +0100, Laurent Vivier wrote:
> Le 08/01/2021 à 18:46, Richard Purdie a écrit :
> > When using qemu-i386 to run gobject introspection parts of a webkitgtk 
> > build using musl as libc on a 64 bit host, it sits in an infinite loop 
> > of mremap calls of ever decreasing/increasing addresses.
> > 
> > I suspect something in the musl memory allocation code loops indefinitely
> > if it only sees ENOMEM and only exits when it hits EFAULT.
> > 
> > According to the docs, trying to mremap outside the address space
> > can/should return EFAULT and changing this allows the build to succeed.
> > 
> > There was previous discussion of this as it used to work before qemu 2.11
> > and we've carried hacks to work around it since, this appears to be a
> > better fix of the real issue?
> > 
> > Signed-off-by: Richard Purdie  > 
> > Index: qemu-5.2.0/linux-user/mmap.c
> > ===
> > --- qemu-5.2.0.orig/linux-user/mmap.c
> > +++ qemu-5.2.0/linux-user/mmap.c
> > @@ -727,7 +727,7 @@ abi_long target_mremap(abi_ulong old_add
> >   !guest_range_valid(new_addr, new_size)) ||
> >  ((flags & MREMAP_MAYMOVE) == 0 &&
> >   !guest_range_valid(old_addr, new_size))) {
> > -errno = ENOMEM;
> > +errno = EFAULT;
> >  return -1;
> >  }
> >  
> > 
> > 
> > 
> 
> I agree with that, the ENOMEM is returned when there is not enough virtual 
> memory (the
> mmap_find_vma() case).
> 
> According to the manpage, EFAULT is returned when old_addr and old_addr + 
> old_size is an invalid
> address space.
> 
> So:
> 
> if (!guest_range_valid(old_addr, old_size)) {
> errno = EFAULT;
> return -1;
> }
> 
> But in the case of new_size and new_addr, it seems the good value to use is 
> EINVAL.
> 
> So:
> 
>    if (((flags & MREMAP_FIXED) && !guest_range_valid(new_addr, new_size)) ||
>    ((flags & MREMAP_MAYMOVE) == 0 && !guest_range_valid(old_addr, 
> new_size))) {
> errno = EINVAL;
> return -1;
> }
> 
> Did you try that?

Its taken me a short while to reproduce the test environment but I did
so and can confirm that using EINVAL works just as well as EFAULT in
the test case we have. The above would therefore seem to make sense to
me and would fix the case we found.

Cheers,

Richard






Re: [Qemu-devel] [PATCH v2 07/11] chardev: Let IOReadHandler use unsigned type

2021-01-22 Thread Richard Purdie
On Fri, 2021-01-22 at 14:55 +0100, Philippe Mathieu-Daudé wrote:
> Hi Prasad, Richard.
> 
> On 1/22/21 12:52 PM, P J P wrote:
> > +-- On Fri, 22 Jan 2021, Richard Purdie wrote --+
> > > If so can anyone point me at that change?
> > > 
> > > I ask since CVE-2018-18438 is marked as affecting all qemu versions
> > > (https://nvd.nist.gov/vuln/detail/CVE-2018-18438).
> > > 
> > > If it was fixed, the version mask could be updated. If the fix wasn't 
> > > deemed 
> > > worthwhile for some reason that is also fine and I can mark this one as 
> > > such 
> > > in our system. I'm being told we only need one of the patches in this 
> > > series 
> > > which I also don't believe as I suspect we either need the set or none of 
> > > them!
> > > 
> > > Any info would be most welcome.
> > 
> >   -> https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg02239.html
> >   -> https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg02231.html
> > 
> > * Yes, the type change fix had come up during patch reviews above, and this 
> >   series implemented the change.
> > 
> > * Series is required IIUC, didn't realise it's not merged.
> 
> Audit from Marc-André pointed that this is unlikely, we asked the
> reporter for a reproducer and got not news, and eventually closed
> this as NOTABUG (not even WONTFIX):
> https://bugzilla.redhat.com/show_bug.cgi?id=1609015

I guessed there some resolution like this but couldn't find it thanks
for the pointer. It's now clear in the archives and I can handle
appropriately rejecting carrying those patches, thanks!

Cheers,

Richard




Re: [Qemu-devel] [PATCH v2 07/11] chardev: Let IOReadHandler use unsigned type

2021-01-22 Thread Richard Purdie
On Fri, 2018-10-12 at 02:22 +0200, Philippe Mathieu-Daudé wrote:
> The number of bytes can not be negative nor zero.
> 
> Fixed 2 format string:
> - hw/char/spapr_vty.c
> - hw/usb/ccid-card-passthru.c
> 
> Suggested-by: Paolo Bonzini 
> Signed-off-by: Philippe Mathieu-Daudé 
> Acked-by: Alberto Garcia 

Sorry to drag up an old patch series. As far as I can see this series
was never applied. I suspect a better way of solving the issue may have
been found? If so can anyone point me at that change?

I ask since CVE-2018-18438 is marked as affecting all qemu versions
(https://nvd.nist.gov/vuln/detail/CVE-2018-18438).

If it was fixed, the version mask could be updated. If the fix wasn't
deemed worthwhile for some reason that is also fine and I can mark this
one as such in our system. I'm being told we only need one of the
patches in this series which I also don't believe as I suspect we
either need the set or none of them!

Any info would be most welcome.

Cheers,

Richard








Re: [RFC PATCH] linux-user/mmap: Return EFAULT for invalid addresses

2021-01-22 Thread Richard Purdie
On Fri, 2021-01-08 at 17:46 +, Richard Purdie wrote:
> When using qemu-i386 to run gobject introspection parts of a webkitgtk 
> build using musl as libc on a 64 bit host, it sits in an infinite loop 
> of mremap calls of ever decreasing/increasing addresses.
> 
> I suspect something in the musl memory allocation code loops indefinitely
> if it only sees ENOMEM and only exits when it hits EFAULT.
> 
> According to the docs, trying to mremap outside the address space
> can/should return EFAULT and changing this allows the build to succeed.
> 
> There was previous discussion of this as it used to work before qemu 2.11
> and we've carried hacks to work around it since, this appears to be a
> better fix of the real issue?
> 
> Signed-off-by: Richard Purdie  
> Index: qemu-5.2.0/linux-user/mmap.c
> ===
> --- qemu-5.2.0.orig/linux-user/mmap.c
> +++ qemu-5.2.0/linux-user/mmap.c
> @@ -727,7 +727,7 @@ abi_long target_mremap(abi_ulong old_add
>   !guest_range_valid(new_addr, new_size)) ||
>  ((flags & MREMAP_MAYMOVE) == 0 &&
>   !guest_range_valid(old_addr, new_size))) {
> -errno = ENOMEM;
> +errno = EFAULT;
>  return -1;
>  }

Any comments on this? I believe its a valid issue that needs fixing and
multiple distros appear to be carrying fixes in this area related to
this.

Cheers,

Richard




Re: [PATCH] linux-user/mmap: Avoid asserts for out of range mremap calls

2021-01-22 Thread Richard Purdie
On Fri, 2021-01-08 at 17:42 +, Richard Purdie wrote:
> If mremap() is called without the MREMAP_MAYMOVE flag with a start address
> just before the end of memory (reserved_va) where new_size would exceed 
> it (and GUEST_ADDR_MAX), the assert(end - 1 <= GUEST_ADDR_MAX) in 
> page_set_flags() would trigger.
> 
> Add an extra guard to the guest_range_valid() checks to prevent this and
> avoid asserting binaries when reserved_va is set.
> 
> This meant a bug I was seeing locally now gives the same behaviour 
> regardless of whether reserved_va is set or not.
> 
> Signed-off-by: Richard Purdie  
> Index: qemu-5.2.0/linux-user/mmap.c
> ===
> --- qemu-5.2.0.orig/linux-user/mmap.c
> +++ qemu-5.2.0/linux-user/mmap.c
> @@ -727,7 +727,9 @@ abi_long target_mremap(abi_ulong old_add
>  
> 
>  if (!guest_range_valid(old_addr, old_size) ||
>  ((flags & MREMAP_FIXED) &&
> - !guest_range_valid(new_addr, new_size))) {
> + !guest_range_valid(new_addr, new_size)) ||
> +((flags & MREMAP_MAYMOVE) == 0 &&
> + !guest_range_valid(old_addr, new_size))) {
>  errno = ENOMEM;
>  return -1;
>  }
> 
> 

Any comments on this? I believe its a valid issue that needs fixing and
multiple distros appear to be carrying fixes in this area related to
this.

Cheers,

Richard




[RFC PATCH] linux-user/mmap: Return EFAULT for invalid addresses

2021-01-08 Thread Richard Purdie
When using qemu-i386 to run gobject introspection parts of a webkitgtk 
build using musl as libc on a 64 bit host, it sits in an infinite loop 
of mremap calls of ever decreasing/increasing addresses.

I suspect something in the musl memory allocation code loops indefinitely
if it only sees ENOMEM and only exits when it hits EFAULT.

According to the docs, trying to mremap outside the address space
can/should return EFAULT and changing this allows the build to succeed.

There was previous discussion of this as it used to work before qemu 2.11
and we've carried hacks to work around it since, this appears to be a
better fix of the real issue?

Signed-off-by: Richard Purdie 

[PATCH] linux-user/mmap: Avoid asserts for out of range mremap calls

2021-01-08 Thread Richard Purdie
If mremap() is called without the MREMAP_MAYMOVE flag with a start address
just before the end of memory (reserved_va) where new_size would exceed 
it (and GUEST_ADDR_MAX), the assert(end - 1 <= GUEST_ADDR_MAX) in 
page_set_flags() would trigger.

Add an extra guard to the guest_range_valid() checks to prevent this and
avoid asserting binaries when reserved_va is set.

This meant a bug I was seeing locally now gives the same behaviour 
regardless of whether reserved_va is set or not.

Signed-off-by: Richard Purdie 

Re: [RFC PATCH 0/3] target/mips: Make the number of TLB entries a CPU property

2020-10-14 Thread Richard Purdie
On Tue, 2020-10-13 at 19:22 -0700, Richard Henderson wrote:
> On 10/13/20 4:11 PM, Richard Henderson wrote:
> > On 10/13/20 6:25 AM, Philippe Mathieu-Daudé wrote:
> > > Yocto developers have expressed interest in running MIPS32
> > > CPU with custom number of TLB:
> > > https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg03428.html
> > > 
> > > Help them by making the number of TLB entries a CPU property,
> > > keeping our set of CPU definitions in sync with real hardware.
> > 
> > You mean keeping the 34kf model within qemu in sync, rather than
> > creating a
> > nonsense model that doesn't exist.
> > 
> > Question: is this cpu parameter useful for anything else?
> > 
> > Because the ideal solution for a CI loop is to use one of the mips
> > cpu models
> > that has the hw page table walker (CP0C3_PW).  Having qemu being
> > able to refill
> > the tlb itself is massively faster.
> > 
> > We do not currently implement a mips cpu that has the PW.  When I
> > downloaded
> 
> Bah, "mips32 cpu".
> 
> We do have the P5600 that does has it, though the code is wrapped up
> in TARGET_MIPS64.  I'll also note that the code could be better
> placed [*]
> 
> > (1) anyone know if the PW incompatible with mips32?
> 
> I've since found a copy of the mips32-pra in the wayback machine and
> have answered this as "no" -- PW is documented for mips32.
> 
> > (2) if not, was there any mips32 hw built with PW
> > that we could model?
> 
> But I still don't know about this.
> 
> A further question for the Yocto folks: could you make use of a 64-
> bit kernel in testing a 32-bit userspace?

We run testing of 64 bit kernel with 64 bit userspace and 32 bit kernel
with 32 bit userspace, we've tested that for years. I know some of our
users do use 64 bit kernels with 32 bit userspace and we do limited
testing of that for multilib support.

I think we did try testing an R2 setup but found little performance
change and I think it may have been unreliable so we didn't make the
switch. We did move to 34kf relatively recently as that did perform
marginally better and matched qemu's recommendations.

We've also run into a lot of problems with 32 bit mips in general if we
go over 256MB memory since that seems to trigger highmem and hangs
regularly for us. We're working on infrastructure to save out those
hung VMs to help us debug such issues but don't have that yet. Its not
regular enough and we don't have the expertise to debug it properly as
yet unfortunately.

There is a question of how valid a 32 bit kernel is these days,
particularly given the memory issues we seem to run into there with
larger images.

Cheers,

Richard





Re: [RFC PATCH 0/3] target/mips: Make the number of TLB entries a CPU property

2020-10-14 Thread Richard Purdie
On Wed, 2020-10-14 at 01:36 +, Victor Kamensky (kamensky) wrote:
> Thank you very much for looking at this. I gave a spin to
> your 3 patch series in original setup, and as expected with
> '-cpu 34Kf,tlb-entries=64' option it works great.
> 
> If nobody objects, and your patches could be merged, we
> would greatly appreciate it.

Speaking as one of the Yocto Project maintainers, this is really
helpful for us, thanks!

qemumips is one of our slowest platforms for automated testing so this
performance improvement helps a lot.

Cheers,

Richard




Re: [Qemu-devel] [Qemu-ppc] [PATCH v2] target/ppc: Fix system lockups caused by interrupt_request state corruption

2017-12-04 Thread Richard Purdie
On Mon, 2017-12-04 at 12:44 +1100, David Gibson wrote:
> On Mon, Dec 04, 2017 at 12:00:40PM +1100, David Gibson wrote:
> > 
> > On Fri, Dec 01, 2017 at 03:49:07PM +, Richard Purdie wrote:
> > > 
> > > Occasionally in Linux guests on x86_64 we're seeing logs like:
> > > 
> > > ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending
> > > 0100req 0004
> > > 
> > > when they should read:
> > > 
> > > ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending
> > > 0100req 0002
> > > 
> > > The "0004" is CPU_INTERRUPT_EXITTB yet the code calls
> > > cpu_interrupt(cs, CPU_INTERRUPT_HARD) ("0002") in this
> > > function
> > > just before the log message. Something is causing the HARD bit
> > > setting
> > > to get lost.
> > > 
> > > The knock on effect of losing that bit is the decrementer timer
> > > interrupts
> > > don't get delivered which causes the guest to sit idle in its
> > > idle handler
> > > and 'hang'.
> > > 
> > > The issue occurs due to races from code which sets
> > > CPU_INTERRUPT_EXITTB.
> > > 
> > > Rather than poking directly into cs->interrupt_request, that code
> > > needs to:
> > > 
> > > a) hold BQL
> > > b) use the cpu_interrupt() helper
> > > 
> > > This patch fixes the call sites to do this, fixing the hang.
> > > 
> > > Signed-off-by: Richard Purdie <richard.pur...@linuxfoundation.org
> > > >
> > I strongly suspect there's a better way to do this long term - a
> > lot
> > of that old ppc TCG code is really crufty.  But as best I can tell,
> > this is certainly a fix over what we had.  So, applied to
> > ppc-for-2.11.
> I take that back.  Running make check with this patch results in:
> 
>   GTESTER check-qtest-ppc64
> **
> ERROR:/home/dwg/src/qemu/cpus.c:1582:qemu_mutex_lock_iothread:
> assertion failed: (!qemu_mutex_iothread_locked())
> Broken pipe
> qemu-system-ppc64: RP: Received invalid message 0x length 0x
> GTester: last random seed: R02S895b0f4813776bf68c147bf987e73f7b
> make: *** [/home/dwg/src/qemu/tests/Makefile.include:852: check-
> qtest-ppc64] Error 1
> 
> So, I've reverted it.

Sorry about that. I tried to stress the code paths and no issues showed
up in our testing but hadn't realised those tests existed.

Given there do seem to be mixed locked and unlocked code paths I've
taken a different approach and sent a v3 which passes those tests.

I do agree with you that there is probably a better way but that would
need someone with a better understanding of the bigger picture than I
have. This patch does stop our image tests locking up so does seem to
fix a valid/real problem which people can run into.

Cheers,

Richard




[Qemu-devel] [PATCH v3] target/ppc: Fix system lockups caused by interrupt_request state corruption

2017-12-04 Thread Richard Purdie
Occasionally in Linux guests on x86_64 we're seeing logs like:

ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 0100req 0004

when they should read:

ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 0100req 0002

The "0004" is CPU_INTERRUPT_EXITTB yet the code calls
cpu_interrupt(cs, CPU_INTERRUPT_HARD) ("0002") in this function
just before the log message. Something is causing the HARD bit setting
to get lost.

The knock on effect of losing that bit is the decrementer timer interrupts
don't get delivered which causes the guest to sit idle in its idle handler
and 'hang'.

The issue occurs due to races from code which sets CPU_INTERRUPT_EXITTB.

Rather than poking directly into cs->interrupt_request, that code needs to:

a) hold BQL
b) use the cpu_interrupt() helper

This patch fixes the call sites to do this, fixing the hang. The calls
are made from a variety of contexts so a helper function is added to handle
the necessary locking. This can likely be improved and optimised in the future
but it ensures the code is correct and doesn't lockup as it stands today.

Signed-off-by: Richard Purdie <richard.pur...@linuxfoundation.org>
---
 target/ppc/excp_helper.c |  7 +++
 target/ppc/helper_regs.h | 17 +++--
 2 files changed, 18 insertions(+), 6 deletions(-)

v3: Fix make check failures
v2: Fixes a compile issue with master and ensures BQL is held in one case
where it potentially wasn't.


diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index e6009e70e5..37d2410726 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -207,7 +207,7 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
excp_model, int excp)
 "Entering checkstop state\n");
 }
 cs->halted = 1;
-cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
+cpu_interrupt_exittb(cs);
 }
 if (env->msr_mask & MSR_HVB) {
 /* ISA specifies HV, but can be delivered to guest with HV clear
@@ -940,7 +940,7 @@ void helper_store_msr(CPUPPCState *env, target_ulong val)
 
 if (excp != 0) {
 CPUState *cs = CPU(ppc_env_get_cpu(env));
-cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
+cpu_interrupt_exittb(cs);
 raise_exception(env, excp);
 }
 }
@@ -995,8 +995,7 @@ static inline void do_rfi(CPUPPCState *env, target_ulong 
nip, target_ulong msr)
 /* No need to raise an exception here,
  * as rfi is always the last insn of a TB
  */
-cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
-
+cpu_interrupt_exittb(cs);
 /* Reset the reservation */
 env->reserve_addr = -1;
 
diff --git a/target/ppc/helper_regs.h b/target/ppc/helper_regs.h
index 2627a70176..84fd30c2db 100644
--- a/target/ppc/helper_regs.h
+++ b/target/ppc/helper_regs.h
@@ -20,6 +20,8 @@
 #ifndef HELPER_REGS_H
 #define HELPER_REGS_H
 
+#include "qemu/main-loop.h"
+
 /* Swap temporary saved registers with GPRs */
 static inline void hreg_swap_gpr_tgpr(CPUPPCState *env)
 {
@@ -96,6 +98,17 @@ static inline void hreg_compute_hflags(CPUPPCState *env)
 env->hflags |= env->hflags_nmsr;
 }
 
+static inline void cpu_interrupt_exittb(CPUState *cs)
+{
+if (!qemu_mutex_iothread_locked()) {
+qemu_mutex_lock_iothread();
+cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
+qemu_mutex_unlock_iothread();
+} else {
+cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
+}
+}
+
 static inline int hreg_store_msr(CPUPPCState *env, target_ulong value,
  int alter_hv)
 {
@@ -114,11 +127,11 @@ static inline int hreg_store_msr(CPUPPCState *env, 
target_ulong value,
 }
 if (((value >> MSR_IR) & 1) != msr_ir ||
 ((value >> MSR_DR) & 1) != msr_dr) {
-cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
+cpu_interrupt_exittb(cs);
 }
 if ((env->mmu_model & POWERPC_MMU_BOOKE) &&
 ((value >> MSR_GS) & 1) != msr_gs) {
-cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
+cpu_interrupt_exittb(cs);
 }
 if (unlikely((env->flags & POWERPC_FLAG_TGPR) &&
  ((value ^ env->msr) & (1 << MSR_TGPR {
-- 
2.14.1




[Qemu-devel] [PATCH v2] target/ppc: Fix system lockups caused by interrupt_request state corruption

2017-12-01 Thread Richard Purdie
Occasionally in Linux guests on x86_64 we're seeing logs like:

ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 0100req 0004

when they should read:

ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 0100req 0002

The "0004" is CPU_INTERRUPT_EXITTB yet the code calls
cpu_interrupt(cs, CPU_INTERRUPT_HARD) ("0002") in this function
just before the log message. Something is causing the HARD bit setting
to get lost.

The knock on effect of losing that bit is the decrementer timer interrupts
don't get delivered which causes the guest to sit idle in its idle handler
and 'hang'.

The issue occurs due to races from code which sets CPU_INTERRUPT_EXITTB.

Rather than poking directly into cs->interrupt_request, that code needs to:

a) hold BQL
b) use the cpu_interrupt() helper

This patch fixes the call sites to do this, fixing the hang.

Signed-off-by: Richard Purdie <richard.pur...@linuxfoundation.org>
---
 target/ppc/excp_helper.c | 16 +---
 target/ppc/helper_regs.h | 10 --
 2 files changed, 21 insertions(+), 5 deletions(-)

v2: Fixes a compile issue with master and ensures BQL is held in one case
where it potentially wasn't.

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index e6009e7..8040277 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -207,7 +207,13 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
excp_model, int excp)
 "Entering checkstop state\n");
 }
 cs->halted = 1;
-cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
+if (!qemu_mutex_iothread_locked()) {
+qemu_mutex_lock_iothread();
+cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
+qemu_mutex_unlock_iothread();
+} else {
+cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
+}
 }
 if (env->msr_mask & MSR_HVB) {
 /* ISA specifies HV, but can be delivered to guest with HV clear
@@ -940,7 +946,9 @@ void helper_store_msr(CPUPPCState *env, target_ulong val)
 
 if (excp != 0) {
 CPUState *cs = CPU(ppc_env_get_cpu(env));
-cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
+qemu_mutex_lock_iothread();
+cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
+qemu_mutex_unlock_iothread();
 raise_exception(env, excp);
 }
 }
@@ -995,7 +1003,9 @@ static inline void do_rfi(CPUPPCState *env, target_ulong 
nip, target_ulong msr)
 /* No need to raise an exception here,
  * as rfi is always the last insn of a TB
  */
-cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
+qemu_mutex_lock_iothread();
+cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
+qemu_mutex_unlock_iothread();
 
 /* Reset the reservation */
 env->reserve_addr = -1;
diff --git a/target/ppc/helper_regs.h b/target/ppc/helper_regs.h
index 2627a70..0beaad5 100644
--- a/target/ppc/helper_regs.h
+++ b/target/ppc/helper_regs.h
@@ -20,6 +20,8 @@
 #ifndef HELPER_REGS_H
 #define HELPER_REGS_H
 
+#include "qemu/main-loop.h"
+
 /* Swap temporary saved registers with GPRs */
 static inline void hreg_swap_gpr_tgpr(CPUPPCState *env)
 {
@@ -114,11 +116,15 @@ static inline int hreg_store_msr(CPUPPCState *env, 
target_ulong value,
 }
 if (((value >> MSR_IR) & 1) != msr_ir ||
 ((value >> MSR_DR) & 1) != msr_dr) {
-cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
+qemu_mutex_lock_iothread();
+cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
+qemu_mutex_unlock_iothread();
 }
 if ((env->mmu_model & POWERPC_MMU_BOOKE) &&
 ((value >> MSR_GS) & 1) != msr_gs) {
-cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
+qemu_mutex_lock_iothread();
+cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
+qemu_mutex_unlock_iothread();
 }
 if (unlikely((env->flags & POWERPC_FLAG_TGPR) &&
  ((value ^ env->msr) & (1 << MSR_TGPR {
-- 
2.7.4




Re: [Qemu-devel] qemu-system-ppc hangs

2017-11-21 Thread Richard Purdie
On Tue, 2017-11-21 at 09:55 +, Alex Bennée wrote:
> Richard Purdie <richard.pur...@linuxfoundation.org> writes:
> > At this point I therefore wanted to seek advice on what the real
> > issue
> > is here and how to fix it!
> Code should be using cpu_interrupt() to change things but we have
> plenty
> of examples in the code of messing with cpu->interrupt_request
> directly
> which is often why the assert() in cpu_interrupt() doesn't get a
> chance
> to fire.
> 
> The two most prolific direct users seems to be i386 and ppc.
> 
> The i386 cases are all most likely OK as it tends to be in KVM
> emulation
> code where by definition the BQL is already held by the time you get
> there. For PPC it will depend on how you got there. The
> multi-thread-tcg.txt document describes the approach:
> 
>   Emulated hardware state
>   ---
> 
>   Currently thanks to KVM work any access to IO memory is
> automatically
>   protected by the global iothread mutex, also known as the BQL (Big
>   Qemu Lock). Any IO region that doesn't use global mutex is expected
> to
>   do its own locking.
> 
>   However IO memory isn't the only way emulated hardware state can be
>   modified. Some architectures have model specific registers that
>   trigger hardware emulation features. Generally any translation
> helper
>   that needs to update more than a single vCPUs of state should take
> the
>   BQL.
> 
>   As the BQL, or global iothread mutex is shared across the system we
>   push the use of the lock as far down into the TCG code as possible
> to
>   minimise contention.
> 
>   (Current solution)
> 
>   MMIO access automatically serialises hardware emulation by way of
> the
>   BQL. Currently ARM targets serialise all ARM_CP_IO register
> accesses
>   and also defer the reset/startup of vCPUs to the vCPU context by
> way
>   of async_run_on_cpu().
> 
>   Updates to interrupt state are also protected by the BQL as they
> can
>   often be cross vCPU.
> 
> So basically it comes down to the call-path into your final
> cpu_interrupt() call which is why I guess your doing the:
> 
>   if (!qemu_mutex_iothread_locked()) {
>  qemu_mutex_lock_iothread();
>  cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
>  qemu_mutex_unlock_iothread();
>   }
> 
> dance. I suspect the helper functions are called both from device
> emulation (where BQL is held) and from vCPU helpers (where it is
> currently not).
> 
> So I suggest the fix is:
> 
>  1. Convert sites doing their own manipulation of
>  cpu->interrupt_request() to call the helper instead
>  2. If helper directly called from TCG code:
>   - take BQL before calling cpu_interrupt(), release after
> Else if helper shared between MMIO/TCG paths
>   - take BQL from TCG path, call helper, release after
> 
> It might be there is some sensible re-factoring that could be done to
> make things clearer but I'll leave that to the PPC experts to weigh
> in
> on.
> 
> Hope that helps!

It does indeed, thanks. I've sent out a proposed patch which does the
above so people can review it and see if its the right thing to do.
Certainly its working fine locally.

Cheers,

Richard



[Qemu-devel] [PATCH] target/ppc: Fix system lockups caused by interrupt_request state corruption

2017-11-21 Thread Richard Purdie
Occasionally in Linux guests on x86_64 we're seeing logs like:

ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 0100req 0004

when they should read:

ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 0100req 0002

The "0004" is CPU_INTERRUPT_EXITTB yet the code calls
cpu_interrupt(cs, CPU_INTERRUPT_HARD) ("0002") in this function
just before the log message. Something is causing the HARD bit setting
to get lost.

The knock on effect of losing that bit is the decrementer timer interrupts
don't get delivered which causes the guest to sit idle in its idle handler
and 'hang'.

The issue occurs due to races from code which sets CPU_INTERRUPT_EXITTB.

Rather than poking directly into cs->interrupt_request, that code needs to:

a) hold BQL
b) use the cpu_interrupt() helper

This patch fixes the call sites to do this, fixing the hang.

Signed-off-by: Richard Purdie <richard.pur...@linuxfoundation.org>
---
 target/ppc/excp_helper.c | 12 +---
 target/ppc/helper_regs.h |  8 ++--
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index e6009e7..f175c21 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -207,7 +207,9 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
excp_model, int excp)
 "Entering checkstop state\n");
 }
 cs->halted = 1;
-cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
+qemu_mutex_lock_iothread();
+cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
+qemu_mutex_unlock_iothread();
 }
 if (env->msr_mask & MSR_HVB) {
 /* ISA specifies HV, but can be delivered to guest with HV clear
@@ -940,7 +942,9 @@ void helper_store_msr(CPUPPCState *env, target_ulong val)
 
 if (excp != 0) {
 CPUState *cs = CPU(ppc_env_get_cpu(env));
-cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
+qemu_mutex_lock_iothread();
+cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
+qemu_mutex_unlock_iothread();
 raise_exception(env, excp);
 }
 }
@@ -995,7 +999,9 @@ static inline void do_rfi(CPUPPCState *env, target_ulong 
nip, target_ulong msr)
 /* No need to raise an exception here,
  * as rfi is always the last insn of a TB
  */
-cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
+qemu_mutex_lock_iothread();
+cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
+qemu_mutex_unlock_iothread();
 
 /* Reset the reservation */
 env->reserve_addr = -1;
diff --git a/target/ppc/helper_regs.h b/target/ppc/helper_regs.h
index 2627a70..13dd0b8 100644
--- a/target/ppc/helper_regs.h
+++ b/target/ppc/helper_regs.h
@@ -114,11 +114,15 @@ static inline int hreg_store_msr(CPUPPCState *env, 
target_ulong value,
 }
 if (((value >> MSR_IR) & 1) != msr_ir ||
 ((value >> MSR_DR) & 1) != msr_dr) {
-cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
+qemu_mutex_lock_iothread();
+cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
+qemu_mutex_unlock_iothread();
 }
 if ((env->mmu_model & POWERPC_MMU_BOOKE) &&
 ((value >> MSR_GS) & 1) != msr_gs) {
-cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
+qemu_mutex_lock_iothread();
+cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
+qemu_mutex_unlock_iothread();
 }
 if (unlikely((env->flags & POWERPC_FLAG_TGPR) &&
  ((value ^ env->msr) & (1 << MSR_TGPR {
-- 
2.7.4




Re: [Qemu-devel] qemu-system-ppc hangs

2017-11-21 Thread Richard Purdie
On Tue, 2017-11-21 at 07:50 +, Mark Cave-Ayland wrote:
> On 21/11/17 00:00, Richard Purdie wrote:
> > I work on the Yocto Project and we use qemu to test boot our Linux
> > images and run tests against them. We've been noticing some
> > instability
> > for ppc where the images sometimes hang, usually around udevd bring
> > up
> > time so just after booting into userspace.
> > 
> > To cut a long story short, I've tracked down what I think is the
> > problem. I believe the decrementer timer stops receiving interrupts
> > so
> > tasks in our images hang indefinitely as the timer stopped. 
> > 
> > It can be summed up with this line of debug:
> > 
> > ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 0100req
> > 0004
> > 
> > It should normally read:
> > 
> > ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 0100req
> > 0002
> > 
> > The question is why CPU_INTERRUPT_EXITTB ends up being set when the
> > lines above this log message clearly sets CPU_INTERRUPT_HARD (via 
> > cpu_interrupt() ).
> > 
> > I note in cpu.h:
> > 
> > /* updates protected by BQL */
> > uint32_t interrupt_request;
> > 
> > (for struct CPUState)
> > 
> > The ppc code does "cs->interrupt_request |= CPU_INTERRUPT_EXITTB"
> > in 5
> > places, 3 in excp_helper.c and 2 in helper_regs.h. In all cases,  
> > g_assert(qemu_mutex_iothread_locked()); fails. If I do something
> > like:
> > 
> > if (!qemu_mutex_iothread_locked()) {
> > qemu_mutex_lock_iothread();
> > cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
> > qemu_mutex_unlock_iothread();
> > } else {
> > cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
> > }
> > 
> > in these call sites then I can no longer lock qemu up with my test
> > case.
> > 
> > I suspect the _HARD setting gets overwritten which stops the 
> > decrementer interrupts being delivered.
> > 
> > I don't know if taking this lock in these situations is going to be
> > bad
> > for performance and whether such a patch would be right/wrong.
> > 
> > At this point I therefore wanted to seek advice on what the real
> > issue
> > is here and how to fix it!
>
> Thanks for the report - given that a lot of work has been done on
> MTTCG and iothread over the past few releases, it wouldn't be a
> complete surprise if something had crept in here.
> 
> Firstly let's start off with some basics: what is your host
> architecture, QEMU version and full command line being used to launch
> QEMU?

I'm running this on x86_64, I'm using qemu 2.10.1 and the commandline
being used for qemu is:

qemu-system-ppc -device virtio-net-pci,netdev=net0,mac=52:54:00:12:34:02 
   -netdev tap,id=net0,ifname=tap0,script=no,downscript=no 
   -drive file=./core-image-sato-qemuppc.rootfs.ext4,if=virtio,format=raw 
   -show-cursor -device virtio-rng-pci  -nographic -pidfile /tmp/zzqemu.1.pid 
   -d unimp,guest_errors,int -D /tmp/qemu.1 -monitor pty -machine mac99 
   -cpu G4 -m 256 -snapshot -serial mon:stdio -serial null 
   -kernel /tmp/repro/vmlinux-qemuppc.bin -append 'root=/dev/vda rw highres=off 
 
console=ttyS0 mem=256M ip=192.168.7.2::192.168.7.1:255.255.255.0 
console=tty 
console=ttyS0  udev.log-priority=debug powersave=off'

> Would it also be possible for you to make your test image available
> for other people to see if they can recreate the same issue?

I've shared the image, kernel and my "reproducer" script: 

http://www.rpsys.net/wp/rp/qemuppc-hang-reproducer.tgz

We needed to find a way to reproduce this at will and it doesn't seem
to happen often. The scripts in there are partly extracted from our
test setup and partly ugly brute forcing. To run them you'd do
something like:

cc tunctl.c -o tunctl
sudo ./runqemu-gen-tapdevs 1000 1000 50
(This sets up tap0-tap49 accessible by user/group 1000/1000, only need
to do this once - its how we enable easy networking without needing
sudo on our test infrastructure)

vi core-image-sato-qemuppc.qemuboot.conf
[set the last three lines to point at where qemu-system-ppc lives]
vi ./runqemu-parallel.py
[set mydir to wherever you extracted it to]
python3 ./runqemu-parallel.py

This will launch 50 copies of qemu, dumping logging and output into
/tmp/qemu.X and /tmp/*runqemu* files and than monitor the logs to see
which if any "stall". Its normal for the image to stall for a few
seconds towards the end of boot but if any are printing stalled
messages for a minute, they've properly hung. You'll see a:

ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 0100req
0004

in the logs of a hung qemu. The image output would usually stop with a
ep_poll4. The kernel I've provided there is a very verbose debug kernel
which is hang to tell when its hung, if a bit slower to boot.

I didn't promise it was neat, sorry :)

Cheers,

Richard




[Qemu-devel] qemu-system-ppc hangs

2017-11-20 Thread Richard Purdie
Hi,

I work on the Yocto Project and we use qemu to test boot our Linux
images and run tests against them. We've been noticing some instability
for ppc where the images sometimes hang, usually around udevd bring up
time so just after booting into userspace.

To cut a long story short, I've tracked down what I think is the
problem. I believe the decrementer timer stops receiving interrupts so
tasks in our images hang indefinitely as the timer stopped. 

It can be summed up with this line of debug:

ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 0100req 0004

It should normally read:

ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 0100req 0002

The question is why CPU_INTERRUPT_EXITTB ends up being set when the
lines above this log message clearly sets CPU_INTERRUPT_HARD (via 
cpu_interrupt() ).

I note in cpu.h:

/* updates protected by BQL */
uint32_t interrupt_request;

(for struct CPUState)

The ppc code does "cs->interrupt_request |= CPU_INTERRUPT_EXITTB" in 5
places, 3 in excp_helper.c and 2 in helper_regs.h. In all cases,  
g_assert(qemu_mutex_iothread_locked()); fails. If I do something like:

if (!qemu_mutex_iothread_locked()) {
qemu_mutex_lock_iothread();
cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
qemu_mutex_unlock_iothread();
} else {
cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
}

in these call sites then I can no longer lock qemu up with my test
case.

I suspect the _HARD setting gets overwritten which stops the 
decrementer interrupts being delivered.

I don't know if taking this lock in these situations is going to be bad
for performance and whether such a patch would be right/wrong.

At this point I therefore wanted to seek advice on what the real issue
is here and how to fix it!

Cheers,

Richard






Re: [Qemu-devel] [RFC PATCH v1 0/8] QOM prop overloading + ARM MPCore CPUs

2015-09-18 Thread Richard Purdie
On Fri, 2015-09-18 at 11:14 -0700, Peter Crosthwaite wrote:
> On Fri, Sep 18, 2015 at 10:23 AM, Richard Purdie
> <richard.pur...@linuxfoundation.org> wrote:
> > On Fri, 2015-09-18 at 09:46 -0700, Peter Crosthwaite wrote:
> >> >> My biggest fear is testing of the changes for the affected boards.
> >> >> Peter, do you much coverage of these boards in your regressions? Do you
> >> >> have automated tests in a git repo somewhere?
> >> >
> >> > The answers to these questions are "nowhere near enough" and
> >> > "unfortunately not"...
> >> >
> >>
> >> How hard would it be to do something Yocto powered? AFAIK Yocto only
> >> supports the one ARM board (Vexpress), three (+ZynqMP, +Zynq) with the
> >> Meta-Xilinx layer and there may be more with other layers (anything in
> >> meta-linaro?). Can we bitbake something that builds out a large number
> >> of ARM machines and tests them all on QEMU?
> >
> > Running our standard ARM board tests is a case of:
> >
> > git clone http://git.yoctoproject.org/git/poky
> > cd poky
> > source oe-init-build-env
> > echo 'INHERIT += "testimage"' >> ./conf/local.conf
> > MACHINE=qemuarm bitbake core-image-sato
> > MACHINE=qemuarm bitbake core-image-sato -c testimage
> >
> 
> So qemuarm is implicitly vexpress, I guess we would want to add more,
> such as qemuarm-zynq, qemuarm-zaurus, qemuarm-virt etc. Can a single
> bitbake core-image-foo build out multiple MACHINEs or does it not work
> like that?

You'd usually just MACHINE=X bitbake ; MACHINE=Y bitbake .
If the configuration shares a common set of compiler optimisation flags,
it will reuse the image binaries.

> > You could replace core-image-sato -> core-image-minimal for a smaller
> > image and fewer tests or try core-image-sato-sdk or core-image-lsb-sdk
> > for more.
> >
> > The Quick Start guide is at
> > http://www.yoctoproject.org/docs/1.8/yocto-project-qs/yocto-project-qs.html 
> > and has various things like precanned lists of prerequisites for the 
> > package manager.
> >
> > Not sure which other boards you could try booting but I know the Zaurus
> > machines did work a long time ago as we submitted the qemu code. They
> > are now in their own layer and I've not tried them in a long time.
> 
> Do these multiple vendor layers conflict with each other and is
> merging all the different ARM machines to poky mainline feasible?

The layer model is intentional like a plugin architecture. OE was once a
monolithic repository, it grew too large and painful to work. We
therefore now have layers which have separate maintainership. The
quality does vary a bit but it did with the monolithic repo too.

In theory you can just plug the layers you want into the core and use
them, they shouldn't conflict.

> Something else that is on topic, is we should consider changing
> (subject to backwards compat) the default qemuarm machine to virt, as
> this machine is well maintained and intended for use as a pure virtual
> machine (which is intent of Yocto qemu specific target IIUC).

That would need some wider discussion with the OE community and our
kernel maintainers since I know we build a specific qemuarm kernel but
in principle it could be done.

There are also qemux86, qemux86-64, qemuppc, qemumips, qemuarm64 and
qemumips64 fwiw. I'm not sure we make the best use of qemu in these so
I'd be interested in any opinions on what we're doing there... :)

FWIW we did leave qemuarm as an armv5 cpu since we tend to test a lot of
of armv7 on real hardware.

Cheers,

Richard





Re: [Qemu-devel] [RFC PATCH v1 0/8] QOM prop overloading + ARM MPCore CPUs

2015-09-18 Thread Richard Purdie
On Fri, 2015-09-18 at 09:46 -0700, Peter Crosthwaite wrote:
> >> My biggest fear is testing of the changes for the affected boards.
> >> Peter, do you much coverage of these boards in your regressions? Do you
> >> have automated tests in a git repo somewhere?
> >
> > The answers to these questions are "nowhere near enough" and
> > "unfortunately not"...
> >
> 
> How hard would it be to do something Yocto powered? AFAIK Yocto only
> supports the one ARM board (Vexpress), three (+ZynqMP, +Zynq) with the
> Meta-Xilinx layer and there may be more with other layers (anything in
> meta-linaro?). Can we bitbake something that builds out a large number
> of ARM machines and tests them all on QEMU?

Running our standard ARM board tests is a case of:

git clone http://git.yoctoproject.org/git/poky
cd poky
source oe-init-build-env
echo 'INHERIT += "testimage"' >> ./conf/local.conf
MACHINE=qemuarm bitbake core-image-sato
MACHINE=qemuarm bitbake core-image-sato -c testimage

You could replace core-image-sato -> core-image-minimal for a smaller
image and fewer tests or try core-image-sato-sdk or core-image-lsb-sdk
for more.

The Quick Start guide is at
http://www.yoctoproject.org/docs/1.8/yocto-project-qs/yocto-project-qs.html and 
has various things like precanned lists of prerequisites for the package 
manager.

Not sure which other boards you could try booting but I know the Zaurus
machines did work a long time ago as we submitted the qemu code. They
are now in their own layer and I've not tried them in a long time.

The above will build its own qemu-native as there are some patches we
rely on (like the network fixes). You can point the qemu recipe at
different source easily enough.

Cheers,

Richard




Re: [Qemu-devel] [RFT PATCH v1 0/3] net: smc91c111 can_receive fixes

2015-09-15 Thread Richard Purdie
On Tue, 2015-09-15 at 11:02 -0700, Peter Crosthwaite wrote:
> On Mon, Sep 14, 2015 at 2:09 PM, Richard Purdie
> <richard.pur...@linuxfoundation.org> wrote:
> > Hi Peter,
> >
> > On Thu, 2015-09-10 at 21:23 -0700, Peter Crosthwaite wrote:
> >> This should hopefully fix your bug, while addressing the extra concern
> >> I raised.
> >>
> >> There was also inconsistent behaviour with corking packets through a
> >> soft reset which I notice and fixed.
> >>
> >> Please let me know if this works for you.
> >
> > I've run it through a few builds/tests in place of my own patches and so
> > far it seems to be working, thanks!
> >
> 
> Can we take that as a formal Tested-by? :)

Tested-by: Richard Purdie <richard.pur...@linuxfoundation.org>

if that helps :)

Cheers,

Richard




Re: [Qemu-devel] [RFT PATCH v1 0/3] net: smc91c111 can_receive fixes

2015-09-14 Thread Richard Purdie
Hi Peter,

On Thu, 2015-09-10 at 21:23 -0700, Peter Crosthwaite wrote:
> This should hopefully fix your bug, while addressing the extra concern
> I raised.
> 
> There was also inconsistent behaviour with corking packets through a
> soft reset which I notice and fixed.
> 
> Please let me know if this works for you.

I've run it through a few builds/tests in place of my own patches and so
far it seems to be working, thanks!

Cheers,

Richard




Re: [Qemu-devel] Segfault using qemu-system-arm in smc91c111

2015-09-07 Thread Richard Purdie
On Sun, 2015-09-06 at 17:48 -0700, Peter Crosthwaite wrote:
> On Sun, Sep 6, 2015 at 4:26 PM, Richard Purdie
> <richard.pur...@linuxfoundation.org> wrote:
> > On Sun, 2015-09-06 at 11:37 -0700, Peter Crosthwaite wrote:
> > I tested an assert in _recieve() which confirms it can be called when
> > can_receive() says it isn't ready.
> >
> 
> A backtrace of this would be handy.

This is the trace with my assert against smc91c111_can_receive(nc) being
false when we're in receive(), before we allocate_packet:

#0  0x7f355f276267 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:55
#1  0x7f355f277eca in __GI_abort () at abort.c:89
#2  0x7f355f26f03d in __assert_fail_base (fmt=0x7f355f3d1028 "%s%s%s:%u: 
%s%sAssertion `%s' failed.\n%n", 
assertion=assertion@entry=0x7f3562158ed7 "smc91c111_can_receive(nc)", 
file=file@entry=0x7f3562158dc8 
"/media/build1/poky/build/tmp-lsb/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/hw/net/smc91c111.c",
 line=line@entry=680, function=function@entry=0x7f35621591b0 
<__PRETTY_FUNCTION__.26130> "smc91c111_receive") at assert.c:92
#3  0x7f355f26f0f2 in __GI___assert_fail 
(assertion=assertion@entry=0x7f3562158ed7 "smc91c111_can_receive(nc)", 
file=file@entry=0x7f3562158dc8 
"/media/build1/poky/build/tmp-lsb/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/hw/net/smc91c111.c",
 line=line@entry=680, function=function@entry=0x7f35621591b0 
<__PRETTY_FUNCTION__.26130> "smc91c111_receive")
at assert.c:101
#4  0x7f3561fca4d0 in smc91c111_receive (nc=0x7f3563604d10, 
buf=0x7f353c09d028 "RT", size=1514)
at 
/media/build1/poky/build/tmp-lsb/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/hw/net/smc91c111.c:680
#5  0x7f356203058b in qemu_deliver_packet (sender=, 
flags=, data=data@entry=0x7f353c09d028 "RT", 
size=, opaque=0x7f3563604d10)
at 
/media/build1/poky/build/tmp-lsb/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/net/net.c:577
#6  0x7f3562031eaa in qemu_net_queue_deliver (size=, 
data=, flags=, 
sender=, queue=0x7f3563604e70)
at 
/media/build1/poky/build/tmp-lsb/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/net/queue.c:157
#7  qemu_net_queue_flush (queue=0x7f3563604e70)
at 
/media/build1/poky/build/tmp-lsb/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/net/queue.c:254
#8  0x7f356202fc7c in qemu_flush_or_purge_queued_packets 
(nc=0x7f3563604d10, purge=)
at 
/media/build1/poky/build/tmp-lsb/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/net/net.c:606
#9  0x7f3561fcacec in smc91c111_writeb (opaque=0x7f35635178f0, 
offset=, value=128)
at 
/media/build1/poky/build/tmp-lsb/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/hw/net/smc91c111.c:382
#10 0x7f3561fcaf84 in smc91c111_writew (opaque=0x7f35635178f0, offset=0, 
value=128)
at 
/media/build1/poky/build/tmp-lsb/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/hw/net/smc91c111.c:612
#11 0x7f3561e52cc5 in memory_region_oldmmio_write_accessor (mr=, addr=, value=, 
size=, shift=, mask=, 
attrs=...)
at 
/media/build1/poky/build/tmp-lsb/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/memory.c:434
#12 0x7f3561e5227d in access_with_adjusted_size (addr=addr@entry=0, 
value=value@entry=0x7f35572d93f8, size=size@entry=2, 
access_size_min=, access_size_max=, 
access=0x7f3561e52c90 , 
mr=0x7f356351bc80, attrs=...)
at 
/media/build1/poky/build/tmp-lsb/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/memory.c:506
#13 0x7f3561e53d5b in memory_region_dispatch_write 
(mr=mr@entry=0x7f356351bc80, addr=0, data=128, size=size@entry=2, 
attrs=attrs@entry=...) at 
/media/build1/poky/build/tmp-lsb/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/memory.c:1171
#14 0x7f3561e1fc51 in address_space_rw (as=, addr=268500992, 
attrs=..., buf=buf@entry=0x7f35572d94c0 "\200", 
len=2, is_write=is_write@entry=true)
at 
/media/build1/poky/build/tmp-lsb/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/exec.c:2445
#15 0x7f3561e1fda0 in address_space_write (len=, 
buf=0x7f35572d94c0 "\200", attrs=..., addr=, 
as=) at 
/media/build1/poky/build/tmp-lsb/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/exec.c:2521
#16 subpage_write (opaque=, addr=, 
value=, len=, attrs=...)
at 
/media/build1/poky/build/tmp-lsb/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/exec.c:2081
#17 0x7f3561e5227d in access_with_adjusted_size (addr=addr@entry=0, 
value=value@entry=0x7f35572d9568, size=size@entry=2, 
access_size_min=, access_size_max=, 
access=0x7f3561e521a0 , 
mr=0x7f356375e360, attrs=...)
at 
/media/build1/poky/build/tmp-lsb/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/memory.c:506
#18 0x7f3561e53d5b in memory_region_dispatch_write (mr=0x7f356375e360, 
addr=0, data=128, size=2, attrs=...)
at 
/media/

Re: [Qemu-devel] Segfault using qemu-system-arm in smc91c111

2015-09-07 Thread Richard Purdie
On Sun, 2015-09-06 at 17:48 -0700, Peter Crosthwaite wrote:
> On Sun, Sep 6, 2015 at 4:26 PM, Richard Purdie
> <richard.pur...@linuxfoundation.org> wrote:
> > On Sun, 2015-09-06 at 11:37 -0700, Peter Crosthwaite wrote:
> > It seems can_receive isn't enough, we'd need to put some checks into
> > receive itself since once can_receive says "yes", multiple packets can
> > arrive to _receive without further checks of can_receive.
> 
> This doesn't sound right. There are other network controllers that
> rely of can_receive catching all cases properly. Is this a regression?
> Looking at logs, I see some refactoring of QEMU net framework around
> June timeframe, if you rewind to QEMU 2.3 (or earlier) does the bug go
> away?

We weren't seeing this problem until we upgraded to 2.4.

> 
> > I've either
> > messed up my previous test or been lucky.
> >
> > I tested an assert in _recieve() which confirms it can be called when
> > can_receive() says it isn't ready.
> >
> 
> A backtrace of this would be handy.
> 
> What is your replicator? I have core-image-minimal handy but it has no
> scp or sshd so all I can think of to stress network is wget, but that
> doesn't replicate.

I've been using a core-image-sato and using the "bitbake core-image-sato
-c testimage" which runs a set of tests against the target image. It
invariably crashes on the scp test when I put an assert in receive().

To make it simpler, if I just "runqemu qemuarm" to boot the
core-image-sato, then scp a 5MB file consisting of zeros into the image,
it hits the assert after around 2% transferred.

Cheers,

Richard




Re: [Qemu-devel] Segfault using qemu-system-arm in smc91c111

2015-09-07 Thread Richard Purdie
On Sun, 2015-09-06 at 17:48 -0700, Peter Crosthwaite wrote:
> This doesn't sound right. There are other network controllers that
> rely of can_receive catching all cases properly. Is this a regression?
> Looking at logs, I see some refactoring of QEMU net framework around
> June timeframe, if you rewind to QEMU 2.3 (or earlier) does the bug go
> away?

I did find an interesting comment in this commit:

http://git.qemu.org/?p=qemu.git;a=commitdiff;h=625de449fc5597f2e1aff9cb586e249e198f03c9

"""
Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and friends,
net queues need to be explicitly flushed after qemu_can_send_packet()
returns false, because the netdev side will disable the polling of fd.
"""

smc91x111 is calling flush functions when it knows can_receive
would/should return false. I believe that is the bug here.

I suspect the driver needs:

* can_receive to actually return the right value
* the locations of the flush calls to be when there is receive space 

This could explain what changed to break this and why moving the flush
calls works in my patch.

Cheers,

Richard




Re: [Qemu-devel] Segfault using qemu-system-arm in smc91c111

2015-09-06 Thread Richard Purdie
On Sun, 2015-09-06 at 11:37 -0700, Peter Crosthwaite wrote:
> On Sun, Sep 6, 2015 at 7:21 AM, Richard Purdie
> <richard.pur...@linuxfoundation.org> wrote:
> > On Sat, 2015-09-05 at 13:30 -0700, Peter Crosthwaite wrote:
> >> On Fri, Sep 4, 2015 at 10:30 AM, Peter Maydell <peter.mayd...@linaro.org> 
> >> wrote:
> >> > On 4 September 2015 at 18:20, Richard Purdie
> >> > <richard.pur...@linuxfoundation.org> wrote:
> >> >> On Fri, 2015-09-04 at 13:43 +0100, Richard Purdie wrote:
> >> >>> On Fri, 2015-09-04 at 12:31 +0100, Peter Maydell wrote:
> >> >>> > On 4 September 2015 at 12:24, Richard Purdie
> >> >>> > <richard.pur...@linuxfoundation.org> wrote:
> >> >>> > > So just based on that, yes, seems that the rx_fifo looks to be
> >> >>> > > overrunning. I can add the asserts but I think it would just 
> >> >>> > > confirm
> >> >>> > > this.
> >> >>> >
> >> >>> > Yes, the point of adding assertions is to confirm a hypothesis.
> >> >>>
> >> >>> I've now confirmed that it does indeed trigger the assert in
> >> >>> smc91c111_receive().
> >> >>
> >> >> I just tried an experiment where I put:
> >> >>
> >> >> if (s->rx_fifo_len >= NUM_PACKETS)
> >> >> return -1;
> >> >>
> >> >> into smc91c111_receive() and my reproducer stops reproducing the
> >> >> problem.
> >>
> >> Does it just stop the crash or does it eliminate the problem
> >> completely with a fully now-working network?
> >
> > It stops the crash, the network works great.
> >
> >> >> I also noticed can_receive() could also have a check on buffer
> >> >> availability. Would one of these changes be the correct fix here?
> >> >
> >> > The interesting question is why smc91c111_allocate_packet() doesn't
> >> > fail in this situation. We only have NUM_PACKETS worth of storage,
> >> > shared between the tx and rx buffers, so how could we both have
> >> > already filled the rx_fifo and have a spare packet for the allocate
> >> > function to return?
> >>
> >> Maybe this:
> >>
> >> case 5: /* Release.  */
> >> smc91c111_release_packet(s, s->packet_num);
> >> break;
> >>
> >> The guest is able to free an allocated packet without the accompanying
> >> pop of tx/rx fifo. This may suggest some sort of guest error?
> >>
> >> The fix depends on the behaviour of the real hardware. If that MMIO op
> >> is supposed to dequeue the corresponding queue entry then we may need
> >> to patch that logic to do search the queues and dequeue it. Otherwise
> >> we need to find out the genuine length of the rx queue, and clamp it
> >> without something like Richards patch. There are a few other bits and
> >> pieces that suggest the guest can have independent control of the
> >> queues and allocated buffers but i'm confused as to how the rx fifo
> >> length can get up to 10 in any case.
> >
> > I think I have a handle on what is going on. smc91c111_release_packet()
> > changes s->allocated() but not rx_fifo. can_receive() only looks at
> > s->allocated. We can trigger new network packets to arrive from
> > smc91c111_release_packet() which calls qemu_flush_queued_packets()
> > *before* we change rx_fifo and this can loop.
> >
> > The patch below which explicitly orders the qemu_flush_queued_packets()
> > call resolved the test case I was able to reproduce this problem in.
> >
> > So there are three ways to fix this, either can_receive() needs to check
> > both s->allocated() and rx_fifo,
> 
> This is probably the winner for me.
> 
> > or the code is more explicit about when
> > qemu_flush_queued_packets() is called (as per my patch below), or the
> > case 4 where smc91c111_release_packet() and then
> > smc91c111_pop_rx_fifo(s) is called is reversed. I also tested the latter
> > which also works, albeit with more ugly code.

It seems can_receive isn't enough, we'd need to put some checks into
receive itself since once can_receive says "yes", multiple packets can
arrive to _receive without further checks of can_receive. I've either
messed up my previous test or been lucky.

I tested an assert in _recieve() which confirms it can be called when
can_receive() says it isn't ready.

If we return -1 in _receive, the code will stop sending packets and all
works as it should, it recovers just fine. So I think that is looking
like the correct fix. I'd note that it already effectively has half this
check in the allocate_packet call, its just missing the rx_fifo_len one.

Cheers,

Richard




Re: [Qemu-devel] Segfault using qemu-system-arm in smc91c111

2015-09-06 Thread Richard Purdie
On Sat, 2015-09-05 at 13:30 -0700, Peter Crosthwaite wrote:
> On Fri, Sep 4, 2015 at 10:30 AM, Peter Maydell <peter.mayd...@linaro.org> 
> wrote:
> > On 4 September 2015 at 18:20, Richard Purdie
> > <richard.pur...@linuxfoundation.org> wrote:
> >> On Fri, 2015-09-04 at 13:43 +0100, Richard Purdie wrote:
> >>> On Fri, 2015-09-04 at 12:31 +0100, Peter Maydell wrote:
> >>> > On 4 September 2015 at 12:24, Richard Purdie
> >>> > <richard.pur...@linuxfoundation.org> wrote:
> >>> > > So just based on that, yes, seems that the rx_fifo looks to be
> >>> > > overrunning. I can add the asserts but I think it would just confirm
> >>> > > this.
> >>> >
> >>> > Yes, the point of adding assertions is to confirm a hypothesis.
> >>>
> >>> I've now confirmed that it does indeed trigger the assert in
> >>> smc91c111_receive().
> >>
> >> I just tried an experiment where I put:
> >>
> >> if (s->rx_fifo_len >= NUM_PACKETS)
> >> return -1;
> >>
> >> into smc91c111_receive() and my reproducer stops reproducing the
> >> problem.
> 
> Does it just stop the crash or does it eliminate the problem
> completely with a fully now-working network?

It stops the crash, the network works great.

> >> I also noticed can_receive() could also have a check on buffer
> >> availability. Would one of these changes be the correct fix here?
> >
> > The interesting question is why smc91c111_allocate_packet() doesn't
> > fail in this situation. We only have NUM_PACKETS worth of storage,
> > shared between the tx and rx buffers, so how could we both have
> > already filled the rx_fifo and have a spare packet for the allocate
> > function to return?
> 
> Maybe this:
> 
> case 5: /* Release.  */
> smc91c111_release_packet(s, s->packet_num);
> break;
> 
> The guest is able to free an allocated packet without the accompanying
> pop of tx/rx fifo. This may suggest some sort of guest error?
> 
> The fix depends on the behaviour of the real hardware. If that MMIO op
> is supposed to dequeue the corresponding queue entry then we may need
> to patch that logic to do search the queues and dequeue it. Otherwise
> we need to find out the genuine length of the rx queue, and clamp it
> without something like Richards patch. There are a few other bits and
> pieces that suggest the guest can have independent control of the
> queues and allocated buffers but i'm confused as to how the rx fifo
> length can get up to 10 in any case.

I think I have a handle on what is going on. smc91c111_release_packet()
changes s->allocated() but not rx_fifo. can_receive() only looks at
s->allocated. We can trigger new network packets to arrive from
smc91c111_release_packet() which calls qemu_flush_queued_packets()
*before* we change rx_fifo and this can loop.

The patch below which explicitly orders the qemu_flush_queued_packets()
call resolved the test case I was able to reproduce this problem in.

So there are three ways to fix this, either can_receive() needs to check
both s->allocated() and rx_fifo, or the code is more explicit about when
qemu_flush_queued_packets() is called (as per my patch below), or the
case 4 where smc91c111_release_packet() and then
smc91c111_pop_rx_fifo(s) is called is reversed. I also tested the latter
which also works, albeit with more ugly code.

The problem is much more reproducible with the assert btw, booting a
qemu image with this and hitting the network interface with scp of a few
large files is usually enough.

So which patch would be preferred? :)

Cheers,

Richard



Index: qemu-2.4.0/hw/net/smc91c111.c
===
--- qemu-2.4.0.orig/hw/net/smc91c111.c
+++ qemu-2.4.0/hw/net/smc91c111.c
@@ -185,7 +185,6 @@ static void smc91c111_release_packet(smc
 s->allocated &= ~(1 << packet);
 if (s->tx_alloc == 0x80)
 smc91c111_tx_alloc(s);
-qemu_flush_queued_packets(qemu_get_queue(s->nic));
 }
 
 /* Flush the TX FIFO.  */
@@ -237,9 +236,11 @@ static void smc91c111_do_tx(smc91c111_st
 }
 }
 #endif
-if (s->ctr & CTR_AUTO_RELEASE)
+if (s->ctr & CTR_AUTO_RELEASE) {
 /* Race?  */
 smc91c111_release_packet(s, packetnum);
+qemu_flush_queued_packets(qemu_get_queue(s->nic));
+}
 else if (s->tx_fifo_done_len < NUM_PACKETS)
 s->tx_fifo_done[s->tx_fifo_done_len++] = packetnum;
 qemu_send_packet(qemu_get_queue(s->nic), p, len);
@@ -379,9 +380,11 @@ static void smc91c111_w

[Qemu-devel] Segfault using qemu-system-arm in smc91c111

2015-09-04 Thread Richard Purdie
We're seeing repeated segfaults in qemu-system-arm when we heavily use
the network. I have a coredump backtrace:

Reading symbols from 
/home/pokybuild/yocto-autobuilder/yocto-worker/nightly-arm-lsb/build/build/tmp/sysroots/x86_64-linux/usr/bin/qemu-system-arm...done.
[New LWP 4536]
[New LWP 4534]
[New LWP 4530]
[New LWP 4537]
[New LWP 6396]

warning: Corrupted shared library list: 0x7f8d5f27e540 != 0x619822507f8d
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by 
`/home/pokybuild/yocto-autobuilder/yocto-worker/nightly-arm-lsb/build/build/tmp/'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  smc91c111_pop_tx_fifo_done (s=0x7f8d6158b560)
at 
/home/pokybuild/yocto-autobuilder/yocto-worker/nightly-arm-lsb/build/build/tmp/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/hw/net/smc91c111.c:179
179 s->tx_fifo_done[i] = s->tx_fifo_done[i + 1];
(gdb) bt
#0  smc91c111_pop_tx_fifo_done (s=0x7f8d6158b560)
at 
/home/pokybuild/yocto-autobuilder/yocto-worker/nightly-arm-lsb/build/build/tmp/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/hw/net/smc91c111.c:179
#1  smc91c111_writeb (opaque=0x7f8d6158b560, offset=12, value=)
at 
/home/pokybuild/yocto-autobuilder/yocto-worker/nightly-arm-lsb/build/build/tmp/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/hw/net/smc91c111.c:431
#2  0x7f8d5ecacd65 in memory_region_oldmmio_write_accessor (mr=, addr=, value=, 
size=, shift=, mask=, 
attrs=...)
at 
/home/pokybuild/yocto-autobuilder/yocto-worker/nightly-arm-lsb/build/build/tmp/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/memory.c:434
#3  0x7f8d5ecac5dd in access_with_adjusted_size (addr=140245200319840, 
addr@entry=12, value=0xc, value@entry=0x7f8d52ac63e8, 
size=1, access_size_min=2031671516, access_size_max=32, 
access=0x7f8d5ecacd30 , 
mr=0x7f8d6158f8f0, attrs=...)
at 
/home/pokybuild/yocto-autobuilder/yocto-worker/nightly-arm-lsb/build/build/tmp/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/memory.c:506
#4  0x7f8d5ecae08b in memory_region_dispatch_write 
(mr=mr@entry=0x7f8d6158f8f0, addr=12, data=2, size=size@entry=1, 
attrs=attrs@entry=...)
at 
/home/pokybuild/yocto-autobuilder/yocto-worker/nightly-arm-lsb/build/build/tmp/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/memory.c:1171
#5  0x7f8d5ec7b78f in address_space_rw (as=0x7f8d5f408600 
, addr=268501004, attrs=..., 
buf=buf@entry=0x7f8d52ac64b0 "\002", len=1, is_write=is_write@entry=true)
at 
/home/pokybuild/yocto-autobuilder/yocto-worker/nightly-arm-lsb/build/build/tmp/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/exec.c:2451
#6  0x7f8d5ec7b9e0 in address_space_write (len=, 
buf=0x7f8d52ac64b0 "\002", attrs=..., addr=, 
as=)
at 
/home/pokybuild/yocto-autobuilder/yocto-worker/nightly-arm-lsb/build/build/tmp/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/exec.c:2521
#7  subpage_write (opaque=, addr=, 
value=, len=, attrs=...)
at 
/home/pokybuild/yocto-autobuilder/yocto-worker/nightly-arm-lsb/build/build/tmp/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/exec.c:2081
#8  0x7f8d5ecac5dd in access_with_adjusted_size (addr=140245200319840, 
addr@entry=12, value=0xc, value@entry=0x7f8d52ac6558, 
size=1, access_size_min=2031671516, access_size_max=32, 
access=0x7f8d5ecac500 , 
mr=0x7f8d618d5750, attrs=...)
at 
/home/pokybuild/yocto-autobuilder/yocto-worker/nightly-arm-lsb/build/build/tmp/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/memory.c:506
#9  0x7f8d5ecae08b in memory_region_dispatch_write (mr=0x7f8d618d5750, 
addr=12, data=2, size=1, attrs=...)
at 
/home/pokybuild/yocto-autobuilder/yocto-worker/nightly-arm-lsb/build/build/tmp/work/x86_64-linux/qemu-native/2.4.0-r1/qemu-2.4.0/memory.c:1171
#10 0x7f8d5584b512 in ?? ()

(gdb) print s->tx_fifo_done
$1 = {99614720, 99614720, 99614720, 99614720}
(gdb) print s->tx_fifo_done_len
$2 = 99614719

so it looks like tx_fifo_done_len has been corrupted, going beyond that
is harder for me to figure out. Does anyone happen to know what might be
going on here? This is with qemu 2.4.0.

Cheers,

Richard





Re: [Qemu-devel] Segfault using qemu-system-arm in smc91c111

2015-09-04 Thread Richard Purdie
On Fri, 2015-09-04 at 12:31 +0100, Peter Maydell wrote:
> On 4 September 2015 at 12:24, Richard Purdie
> <richard.pur...@linuxfoundation.org> wrote:
> > So just based on that, yes, seems that the rx_fifo looks to be
> > overrunning. I can add the asserts but I think it would just confirm
> > this.
> 
> Yes, the point of adding assertions is to confirm a hypothesis.

I've now confirmed that it does indeed trigger the assert in
smc91c111_receive().

> >> Also, do you have a more specific reproduce case so I can try
> >> to replicate the problem here?
> >
> > Not sure how familiar you are with the yocto project?
> 
> I don't really know about the Yocto project, no. What I need
> is a set of instructions I can follow ("download this, run this
> command line", etc) so I can reproduce it on my machine.

I'm running some tests at the moment to see which environments this is
reproducible in and will try and distil reproduction steps based on
that.

Cheers,

Richard




Re: [Qemu-devel] Segfault using qemu-system-arm in smc91c111

2015-09-04 Thread Richard Purdie
On Fri, 2015-09-04 at 11:45 +0100, Peter Maydell wrote:
> On 4 September 2015 at 11:25, Richard Purdie
> <richard.pur...@linuxfoundation.org> wrote:
> > We're seeing repeated segfaults in qemu-system-arm when we heavily use
> > the network. I have a coredump backtrace:
> 
> > (gdb) print s->tx_fifo_done
> > $1 = {99614720, 99614720, 99614720, 99614720}
> > (gdb) print s->tx_fifo_done_len
> > $2 = 99614719
> >
> > so it looks like tx_fifo_done_len has been corrupted, going beyond that
> > is harder for me to figure out. Does anyone happen to know what might be
> > going on here? This is with qemu 2.4.0.
> 
> That would suggest the rx_fifo buffer is overrunning (assuming
> none of the other fields in the struct look like they've
> been corrupted). Can you try putting
> assert(s->rx_fifo_len < NUM_PACKETS);
>  before
> s->rx_fifo[s->rx_fifo_len++] = packetnum;
> in smc91c111_receive(), and see if you hit that assertion?

(gdb) print s->tx_fifo_len
$2 = 0
(gdb) print s->rx_fifo_len
$3 = 10

So just based on that, yes, seems that the rx_fifo looks to be
overrunning. I can add the asserts but I think it would just confirm
this.

> Also, do you have a more specific reproduce case so I can try
> to replicate the problem here?

Not sure how familiar you are with the yocto project? Basically we build
a core-image-sato rootfs image, then boot it under qemu and run some
tests against it. This seems to reliably fail for arm, particularly on
our debian8 autobuilder for reasons as yet unknown. The build logs for a
couple of example failures are:

https://autobuilder.yoctoproject.org/main/builders/nightly-oecore/builds/477
https://autobuilder.yoctoproject.org/main/builders/nightly-arm-lsb/builds/465

There is an issue where the tests don't stop running after qemu
segfaults, they continue to try and connect to it which is an issue
we'll work separately. The is a segfault/coredump showing the same
backtrace for both the above builds.

So if you had an OE build environment, you could download (or build) a
core-image-sato, then just run the tests against it (bitbake
core-image-sato -c testimage). We've yet to figure out exactly which
environments trigger it but it does seem to fail fairly regularly (>50%
of the time) when running these tests.

I appreciate its not exactly an easy reproducer but the setup is
designed to be replicated and you did ask! :)

Cheers,

Richard






Re: [Qemu-devel] Segfault using qemu-system-arm in smc91c111

2015-09-04 Thread Richard Purdie
On Fri, 2015-09-04 at 13:43 +0100, Richard Purdie wrote:
> On Fri, 2015-09-04 at 12:31 +0100, Peter Maydell wrote:
> > On 4 September 2015 at 12:24, Richard Purdie
> > <richard.pur...@linuxfoundation.org> wrote:
> > > So just based on that, yes, seems that the rx_fifo looks to be
> > > overrunning. I can add the asserts but I think it would just confirm
> > > this.
> > 
> > Yes, the point of adding assertions is to confirm a hypothesis.
> 
> I've now confirmed that it does indeed trigger the assert in
> smc91c111_receive().

I just tried an experiment where I put:

if (s->rx_fifo_len >= NUM_PACKETS)
return -1;

into smc91c111_receive() and my reproducer stops reproducing the
problem. I also noticed can_receive() could also have a check on buffer
availability. Would one of these changes be the correct fix here?

(Still working on a reproducer, ended up fixing the other test
continuation issues so the failure is more obvious).

Cheers,

Richard




[Qemu-devel] [PATCH] hw/usb/wacom: Add missing HID descriptor

2014-11-27 Thread Richard Purdie
The USB wacom device is missing a HID descriptor which causes it
to fail to operate with recent kernels (e.g. 3.17).

This patch adds a HID desriptor to the device, based upon one from 
real wacom device and allows the device to work properly again.

Signed-off-by: Richard Purdie richard.pur...@linuxfoundation.org

Index: qemu-2.1.0/hw/usb/dev-wacom.c
===
--- qemu-2.1.0.orig/hw/usb/dev-wacom.c  2014-08-01 15:12:17.0 +0100
+++ qemu-2.1.0/hw/usb/dev-wacom.c   2014-10-12 12:13:30.540306042 +0100
@@ -68,6 +68,89 @@
 [STR_SERIALNUMBER] = 1,
 };
 
+static const uint8_t qemu_tablet_hid_report_descriptor[] = {
+0x05, 0x01,/* Usage Page (Generic Desktop) */
+0x09, 0x02,/* Usage (Mouse) */
+0xa1, 0x01,/* Collection (Application) */
+0x85, 0x01,/*   Report ID (1) */ 
+0x09, 0x01,/*   Usage (Pointer) */
+0xa1, 0x00,/*   Collection (Physical) */
+0x05, 0x09,/* Usage Page (Button) */
+0x19, 0x01,/* Usage Minimum (1) */
+0x29, 0x05,/* Usage Maximum (5) */
+0x15, 0x00,/* Logical Minimum (0) */
+0x25, 0x01,/* Logical Maximum (1) */
+0x95, 0x05,/* Report Count (5) */
+0x75, 0x01,/* Report Size (1) */
+0x81, 0x02,/* Input (Data, Variable, Absolute) */
+0x95, 0x01,/* Report Count (1) */
+0x75, 0x03,/* Report Size (3) */
+0x81, 0x01,/* Input (Constant) */
+0x05, 0x01,/* Usage Page (Generic Desktop) */
+0x09, 0x30,/* Usage (X) */
+0x09, 0x31,/* Usage (Y) */
+0x15, 0x81,/* Logical Minimum (-127) */
+0x25, 0x7f,/* Logical Maximum (127) */
+0x75, 0x08,/* Report Size (8) */
+0x95, 0x02,/* Report Count (2) */
+0x81, 0x06,/* Input (Data, Variable, Relative) */
+0xc0,  /*   End Collection */
+0xc0,  /* End Collection */
+0x05, 0x0d,/* Usage Page (Digitizer) */
+0x09, 0x01,/* Usage (Digitizer) */
+0xa1, 0x01,/* Collection (Application) */
+0x85, 0x02,/*   Report ID (2) */ 
+0xa1, 0x00,/*   Collection (Physical) */
+0x06, 0x00, 0xff,   /*   Usage Page (Vendor 0xff00) */
+0x09, 0x01,/*   Usage (Digitizer) */
+0x15, 0x00,/* Logical Minimum (0) */
+0x26, 0xff, 0x00,  /* Logical Maximum (255) */
+0x75, 0x08,/* Report Size (8) */
+0x95, 0x08,/* Report Count (8) */
+0x81, 0x02,/* Input (Data, Variable, Absolute) */
+0xc0,  /*   End Collection */
+0x09, 0x01,/*   Usage (Digitizer) */
+0x85, 0x02,/*   Report ID (2) */ 
+0x95, 0x01,/*   Report Count (1) */
+0xb1, 0x02,/*   FEATURE (2) */
+0xc0,  /* End Collection */
+0x06, 0x00, 0xff,  /* Usage Page (Vendor 0xff00) */
+0x09, 0x01,/* Usage (Digitizer) */
+0xa1, 0x01,/* Collection (Application) */
+0x85, 0x02,/*   Report ID (2) */ 
+0x05, 0x0d,/*   Usage Page (Digitizer)  */
+0x09, 0x22,/*   Usage (Finger) */
+0xa1, 0x00,/*   Collection (Physical) */
+0x06, 0x00, 0xff,  /*   Usage Page (Vendor 0xff00) */
+0x09, 0x01,/* Usage (Digitizer) */
+0x15, 0x00,/* Logical Minimum (0) */
+0x26, 0xff, 0x00,  /* Logical Maximum */
+0x75, 0x08,/* Report Size (8) */
+0x95, 0x02,/* Report Count (2) */
+0x81, 0x02,/* Input (Data, Variable, Absolute) */
+0x05, 0x01,/* Usage Page (Generic Desktop) */
+0x09, 0x30,/* Usage (X) */
+0x35, 0x00,/* Physical Minimum */
+0x46, 0xe0, 0x2e,  /* Physical Maximum */
+0x26, 0xe0, 0x01,   /* Logical Maximum */
+0x75, 0x10,/* Report Size (16) */
+0x95, 0x01,/* Report Count (1) */
+0x81, 0x02,/* Input (Data, Variable, Absolute) */
+0x09, 0x31,/* Usage (Y) */
+0x46, 0x40, 0x1f,  /* Physical Maximum */
+0x26, 0x40, 0x01,  /* Logical Maximum */
+0x81, 0x02,/* Input (Data, Variable, Absolute) */
+0x06, 0x00, 0xff,  /* Usage Page (Vendor 0xff00) */
+0x09, 0x01,/* Usage (Digitizer) */
+0x26, 0xff, 0x00,  /* Logical Maximum */
+0x75

Re: [Qemu-devel] [BUG/PATCH] Fix i386 SSE status flag corruption

2013-10-13 Thread Richard Purdie
On Mon, 2013-09-30 at 22:20 +0100, Richard Purdie wrote:
 This is a combination of bug report and patch. I'm not sure if you'll want to 
 fix it 
 like this but it illustrates the problem and should be easy to fix based on 
 this.
 
 When we restore the mxcsr register with FXRSTOR, we need to update the 
 various SSE
 status flags in CPUX86State by calling update_sse_status(). If we don't, we 
 end up 
 using the status bits from some other context with interesting results.
 
 I used a function prototype since it makes the fix clear, some code 
 rearrangement
 might be needed ultimately.
 
 Signed-off-by: Richard Purdie richard.pur...@linuxfoundation.org

Ping?

This seems to be quite nasty SSE register corruption and it would be
nice to get it fixed. Happy to rework the patch if needed, am just not
sure the best way to handle moving the function if that is what is
needed.

Cheers,

Richard

 Index: qemu-1.5.0/target-i386/fpu_helper.c
 ===
 --- qemu-1.5.0.orig/target-i386/fpu_helper.c  2013-09-30 18:46:39.283377648 
 +
 +++ qemu-1.5.0/target-i386/fpu_helper.c   2013-09-30 18:46:56.895377232 
 +
 @@ -1149,6 +1149,8 @@
  }
  }
  
 +static void update_sse_status(CPUX86State *env);
 +
  void helper_fxrstor(CPUX86State *env, target_ulong ptr, int data64)
  {
  int i, fpus, fptag, nb_xmm_regs;
 @@ -1180,6 +1182,7 @@
  if (env-cr[4]  CR4_OSFXSR_MASK) {
  /* XXX: finish it */
  env-mxcsr = cpu_ldl_data(env, ptr + 0x18);
 +update_sse_status(env);
  /* cpu_ldl_data(env, ptr + 0x1c); */
  if (env-hflags  HF_CS64_MASK) {
  nb_xmm_regs = 16;
 
 





[Qemu-devel] [BUG/PATCH] Fix i386 SSE status flag corruption

2013-09-30 Thread Richard Purdie
This is a combination of bug report and patch. I'm not sure if you'll want to 
fix it 
like this but it illustrates the problem and should be easy to fix based on 
this.

When we restore the mxcsr register with FXRSTOR, we need to update the various 
SSE
status flags in CPUX86State by calling update_sse_status(). If we don't, we end 
up 
using the status bits from some other context with interesting results.

I used a function prototype since it makes the fix clear, some code 
rearrangement
might be needed ultimately.

Signed-off-by: Richard Purdie richard.pur...@linuxfoundation.org

Index: qemu-1.5.0/target-i386/fpu_helper.c
===
--- qemu-1.5.0.orig/target-i386/fpu_helper.c2013-09-30 18:46:39.283377648 
+
+++ qemu-1.5.0/target-i386/fpu_helper.c 2013-09-30 18:46:56.895377232 +
@@ -1149,6 +1149,8 @@
 }
 }
 
+static void update_sse_status(CPUX86State *env);
+
 void helper_fxrstor(CPUX86State *env, target_ulong ptr, int data64)
 {
 int i, fpus, fptag, nb_xmm_regs;
@@ -1180,6 +1182,7 @@
 if (env-cr[4]  CR4_OSFXSR_MASK) {
 /* XXX: finish it */
 env-mxcsr = cpu_ldl_data(env, ptr + 0x18);
+update_sse_status(env);
 /* cpu_ldl_data(env, ptr + 0x1c); */
 if (env-hflags  HF_CS64_MASK) {
 nb_xmm_regs = 16;




[Qemu-devel] Fix writev syscall emulation

2008-02-04 Thread Richard Purdie
Hi,

OpenEmbedded/Poky use qemu for locale generation when cross compiling.
When we upgraded to qemu 0.9.1 it started giving locale generation
errors on all 64 bit machines and some 32 bit ones.

I've traced it to the writev syscall failing. localedef passes several
{ NULL, 0 } iovec entries which trip up lock_iovec(). That function is
returning an error for these but the return value isn't checked. The
syscall is therefore always made but sometimes with a iovec that has
only been half copied. If the total writes exceed size_t, EINVAL is
returned by the kernel and glibc code emulates the call with a write
which is most likely to happen on 32 bit so it sometimes works. size_t
is unlikely to be exceeded on 64 bit so that returns an EFAULT and
always corrupts/fails.

Anyhow, it seems 0 length iovec entries are allowed and we shouldn't
care about the addresses in those cases. The patch below is one way to
fix this. Ideally the return value of lock_iovec() needs be be checked
too.

Regards,

Richard

---
 linux-user/syscall.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: qemu-0.9.1/linux-user/syscall.c
===
--- qemu-0.9.1.orig/linux-user/syscall.c2008-02-03 00:00:00.0 
+
+++ qemu-0.9.1/linux-user/syscall.c 2008-02-03 00:00:38.0 +
@@ -1048,7 +1048,7 @@ static abi_long lock_iovec(int type, str
 base = tswapl(target_vec[i].iov_base);
 vec[i].iov_len = tswapl(target_vec[i].iov_len);
 vec[i].iov_base = lock_user(type, base, vec[i].iov_len, copy);
-   if (!vec[i].iov_base) 
+   if (!vec[i].iov_base  vec[i].iov_len) 
 goto fail;
 }
 unlock_user (target_vec, target_addr, 0);