Re: Display update issue on M1 Macs

2023-02-03 Thread Akihiko Odaki

On 2023/02/03 22:45, BALATON Zoltan wrote:

On Fri, 3 Feb 2023, Akihiko Odaki wrote:

On 2023/02/02 19:51, BALATON Zoltan wrote:

On Tue, 31 Jan 2023, BALATON Zoltan wrote:

On Tue, 31 Jan 2023, Akihiko Odaki wrote:

[...]
To summarise previous discussion:

- There's a problem on Apple M1 Macs with sm501 and ati-vga 2d accel 
functions drawing from device model into the video memory of the 
emulated card which is not shown on screen when the display update 
callback is called from another thread. This works on x86_64 host so 
I suspect it may be related to missing memory synchronisation that 
ARM may need.


- This can be reproduced running AmigaOS4 on sam460ex or MorphOS 
(demo iso downliadable from their web site) on sam460ex, pegasos2 or 
mac99,via=pmu with -device ati-vga,romfile="" as described here: 
http://zero.eik.bme.hu/~balaton/qemu/amiga/


- I can't test it myself lacking hardware so I have to rely on 
reports from people who have this hardware so there may be some 
uncertainity in the info I get.


- We have confirmed it's not related to a known race condition as 
disabling dirty tracking and always doing full updates of whole 
screen did not fix it:


But there is an exception: 
memory_region_snapshot_and_clear_dirty() releases iothread 
lock, and that broke raspi3b display device:

https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/

It is unexpected that gfx_update() callback releases iothread 
lock so it may break things in peculiar ways.


Peter, is there any change in the situation regarding the 
race introduced by memory_region_snapshot_and_clear_dirty()?


For now, to workaround the issue, I think you can create 
another mutex and make the entire sm501_2d_engine_write() and 
sm501_update_display() critical sections.


Interesting thread but not sure it's the same problem so this 
workaround may not be enough to fix my issue. Here's a video 
posted by one of the people who reported it showing the 
problem on M1 Mac:


https://www.youtube.com/watch?v=FDqoNbp6PQs

and here's how it looks like on other machines:

https://www.youtube.com/watch?v=ML7-F4HNFKQ

There are also videos showing it running on RPi 4 and G5 Mac 
without this issue so it seems to only happen on Apple Silicon 
M1 Macs. What's strange is that graphics elements are not just 
delayed which I think should happen with missing thread 
synchronisation where the update callback would miss some 
pixels rendered during it's running but subsequent update 
callbacks would eventually draw those, woudn't they? Also 
setting full_update to 1 in sm501_update_display() callback to 
disable dirty tracking does not fix the problem. So it looks 
like as if sm501_2d_operation() running on one CPU core only 
writes data to the local cache of that core which 
sm501_update_display() running on other core can't see, so 
maybe some cache synchronisation is needed in 
memory_region_set_dirty() or if that's already there maybe I 
should call it for all changes not only those in the visible 
display area? I'm still not sure I understand the problem and 
don't know what could be a fix for it so anything to test to 
identify the issue better might also bring us closer to a 
solution.


If you set full_update to 1, you may also comment out 
memory_region_snapshot_and_clear_dirty() and 
memory_region_snapshot_get_dirty() to avoid the iothread mutex 
being unlocked. The iothread mutex should ensure cache 
coherency as well.


But as you say, it's weird that the rendered result is not just 
delayed but missed. That may imply other possibilities (e.g., 
the results are overwritten by someone else). If the problem 
persists after commenting out 
memory_region_snapshot_and_clear_dirty() and 
memory_region_snapshot_get_dirty(), I think you can assume the 
inter-thread coherency between sm501_2d_operation() and 
sm501_update_display() is not causing the problem.


I've asked people who reported and can reproduce it to test this 
but it did not change anything so confirmed it's not that race 
condition but looks more like some cache inconsistency maybe. 
Any other ideas?


I can come up with two important differences between x86 and Arm 
which can affect the execution of QEMU:
1. Memory model. Arm uses a memory model more relaxed than x86 so 
it is more sensitive for synchronization failures among threads.
2. Different instructions. TCG uses JIT so differences in 
instructions matter.


We should be able to exclude 1) as a potential cause of the 
problem. iothread mutex should take care of race condition and 
even cache coherency problem; mutex includes memory barrier 
functionality.

[...]
For difference 2), you may try to use TCI. You can find details 
of TCI in tcg/tci/README.


This was tested and also with TCI got the same results just much 
slower.


The common sense tells, however, the memory model is usually the 
cause of the problem when you see behavioral differences between 
x86 and Arm, and 

Re: Display update issue on M1 Macs

2023-02-03 Thread BALATON Zoltan

On Fri, 3 Feb 2023, Akihiko Odaki wrote:

On 2023/02/02 19:51, BALATON Zoltan wrote:

On Tue, 31 Jan 2023, BALATON Zoltan wrote:

On Tue, 31 Jan 2023, Akihiko Odaki wrote:

[...]
To summarise previous discussion:

- There's a problem on Apple M1 Macs with sm501 and ati-vga 2d accel 
functions drawing from device model into the video memory of the emulated 
card which is not shown on screen when the display update callback is 
called from another thread. This works on x86_64 host so I suspect it may 
be related to missing memory synchronisation that ARM may need.


- This can be reproduced running AmigaOS4 on sam460ex or MorphOS (demo iso 
downliadable from their web site) on sam460ex, pegasos2 or mac99,via=pmu 
with -device ati-vga,romfile="" as described here: 
http://zero.eik.bme.hu/~balaton/qemu/amiga/


- I can't test it myself lacking hardware so I have to rely on reports from 
people who have this hardware so there may be some uncertainity in the info 
I get.


- We have confirmed it's not related to a known race condition as disabling 
dirty tracking and always doing full updates of whole screen did not fix 
it:


But there is an exception: memory_region_snapshot_and_clear_dirty() 
releases iothread lock, and that broke raspi3b display device:

https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/

It is unexpected that gfx_update() callback releases iothread lock 
so it may break things in peculiar ways.


Peter, is there any change in the situation regarding the race 
introduced by memory_region_snapshot_and_clear_dirty()?


For now, to workaround the issue, I think you can create another 
mutex and make the entire sm501_2d_engine_write() and 
sm501_update_display() critical sections.


Interesting thread but not sure it's the same problem so this 
workaround may not be enough to fix my issue. Here's a video posted 
by one of the people who reported it showing the problem on M1 Mac:


https://www.youtube.com/watch?v=FDqoNbp6PQs

and here's how it looks like on other machines:

https://www.youtube.com/watch?v=ML7-F4HNFKQ

There are also videos showing it running on RPi 4 and G5 Mac without 
this issue so it seems to only happen on Apple Silicon M1 Macs. 
What's strange is that graphics elements are not just delayed which 
I think should happen with missing thread synchronisation where the 
update callback would miss some pixels rendered during it's running 
but subsequent update callbacks would eventually draw those, woudn't 
they? Also setting full_update to 1 in sm501_update_display() 
callback to disable dirty tracking does not fix the problem. So it 
looks like as if sm501_2d_operation() running on one CPU core only 
writes data to the local cache of that core which 
sm501_update_display() running on other core can't see, so maybe 
some cache synchronisation is needed in memory_region_set_dirty() or 
if that's already there maybe I should call it for all changes not 
only those in the visible display area? I'm still not sure I 
understand the problem and don't know what could be a fix for it so 
anything to test to identify the issue better might also bring us 
closer to a solution.


If you set full_update to 1, you may also comment out 
memory_region_snapshot_and_clear_dirty() and 
memory_region_snapshot_get_dirty() to avoid the iothread mutex being 
unlocked. The iothread mutex should ensure cache coherency as well.


But as you say, it's weird that the rendered result is not just 
delayed but missed. That may imply other possibilities (e.g., the 
results are overwritten by someone else). If the problem persists 
after commenting out memory_region_snapshot_and_clear_dirty() and 
memory_region_snapshot_get_dirty(), I think you can assume the 
inter-thread coherency between sm501_2d_operation() and 
sm501_update_display() is not causing the problem.


I've asked people who reported and can reproduce it to test this but 
it did not change anything so confirmed it's not that race condition 
but looks more like some cache inconsistency maybe. Any other ideas?


I can come up with two important differences between x86 and Arm which 
can affect the execution of QEMU:
1. Memory model. Arm uses a memory model more relaxed than x86 so it is 
more sensitive for synchronization failures among threads.
2. Different instructions. TCG uses JIT so differences in instructions 
matter.


We should be able to exclude 1) as a potential cause of the problem. 
iothread mutex should take care of race condition and even cache 
coherency problem; mutex includes memory barrier functionality.

[...]
For difference 2), you may try to use TCI. You can find details of TCI 
in tcg/tci/README.


This was tested and also with TCI got the same results just much slower.

The common sense tells, however, the memory model is usually the cause 
of the problem when you see behavioral differences between x86 and Arm, 
and TCG should work fine with both of x86 and Arm as they 

Re: Display update issue on M1 Macs

2023-02-03 Thread Akihiko Odaki

On 2023/02/02 19:51, BALATON Zoltan wrote:

On Tue, 31 Jan 2023, BALATON Zoltan wrote:

On Tue, 31 Jan 2023, Akihiko Odaki wrote:

[...]
To summarise previous discussion:

- There's a problem on Apple M1 Macs with sm501 and ati-vga 2d accel 
functions drawing from device model into the video memory of the 
emulated card which is not shown on screen when the display update 
callback is called from another thread. This works on x86_64 host so I 
suspect it may be related to missing memory synchronisation that ARM may 
need.


- This can be reproduced running AmigaOS4 on sam460ex or MorphOS (demo 
iso downliadable from their web site) on sam460ex, pegasos2 or 
mac99,via=pmu with -device ati-vga,romfile="" as described here: 
http://zero.eik.bme.hu/~balaton/qemu/amiga/


- I can't test it myself lacking hardware so I have to rely on reports 
from people who have this hardware so there may be some uncertainity in 
the info I get.


- We have confirmed it's not related to a known race condition as 
disabling dirty tracking and always doing full updates of whole screen 
did not fix it:


But there is an exception: 
memory_region_snapshot_and_clear_dirty() releases iothread 
lock, and that broke raspi3b display device:

https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/

It is unexpected that gfx_update() callback releases iothread 
lock so it may break things in peculiar ways.


Peter, is there any change in the situation regarding the race 
introduced by memory_region_snapshot_and_clear_dirty()?


For now, to workaround the issue, I think you can create 
another mutex and make the entire sm501_2d_engine_write() and 
sm501_update_display() critical sections.


Interesting thread but not sure it's the same problem so this 
workaround may not be enough to fix my issue. Here's a video 
posted by one of the people who reported it showing the problem 
on M1 Mac:


https://www.youtube.com/watch?v=FDqoNbp6PQs

and here's how it looks like on other machines:

https://www.youtube.com/watch?v=ML7-F4HNFKQ

There are also videos showing it running on RPi 4 and G5 Mac 
without this issue so it seems to only happen on Apple Silicon 
M1 Macs. What's strange is that graphics elements are not just 
delayed which I think should happen with missing thread 
synchronisation where the update callback would miss some pixels 
rendered during it's running but subsequent update callbacks 
would eventually draw those, woudn't they? Also setting 
full_update to 1 in sm501_update_display() callback to disable 
dirty tracking does not fix the problem. So it looks like as if 
sm501_2d_operation() running on one CPU core only writes data to 
the local cache of that core which sm501_update_display() 
running on other core can't see, so maybe some cache 
synchronisation is needed in memory_region_set_dirty() or if 
that's already there maybe I should call it for all changes not 
only those in the visible display area? I'm still not sure I 
understand the problem and don't know what could be a fix for it 
so anything to test to identify the issue better might also 
bring us closer to a solution.


If you set full_update to 1, you may also comment out 
memory_region_snapshot_and_clear_dirty() and 
memory_region_snapshot_get_dirty() to avoid the iothread mutex 
being unlocked. The iothread mutex should ensure cache coherency 
as well.


But as you say, it's weird that the rendered result is not just 
delayed but missed. That may imply other possibilities (e.g., the 
results are overwritten by someone else). If the problem persists 
after commenting out memory_region_snapshot_and_clear_dirty() and 
memory_region_snapshot_get_dirty(), I think you can assume the 
inter-thread coherency between sm501_2d_operation() and 
sm501_update_display() is not causing the problem.


I've asked people who reported and can reproduce it to test this 
but it did not change anything so confirmed it's not that race 
condition but looks more like some cache inconsistency maybe. Any 
other ideas?


I can come up with two important differences between x86 and Arm 
which can affect the execution of QEMU:
1. Memory model. Arm uses a memory model more relaxed than x86 so 
it is more sensitive for synchronization failures among threads.
2. Different instructions. TCG uses JIT so differences in 
instructions matter.


We should be able to exclude 1) as a potential cause of the 
problem. iothread mutex should take care of race condition and even 
cache coherency problem; mutex includes memory barrier functionality.

[...]
For difference 2), you may try to use TCI. You can find details of 
TCI in tcg/tci/README.


This was tested and also with TCI got the same results just much 
slower.


The common sense tells, however, the memory model is usually the 
cause of the problem when you see behavioral differences between 
x86 and Arm, and TCG should work fine with both of x86 and Arm as 
they should have been tested well.

[...]

Re: Display update issue on M1 Macs

2023-02-02 Thread BALATON Zoltan

On Tue, 31 Jan 2023, BALATON Zoltan wrote:

On Tue, 31 Jan 2023, Akihiko Odaki wrote:

[...]
To summarise previous discussion:

- There's a problem on Apple M1 Macs with sm501 and ati-vga 2d accel 
functions drawing from device model into the video memory of the emulated 
card which is not shown on screen when the display update callback is 
called from another thread. This works on x86_64 host so I suspect it may 
be related to missing memory synchronisation that ARM may need.


- This can be reproduced running AmigaOS4 on sam460ex or MorphOS (demo iso 
downliadable from their web site) on sam460ex, pegasos2 or mac99,via=pmu 
with -device ati-vga,romfile="" as described here: 
http://zero.eik.bme.hu/~balaton/qemu/amiga/


- I can't test it myself lacking hardware so I have to rely on reports 
from people who have this hardware so there may be some uncertainity in 
the info I get.


- We have confirmed it's not related to a known race condition as 
disabling dirty tracking and always doing full updates of whole screen 
did not fix it:


But there is an exception: memory_region_snapshot_and_clear_dirty() 
releases iothread lock, and that broke raspi3b display device:

https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/

It is unexpected that gfx_update() callback releases iothread lock so 
it may break things in peculiar ways.


Peter, is there any change in the situation regarding the race 
introduced by memory_region_snapshot_and_clear_dirty()?


For now, to workaround the issue, I think you can create another 
mutex and make the entire sm501_2d_engine_write() and 
sm501_update_display() critical sections.


Interesting thread but not sure it's the same problem so this 
workaround may not be enough to fix my issue. Here's a video posted by 
one of the people who reported it showing the problem on M1 Mac:


https://www.youtube.com/watch?v=FDqoNbp6PQs

and here's how it looks like on other machines:

https://www.youtube.com/watch?v=ML7-F4HNFKQ

There are also videos showing it running on RPi 4 and G5 Mac without 
this issue so it seems to only happen on Apple Silicon M1 Macs. What's 
strange is that graphics elements are not just delayed which I think 
should happen with missing thread synchronisation where the update 
callback would miss some pixels rendered during it's running but 
subsequent update callbacks would eventually draw those, woudn't they? 
Also setting full_update to 1 in sm501_update_display() callback to 
disable dirty tracking does not fix the problem. So it looks like as 
if sm501_2d_operation() running on one CPU core only writes data to 
the local cache of that core which sm501_update_display() running on 
other core can't see, so maybe some cache synchronisation is needed in 
memory_region_set_dirty() or if that's already there maybe I should 
call it for all changes not only those in the visible display area? 
I'm still not sure I understand the problem and don't know what could 
be a fix for it so anything to test to identify the issue better might 
also bring us closer to a solution.


If you set full_update to 1, you may also comment out 
memory_region_snapshot_and_clear_dirty() and 
memory_region_snapshot_get_dirty() to avoid the iothread mutex being 
unlocked. The iothread mutex should ensure cache coherency as well.


But as you say, it's weird that the rendered result is not just delayed 
but missed. That may imply other possibilities (e.g., the results are 
overwritten by someone else). If the problem persists after commenting 
out memory_region_snapshot_and_clear_dirty() and 
memory_region_snapshot_get_dirty(), I think you can assume the 
inter-thread coherency between sm501_2d_operation() and 
sm501_update_display() is not causing the problem.


I've asked people who reported and can reproduce it to test this but it 
did not change anything so confirmed it's not that race condition but 
looks more like some cache inconsistency maybe. Any other ideas?


I can come up with two important differences between x86 and Arm which 
can affect the execution of QEMU:
1. Memory model. Arm uses a memory model more relaxed than x86 so it is 
more sensitive for synchronization failures among threads.
2. Different instructions. TCG uses JIT so differences in instructions 
matter.


We should be able to exclude 1) as a potential cause of the problem. 
iothread mutex should take care of race condition and even cache 
coherency problem; mutex includes memory barrier functionality.

[...]
For difference 2), you may try to use TCI. You can find details of TCI in 
tcg/tci/README.


This was tested and also with TCI got the same results just much slower.

The common sense tells, however, the memory model is usually the cause of 
the problem when you see behavioral differences between x86 and Arm, and 
TCG should work fine with both of x86 and Arm as they should have been 
tested well.

[...]
Fortunately macOS provides Rosetta 2 for x86 

Re: Display update issue on M1 Macs

2023-01-31 Thread BALATON Zoltan

On Tue, 31 Jan 2023, Akihiko Odaki wrote:

On 2023/01/31 8:58, BALATON Zoltan wrote:

On Sat, 28 Jan 2023, Akihiko Odaki wrote:

On 2023/01/23 8:28, BALATON Zoltan wrote:

On Thu, 19 Jan 2023, Akihiko Odaki wrote:

On 2023/01/15 3:11, BALATON Zoltan wrote:

On Sat, 14 Jan 2023, Akihiko Odaki wrote:

On 2023/01/13 22:43, BALATON Zoltan wrote:

On Thu, 5 Jan 2023, BALATON Zoltan wrote:

Hello,

I got reports from several users trying to run AmigaOS4 on sam460ex 
on Apple silicon Macs that they get missing graphics that I can't 
reproduce on x86_64. With help from the users who get the problem 
we've narrowed it down to the following:


It looks like that data written to the sm501's ram in 
qemu/hw/display/sm501.c::sm501_2d_operation() is then not seen from 
sm501_update_display() in the same file. The sm501_2d_operation() 
function is called when the guest accesses the emulated card so it 
may run in a different thread than sm501_update_display() which is 
called by the ui backend but I'm not sure how QEMU calls these. Is 
device code running in iothread and display update in main thread? 
The problem is also independent of the display backend and was 
reproduced with both -display cocoa and -display sdl.


We have confirmed it's not the pixman routines that 
sm501_2d_operation() uses as the same issue is seen also with QEMU 
4.x where pixman wasn't used and with all versions up to 7.2 so it's 
also not some bisectable change in QEMU. It also happens with 
--enable-debug so it doesn't seem to be related to optimisation 
either and I don't get it on x86_64 but even x86_64 QEMU builds run 
on Apple M1 with Rosetta 2 show the problem. It also only seems to 
affect graphics written from sm501_2d_operation() which AmigaOS4 
uses extensively but other OSes don't and just render graphics with 
the vcpu which work without problem also on the M1 Macs that show 
this problem with AmigaOS4. Theoretically this could be some missing 
syncronisation which is something ARM and PPC may need while x86 
doesn't but I don't know if this is really the reason and if so 
where and how to fix it). Any idea what may cause this and what 
could be a fix to try?


Any idea anyone? At least some explanation if the above is plausible 
or if there's an option to disable the iothread and run everyting in 
a single thread to verify the theory could help. I've got reports 
from at least 3 people getting this problem but I can't do much to 
fix it without some help.



(Info on how to run it is here:
http://zero.eik.bme.hu/~balaton/qemu/amiga/#amigaos
but AmigaOS4 is not freely distributable so it's a bit hard to 
reproduce. Some Linux X servers that support sm501/sm502 may also 
use the card's 2d engine but I don't know about any live CDs that 
readily run on sam460ex.)


Thank you,
BALATON Zoltan


Sorry, I missed the email.

Indeed the ui backend should call sm501_update_display() in the main 
thread, which should be different from the thread calling 
sm501_2d_operation(). However, if I understand it correctly, both of 
the functions should be called with iothread lock held so there should 
be no race condition in theory.


But there is an exception: memory_region_snapshot_and_clear_dirty() 
releases iothread lock, and that broke raspi3b display device:

https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/

It is unexpected that gfx_update() callback releases iothread lock so 
it may break things in peculiar ways.


Peter, is there any change in the situation regarding the race 
introduced by memory_region_snapshot_and_clear_dirty()?


For now, to workaround the issue, I think you can create another mutex 
and make the entire sm501_2d_engine_write() and sm501_update_display() 
critical sections.


Interesting thread but not sure it's the same problem so this 
workaround may not be enough to fix my issue. Here's a video posted by 
one of the people who reported it showing the problem on M1 Mac:


https://www.youtube.com/watch?v=FDqoNbp6PQs

and here's how it looks like on other machines:

https://www.youtube.com/watch?v=ML7-F4HNFKQ

There are also videos showing it running on RPi 4 and G5 Mac without 
this issue so it seems to only happen on Apple Silicon M1 Macs. What's 
strange is that graphics elements are not just delayed which I think 
should happen with missing thread synchronisation where the update 
callback would miss some pixels rendered during it's running but 
subsequent update callbacks would eventually draw those, woudn't they? 
Also setting full_update to 1 in sm501_update_display() callback to 
disable dirty tracking does not fix the problem. So it looks like as if 
sm501_2d_operation() running on one CPU core only writes data to the 
local cache of that core which sm501_update_display() running on other 
core can't see, so maybe some cache synchronisation is needed in 
memory_region_set_dirty() or if that's already there maybe I should 
call it for all changes not only 

Re: Display update issue on M1 Macs

2023-01-30 Thread Akihiko Odaki

On 2023/01/31 8:58, BALATON Zoltan wrote:

On Sat, 28 Jan 2023, Akihiko Odaki wrote:

On 2023/01/23 8:28, BALATON Zoltan wrote:

On Thu, 19 Jan 2023, Akihiko Odaki wrote:

On 2023/01/15 3:11, BALATON Zoltan wrote:

On Sat, 14 Jan 2023, Akihiko Odaki wrote:

On 2023/01/13 22:43, BALATON Zoltan wrote:

On Thu, 5 Jan 2023, BALATON Zoltan wrote:

Hello,

I got reports from several users trying to run AmigaOS4 on 
sam460ex on Apple silicon Macs that they get missing graphics 
that I can't reproduce on x86_64. With help from the users who 
get the problem we've narrowed it down to the following:


It looks like that data written to the sm501's ram in 
qemu/hw/display/sm501.c::sm501_2d_operation() is then not seen 
from sm501_update_display() in the same file. The 
sm501_2d_operation() function is called when the guest accesses 
the emulated card so it may run in a different thread than 
sm501_update_display() which is called by the ui backend but I'm 
not sure how QEMU calls these. Is device code running in 
iothread and display update in main thread? The problem is also 
independent of the display backend and was reproduced with both 
-display cocoa and -display sdl.


We have confirmed it's not the pixman routines that 
sm501_2d_operation() uses as the same issue is seen also with 
QEMU 4.x where pixman wasn't used and with all versions up to 
7.2 so it's also not some bisectable change in QEMU. It also 
happens with --enable-debug so it doesn't seem to be related to 
optimisation either and I don't get it on x86_64 but even x86_64 
QEMU builds run on Apple M1 with Rosetta 2 show the problem. It 
also only seems to affect graphics written from 
sm501_2d_operation() which AmigaOS4 uses extensively but other 
OSes don't and just render graphics with the vcpu which work 
without problem also on the M1 Macs that show this problem with 
AmigaOS4. Theoretically this could be some missing 
syncronisation which is something ARM and PPC may need while x86 
doesn't but I don't know if this is really the reason and if so 
where and how to fix it). Any idea what may cause this and what 
could be a fix to try?


Any idea anyone? At least some explanation if the above is 
plausible or if there's an option to disable the iothread and run 
everyting in a single thread to verify the theory could help. 
I've got reports from at least 3 people getting this problem but 
I can't do much to fix it without some help.



(Info on how to run it is here:
http://zero.eik.bme.hu/~balaton/qemu/amiga/#amigaos
but AmigaOS4 is not freely distributable so it's a bit hard to 
reproduce. Some Linux X servers that support sm501/sm502 may 
also use the card's 2d engine but I don't know about any live 
CDs that readily run on sam460ex.)


Thank you,
BALATON Zoltan


Sorry, I missed the email.

Indeed the ui backend should call sm501_update_display() in the 
main thread, which should be different from the thread calling 
sm501_2d_operation(). However, if I understand it correctly, both 
of the functions should be called with iothread lock held so there 
should be no race condition in theory.


But there is an exception: 
memory_region_snapshot_and_clear_dirty() releases iothread lock, 
and that broke raspi3b display device:

https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/

It is unexpected that gfx_update() callback releases iothread lock 
so it may break things in peculiar ways.


Peter, is there any change in the situation regarding the race 
introduced by memory_region_snapshot_and_clear_dirty()?


For now, to workaround the issue, I think you can create another 
mutex and make the entire sm501_2d_engine_write() and 
sm501_update_display() critical sections.


Interesting thread but not sure it's the same problem so this 
workaround may not be enough to fix my issue. Here's a video posted 
by one of the people who reported it showing the problem on M1 Mac:


https://www.youtube.com/watch?v=FDqoNbp6PQs

and here's how it looks like on other machines:

https://www.youtube.com/watch?v=ML7-F4HNFKQ

There are also videos showing it running on RPi 4 and G5 Mac 
without this issue so it seems to only happen on Apple Silicon M1 
Macs. What's strange is that graphics elements are not just delayed 
which I think should happen with missing thread synchronisation 
where the update callback would miss some pixels rendered during 
it's running but subsequent update callbacks would eventually draw 
those, woudn't they? Also setting full_update to 1 in 
sm501_update_display() callback to disable dirty tracking does not 
fix the problem. So it looks like as if sm501_2d_operation() 
running on one CPU core only writes data to the local cache of that 
core which sm501_update_display() running on other core can't see, 
so maybe some cache synchronisation is needed in 
memory_region_set_dirty() or if that's already there maybe I should 
call it for all changes not only those in the visible display area? 
I'm 

Re: Display update issue on M1 Macs

2023-01-30 Thread BALATON Zoltan

On Sat, 28 Jan 2023, Akihiko Odaki wrote:

On 2023/01/23 8:28, BALATON Zoltan wrote:

On Thu, 19 Jan 2023, Akihiko Odaki wrote:

On 2023/01/15 3:11, BALATON Zoltan wrote:

On Sat, 14 Jan 2023, Akihiko Odaki wrote:

On 2023/01/13 22:43, BALATON Zoltan wrote:

On Thu, 5 Jan 2023, BALATON Zoltan wrote:

Hello,

I got reports from several users trying to run AmigaOS4 on sam460ex on 
Apple silicon Macs that they get missing graphics that I can't 
reproduce on x86_64. With help from the users who get the problem 
we've narrowed it down to the following:


It looks like that data written to the sm501's ram in 
qemu/hw/display/sm501.c::sm501_2d_operation() is then not seen from 
sm501_update_display() in the same file. The sm501_2d_operation() 
function is called when the guest accesses the emulated card so it may 
run in a different thread than sm501_update_display() which is called 
by the ui backend but I'm not sure how QEMU calls these. Is device 
code running in iothread and display update in main thread? The 
problem is also independent of the display backend and was reproduced 
with both -display cocoa and -display sdl.


We have confirmed it's not the pixman routines that 
sm501_2d_operation() uses as the same issue is seen also with QEMU 4.x 
where pixman wasn't used and with all versions up to 7.2 so it's also 
not some bisectable change in QEMU. It also happens with 
--enable-debug so it doesn't seem to be related to optimisation either 
and I don't get it on x86_64 but even x86_64 QEMU builds run on Apple 
M1 with Rosetta 2 show the problem. It also only seems to affect 
graphics written from sm501_2d_operation() which AmigaOS4 uses 
extensively but other OSes don't and just render graphics with the 
vcpu which work without problem also on the M1 Macs that show this 
problem with AmigaOS4. Theoretically this could be some missing 
syncronisation which is something ARM and PPC may need while x86 
doesn't but I don't know if this is really the reason and if so where 
and how to fix it). Any idea what may cause this and what could be a 
fix to try?


Any idea anyone? At least some explanation if the above is plausible or 
if there's an option to disable the iothread and run everyting in a 
single thread to verify the theory could help. I've got reports from at 
least 3 people getting this problem but I can't do much to fix it 
without some help.



(Info on how to run it is here:
http://zero.eik.bme.hu/~balaton/qemu/amiga/#amigaos
but AmigaOS4 is not freely distributable so it's a bit hard to 
reproduce. Some Linux X servers that support sm501/sm502 may also use 
the card's 2d engine but I don't know about any live CDs that readily 
run on sam460ex.)


Thank you,
BALATON Zoltan


Sorry, I missed the email.

Indeed the ui backend should call sm501_update_display() in the main 
thread, which should be different from the thread calling 
sm501_2d_operation(). However, if I understand it correctly, both of the 
functions should be called with iothread lock held so there should be no 
race condition in theory.


But there is an exception: memory_region_snapshot_and_clear_dirty() 
releases iothread lock, and that broke raspi3b display device:

https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/

It is unexpected that gfx_update() callback releases iothread lock so it 
may break things in peculiar ways.


Peter, is there any change in the situation regarding the race 
introduced by memory_region_snapshot_and_clear_dirty()?


For now, to workaround the issue, I think you can create another mutex 
and make the entire sm501_2d_engine_write() and sm501_update_display() 
critical sections.


Interesting thread but not sure it's the same problem so this workaround 
may not be enough to fix my issue. Here's a video posted by one of the 
people who reported it showing the problem on M1 Mac:


https://www.youtube.com/watch?v=FDqoNbp6PQs

and here's how it looks like on other machines:

https://www.youtube.com/watch?v=ML7-F4HNFKQ

There are also videos showing it running on RPi 4 and G5 Mac without this 
issue so it seems to only happen on Apple Silicon M1 Macs. What's strange 
is that graphics elements are not just delayed which I think should 
happen with missing thread synchronisation where the update callback 
would miss some pixels rendered during it's running but subsequent update 
callbacks would eventually draw those, woudn't they? Also setting 
full_update to 1 in sm501_update_display() callback to disable dirty 
tracking does not fix the problem. So it looks like as if 
sm501_2d_operation() running on one CPU core only writes data to the 
local cache of that core which sm501_update_display() running on other 
core can't see, so maybe some cache synchronisation is needed in 
memory_region_set_dirty() or if that's already there maybe I should call 
it for all changes not only those in the visible display area? I'm still 
not sure I understand the problem and 

Re: Display update issue on M1 Macs

2023-01-27 Thread Akihiko Odaki




On 2023/01/23 8:28, BALATON Zoltan wrote:

On Thu, 19 Jan 2023, Akihiko Odaki wrote:

On 2023/01/15 3:11, BALATON Zoltan wrote:

On Sat, 14 Jan 2023, Akihiko Odaki wrote:

On 2023/01/13 22:43, BALATON Zoltan wrote:

On Thu, 5 Jan 2023, BALATON Zoltan wrote:

Hello,

I got reports from several users trying to run AmigaOS4 on 
sam460ex on Apple silicon Macs that they get missing graphics that 
I can't reproduce on x86_64. With help from the users who get the 
problem we've narrowed it down to the following:


It looks like that data written to the sm501's ram in 
qemu/hw/display/sm501.c::sm501_2d_operation() is then not seen 
from sm501_update_display() in the same file. The 
sm501_2d_operation() function is called when the guest accesses 
the emulated card so it may run in a different thread than 
sm501_update_display() which is called by the ui backend but I'm 
not sure how QEMU calls these. Is device code running in iothread 
and display update in main thread? The problem is also independent 
of the display backend and was reproduced with both -display cocoa 
and -display sdl.


We have confirmed it's not the pixman routines that 
sm501_2d_operation() uses as the same issue is seen also with QEMU 
4.x where pixman wasn't used and with all versions up to 7.2 so 
it's also not some bisectable change in QEMU. It also happens with 
--enable-debug so it doesn't seem to be related to optimisation 
either and I don't get it on x86_64 but even x86_64 QEMU builds 
run on Apple M1 with Rosetta 2 show the problem. It also only 
seems to affect graphics written from sm501_2d_operation() which 
AmigaOS4 uses extensively but other OSes don't and just render 
graphics with the vcpu which work without problem also on the M1 
Macs that show this problem with AmigaOS4. Theoretically this 
could be some missing syncronisation which is something ARM and 
PPC may need while x86 doesn't but I don't know if this is really 
the reason and if so where and how to fix it). Any idea what may 
cause this and what could be a fix to try?


Any idea anyone? At least some explanation if the above is 
plausible or if there's an option to disable the iothread and run 
everyting in a single thread to verify the theory could help. I've 
got reports from at least 3 people getting this problem but I can't 
do much to fix it without some help.



(Info on how to run it is here:
http://zero.eik.bme.hu/~balaton/qemu/amiga/#amigaos
but AmigaOS4 is not freely distributable so it's a bit hard to 
reproduce. Some Linux X servers that support sm501/sm502 may also 
use the card's 2d engine but I don't know about any live CDs that 
readily run on sam460ex.)


Thank you,
BALATON Zoltan


Sorry, I missed the email.

Indeed the ui backend should call sm501_update_display() in the main 
thread, which should be different from the thread calling 
sm501_2d_operation(). However, if I understand it correctly, both of 
the functions should be called with iothread lock held so there 
should be no race condition in theory.


But there is an exception: memory_region_snapshot_and_clear_dirty() 
releases iothread lock, and that broke raspi3b display device:

https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/

It is unexpected that gfx_update() callback releases iothread lock 
so it may break things in peculiar ways.


Peter, is there any change in the situation regarding the race 
introduced by memory_region_snapshot_and_clear_dirty()?


For now, to workaround the issue, I think you can create another 
mutex and make the entire sm501_2d_engine_write() and 
sm501_update_display() critical sections.


Interesting thread but not sure it's the same problem so this 
workaround may not be enough to fix my issue. Here's a video posted 
by one of the people who reported it showing the problem on M1 Mac:


https://www.youtube.com/watch?v=FDqoNbp6PQs

and here's how it looks like on other machines:

https://www.youtube.com/watch?v=ML7-F4HNFKQ

There are also videos showing it running on RPi 4 and G5 Mac without 
this issue so it seems to only happen on Apple Silicon M1 Macs. 
What's strange is that graphics elements are not just delayed which I 
think should happen with missing thread synchronisation where the 
update callback would miss some pixels rendered during it's running 
but subsequent update callbacks would eventually draw those, woudn't 
they? Also setting full_update to 1 in sm501_update_display() 
callback to disable dirty tracking does not fix the problem. So it 
looks like as if sm501_2d_operation() running on one CPU core only 
writes data to the local cache of that core which 
sm501_update_display() running on other core can't see, so maybe some 
cache synchronisation is needed in memory_region_set_dirty() or if 
that's already there maybe I should call it for all changes not only 
those in the visible display area? I'm still not sure I understand 
the problem and don't know what could be a fix for it so 

Re: Display update issue on M1 Macs

2023-01-22 Thread BALATON Zoltan

On Thu, 19 Jan 2023, Akihiko Odaki wrote:

On 2023/01/15 3:11, BALATON Zoltan wrote:

On Sat, 14 Jan 2023, Akihiko Odaki wrote:

On 2023/01/13 22:43, BALATON Zoltan wrote:

On Thu, 5 Jan 2023, BALATON Zoltan wrote:

Hello,

I got reports from several users trying to run AmigaOS4 on sam460ex on 
Apple silicon Macs that they get missing graphics that I can't reproduce 
on x86_64. With help from the users who get the problem we've narrowed 
it down to the following:


It looks like that data written to the sm501's ram in 
qemu/hw/display/sm501.c::sm501_2d_operation() is then not seen from 
sm501_update_display() in the same file. The sm501_2d_operation() 
function is called when the guest accesses the emulated card so it may 
run in a different thread than sm501_update_display() which is called by 
the ui backend but I'm not sure how QEMU calls these. Is device code 
running in iothread and display update in main thread? The problem is 
also independent of the display backend and was reproduced with both 
-display cocoa and -display sdl.


We have confirmed it's not the pixman routines that sm501_2d_operation() 
uses as the same issue is seen also with QEMU 4.x where pixman wasn't 
used and with all versions up to 7.2 so it's also not some bisectable 
change in QEMU. It also happens with --enable-debug so it doesn't seem 
to be related to optimisation either and I don't get it on x86_64 but 
even x86_64 QEMU builds run on Apple M1 with Rosetta 2 show the problem. 
It also only seems to affect graphics written from sm501_2d_operation() 
which AmigaOS4 uses extensively but other OSes don't and just render 
graphics with the vcpu which work without problem also on the M1 Macs 
that show this problem with AmigaOS4. Theoretically this could be some 
missing syncronisation which is something ARM and PPC may need while x86 
doesn't but I don't know if this is really the reason and if so where 
and how to fix it). Any idea what may cause this and what could be a fix 
to try?


Any idea anyone? At least some explanation if the above is plausible or 
if there's an option to disable the iothread and run everyting in a 
single thread to verify the theory could help. I've got reports from at 
least 3 people getting this problem but I can't do much to fix it without 
some help.



(Info on how to run it is here:
http://zero.eik.bme.hu/~balaton/qemu/amiga/#amigaos
but AmigaOS4 is not freely distributable so it's a bit hard to 
reproduce. Some Linux X servers that support sm501/sm502 may also use 
the card's 2d engine but I don't know about any live CDs that readily 
run on sam460ex.)


Thank you,
BALATON Zoltan


Sorry, I missed the email.

Indeed the ui backend should call sm501_update_display() in the main 
thread, which should be different from the thread calling 
sm501_2d_operation(). However, if I understand it correctly, both of the 
functions should be called with iothread lock held so there should be no 
race condition in theory.


But there is an exception: memory_region_snapshot_and_clear_dirty() 
releases iothread lock, and that broke raspi3b display device:

https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/

It is unexpected that gfx_update() callback releases iothread lock so it 
may break things in peculiar ways.


Peter, is there any change in the situation regarding the race introduced 
by memory_region_snapshot_and_clear_dirty()?


For now, to workaround the issue, I think you can create another mutex and 
make the entire sm501_2d_engine_write() and sm501_update_display() 
critical sections.


Interesting thread but not sure it's the same problem so this workaround 
may not be enough to fix my issue. Here's a video posted by one of the 
people who reported it showing the problem on M1 Mac:


https://www.youtube.com/watch?v=FDqoNbp6PQs

and here's how it looks like on other machines:

https://www.youtube.com/watch?v=ML7-F4HNFKQ

There are also videos showing it running on RPi 4 and G5 Mac without this 
issue so it seems to only happen on Apple Silicon M1 Macs. What's strange 
is that graphics elements are not just delayed which I think should happen 
with missing thread synchronisation where the update callback would miss 
some pixels rendered during it's running but subsequent update callbacks 
would eventually draw those, woudn't they? Also setting full_update to 1 in 
sm501_update_display() callback to disable dirty tracking does not fix the 
problem. So it looks like as if sm501_2d_operation() running on one CPU 
core only writes data to the local cache of that core which 
sm501_update_display() running on other core can't see, so maybe some cache 
synchronisation is needed in memory_region_set_dirty() or if that's already 
there maybe I should call it for all changes not only those in the visible 
display area? I'm still not sure I understand the problem and don't know 
what could be a fix for it so anything to test to identify the issue better 

Re: Display update issue on M1 Macs

2023-01-19 Thread Akihiko Odaki

On 2023/01/15 3:11, BALATON Zoltan wrote:

On Sat, 14 Jan 2023, Akihiko Odaki wrote:

On 2023/01/13 22:43, BALATON Zoltan wrote:

On Thu, 5 Jan 2023, BALATON Zoltan wrote:

Hello,

I got reports from several users trying to run AmigaOS4 on sam460ex 
on Apple silicon Macs that they get missing graphics that I can't 
reproduce on x86_64. With help from the users who get the problem 
we've narrowed it down to the following:


It looks like that data written to the sm501's ram in 
qemu/hw/display/sm501.c::sm501_2d_operation() is then not seen from 
sm501_update_display() in the same file. The sm501_2d_operation() 
function is called when the guest accesses the emulated card so it 
may run in a different thread than sm501_update_display() which is 
called by the ui backend but I'm not sure how QEMU calls these. Is 
device code running in iothread and display update in main thread? 
The problem is also independent of the display backend and was 
reproduced with both -display cocoa and -display sdl.


We have confirmed it's not the pixman routines that 
sm501_2d_operation() uses as the same issue is seen also with QEMU 
4.x where pixman wasn't used and with all versions up to 7.2 so it's 
also not some bisectable change in QEMU. It also happens with 
--enable-debug so it doesn't seem to be related to optimisation 
either and I don't get it on x86_64 but even x86_64 QEMU builds run 
on Apple M1 with Rosetta 2 show the problem. It also only seems to 
affect graphics written from sm501_2d_operation() which AmigaOS4 
uses extensively but other OSes don't and just render graphics with 
the vcpu which work without problem also on the M1 Macs that show 
this problem with AmigaOS4. Theoretically this could be some missing 
syncronisation which is something ARM and PPC may need while x86 
doesn't but I don't know if this is really the reason and if so 
where and how to fix it). Any idea what may cause this and what 
could be a fix to try?


Any idea anyone? At least some explanation if the above is plausible 
or if there's an option to disable the iothread and run everyting in 
a single thread to verify the theory could help. I've got reports 
from at least 3 people getting this problem but I can't do much to 
fix it without some help.



(Info on how to run it is here:
http://zero.eik.bme.hu/~balaton/qemu/amiga/#amigaos
but AmigaOS4 is not freely distributable so it's a bit hard to 
reproduce. Some Linux X servers that support sm501/sm502 may also 
use the card's 2d engine but I don't know about any live CDs that 
readily run on sam460ex.)


Thank you,
BALATON Zoltan


Sorry, I missed the email.

Indeed the ui backend should call sm501_update_display() in the main 
thread, which should be different from the thread calling 
sm501_2d_operation(). However, if I understand it correctly, both of 
the functions should be called with iothread lock held so there should 
be no race condition in theory.


But there is an exception: memory_region_snapshot_and_clear_dirty() 
releases iothread lock, and that broke raspi3b display device:

https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/

It is unexpected that gfx_update() callback releases iothread lock so 
it may break things in peculiar ways.


Peter, is there any change in the situation regarding the race 
introduced by memory_region_snapshot_and_clear_dirty()?


For now, to workaround the issue, I think you can create another mutex 
and make the entire sm501_2d_engine_write() and sm501_update_display() 
critical sections.


Interesting thread but not sure it's the same problem so this workaround 
may not be enough to fix my issue. Here's a video posted by one of the 
people who reported it showing the problem on M1 Mac:


https://www.youtube.com/watch?v=FDqoNbp6PQs

and here's how it looks like on other machines:

https://www.youtube.com/watch?v=ML7-F4HNFKQ

There are also videos showing it running on RPi 4 and G5 Mac without 
this issue so it seems to only happen on Apple Silicon M1 Macs. What's 
strange is that graphics elements are not just delayed which I think 
should happen with missing thread synchronisation where the update 
callback would miss some pixels rendered during it's running but 
subsequent update callbacks would eventually draw those, woudn't they? 
Also setting full_update to 1 in sm501_update_display() callback to 
disable dirty tracking does not fix the problem. So it looks like as if 
sm501_2d_operation() running on one CPU core only writes data to the 
local cache of that core which sm501_update_display() running on other 
core can't see, so maybe some cache synchronisation is needed in 
memory_region_set_dirty() or if that's already there maybe I should call 
it for all changes not only those in the visible display area? I'm still 
not sure I understand the problem and don't know what could be a fix for 
it so anything to test to identify the issue better might also bring us 
closer to a solution.


Re: Display update issue on M1 Macs

2023-01-14 Thread BALATON Zoltan

On Sat, 14 Jan 2023, Akihiko Odaki wrote:

On 2023/01/13 22:43, BALATON Zoltan wrote:

On Thu, 5 Jan 2023, BALATON Zoltan wrote:

Hello,

I got reports from several users trying to run AmigaOS4 on sam460ex on 
Apple silicon Macs that they get missing graphics that I can't reproduce 
on x86_64. With help from the users who get the problem we've narrowed it 
down to the following:


It looks like that data written to the sm501's ram in 
qemu/hw/display/sm501.c::sm501_2d_operation() is then not seen from 
sm501_update_display() in the same file. The sm501_2d_operation() function 
is called when the guest accesses the emulated card so it may run in a 
different thread than sm501_update_display() which is called by the ui 
backend but I'm not sure how QEMU calls these. Is device code running in 
iothread and display update in main thread? The problem is also 
independent of the display backend and was reproduced with both -display 
cocoa and -display sdl.


We have confirmed it's not the pixman routines that sm501_2d_operation() 
uses as the same issue is seen also with QEMU 4.x where pixman wasn't used 
and with all versions up to 7.2 so it's also not some bisectable change in 
QEMU. It also happens with --enable-debug so it doesn't seem to be related 
to optimisation either and I don't get it on x86_64 but even x86_64 QEMU 
builds run on Apple M1 with Rosetta 2 show the problem. It also only seems 
to affect graphics written from sm501_2d_operation() which AmigaOS4 uses 
extensively but other OSes don't and just render graphics with the vcpu 
which work without problem also on the M1 Macs that show this problem with 
AmigaOS4. Theoretically this could be some missing syncronisation which is 
something ARM and PPC may need while x86 doesn't but I don't know if this 
is really the reason and if so where and how to fix it). Any idea what may 
cause this and what could be a fix to try?


Any idea anyone? At least some explanation if the above is plausible or if 
there's an option to disable the iothread and run everyting in a single 
thread to verify the theory could help. I've got reports from at least 3 
people getting this problem but I can't do much to fix it without some 
help.



(Info on how to run it is here:
http://zero.eik.bme.hu/~balaton/qemu/amiga/#amigaos
but AmigaOS4 is not freely distributable so it's a bit hard to reproduce. 
Some Linux X servers that support sm501/sm502 may also use the card's 2d 
engine but I don't know about any live CDs that readily run on sam460ex.)


Thank you,
BALATON Zoltan


Sorry, I missed the email.

Indeed the ui backend should call sm501_update_display() in the main thread, 
which should be different from the thread calling sm501_2d_operation(). 
However, if I understand it correctly, both of the functions should be called 
with iothread lock held so there should be no race condition in theory.


But there is an exception: memory_region_snapshot_and_clear_dirty() releases 
iothread lock, and that broke raspi3b display device:

https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/

It is unexpected that gfx_update() callback releases iothread lock so it may 
break things in peculiar ways.


Peter, is there any change in the situation regarding the race introduced by 
memory_region_snapshot_and_clear_dirty()?


For now, to workaround the issue, I think you can create another mutex and 
make the entire sm501_2d_engine_write() and sm501_update_display() critical 
sections.


Interesting thread but not sure it's the same problem so this workaround 
may not be enough to fix my issue. Here's a video posted by one of the 
people who reported it showing the problem on M1 Mac:


https://www.youtube.com/watch?v=FDqoNbp6PQs

and here's how it looks like on other machines:

https://www.youtube.com/watch?v=ML7-F4HNFKQ

There are also videos showing it running on RPi 4 and G5 Mac without this 
issue so it seems to only happen on Apple Silicon M1 Macs. What's strange 
is that graphics elements are not just delayed which I think should happen 
with missing thread synchronisation where the update callback would miss 
some pixels rendered during it's running but subsequent update callbacks 
would eventually draw those, woudn't they? Also setting full_update to 1 
in sm501_update_display() callback to disable dirty tracking does not fix 
the problem. So it looks like as if sm501_2d_operation() running on one 
CPU core only writes data to the local cache of that core which 
sm501_update_display() running on other core can't see, so maybe some 
cache synchronisation is needed in memory_region_set_dirty() or if that's 
already there maybe I should call it for all changes not only those in the 
visible display area? I'm still not sure I understand the problem and 
don't know what could be a fix for it so anything to test to identify the 
issue better might also bring us closer to a solution.


Regards,
BALATON Zoltan



Re: Display update issue on M1 Macs

2023-01-13 Thread Akihiko Odaki

On 2023/01/13 22:43, BALATON Zoltan wrote:

On Thu, 5 Jan 2023, BALATON Zoltan wrote:

Hello,

I got reports from several users trying to run AmigaOS4 on sam460ex on 
Apple silicon Macs that they get missing graphics that I can't 
reproduce on x86_64. With help from the users who get the problem 
we've narrowed it down to the following:


It looks like that data written to the sm501's ram in 
qemu/hw/display/sm501.c::sm501_2d_operation() is then not seen from 
sm501_update_display() in the same file. The sm501_2d_operation() 
function is called when the guest accesses the emulated card so it may 
run in a different thread than sm501_update_display() which is called 
by the ui backend but I'm not sure how QEMU calls these. Is device 
code running in iothread and display update in main thread? The 
problem is also independent of the display backend and was reproduced 
with both -display cocoa and -display sdl.


We have confirmed it's not the pixman routines that 
sm501_2d_operation() uses as the same issue is seen also with QEMU 4.x 
where pixman wasn't used and with all versions up to 7.2 so it's also 
not some bisectable change in QEMU. It also happens with 
--enable-debug so it doesn't seem to be related to optimisation either 
and I don't get it on x86_64 but even x86_64 QEMU builds run on Apple 
M1 with Rosetta 2 show the problem. It also only seems to affect 
graphics written from sm501_2d_operation() which AmigaOS4 uses 
extensively but other OSes don't and just render graphics with the 
vcpu which work without problem also on the M1 Macs that show this 
problem with AmigaOS4. Theoretically this could be some missing 
syncronisation which is something ARM and PPC may need while x86 
doesn't but I don't know if this is really the reason and if so where 
and how to fix it). Any idea what may cause this and what could be a 
fix to try?


Any idea anyone? At least some explanation if the above is plausible or 
if there's an option to disable the iothread and run everyting in a 
single thread to verify the theory could help. I've got reports from at 
least 3 people getting this problem but I can't do much to fix it 
without some help.



(Info on how to run it is here:
http://zero.eik.bme.hu/~balaton/qemu/amiga/#amigaos
but AmigaOS4 is not freely distributable so it's a bit hard to 
reproduce. Some Linux X servers that support sm501/sm502 may also use 
the card's 2d engine but I don't know about any live CDs that readily 
run on sam460ex.)


Thank you,
BALATON Zoltan


Sorry, I missed the email.

Indeed the ui backend should call sm501_update_display() in the main 
thread, which should be different from the thread calling 
sm501_2d_operation(). However, if I understand it correctly, both of the 
functions should be called with iothread lock held so there should be no 
race condition in theory.


But there is an exception: memory_region_snapshot_and_clear_dirty() 
releases iothread lock, and that broke raspi3b display device:

https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/

It is unexpected that gfx_update() callback releases iothread lock so it 
may break things in peculiar ways.


Peter, is there any change in the situation regarding the race 
introduced by memory_region_snapshot_and_clear_dirty()?


For now, to workaround the issue, I think you can create another mutex 
and make the entire sm501_2d_engine_write() and sm501_update_display() 
critical sections.


Regards,
Akihiko Odaki



Re: Display update issue on M1 Macs

2023-01-13 Thread BALATON Zoltan

On Thu, 5 Jan 2023, BALATON Zoltan wrote:

Hello,

I got reports from several users trying to run AmigaOS4 on sam460ex on Apple 
silicon Macs that they get missing graphics that I can't reproduce on x86_64. 
With help from the users who get the problem we've narrowed it down to the 
following:


It looks like that data written to the sm501's ram in 
qemu/hw/display/sm501.c::sm501_2d_operation() is then not seen from 
sm501_update_display() in the same file. The sm501_2d_operation() function is 
called when the guest accesses the emulated card so it may run in a different 
thread than sm501_update_display() which is called by the ui backend but I'm 
not sure how QEMU calls these. Is device code running in iothread and display 
update in main thread? The problem is also independent of the display backend 
and was reproduced with both -display cocoa and -display sdl.


We have confirmed it's not the pixman routines that sm501_2d_operation() uses 
as the same issue is seen also with QEMU 4.x where pixman wasn't used and 
with all versions up to 7.2 so it's also not some bisectable change in QEMU. 
It also happens with --enable-debug so it doesn't seem to be related to 
optimisation either and I don't get it on x86_64 but even x86_64 QEMU builds 
run on Apple M1 with Rosetta 2 show the problem. It also only seems to affect 
graphics written from sm501_2d_operation() which AmigaOS4 uses extensively 
but other OSes don't and just render graphics with the vcpu which work 
without problem also on the M1 Macs that show this problem with AmigaOS4. 
Theoretically this could be some missing syncronisation which is something 
ARM and PPC may need while x86 doesn't but I don't know if this is really the 
reason and if so where and how to fix it). Any idea what may cause this and 
what could be a fix to try?


Any idea anyone? At least some explanation if the above is plausible or if 
there's an option to disable the iothread and run everyting in a single 
thread to verify the theory could help. I've got reports from at least 3 
people getting this problem but I can't do much to fix it without some 
help.



(Info on how to run it is here:
http://zero.eik.bme.hu/~balaton/qemu/amiga/#amigaos
but AmigaOS4 is not freely distributable so it's a bit hard to reproduce. 
Some Linux X servers that support sm501/sm502 may also use the card's 2d 
engine but I don't know about any live CDs that readily run on sam460ex.)


Thank you,
BALATON Zoltan




Display update issue on M1 Macs

2023-01-04 Thread BALATON Zoltan

Hello,

I got reports from several users trying to run AmigaOS4 on sam460ex on 
Apple silicon Macs that they get missing graphics that I can't reproduce 
on x86_64. With help from the users who get the problem we've narrowed it 
down to the following:


It looks like that data written to the sm501's ram in 
qemu/hw/display/sm501.c::sm501_2d_operation() is then not seen from 
sm501_update_display() in the same file. The sm501_2d_operation() function 
is called when the guest accesses the emulated card so it may run in a 
different thread than sm501_update_display() which is called by the ui 
backend but I'm not sure how QEMU calls these. Is device code running in 
iothread and display update in main thread? The problem is also 
independent of the display backend and was reproduced with both -display 
cocoa and -display sdl.


We have confirmed it's not the pixman routines that sm501_2d_operation() 
uses as the same issue is seen also with QEMU 4.x where pixman wasn't used 
and with all versions up to 7.2 so it's also not some bisectable change in 
QEMU. It also happens with --enable-debug so it doesn't seem to be related 
to optimisation either and I don't get it on x86_64 but even x86_64 QEMU 
builds run on Apple M1 with Rosetta 2 show the problem. It also only seems 
to affect graphics written from sm501_2d_operation() which AmigaOS4 uses 
extensively but other OSes don't and just render graphics with the vcpu 
which work without problem also on the M1 Macs that show this problem with 
AmigaOS4. Theoretically this could be some missing syncronisation which is 
something ARM and PPC may need while x86 doesn't but I don't know if this 
is really the reason and if so where and how to fix it). Any idea what may 
cause this and what could be a fix to try?


(Info on how to run it is here:
http://zero.eik.bme.hu/~balaton/qemu/amiga/#amigaos
but AmigaOS4 is not freely distributable so it's a bit hard to reproduce. 
Some Linux X servers that support sm501/sm502 may also use the card's 2d 
engine but I don't know about any live CDs that readily run on sam460ex.)


Thank you,
BALATON Zoltan