Re: Display update issue on M1 Macs
On 2023/02/03 22:45, BALATON Zoltan wrote: On Fri, 3 Feb 2023, Akihiko Odaki wrote: On 2023/02/02 19:51, BALATON Zoltan wrote: On Tue, 31 Jan 2023, BALATON Zoltan wrote: On Tue, 31 Jan 2023, Akihiko Odaki wrote: [...] To summarise previous discussion: - There's a problem on Apple M1 Macs with sm501 and ati-vga 2d accel functions drawing from device model into the video memory of the emulated card which is not shown on screen when the display update callback is called from another thread. This works on x86_64 host so I suspect it may be related to missing memory synchronisation that ARM may need. - This can be reproduced running AmigaOS4 on sam460ex or MorphOS (demo iso downliadable from their web site) on sam460ex, pegasos2 or mac99,via=pmu with -device ati-vga,romfile="" as described here: http://zero.eik.bme.hu/~balaton/qemu/amiga/ - I can't test it myself lacking hardware so I have to rely on reports from people who have this hardware so there may be some uncertainity in the info I get. - We have confirmed it's not related to a known race condition as disabling dirty tracking and always doing full updates of whole screen did not fix it: But there is an exception: memory_region_snapshot_and_clear_dirty() releases iothread lock, and that broke raspi3b display device: https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/ It is unexpected that gfx_update() callback releases iothread lock so it may break things in peculiar ways. Peter, is there any change in the situation regarding the race introduced by memory_region_snapshot_and_clear_dirty()? For now, to workaround the issue, I think you can create another mutex and make the entire sm501_2d_engine_write() and sm501_update_display() critical sections. Interesting thread but not sure it's the same problem so this workaround may not be enough to fix my issue. Here's a video posted by one of the people who reported it showing the problem on M1 Mac: https://www.youtube.com/watch?v=FDqoNbp6PQs and here's how it looks like on other machines: https://www.youtube.com/watch?v=ML7-F4HNFKQ There are also videos showing it running on RPi 4 and G5 Mac without this issue so it seems to only happen on Apple Silicon M1 Macs. What's strange is that graphics elements are not just delayed which I think should happen with missing thread synchronisation where the update callback would miss some pixels rendered during it's running but subsequent update callbacks would eventually draw those, woudn't they? Also setting full_update to 1 in sm501_update_display() callback to disable dirty tracking does not fix the problem. So it looks like as if sm501_2d_operation() running on one CPU core only writes data to the local cache of that core which sm501_update_display() running on other core can't see, so maybe some cache synchronisation is needed in memory_region_set_dirty() or if that's already there maybe I should call it for all changes not only those in the visible display area? I'm still not sure I understand the problem and don't know what could be a fix for it so anything to test to identify the issue better might also bring us closer to a solution. If you set full_update to 1, you may also comment out memory_region_snapshot_and_clear_dirty() and memory_region_snapshot_get_dirty() to avoid the iothread mutex being unlocked. The iothread mutex should ensure cache coherency as well. But as you say, it's weird that the rendered result is not just delayed but missed. That may imply other possibilities (e.g., the results are overwritten by someone else). If the problem persists after commenting out memory_region_snapshot_and_clear_dirty() and memory_region_snapshot_get_dirty(), I think you can assume the inter-thread coherency between sm501_2d_operation() and sm501_update_display() is not causing the problem. I've asked people who reported and can reproduce it to test this but it did not change anything so confirmed it's not that race condition but looks more like some cache inconsistency maybe. Any other ideas? I can come up with two important differences between x86 and Arm which can affect the execution of QEMU: 1. Memory model. Arm uses a memory model more relaxed than x86 so it is more sensitive for synchronization failures among threads. 2. Different instructions. TCG uses JIT so differences in instructions matter. We should be able to exclude 1) as a potential cause of the problem. iothread mutex should take care of race condition and even cache coherency problem; mutex includes memory barrier functionality. [...] For difference 2), you may try to use TCI. You can find details of TCI in tcg/tci/README. This was tested and also with TCI got the same results just much slower. The common sense tells, however, the memory model is usually the cause of the problem when you see behavioral differences between x86 and Arm, and
Re: Display update issue on M1 Macs
On Fri, 3 Feb 2023, Akihiko Odaki wrote: On 2023/02/02 19:51, BALATON Zoltan wrote: On Tue, 31 Jan 2023, BALATON Zoltan wrote: On Tue, 31 Jan 2023, Akihiko Odaki wrote: [...] To summarise previous discussion: - There's a problem on Apple M1 Macs with sm501 and ati-vga 2d accel functions drawing from device model into the video memory of the emulated card which is not shown on screen when the display update callback is called from another thread. This works on x86_64 host so I suspect it may be related to missing memory synchronisation that ARM may need. - This can be reproduced running AmigaOS4 on sam460ex or MorphOS (demo iso downliadable from their web site) on sam460ex, pegasos2 or mac99,via=pmu with -device ati-vga,romfile="" as described here: http://zero.eik.bme.hu/~balaton/qemu/amiga/ - I can't test it myself lacking hardware so I have to rely on reports from people who have this hardware so there may be some uncertainity in the info I get. - We have confirmed it's not related to a known race condition as disabling dirty tracking and always doing full updates of whole screen did not fix it: But there is an exception: memory_region_snapshot_and_clear_dirty() releases iothread lock, and that broke raspi3b display device: https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/ It is unexpected that gfx_update() callback releases iothread lock so it may break things in peculiar ways. Peter, is there any change in the situation regarding the race introduced by memory_region_snapshot_and_clear_dirty()? For now, to workaround the issue, I think you can create another mutex and make the entire sm501_2d_engine_write() and sm501_update_display() critical sections. Interesting thread but not sure it's the same problem so this workaround may not be enough to fix my issue. Here's a video posted by one of the people who reported it showing the problem on M1 Mac: https://www.youtube.com/watch?v=FDqoNbp6PQs and here's how it looks like on other machines: https://www.youtube.com/watch?v=ML7-F4HNFKQ There are also videos showing it running on RPi 4 and G5 Mac without this issue so it seems to only happen on Apple Silicon M1 Macs. What's strange is that graphics elements are not just delayed which I think should happen with missing thread synchronisation where the update callback would miss some pixels rendered during it's running but subsequent update callbacks would eventually draw those, woudn't they? Also setting full_update to 1 in sm501_update_display() callback to disable dirty tracking does not fix the problem. So it looks like as if sm501_2d_operation() running on one CPU core only writes data to the local cache of that core which sm501_update_display() running on other core can't see, so maybe some cache synchronisation is needed in memory_region_set_dirty() or if that's already there maybe I should call it for all changes not only those in the visible display area? I'm still not sure I understand the problem and don't know what could be a fix for it so anything to test to identify the issue better might also bring us closer to a solution. If you set full_update to 1, you may also comment out memory_region_snapshot_and_clear_dirty() and memory_region_snapshot_get_dirty() to avoid the iothread mutex being unlocked. The iothread mutex should ensure cache coherency as well. But as you say, it's weird that the rendered result is not just delayed but missed. That may imply other possibilities (e.g., the results are overwritten by someone else). If the problem persists after commenting out memory_region_snapshot_and_clear_dirty() and memory_region_snapshot_get_dirty(), I think you can assume the inter-thread coherency between sm501_2d_operation() and sm501_update_display() is not causing the problem. I've asked people who reported and can reproduce it to test this but it did not change anything so confirmed it's not that race condition but looks more like some cache inconsistency maybe. Any other ideas? I can come up with two important differences between x86 and Arm which can affect the execution of QEMU: 1. Memory model. Arm uses a memory model more relaxed than x86 so it is more sensitive for synchronization failures among threads. 2. Different instructions. TCG uses JIT so differences in instructions matter. We should be able to exclude 1) as a potential cause of the problem. iothread mutex should take care of race condition and even cache coherency problem; mutex includes memory barrier functionality. [...] For difference 2), you may try to use TCI. You can find details of TCI in tcg/tci/README. This was tested and also with TCI got the same results just much slower. The common sense tells, however, the memory model is usually the cause of the problem when you see behavioral differences between x86 and Arm, and TCG should work fine with both of x86 and Arm as they
Re: Display update issue on M1 Macs
On 2023/02/02 19:51, BALATON Zoltan wrote: On Tue, 31 Jan 2023, BALATON Zoltan wrote: On Tue, 31 Jan 2023, Akihiko Odaki wrote: [...] To summarise previous discussion: - There's a problem on Apple M1 Macs with sm501 and ati-vga 2d accel functions drawing from device model into the video memory of the emulated card which is not shown on screen when the display update callback is called from another thread. This works on x86_64 host so I suspect it may be related to missing memory synchronisation that ARM may need. - This can be reproduced running AmigaOS4 on sam460ex or MorphOS (demo iso downliadable from their web site) on sam460ex, pegasos2 or mac99,via=pmu with -device ati-vga,romfile="" as described here: http://zero.eik.bme.hu/~balaton/qemu/amiga/ - I can't test it myself lacking hardware so I have to rely on reports from people who have this hardware so there may be some uncertainity in the info I get. - We have confirmed it's not related to a known race condition as disabling dirty tracking and always doing full updates of whole screen did not fix it: But there is an exception: memory_region_snapshot_and_clear_dirty() releases iothread lock, and that broke raspi3b display device: https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/ It is unexpected that gfx_update() callback releases iothread lock so it may break things in peculiar ways. Peter, is there any change in the situation regarding the race introduced by memory_region_snapshot_and_clear_dirty()? For now, to workaround the issue, I think you can create another mutex and make the entire sm501_2d_engine_write() and sm501_update_display() critical sections. Interesting thread but not sure it's the same problem so this workaround may not be enough to fix my issue. Here's a video posted by one of the people who reported it showing the problem on M1 Mac: https://www.youtube.com/watch?v=FDqoNbp6PQs and here's how it looks like on other machines: https://www.youtube.com/watch?v=ML7-F4HNFKQ There are also videos showing it running on RPi 4 and G5 Mac without this issue so it seems to only happen on Apple Silicon M1 Macs. What's strange is that graphics elements are not just delayed which I think should happen with missing thread synchronisation where the update callback would miss some pixels rendered during it's running but subsequent update callbacks would eventually draw those, woudn't they? Also setting full_update to 1 in sm501_update_display() callback to disable dirty tracking does not fix the problem. So it looks like as if sm501_2d_operation() running on one CPU core only writes data to the local cache of that core which sm501_update_display() running on other core can't see, so maybe some cache synchronisation is needed in memory_region_set_dirty() or if that's already there maybe I should call it for all changes not only those in the visible display area? I'm still not sure I understand the problem and don't know what could be a fix for it so anything to test to identify the issue better might also bring us closer to a solution. If you set full_update to 1, you may also comment out memory_region_snapshot_and_clear_dirty() and memory_region_snapshot_get_dirty() to avoid the iothread mutex being unlocked. The iothread mutex should ensure cache coherency as well. But as you say, it's weird that the rendered result is not just delayed but missed. That may imply other possibilities (e.g., the results are overwritten by someone else). If the problem persists after commenting out memory_region_snapshot_and_clear_dirty() and memory_region_snapshot_get_dirty(), I think you can assume the inter-thread coherency between sm501_2d_operation() and sm501_update_display() is not causing the problem. I've asked people who reported and can reproduce it to test this but it did not change anything so confirmed it's not that race condition but looks more like some cache inconsistency maybe. Any other ideas? I can come up with two important differences between x86 and Arm which can affect the execution of QEMU: 1. Memory model. Arm uses a memory model more relaxed than x86 so it is more sensitive for synchronization failures among threads. 2. Different instructions. TCG uses JIT so differences in instructions matter. We should be able to exclude 1) as a potential cause of the problem. iothread mutex should take care of race condition and even cache coherency problem; mutex includes memory barrier functionality. [...] For difference 2), you may try to use TCI. You can find details of TCI in tcg/tci/README. This was tested and also with TCI got the same results just much slower. The common sense tells, however, the memory model is usually the cause of the problem when you see behavioral differences between x86 and Arm, and TCG should work fine with both of x86 and Arm as they should have been tested well. [...]
Re: Display update issue on M1 Macs
On Tue, 31 Jan 2023, BALATON Zoltan wrote: On Tue, 31 Jan 2023, Akihiko Odaki wrote: [...] To summarise previous discussion: - There's a problem on Apple M1 Macs with sm501 and ati-vga 2d accel functions drawing from device model into the video memory of the emulated card which is not shown on screen when the display update callback is called from another thread. This works on x86_64 host so I suspect it may be related to missing memory synchronisation that ARM may need. - This can be reproduced running AmigaOS4 on sam460ex or MorphOS (demo iso downliadable from their web site) on sam460ex, pegasos2 or mac99,via=pmu with -device ati-vga,romfile="" as described here: http://zero.eik.bme.hu/~balaton/qemu/amiga/ - I can't test it myself lacking hardware so I have to rely on reports from people who have this hardware so there may be some uncertainity in the info I get. - We have confirmed it's not related to a known race condition as disabling dirty tracking and always doing full updates of whole screen did not fix it: But there is an exception: memory_region_snapshot_and_clear_dirty() releases iothread lock, and that broke raspi3b display device: https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/ It is unexpected that gfx_update() callback releases iothread lock so it may break things in peculiar ways. Peter, is there any change in the situation regarding the race introduced by memory_region_snapshot_and_clear_dirty()? For now, to workaround the issue, I think you can create another mutex and make the entire sm501_2d_engine_write() and sm501_update_display() critical sections. Interesting thread but not sure it's the same problem so this workaround may not be enough to fix my issue. Here's a video posted by one of the people who reported it showing the problem on M1 Mac: https://www.youtube.com/watch?v=FDqoNbp6PQs and here's how it looks like on other machines: https://www.youtube.com/watch?v=ML7-F4HNFKQ There are also videos showing it running on RPi 4 and G5 Mac without this issue so it seems to only happen on Apple Silicon M1 Macs. What's strange is that graphics elements are not just delayed which I think should happen with missing thread synchronisation where the update callback would miss some pixels rendered during it's running but subsequent update callbacks would eventually draw those, woudn't they? Also setting full_update to 1 in sm501_update_display() callback to disable dirty tracking does not fix the problem. So it looks like as if sm501_2d_operation() running on one CPU core only writes data to the local cache of that core which sm501_update_display() running on other core can't see, so maybe some cache synchronisation is needed in memory_region_set_dirty() or if that's already there maybe I should call it for all changes not only those in the visible display area? I'm still not sure I understand the problem and don't know what could be a fix for it so anything to test to identify the issue better might also bring us closer to a solution. If you set full_update to 1, you may also comment out memory_region_snapshot_and_clear_dirty() and memory_region_snapshot_get_dirty() to avoid the iothread mutex being unlocked. The iothread mutex should ensure cache coherency as well. But as you say, it's weird that the rendered result is not just delayed but missed. That may imply other possibilities (e.g., the results are overwritten by someone else). If the problem persists after commenting out memory_region_snapshot_and_clear_dirty() and memory_region_snapshot_get_dirty(), I think you can assume the inter-thread coherency between sm501_2d_operation() and sm501_update_display() is not causing the problem. I've asked people who reported and can reproduce it to test this but it did not change anything so confirmed it's not that race condition but looks more like some cache inconsistency maybe. Any other ideas? I can come up with two important differences between x86 and Arm which can affect the execution of QEMU: 1. Memory model. Arm uses a memory model more relaxed than x86 so it is more sensitive for synchronization failures among threads. 2. Different instructions. TCG uses JIT so differences in instructions matter. We should be able to exclude 1) as a potential cause of the problem. iothread mutex should take care of race condition and even cache coherency problem; mutex includes memory barrier functionality. [...] For difference 2), you may try to use TCI. You can find details of TCI in tcg/tci/README. This was tested and also with TCI got the same results just much slower. The common sense tells, however, the memory model is usually the cause of the problem when you see behavioral differences between x86 and Arm, and TCG should work fine with both of x86 and Arm as they should have been tested well. [...] Fortunately macOS provides Rosetta 2 for x86
Re: Display update issue on M1 Macs
On Tue, 31 Jan 2023, Akihiko Odaki wrote: On 2023/01/31 8:58, BALATON Zoltan wrote: On Sat, 28 Jan 2023, Akihiko Odaki wrote: On 2023/01/23 8:28, BALATON Zoltan wrote: On Thu, 19 Jan 2023, Akihiko Odaki wrote: On 2023/01/15 3:11, BALATON Zoltan wrote: On Sat, 14 Jan 2023, Akihiko Odaki wrote: On 2023/01/13 22:43, BALATON Zoltan wrote: On Thu, 5 Jan 2023, BALATON Zoltan wrote: Hello, I got reports from several users trying to run AmigaOS4 on sam460ex on Apple silicon Macs that they get missing graphics that I can't reproduce on x86_64. With help from the users who get the problem we've narrowed it down to the following: It looks like that data written to the sm501's ram in qemu/hw/display/sm501.c::sm501_2d_operation() is then not seen from sm501_update_display() in the same file. The sm501_2d_operation() function is called when the guest accesses the emulated card so it may run in a different thread than sm501_update_display() which is called by the ui backend but I'm not sure how QEMU calls these. Is device code running in iothread and display update in main thread? The problem is also independent of the display backend and was reproduced with both -display cocoa and -display sdl. We have confirmed it's not the pixman routines that sm501_2d_operation() uses as the same issue is seen also with QEMU 4.x where pixman wasn't used and with all versions up to 7.2 so it's also not some bisectable change in QEMU. It also happens with --enable-debug so it doesn't seem to be related to optimisation either and I don't get it on x86_64 but even x86_64 QEMU builds run on Apple M1 with Rosetta 2 show the problem. It also only seems to affect graphics written from sm501_2d_operation() which AmigaOS4 uses extensively but other OSes don't and just render graphics with the vcpu which work without problem also on the M1 Macs that show this problem with AmigaOS4. Theoretically this could be some missing syncronisation which is something ARM and PPC may need while x86 doesn't but I don't know if this is really the reason and if so where and how to fix it). Any idea what may cause this and what could be a fix to try? Any idea anyone? At least some explanation if the above is plausible or if there's an option to disable the iothread and run everyting in a single thread to verify the theory could help. I've got reports from at least 3 people getting this problem but I can't do much to fix it without some help. (Info on how to run it is here: http://zero.eik.bme.hu/~balaton/qemu/amiga/#amigaos but AmigaOS4 is not freely distributable so it's a bit hard to reproduce. Some Linux X servers that support sm501/sm502 may also use the card's 2d engine but I don't know about any live CDs that readily run on sam460ex.) Thank you, BALATON Zoltan Sorry, I missed the email. Indeed the ui backend should call sm501_update_display() in the main thread, which should be different from the thread calling sm501_2d_operation(). However, if I understand it correctly, both of the functions should be called with iothread lock held so there should be no race condition in theory. But there is an exception: memory_region_snapshot_and_clear_dirty() releases iothread lock, and that broke raspi3b display device: https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/ It is unexpected that gfx_update() callback releases iothread lock so it may break things in peculiar ways. Peter, is there any change in the situation regarding the race introduced by memory_region_snapshot_and_clear_dirty()? For now, to workaround the issue, I think you can create another mutex and make the entire sm501_2d_engine_write() and sm501_update_display() critical sections. Interesting thread but not sure it's the same problem so this workaround may not be enough to fix my issue. Here's a video posted by one of the people who reported it showing the problem on M1 Mac: https://www.youtube.com/watch?v=FDqoNbp6PQs and here's how it looks like on other machines: https://www.youtube.com/watch?v=ML7-F4HNFKQ There are also videos showing it running on RPi 4 and G5 Mac without this issue so it seems to only happen on Apple Silicon M1 Macs. What's strange is that graphics elements are not just delayed which I think should happen with missing thread synchronisation where the update callback would miss some pixels rendered during it's running but subsequent update callbacks would eventually draw those, woudn't they? Also setting full_update to 1 in sm501_update_display() callback to disable dirty tracking does not fix the problem. So it looks like as if sm501_2d_operation() running on one CPU core only writes data to the local cache of that core which sm501_update_display() running on other core can't see, so maybe some cache synchronisation is needed in memory_region_set_dirty() or if that's already there maybe I should call it for all changes not only
Re: Display update issue on M1 Macs
On 2023/01/31 8:58, BALATON Zoltan wrote: On Sat, 28 Jan 2023, Akihiko Odaki wrote: On 2023/01/23 8:28, BALATON Zoltan wrote: On Thu, 19 Jan 2023, Akihiko Odaki wrote: On 2023/01/15 3:11, BALATON Zoltan wrote: On Sat, 14 Jan 2023, Akihiko Odaki wrote: On 2023/01/13 22:43, BALATON Zoltan wrote: On Thu, 5 Jan 2023, BALATON Zoltan wrote: Hello, I got reports from several users trying to run AmigaOS4 on sam460ex on Apple silicon Macs that they get missing graphics that I can't reproduce on x86_64. With help from the users who get the problem we've narrowed it down to the following: It looks like that data written to the sm501's ram in qemu/hw/display/sm501.c::sm501_2d_operation() is then not seen from sm501_update_display() in the same file. The sm501_2d_operation() function is called when the guest accesses the emulated card so it may run in a different thread than sm501_update_display() which is called by the ui backend but I'm not sure how QEMU calls these. Is device code running in iothread and display update in main thread? The problem is also independent of the display backend and was reproduced with both -display cocoa and -display sdl. We have confirmed it's not the pixman routines that sm501_2d_operation() uses as the same issue is seen also with QEMU 4.x where pixman wasn't used and with all versions up to 7.2 so it's also not some bisectable change in QEMU. It also happens with --enable-debug so it doesn't seem to be related to optimisation either and I don't get it on x86_64 but even x86_64 QEMU builds run on Apple M1 with Rosetta 2 show the problem. It also only seems to affect graphics written from sm501_2d_operation() which AmigaOS4 uses extensively but other OSes don't and just render graphics with the vcpu which work without problem also on the M1 Macs that show this problem with AmigaOS4. Theoretically this could be some missing syncronisation which is something ARM and PPC may need while x86 doesn't but I don't know if this is really the reason and if so where and how to fix it). Any idea what may cause this and what could be a fix to try? Any idea anyone? At least some explanation if the above is plausible or if there's an option to disable the iothread and run everyting in a single thread to verify the theory could help. I've got reports from at least 3 people getting this problem but I can't do much to fix it without some help. (Info on how to run it is here: http://zero.eik.bme.hu/~balaton/qemu/amiga/#amigaos but AmigaOS4 is not freely distributable so it's a bit hard to reproduce. Some Linux X servers that support sm501/sm502 may also use the card's 2d engine but I don't know about any live CDs that readily run on sam460ex.) Thank you, BALATON Zoltan Sorry, I missed the email. Indeed the ui backend should call sm501_update_display() in the main thread, which should be different from the thread calling sm501_2d_operation(). However, if I understand it correctly, both of the functions should be called with iothread lock held so there should be no race condition in theory. But there is an exception: memory_region_snapshot_and_clear_dirty() releases iothread lock, and that broke raspi3b display device: https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/ It is unexpected that gfx_update() callback releases iothread lock so it may break things in peculiar ways. Peter, is there any change in the situation regarding the race introduced by memory_region_snapshot_and_clear_dirty()? For now, to workaround the issue, I think you can create another mutex and make the entire sm501_2d_engine_write() and sm501_update_display() critical sections. Interesting thread but not sure it's the same problem so this workaround may not be enough to fix my issue. Here's a video posted by one of the people who reported it showing the problem on M1 Mac: https://www.youtube.com/watch?v=FDqoNbp6PQs and here's how it looks like on other machines: https://www.youtube.com/watch?v=ML7-F4HNFKQ There are also videos showing it running on RPi 4 and G5 Mac without this issue so it seems to only happen on Apple Silicon M1 Macs. What's strange is that graphics elements are not just delayed which I think should happen with missing thread synchronisation where the update callback would miss some pixels rendered during it's running but subsequent update callbacks would eventually draw those, woudn't they? Also setting full_update to 1 in sm501_update_display() callback to disable dirty tracking does not fix the problem. So it looks like as if sm501_2d_operation() running on one CPU core only writes data to the local cache of that core which sm501_update_display() running on other core can't see, so maybe some cache synchronisation is needed in memory_region_set_dirty() or if that's already there maybe I should call it for all changes not only those in the visible display area? I'm
Re: Display update issue on M1 Macs
On Sat, 28 Jan 2023, Akihiko Odaki wrote: On 2023/01/23 8:28, BALATON Zoltan wrote: On Thu, 19 Jan 2023, Akihiko Odaki wrote: On 2023/01/15 3:11, BALATON Zoltan wrote: On Sat, 14 Jan 2023, Akihiko Odaki wrote: On 2023/01/13 22:43, BALATON Zoltan wrote: On Thu, 5 Jan 2023, BALATON Zoltan wrote: Hello, I got reports from several users trying to run AmigaOS4 on sam460ex on Apple silicon Macs that they get missing graphics that I can't reproduce on x86_64. With help from the users who get the problem we've narrowed it down to the following: It looks like that data written to the sm501's ram in qemu/hw/display/sm501.c::sm501_2d_operation() is then not seen from sm501_update_display() in the same file. The sm501_2d_operation() function is called when the guest accesses the emulated card so it may run in a different thread than sm501_update_display() which is called by the ui backend but I'm not sure how QEMU calls these. Is device code running in iothread and display update in main thread? The problem is also independent of the display backend and was reproduced with both -display cocoa and -display sdl. We have confirmed it's not the pixman routines that sm501_2d_operation() uses as the same issue is seen also with QEMU 4.x where pixman wasn't used and with all versions up to 7.2 so it's also not some bisectable change in QEMU. It also happens with --enable-debug so it doesn't seem to be related to optimisation either and I don't get it on x86_64 but even x86_64 QEMU builds run on Apple M1 with Rosetta 2 show the problem. It also only seems to affect graphics written from sm501_2d_operation() which AmigaOS4 uses extensively but other OSes don't and just render graphics with the vcpu which work without problem also on the M1 Macs that show this problem with AmigaOS4. Theoretically this could be some missing syncronisation which is something ARM and PPC may need while x86 doesn't but I don't know if this is really the reason and if so where and how to fix it). Any idea what may cause this and what could be a fix to try? Any idea anyone? At least some explanation if the above is plausible or if there's an option to disable the iothread and run everyting in a single thread to verify the theory could help. I've got reports from at least 3 people getting this problem but I can't do much to fix it without some help. (Info on how to run it is here: http://zero.eik.bme.hu/~balaton/qemu/amiga/#amigaos but AmigaOS4 is not freely distributable so it's a bit hard to reproduce. Some Linux X servers that support sm501/sm502 may also use the card's 2d engine but I don't know about any live CDs that readily run on sam460ex.) Thank you, BALATON Zoltan Sorry, I missed the email. Indeed the ui backend should call sm501_update_display() in the main thread, which should be different from the thread calling sm501_2d_operation(). However, if I understand it correctly, both of the functions should be called with iothread lock held so there should be no race condition in theory. But there is an exception: memory_region_snapshot_and_clear_dirty() releases iothread lock, and that broke raspi3b display device: https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/ It is unexpected that gfx_update() callback releases iothread lock so it may break things in peculiar ways. Peter, is there any change in the situation regarding the race introduced by memory_region_snapshot_and_clear_dirty()? For now, to workaround the issue, I think you can create another mutex and make the entire sm501_2d_engine_write() and sm501_update_display() critical sections. Interesting thread but not sure it's the same problem so this workaround may not be enough to fix my issue. Here's a video posted by one of the people who reported it showing the problem on M1 Mac: https://www.youtube.com/watch?v=FDqoNbp6PQs and here's how it looks like on other machines: https://www.youtube.com/watch?v=ML7-F4HNFKQ There are also videos showing it running on RPi 4 and G5 Mac without this issue so it seems to only happen on Apple Silicon M1 Macs. What's strange is that graphics elements are not just delayed which I think should happen with missing thread synchronisation where the update callback would miss some pixels rendered during it's running but subsequent update callbacks would eventually draw those, woudn't they? Also setting full_update to 1 in sm501_update_display() callback to disable dirty tracking does not fix the problem. So it looks like as if sm501_2d_operation() running on one CPU core only writes data to the local cache of that core which sm501_update_display() running on other core can't see, so maybe some cache synchronisation is needed in memory_region_set_dirty() or if that's already there maybe I should call it for all changes not only those in the visible display area? I'm still not sure I understand the problem and
Re: Display update issue on M1 Macs
On 2023/01/23 8:28, BALATON Zoltan wrote: On Thu, 19 Jan 2023, Akihiko Odaki wrote: On 2023/01/15 3:11, BALATON Zoltan wrote: On Sat, 14 Jan 2023, Akihiko Odaki wrote: On 2023/01/13 22:43, BALATON Zoltan wrote: On Thu, 5 Jan 2023, BALATON Zoltan wrote: Hello, I got reports from several users trying to run AmigaOS4 on sam460ex on Apple silicon Macs that they get missing graphics that I can't reproduce on x86_64. With help from the users who get the problem we've narrowed it down to the following: It looks like that data written to the sm501's ram in qemu/hw/display/sm501.c::sm501_2d_operation() is then not seen from sm501_update_display() in the same file. The sm501_2d_operation() function is called when the guest accesses the emulated card so it may run in a different thread than sm501_update_display() which is called by the ui backend but I'm not sure how QEMU calls these. Is device code running in iothread and display update in main thread? The problem is also independent of the display backend and was reproduced with both -display cocoa and -display sdl. We have confirmed it's not the pixman routines that sm501_2d_operation() uses as the same issue is seen also with QEMU 4.x where pixman wasn't used and with all versions up to 7.2 so it's also not some bisectable change in QEMU. It also happens with --enable-debug so it doesn't seem to be related to optimisation either and I don't get it on x86_64 but even x86_64 QEMU builds run on Apple M1 with Rosetta 2 show the problem. It also only seems to affect graphics written from sm501_2d_operation() which AmigaOS4 uses extensively but other OSes don't and just render graphics with the vcpu which work without problem also on the M1 Macs that show this problem with AmigaOS4. Theoretically this could be some missing syncronisation which is something ARM and PPC may need while x86 doesn't but I don't know if this is really the reason and if so where and how to fix it). Any idea what may cause this and what could be a fix to try? Any idea anyone? At least some explanation if the above is plausible or if there's an option to disable the iothread and run everyting in a single thread to verify the theory could help. I've got reports from at least 3 people getting this problem but I can't do much to fix it without some help. (Info on how to run it is here: http://zero.eik.bme.hu/~balaton/qemu/amiga/#amigaos but AmigaOS4 is not freely distributable so it's a bit hard to reproduce. Some Linux X servers that support sm501/sm502 may also use the card's 2d engine but I don't know about any live CDs that readily run on sam460ex.) Thank you, BALATON Zoltan Sorry, I missed the email. Indeed the ui backend should call sm501_update_display() in the main thread, which should be different from the thread calling sm501_2d_operation(). However, if I understand it correctly, both of the functions should be called with iothread lock held so there should be no race condition in theory. But there is an exception: memory_region_snapshot_and_clear_dirty() releases iothread lock, and that broke raspi3b display device: https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/ It is unexpected that gfx_update() callback releases iothread lock so it may break things in peculiar ways. Peter, is there any change in the situation regarding the race introduced by memory_region_snapshot_and_clear_dirty()? For now, to workaround the issue, I think you can create another mutex and make the entire sm501_2d_engine_write() and sm501_update_display() critical sections. Interesting thread but not sure it's the same problem so this workaround may not be enough to fix my issue. Here's a video posted by one of the people who reported it showing the problem on M1 Mac: https://www.youtube.com/watch?v=FDqoNbp6PQs and here's how it looks like on other machines: https://www.youtube.com/watch?v=ML7-F4HNFKQ There are also videos showing it running on RPi 4 and G5 Mac without this issue so it seems to only happen on Apple Silicon M1 Macs. What's strange is that graphics elements are not just delayed which I think should happen with missing thread synchronisation where the update callback would miss some pixels rendered during it's running but subsequent update callbacks would eventually draw those, woudn't they? Also setting full_update to 1 in sm501_update_display() callback to disable dirty tracking does not fix the problem. So it looks like as if sm501_2d_operation() running on one CPU core only writes data to the local cache of that core which sm501_update_display() running on other core can't see, so maybe some cache synchronisation is needed in memory_region_set_dirty() or if that's already there maybe I should call it for all changes not only those in the visible display area? I'm still not sure I understand the problem and don't know what could be a fix for it so
Re: Display update issue on M1 Macs
On Thu, 19 Jan 2023, Akihiko Odaki wrote: On 2023/01/15 3:11, BALATON Zoltan wrote: On Sat, 14 Jan 2023, Akihiko Odaki wrote: On 2023/01/13 22:43, BALATON Zoltan wrote: On Thu, 5 Jan 2023, BALATON Zoltan wrote: Hello, I got reports from several users trying to run AmigaOS4 on sam460ex on Apple silicon Macs that they get missing graphics that I can't reproduce on x86_64. With help from the users who get the problem we've narrowed it down to the following: It looks like that data written to the sm501's ram in qemu/hw/display/sm501.c::sm501_2d_operation() is then not seen from sm501_update_display() in the same file. The sm501_2d_operation() function is called when the guest accesses the emulated card so it may run in a different thread than sm501_update_display() which is called by the ui backend but I'm not sure how QEMU calls these. Is device code running in iothread and display update in main thread? The problem is also independent of the display backend and was reproduced with both -display cocoa and -display sdl. We have confirmed it's not the pixman routines that sm501_2d_operation() uses as the same issue is seen also with QEMU 4.x where pixman wasn't used and with all versions up to 7.2 so it's also not some bisectable change in QEMU. It also happens with --enable-debug so it doesn't seem to be related to optimisation either and I don't get it on x86_64 but even x86_64 QEMU builds run on Apple M1 with Rosetta 2 show the problem. It also only seems to affect graphics written from sm501_2d_operation() which AmigaOS4 uses extensively but other OSes don't and just render graphics with the vcpu which work without problem also on the M1 Macs that show this problem with AmigaOS4. Theoretically this could be some missing syncronisation which is something ARM and PPC may need while x86 doesn't but I don't know if this is really the reason and if so where and how to fix it). Any idea what may cause this and what could be a fix to try? Any idea anyone? At least some explanation if the above is plausible or if there's an option to disable the iothread and run everyting in a single thread to verify the theory could help. I've got reports from at least 3 people getting this problem but I can't do much to fix it without some help. (Info on how to run it is here: http://zero.eik.bme.hu/~balaton/qemu/amiga/#amigaos but AmigaOS4 is not freely distributable so it's a bit hard to reproduce. Some Linux X servers that support sm501/sm502 may also use the card's 2d engine but I don't know about any live CDs that readily run on sam460ex.) Thank you, BALATON Zoltan Sorry, I missed the email. Indeed the ui backend should call sm501_update_display() in the main thread, which should be different from the thread calling sm501_2d_operation(). However, if I understand it correctly, both of the functions should be called with iothread lock held so there should be no race condition in theory. But there is an exception: memory_region_snapshot_and_clear_dirty() releases iothread lock, and that broke raspi3b display device: https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/ It is unexpected that gfx_update() callback releases iothread lock so it may break things in peculiar ways. Peter, is there any change in the situation regarding the race introduced by memory_region_snapshot_and_clear_dirty()? For now, to workaround the issue, I think you can create another mutex and make the entire sm501_2d_engine_write() and sm501_update_display() critical sections. Interesting thread but not sure it's the same problem so this workaround may not be enough to fix my issue. Here's a video posted by one of the people who reported it showing the problem on M1 Mac: https://www.youtube.com/watch?v=FDqoNbp6PQs and here's how it looks like on other machines: https://www.youtube.com/watch?v=ML7-F4HNFKQ There are also videos showing it running on RPi 4 and G5 Mac without this issue so it seems to only happen on Apple Silicon M1 Macs. What's strange is that graphics elements are not just delayed which I think should happen with missing thread synchronisation where the update callback would miss some pixels rendered during it's running but subsequent update callbacks would eventually draw those, woudn't they? Also setting full_update to 1 in sm501_update_display() callback to disable dirty tracking does not fix the problem. So it looks like as if sm501_2d_operation() running on one CPU core only writes data to the local cache of that core which sm501_update_display() running on other core can't see, so maybe some cache synchronisation is needed in memory_region_set_dirty() or if that's already there maybe I should call it for all changes not only those in the visible display area? I'm still not sure I understand the problem and don't know what could be a fix for it so anything to test to identify the issue better
Re: Display update issue on M1 Macs
On 2023/01/15 3:11, BALATON Zoltan wrote: On Sat, 14 Jan 2023, Akihiko Odaki wrote: On 2023/01/13 22:43, BALATON Zoltan wrote: On Thu, 5 Jan 2023, BALATON Zoltan wrote: Hello, I got reports from several users trying to run AmigaOS4 on sam460ex on Apple silicon Macs that they get missing graphics that I can't reproduce on x86_64. With help from the users who get the problem we've narrowed it down to the following: It looks like that data written to the sm501's ram in qemu/hw/display/sm501.c::sm501_2d_operation() is then not seen from sm501_update_display() in the same file. The sm501_2d_operation() function is called when the guest accesses the emulated card so it may run in a different thread than sm501_update_display() which is called by the ui backend but I'm not sure how QEMU calls these. Is device code running in iothread and display update in main thread? The problem is also independent of the display backend and was reproduced with both -display cocoa and -display sdl. We have confirmed it's not the pixman routines that sm501_2d_operation() uses as the same issue is seen also with QEMU 4.x where pixman wasn't used and with all versions up to 7.2 so it's also not some bisectable change in QEMU. It also happens with --enable-debug so it doesn't seem to be related to optimisation either and I don't get it on x86_64 but even x86_64 QEMU builds run on Apple M1 with Rosetta 2 show the problem. It also only seems to affect graphics written from sm501_2d_operation() which AmigaOS4 uses extensively but other OSes don't and just render graphics with the vcpu which work without problem also on the M1 Macs that show this problem with AmigaOS4. Theoretically this could be some missing syncronisation which is something ARM and PPC may need while x86 doesn't but I don't know if this is really the reason and if so where and how to fix it). Any idea what may cause this and what could be a fix to try? Any idea anyone? At least some explanation if the above is plausible or if there's an option to disable the iothread and run everyting in a single thread to verify the theory could help. I've got reports from at least 3 people getting this problem but I can't do much to fix it without some help. (Info on how to run it is here: http://zero.eik.bme.hu/~balaton/qemu/amiga/#amigaos but AmigaOS4 is not freely distributable so it's a bit hard to reproduce. Some Linux X servers that support sm501/sm502 may also use the card's 2d engine but I don't know about any live CDs that readily run on sam460ex.) Thank you, BALATON Zoltan Sorry, I missed the email. Indeed the ui backend should call sm501_update_display() in the main thread, which should be different from the thread calling sm501_2d_operation(). However, if I understand it correctly, both of the functions should be called with iothread lock held so there should be no race condition in theory. But there is an exception: memory_region_snapshot_and_clear_dirty() releases iothread lock, and that broke raspi3b display device: https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/ It is unexpected that gfx_update() callback releases iothread lock so it may break things in peculiar ways. Peter, is there any change in the situation regarding the race introduced by memory_region_snapshot_and_clear_dirty()? For now, to workaround the issue, I think you can create another mutex and make the entire sm501_2d_engine_write() and sm501_update_display() critical sections. Interesting thread but not sure it's the same problem so this workaround may not be enough to fix my issue. Here's a video posted by one of the people who reported it showing the problem on M1 Mac: https://www.youtube.com/watch?v=FDqoNbp6PQs and here's how it looks like on other machines: https://www.youtube.com/watch?v=ML7-F4HNFKQ There are also videos showing it running on RPi 4 and G5 Mac without this issue so it seems to only happen on Apple Silicon M1 Macs. What's strange is that graphics elements are not just delayed which I think should happen with missing thread synchronisation where the update callback would miss some pixels rendered during it's running but subsequent update callbacks would eventually draw those, woudn't they? Also setting full_update to 1 in sm501_update_display() callback to disable dirty tracking does not fix the problem. So it looks like as if sm501_2d_operation() running on one CPU core only writes data to the local cache of that core which sm501_update_display() running on other core can't see, so maybe some cache synchronisation is needed in memory_region_set_dirty() or if that's already there maybe I should call it for all changes not only those in the visible display area? I'm still not sure I understand the problem and don't know what could be a fix for it so anything to test to identify the issue better might also bring us closer to a solution.
Re: Display update issue on M1 Macs
On Sat, 14 Jan 2023, Akihiko Odaki wrote: On 2023/01/13 22:43, BALATON Zoltan wrote: On Thu, 5 Jan 2023, BALATON Zoltan wrote: Hello, I got reports from several users trying to run AmigaOS4 on sam460ex on Apple silicon Macs that they get missing graphics that I can't reproduce on x86_64. With help from the users who get the problem we've narrowed it down to the following: It looks like that data written to the sm501's ram in qemu/hw/display/sm501.c::sm501_2d_operation() is then not seen from sm501_update_display() in the same file. The sm501_2d_operation() function is called when the guest accesses the emulated card so it may run in a different thread than sm501_update_display() which is called by the ui backend but I'm not sure how QEMU calls these. Is device code running in iothread and display update in main thread? The problem is also independent of the display backend and was reproduced with both -display cocoa and -display sdl. We have confirmed it's not the pixman routines that sm501_2d_operation() uses as the same issue is seen also with QEMU 4.x where pixman wasn't used and with all versions up to 7.2 so it's also not some bisectable change in QEMU. It also happens with --enable-debug so it doesn't seem to be related to optimisation either and I don't get it on x86_64 but even x86_64 QEMU builds run on Apple M1 with Rosetta 2 show the problem. It also only seems to affect graphics written from sm501_2d_operation() which AmigaOS4 uses extensively but other OSes don't and just render graphics with the vcpu which work without problem also on the M1 Macs that show this problem with AmigaOS4. Theoretically this could be some missing syncronisation which is something ARM and PPC may need while x86 doesn't but I don't know if this is really the reason and if so where and how to fix it). Any idea what may cause this and what could be a fix to try? Any idea anyone? At least some explanation if the above is plausible or if there's an option to disable the iothread and run everyting in a single thread to verify the theory could help. I've got reports from at least 3 people getting this problem but I can't do much to fix it without some help. (Info on how to run it is here: http://zero.eik.bme.hu/~balaton/qemu/amiga/#amigaos but AmigaOS4 is not freely distributable so it's a bit hard to reproduce. Some Linux X servers that support sm501/sm502 may also use the card's 2d engine but I don't know about any live CDs that readily run on sam460ex.) Thank you, BALATON Zoltan Sorry, I missed the email. Indeed the ui backend should call sm501_update_display() in the main thread, which should be different from the thread calling sm501_2d_operation(). However, if I understand it correctly, both of the functions should be called with iothread lock held so there should be no race condition in theory. But there is an exception: memory_region_snapshot_and_clear_dirty() releases iothread lock, and that broke raspi3b display device: https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/ It is unexpected that gfx_update() callback releases iothread lock so it may break things in peculiar ways. Peter, is there any change in the situation regarding the race introduced by memory_region_snapshot_and_clear_dirty()? For now, to workaround the issue, I think you can create another mutex and make the entire sm501_2d_engine_write() and sm501_update_display() critical sections. Interesting thread but not sure it's the same problem so this workaround may not be enough to fix my issue. Here's a video posted by one of the people who reported it showing the problem on M1 Mac: https://www.youtube.com/watch?v=FDqoNbp6PQs and here's how it looks like on other machines: https://www.youtube.com/watch?v=ML7-F4HNFKQ There are also videos showing it running on RPi 4 and G5 Mac without this issue so it seems to only happen on Apple Silicon M1 Macs. What's strange is that graphics elements are not just delayed which I think should happen with missing thread synchronisation where the update callback would miss some pixels rendered during it's running but subsequent update callbacks would eventually draw those, woudn't they? Also setting full_update to 1 in sm501_update_display() callback to disable dirty tracking does not fix the problem. So it looks like as if sm501_2d_operation() running on one CPU core only writes data to the local cache of that core which sm501_update_display() running on other core can't see, so maybe some cache synchronisation is needed in memory_region_set_dirty() or if that's already there maybe I should call it for all changes not only those in the visible display area? I'm still not sure I understand the problem and don't know what could be a fix for it so anything to test to identify the issue better might also bring us closer to a solution. Regards, BALATON Zoltan
Re: Display update issue on M1 Macs
On 2023/01/13 22:43, BALATON Zoltan wrote: On Thu, 5 Jan 2023, BALATON Zoltan wrote: Hello, I got reports from several users trying to run AmigaOS4 on sam460ex on Apple silicon Macs that they get missing graphics that I can't reproduce on x86_64. With help from the users who get the problem we've narrowed it down to the following: It looks like that data written to the sm501's ram in qemu/hw/display/sm501.c::sm501_2d_operation() is then not seen from sm501_update_display() in the same file. The sm501_2d_operation() function is called when the guest accesses the emulated card so it may run in a different thread than sm501_update_display() which is called by the ui backend but I'm not sure how QEMU calls these. Is device code running in iothread and display update in main thread? The problem is also independent of the display backend and was reproduced with both -display cocoa and -display sdl. We have confirmed it's not the pixman routines that sm501_2d_operation() uses as the same issue is seen also with QEMU 4.x where pixman wasn't used and with all versions up to 7.2 so it's also not some bisectable change in QEMU. It also happens with --enable-debug so it doesn't seem to be related to optimisation either and I don't get it on x86_64 but even x86_64 QEMU builds run on Apple M1 with Rosetta 2 show the problem. It also only seems to affect graphics written from sm501_2d_operation() which AmigaOS4 uses extensively but other OSes don't and just render graphics with the vcpu which work without problem also on the M1 Macs that show this problem with AmigaOS4. Theoretically this could be some missing syncronisation which is something ARM and PPC may need while x86 doesn't but I don't know if this is really the reason and if so where and how to fix it). Any idea what may cause this and what could be a fix to try? Any idea anyone? At least some explanation if the above is plausible or if there's an option to disable the iothread and run everyting in a single thread to verify the theory could help. I've got reports from at least 3 people getting this problem but I can't do much to fix it without some help. (Info on how to run it is here: http://zero.eik.bme.hu/~balaton/qemu/amiga/#amigaos but AmigaOS4 is not freely distributable so it's a bit hard to reproduce. Some Linux X servers that support sm501/sm502 may also use the card's 2d engine but I don't know about any live CDs that readily run on sam460ex.) Thank you, BALATON Zoltan Sorry, I missed the email. Indeed the ui backend should call sm501_update_display() in the main thread, which should be different from the thread calling sm501_2d_operation(). However, if I understand it correctly, both of the functions should be called with iothread lock held so there should be no race condition in theory. But there is an exception: memory_region_snapshot_and_clear_dirty() releases iothread lock, and that broke raspi3b display device: https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/ It is unexpected that gfx_update() callback releases iothread lock so it may break things in peculiar ways. Peter, is there any change in the situation regarding the race introduced by memory_region_snapshot_and_clear_dirty()? For now, to workaround the issue, I think you can create another mutex and make the entire sm501_2d_engine_write() and sm501_update_display() critical sections. Regards, Akihiko Odaki
Re: Display update issue on M1 Macs
On Thu, 5 Jan 2023, BALATON Zoltan wrote: Hello, I got reports from several users trying to run AmigaOS4 on sam460ex on Apple silicon Macs that they get missing graphics that I can't reproduce on x86_64. With help from the users who get the problem we've narrowed it down to the following: It looks like that data written to the sm501's ram in qemu/hw/display/sm501.c::sm501_2d_operation() is then not seen from sm501_update_display() in the same file. The sm501_2d_operation() function is called when the guest accesses the emulated card so it may run in a different thread than sm501_update_display() which is called by the ui backend but I'm not sure how QEMU calls these. Is device code running in iothread and display update in main thread? The problem is also independent of the display backend and was reproduced with both -display cocoa and -display sdl. We have confirmed it's not the pixman routines that sm501_2d_operation() uses as the same issue is seen also with QEMU 4.x where pixman wasn't used and with all versions up to 7.2 so it's also not some bisectable change in QEMU. It also happens with --enable-debug so it doesn't seem to be related to optimisation either and I don't get it on x86_64 but even x86_64 QEMU builds run on Apple M1 with Rosetta 2 show the problem. It also only seems to affect graphics written from sm501_2d_operation() which AmigaOS4 uses extensively but other OSes don't and just render graphics with the vcpu which work without problem also on the M1 Macs that show this problem with AmigaOS4. Theoretically this could be some missing syncronisation which is something ARM and PPC may need while x86 doesn't but I don't know if this is really the reason and if so where and how to fix it). Any idea what may cause this and what could be a fix to try? Any idea anyone? At least some explanation if the above is plausible or if there's an option to disable the iothread and run everyting in a single thread to verify the theory could help. I've got reports from at least 3 people getting this problem but I can't do much to fix it without some help. (Info on how to run it is here: http://zero.eik.bme.hu/~balaton/qemu/amiga/#amigaos but AmigaOS4 is not freely distributable so it's a bit hard to reproduce. Some Linux X servers that support sm501/sm502 may also use the card's 2d engine but I don't know about any live CDs that readily run on sam460ex.) Thank you, BALATON Zoltan
Display update issue on M1 Macs
Hello, I got reports from several users trying to run AmigaOS4 on sam460ex on Apple silicon Macs that they get missing graphics that I can't reproduce on x86_64. With help from the users who get the problem we've narrowed it down to the following: It looks like that data written to the sm501's ram in qemu/hw/display/sm501.c::sm501_2d_operation() is then not seen from sm501_update_display() in the same file. The sm501_2d_operation() function is called when the guest accesses the emulated card so it may run in a different thread than sm501_update_display() which is called by the ui backend but I'm not sure how QEMU calls these. Is device code running in iothread and display update in main thread? The problem is also independent of the display backend and was reproduced with both -display cocoa and -display sdl. We have confirmed it's not the pixman routines that sm501_2d_operation() uses as the same issue is seen also with QEMU 4.x where pixman wasn't used and with all versions up to 7.2 so it's also not some bisectable change in QEMU. It also happens with --enable-debug so it doesn't seem to be related to optimisation either and I don't get it on x86_64 but even x86_64 QEMU builds run on Apple M1 with Rosetta 2 show the problem. It also only seems to affect graphics written from sm501_2d_operation() which AmigaOS4 uses extensively but other OSes don't and just render graphics with the vcpu which work without problem also on the M1 Macs that show this problem with AmigaOS4. Theoretically this could be some missing syncronisation which is something ARM and PPC may need while x86 doesn't but I don't know if this is really the reason and if so where and how to fix it). Any idea what may cause this and what could be a fix to try? (Info on how to run it is here: http://zero.eik.bme.hu/~balaton/qemu/amiga/#amigaos but AmigaOS4 is not freely distributable so it's a bit hard to reproduce. Some Linux X servers that support sm501/sm502 may also use the card's 2d engine but I don't know about any live CDs that readily run on sam460ex.) Thank you, BALATON Zoltan