Re: TCG change broke MorphOS boot on sam460ex

2024-05-27 Thread Nicholas Piggin
On Tue May 28, 2024 at 8:23 AM AEST, BALATON Zoltan wrote:
> On Wed, 3 Apr 2024, Nicholas Piggin wrote:
> > On Tue Apr 2, 2024 at 9:32 PM AEST, BALATON Zoltan wrote:
> >> On Thu, 21 Mar 2024, BALATON Zoltan wrote:
> >>> On 27/2/24 17:47, BALATON Zoltan wrote:
>  Hello,
> 
>  Commit 18a536f1f8 (accel/tcg: Always require can_do_io) broke booting
>  MorphOS on sam460ex (this was before 8.2.0 and I thought I've verified it
>  before that release but apparently missed it back then). It can be
>  reproduced with https://www.morphos-team.net/morphos-3.18.iso and 
>  following
>  command:
> 
>  qemu-system-ppc -M sam460ex -serial stdio -d unimp,guest_errors \
>     -drive if=none,id=cd,format=raw,file=morphos-3.18.iso \
>     -device ide-cd,drive=cd,bus=ide.1
> >>
> >> Any idea on this one? While MorphOS boots on other machines and other OSes
> >> seem to boot on this machine it may still suggest there's some problem
> >> somewhere as this worked before. So it may worth investigating it to make
> >> sure there's no bug that could affect other OSes too even if they boot. I
> >> don't know how to debug this so some help would be needed.
> >
> > In the bad case it crashes after running this TB:
> >
> > 
> > IN:
> > 0x00c01354:  38c00040  li   r6, 0x40
> > 0x00c01358:  38e10204  addi r7, r1, 0x204
> > 0x00c0135c:  39010104  addi r8, r1, 0x104
> > 0x00c01360:  39410004  addi r10, r1, 4
> > 0x00c01364:  3920  li   r9, 0
> > 0x00c01368:  7cc903a6  mtctrr6
> > 0x00c0136c:  84c70004  lwzu r6, 4(r7)
> > 0x00c01370:  7cc907a4  tlbwehi  r6, r9
> > 0x00c01374:  84c80004  lwzu r6, 4(r8)
> > 0x00c01378:  7cc90fa4  tlbwelo  r6, r9
> > 0x00c0137c:  84ca0004  lwzu r6, 4(r10)
> > 0x00c01380:  7cc917a4  tlbwehi  r6, r9
> > 0x00c01384:  39290001  addi r9, r9, 1
> > 0x00c01388:  4200ffe4  bdnz 0xc0136c
> > 
> > IN:
> > 0x00c01374: unable to read memory
> > 
> >
> > "unable to read memory" is the tracer, it does actually translate
> > the address, but it points to a wayward real address which returns
> > 0 to TCG, which is an invalid instruction.
> >
> > The good case instead doesn't exit the TB after 0x00c01370 but after
> > the complete loop at the bdnz. That look like this after the same
> > first TB:
> >
> > 
> > IN:
> > 0x00c0136c:  84c70004  lwzu r6, 4(r7)
> > 0x00c01370:  7cc907a4  tlbwehi  r6, r9
> > 0x00c01374:  84c80004  lwzu r6, 4(r8)
> > 0x00c01378:  7cc90fa4  tlbwelo  r6, r9
> > 0x00c0137c:  84ca0004  lwzu r6, 4(r10)
> > 0x00c01380:  7cc917a4  tlbwehi  r6, r9
> > 0x00c01384:  39290001  addi r9, r9, 1
> > 0x00c01388:  4200ffe4  bdnz 0xc0136c
> > 
> > IN:
> > 0x00c0138c:  4c00012c  isync
> >
> > All the tlbwe are executed in the same TB. MMU tracing shows the
> > first tlbwehi creates a new valid(!) TLB for 0x-0x1
> > that has a garbage RPN because the tlbwelo did not run yet.
> >
> > What's happening in the bad case is that the translator breaks
> > and "re-fetches" instructions in the middle of that sequence, and
> > that's where the bogus translation causes 0 to be returned. The
> > good case the whole block is executed in the same fetch which
> > creates correct translations.
> >
> > So it looks like a morphos bug, the can-do-io change just happens
> > to cause it to re-fetch in that place, but that could happen for
> > a number of reasons, so you can't rely on TLB *only* changing or
> > ifetch *only* re-fetching at a sync point like isync.
> >
> > I would expect code like this to write an invalid entry with tlbwehi,
> > then tlbwelo to set the correct RPN, then make the entry valid with
> > the second tlbwehi. It would probably fix the bug if you just did the
> > first tlbwehi with r6=0 (or at least without the 0x200 bit set).
>
> Revisiting this, I've found in the docs that PPC440 has shadow TLBs so 
> this code can rely upon the TLB not being invalidated until isync and 
> works on real machine but breaks on QEMU.

I never programmed for 440 but it's unclear to me from the docs how
much you can rely on this programatically (you would have to ensure
no page crossings, disable interrupts, hope for no machine check,
etc).

But it does break real software so whether or not it is following
exact letter of the law, it would be good to fix.

> We would either need to make 
> sure the TB runs until the sync or somehow emulate the shadow TLB. I've 
> experimented with the latter but I could not make it work (and 
> unexpectedly keeping a cache of the most recently used entries is slower 
> than always searching through all TLB entries as done now so I've 
> abandoned that idea). The problem is that an entry is modified by multiple 
> tlbwe instructions but these can come in any order (and sometimes only one 
> of them is done like invalidating an entry seems to only do one write) so 
> I don't know when to copy the new 

Re: TCG change broke MorphOS boot on sam460ex

2024-05-27 Thread BALATON Zoltan

On Tue, 28 May 2024, BALATON Zoltan wrote:

On Wed, 3 Apr 2024, Nicholas Piggin wrote:

On Tue Apr 2, 2024 at 9:32 PM AEST, BALATON Zoltan wrote:

On Thu, 21 Mar 2024, BALATON Zoltan wrote:

On 27/2/24 17:47, BALATON Zoltan wrote:

Hello,

Commit 18a536f1f8 (accel/tcg: Always require can_do_io) broke booting
MorphOS on sam460ex (this was before 8.2.0 and I thought I've verified 
it

before that release but apparently missed it back then). It can be
reproduced with https://www.morphos-team.net/morphos-3.18.iso and 
following

command:

qemu-system-ppc -M sam460ex -serial stdio -d unimp,guest_errors \
   -drive if=none,id=cd,format=raw,file=morphos-3.18.iso \
   -device ide-cd,drive=cd,bus=ide.1


Any idea on this one? While MorphOS boots on other machines and other OSes
seem to boot on this machine it may still suggest there's some problem
somewhere as this worked before. So it may worth investigating it to make
sure there's no bug that could affect other OSes too even if they boot. I
don't know how to debug this so some help would be needed.


In the bad case it crashes after running this TB:


IN:
0x00c01354:  38c00040  li   r6, 0x40
0x00c01358:  38e10204  addi r7, r1, 0x204
0x00c0135c:  39010104  addi r8, r1, 0x104
0x00c01360:  39410004  addi r10, r1, 4
0x00c01364:  3920  li   r9, 0
0x00c01368:  7cc903a6  mtctrr6
0x00c0136c:  84c70004  lwzu r6, 4(r7)
0x00c01370:  7cc907a4  tlbwehi  r6, r9
0x00c01374:  84c80004  lwzu r6, 4(r8)
0x00c01378:  7cc90fa4  tlbwelo  r6, r9
0x00c0137c:  84ca0004  lwzu r6, 4(r10)
0x00c01380:  7cc917a4  tlbwehi  r6, r9
0x00c01384:  39290001  addi r9, r9, 1
0x00c01388:  4200ffe4  bdnz 0xc0136c

IN:
0x00c01374: unable to read memory


"unable to read memory" is the tracer, it does actually translate
the address, but it points to a wayward real address which returns
0 to TCG, which is an invalid instruction.

The good case instead doesn't exit the TB after 0x00c01370 but after
the complete loop at the bdnz. That look like this after the same
first TB:


IN:
0x00c0136c:  84c70004  lwzu r6, 4(r7)
0x00c01370:  7cc907a4  tlbwehi  r6, r9
0x00c01374:  84c80004  lwzu r6, 4(r8)
0x00c01378:  7cc90fa4  tlbwelo  r6, r9
0x00c0137c:  84ca0004  lwzu r6, 4(r10)
0x00c01380:  7cc917a4  tlbwehi  r6, r9
0x00c01384:  39290001  addi r9, r9, 1
0x00c01388:  4200ffe4  bdnz 0xc0136c

IN:
0x00c0138c:  4c00012c  isync

All the tlbwe are executed in the same TB. MMU tracing shows the
first tlbwehi creates a new valid(!) TLB for 0x-0x1
that has a garbage RPN because the tlbwelo did not run yet.

What's happening in the bad case is that the translator breaks
and "re-fetches" instructions in the middle of that sequence, and
that's where the bogus translation causes 0 to be returned. The
good case the whole block is executed in the same fetch which
creates correct translations.

So it looks like a morphos bug, the can-do-io change just happens
to cause it to re-fetch in that place, but that could happen for
a number of reasons, so you can't rely on TLB *only* changing or
ifetch *only* re-fetching at a sync point like isync.

I would expect code like this to write an invalid entry with tlbwehi,
then tlbwelo to set the correct RPN, then make the entry valid with
the second tlbwehi. It would probably fix the bug if you just did the
first tlbwehi with r6=0 (or at least without the 0x200 bit set).


Revisiting this, I've found in the docs that PPC440 has shadow TLBs so this 
code can rely upon the TLB not being invalidated until isync and works on 
real machine but breaks on QEMU. We would either need to make sure the TB 
runs until the sync or somehow emulate the shadow TLB. I've experimented with 
the latter but I could not make it work (and unexpectedly keeping a cache of 
the most recently used entries is slower than always searching through all 
TLB entries as done now so I've abandoned that idea). The problem is that an 
entry is modified by multiple tlbwe instructions but these can come in any 
order (and sometimes only one of them is done like invalidating an entry 
seems to only do one write) so I don't know when to copy the new entry to the 
TLB and when to wait for more parts and keep the old one. Any idea how to fix 
this?


Also I'm not sure if it's related but by running the stream benchmark on 
sam460ex now I can reproduce some memory access problem but I'm not sure what 
causes it. The full output of that benchmark under AmigaOS on sam460ex is 
this:


-
STREAM version $Revision: 5.10 $
-
This system uses 8 bytes per array element.
-
Array size = 1000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 

Re: TCG change broke MorphOS boot on sam460ex

2024-05-27 Thread BALATON Zoltan

On Wed, 3 Apr 2024, Nicholas Piggin wrote:

On Tue Apr 2, 2024 at 9:32 PM AEST, BALATON Zoltan wrote:

On Thu, 21 Mar 2024, BALATON Zoltan wrote:

On 27/2/24 17:47, BALATON Zoltan wrote:

Hello,

Commit 18a536f1f8 (accel/tcg: Always require can_do_io) broke booting
MorphOS on sam460ex (this was before 8.2.0 and I thought I've verified it
before that release but apparently missed it back then). It can be
reproduced with https://www.morphos-team.net/morphos-3.18.iso and following
command:

qemu-system-ppc -M sam460ex -serial stdio -d unimp,guest_errors \
   -drive if=none,id=cd,format=raw,file=morphos-3.18.iso \
   -device ide-cd,drive=cd,bus=ide.1


Any idea on this one? While MorphOS boots on other machines and other OSes
seem to boot on this machine it may still suggest there's some problem
somewhere as this worked before. So it may worth investigating it to make
sure there's no bug that could affect other OSes too even if they boot. I
don't know how to debug this so some help would be needed.


In the bad case it crashes after running this TB:


IN:
0x00c01354:  38c00040  li   r6, 0x40
0x00c01358:  38e10204  addi r7, r1, 0x204
0x00c0135c:  39010104  addi r8, r1, 0x104
0x00c01360:  39410004  addi r10, r1, 4
0x00c01364:  3920  li   r9, 0
0x00c01368:  7cc903a6  mtctrr6
0x00c0136c:  84c70004  lwzu r6, 4(r7)
0x00c01370:  7cc907a4  tlbwehi  r6, r9
0x00c01374:  84c80004  lwzu r6, 4(r8)
0x00c01378:  7cc90fa4  tlbwelo  r6, r9
0x00c0137c:  84ca0004  lwzu r6, 4(r10)
0x00c01380:  7cc917a4  tlbwehi  r6, r9
0x00c01384:  39290001  addi r9, r9, 1
0x00c01388:  4200ffe4  bdnz 0xc0136c

IN:
0x00c01374: unable to read memory


"unable to read memory" is the tracer, it does actually translate
the address, but it points to a wayward real address which returns
0 to TCG, which is an invalid instruction.

The good case instead doesn't exit the TB after 0x00c01370 but after
the complete loop at the bdnz. That look like this after the same
first TB:


IN:
0x00c0136c:  84c70004  lwzu r6, 4(r7)
0x00c01370:  7cc907a4  tlbwehi  r6, r9
0x00c01374:  84c80004  lwzu r6, 4(r8)
0x00c01378:  7cc90fa4  tlbwelo  r6, r9
0x00c0137c:  84ca0004  lwzu r6, 4(r10)
0x00c01380:  7cc917a4  tlbwehi  r6, r9
0x00c01384:  39290001  addi r9, r9, 1
0x00c01388:  4200ffe4  bdnz 0xc0136c

IN:
0x00c0138c:  4c00012c  isync

All the tlbwe are executed in the same TB. MMU tracing shows the
first tlbwehi creates a new valid(!) TLB for 0x-0x1
that has a garbage RPN because the tlbwelo did not run yet.

What's happening in the bad case is that the translator breaks
and "re-fetches" instructions in the middle of that sequence, and
that's where the bogus translation causes 0 to be returned. The
good case the whole block is executed in the same fetch which
creates correct translations.

So it looks like a morphos bug, the can-do-io change just happens
to cause it to re-fetch in that place, but that could happen for
a number of reasons, so you can't rely on TLB *only* changing or
ifetch *only* re-fetching at a sync point like isync.

I would expect code like this to write an invalid entry with tlbwehi,
then tlbwelo to set the correct RPN, then make the entry valid with
the second tlbwehi. It would probably fix the bug if you just did the
first tlbwehi with r6=0 (or at least without the 0x200 bit set).


Revisiting this, I've found in the docs that PPC440 has shadow TLBs so 
this code can rely upon the TLB not being invalidated until isync and 
works on real machine but breaks on QEMU. We would either need to make 
sure the TB runs until the sync or somehow emulate the shadow TLB. I've 
experimented with the latter but I could not make it work (and 
unexpectedly keeping a cache of the most recently used entries is slower 
than always searching through all TLB entries as done now so I've 
abandoned that idea). The problem is that an entry is modified by multiple 
tlbwe instructions but these can come in any order (and sometimes only one 
of them is done like invalidating an entry seems to only do one write) so 
I don't know when to copy the new entry to the TLB and when to wait for 
more parts and keep the old one. Any idea how to fix this?


Also I'm not sure if it's related but by running the stream benchmark on 
sam460ex now I can reproduce some memory access problem but I'm not sure 
what causes it. The full output of that benchmark under AmigaOS on 
sam460ex is this:


-
STREAM version $Revision: 5.10 $
-
This system uses 8 bytes per array element.
-
Array size = 1000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
 

Re: TCG change broke MorphOS boot on sam460ex

2024-04-03 Thread BALATON Zoltan

On Wed, 3 Apr 2024, Nicholas Piggin wrote:

On Tue Apr 2, 2024 at 9:32 PM AEST, BALATON Zoltan wrote:

On Thu, 21 Mar 2024, BALATON Zoltan wrote:

On 27/2/24 17:47, BALATON Zoltan wrote:

Hello,

Commit 18a536f1f8 (accel/tcg: Always require can_do_io) broke booting
MorphOS on sam460ex (this was before 8.2.0 and I thought I've verified it
before that release but apparently missed it back then). It can be
reproduced with https://www.morphos-team.net/morphos-3.18.iso and following
command:

qemu-system-ppc -M sam460ex -serial stdio -d unimp,guest_errors \
   -drive if=none,id=cd,format=raw,file=morphos-3.18.iso \
   -device ide-cd,drive=cd,bus=ide.1


Any idea on this one? While MorphOS boots on other machines and other OSes
seem to boot on this machine it may still suggest there's some problem
somewhere as this worked before. So it may worth investigating it to make
sure there's no bug that could affect other OSes too even if they boot. I
don't know how to debug this so some help would be needed.


In the bad case it crashes after running this TB:


IN:
0x00c01354:  38c00040  li   r6, 0x40
0x00c01358:  38e10204  addi r7, r1, 0x204
0x00c0135c:  39010104  addi r8, r1, 0x104
0x00c01360:  39410004  addi r10, r1, 4
0x00c01364:  3920  li   r9, 0
0x00c01368:  7cc903a6  mtctrr6
0x00c0136c:  84c70004  lwzu r6, 4(r7)
0x00c01370:  7cc907a4  tlbwehi  r6, r9
0x00c01374:  84c80004  lwzu r6, 4(r8)
0x00c01378:  7cc90fa4  tlbwelo  r6, r9
0x00c0137c:  84ca0004  lwzu r6, 4(r10)
0x00c01380:  7cc917a4  tlbwehi  r6, r9
0x00c01384:  39290001  addi r9, r9, 1
0x00c01388:  4200ffe4  bdnz 0xc0136c

IN:
0x00c01374: unable to read memory


"unable to read memory" is the tracer, it does actually translate
the address, but it points to a wayward real address which returns
0 to TCG, which is an invalid instruction.

The good case instead doesn't exit the TB after 0x00c01370 but after
the complete loop at the bdnz. That look like this after the same
first TB:


IN:
0x00c0136c:  84c70004  lwzu r6, 4(r7)
0x00c01370:  7cc907a4  tlbwehi  r6, r9
0x00c01374:  84c80004  lwzu r6, 4(r8)
0x00c01378:  7cc90fa4  tlbwelo  r6, r9
0x00c0137c:  84ca0004  lwzu r6, 4(r10)
0x00c01380:  7cc917a4  tlbwehi  r6, r9
0x00c01384:  39290001  addi r9, r9, 1
0x00c01388:  4200ffe4  bdnz 0xc0136c

IN:
0x00c0138c:  4c00012c  isync

All the tlbwe are executed in the same TB. MMU tracing shows the
first tlbwehi creates a new valid(!) TLB for 0x-0x1
that has a garbage RPN because the tlbwelo did not run yet.

What's happening in the bad case is that the translator breaks
and "re-fetches" instructions in the middle of that sequence, and
that's where the bogus translation causes 0 to be returned. The
good case the whole block is executed in the same fetch which
creates correct translations.

So it looks like a morphos bug, the can-do-io change just happens
to cause it to re-fetch in that place, but that could happen for
a number of reasons, so you can't rely on TLB *only* changing or
ifetch *only* re-fetching at a sync point like isync.


Thanks a lot for the analysis. Probably ir works on real machine due to 
cache effects so maybe it was just luck this did not break.



I would expect code like this to write an invalid entry with tlbwehi,
then tlbwelo to set the correct RPN, then make the entry valid with
the second tlbwehi. It would probably fix the bug if you just did the
first tlbwehi with r6=0 (or at least without the 0x200 bit set).


I think I had to fix a similar issue in AROS years ago when I've first 
tried to make sam460ex emulation work and used AROS for testing:

https://github.com/aros-development-team/AROS/commit/586a8ada8a5b861a77cab177d39e01de8c3f4cf5

I can't fix MorphOS as it's not open source but hope MorphOS people will 
get to know about this and do something with it. It still works better on 
other emulated machines such as pegasos2 and mac99 so it's not a big deal, 
just wanted to make sure it would not be a bug that could affect other 
OSes on sam460ex.


Thank you,
BALATON Zoltan

Re: TCG change broke MorphOS boot on sam460ex

2024-04-02 Thread Nicholas Piggin
On Tue Apr 2, 2024 at 9:32 PM AEST, BALATON Zoltan wrote:
> On Thu, 21 Mar 2024, BALATON Zoltan wrote:
> > On 27/2/24 17:47, BALATON Zoltan wrote:
> >> Hello,
> >> 
> >> Commit 18a536f1f8 (accel/tcg: Always require can_do_io) broke booting 
> >> MorphOS on sam460ex (this was before 8.2.0 and I thought I've verified it 
> >> before that release but apparently missed it back then). It can be 
> >> reproduced with https://www.morphos-team.net/morphos-3.18.iso and 
> >> following 
> >> command:
> >> 
> >> qemu-system-ppc -M sam460ex -serial stdio -d unimp,guest_errors \
> >>    -drive if=none,id=cd,format=raw,file=morphos-3.18.iso \
> >>    -device ide-cd,drive=cd,bus=ide.1
>
> Any idea on this one? While MorphOS boots on other machines and other OSes 
> seem to boot on this machine it may still suggest there's some problem 
> somewhere as this worked before. So it may worth investigating it to make 
> sure there's no bug that could affect other OSes too even if they boot. I 
> don't know how to debug this so some help would be needed.

In the bad case it crashes after running this TB:


IN:
0x00c01354:  38c00040  li   r6, 0x40
0x00c01358:  38e10204  addi r7, r1, 0x204
0x00c0135c:  39010104  addi r8, r1, 0x104
0x00c01360:  39410004  addi r10, r1, 4
0x00c01364:  3920  li   r9, 0
0x00c01368:  7cc903a6  mtctrr6
0x00c0136c:  84c70004  lwzu r6, 4(r7)
0x00c01370:  7cc907a4  tlbwehi  r6, r9
0x00c01374:  84c80004  lwzu r6, 4(r8)
0x00c01378:  7cc90fa4  tlbwelo  r6, r9
0x00c0137c:  84ca0004  lwzu r6, 4(r10)
0x00c01380:  7cc917a4  tlbwehi  r6, r9
0x00c01384:  39290001  addi r9, r9, 1
0x00c01388:  4200ffe4  bdnz 0xc0136c

IN:
0x00c01374: unable to read memory


"unable to read memory" is the tracer, it does actually translate
the address, but it points to a wayward real address which returns
0 to TCG, which is an invalid instruction.

The good case instead doesn't exit the TB after 0x00c01370 but after
the complete loop at the bdnz. That look like this after the same
first TB:


IN:
0x00c0136c:  84c70004  lwzu r6, 4(r7)
0x00c01370:  7cc907a4  tlbwehi  r6, r9
0x00c01374:  84c80004  lwzu r6, 4(r8)
0x00c01378:  7cc90fa4  tlbwelo  r6, r9
0x00c0137c:  84ca0004  lwzu r6, 4(r10)
0x00c01380:  7cc917a4  tlbwehi  r6, r9
0x00c01384:  39290001  addi r9, r9, 1
0x00c01388:  4200ffe4  bdnz 0xc0136c

IN:
0x00c0138c:  4c00012c  isync

All the tlbwe are executed in the same TB. MMU tracing shows the
first tlbwehi creates a new valid(!) TLB for 0x-0x1
that has a garbage RPN because the tlbwelo did not run yet.

What's happening in the bad case is that the translator breaks
and "re-fetches" instructions in the middle of that sequence, and
that's where the bogus translation causes 0 to be returned. The
good case the whole block is executed in the same fetch which
creates correct translations.

So it looks like a morphos bug, the can-do-io change just happens
to cause it to re-fetch in that place, but that could happen for
a number of reasons, so you can't rely on TLB *only* changing or
ifetch *only* re-fetching at a sync point like isync.

I would expect code like this to write an invalid entry with tlbwehi,
then tlbwelo to set the correct RPN, then make the entry valid with
the second tlbwehi. It would probably fix the bug if you just did the
first tlbwehi with r6=0 (or at least without the 0x200 bit set).

Thanks,
Nick



Re: TCG change broke MorphOS boot on sam460ex

2024-04-02 Thread BALATON Zoltan

On Thu, 21 Mar 2024, BALATON Zoltan wrote:

On 27/2/24 17:47, BALATON Zoltan wrote:

Hello,

Commit 18a536f1f8 (accel/tcg: Always require can_do_io) broke booting 
MorphOS on sam460ex (this was before 8.2.0 and I thought I've verified it 
before that release but apparently missed it back then). It can be 
reproduced with https://www.morphos-team.net/morphos-3.18.iso and following 
command:


qemu-system-ppc -M sam460ex -serial stdio -d unimp,guest_errors \
   -drive if=none,id=cd,format=raw,file=morphos-3.18.iso \
   -device ide-cd,drive=cd,bus=ide.1


Any idea on this one? While MorphOS boots on other machines and other OSes 
seem to boot on this machine it may still suggest there's some problem 
somewhere as this worked before. So it may worth investigating it to make 
sure there's no bug that could affect other OSes too even if they boot. I 
don't know how to debug this so some help would be needed.


Regards,
BALATON Zoltan

Although it breaks at the TCG change it may also be related to tlbwe changes 
somehow but I don't really understand it. I've tried to get some more debug 
info in case somebody can tell what's happening. With 18a536f1f8^ (the commit 
before the one it broke at and still works) I get:



IN:
ppcemb_tlb_check: TLB 0 address 00c01000 PID 0 <=> f000 f000 0 7f
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 1 address 00c01000 PID 0 <=> d000 f000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 2 address 00c01000 PID 0 <=> 8000 f000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 3 address 00c01000 PID 0 <=> 9000 f000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 4 address 00c01000 PID 0 <=> a000 f000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 5 address 00c01000 PID 0 <=> b000 f000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 6 address 00c01000 PID 0 <=> c000 f000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 7 address 00c01000 PID 0 <=> e000 ff00 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 8 address 00c01000 PID 0 <=> e100 ff00 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 9 address 00c01000 PID 0 <=> e300 fc00 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 10 address 00c01000 PID 0 <=> e3001000 fc00 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 11 address 00c01000 PID 0 <=> e400 c000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 12 address 00c01000 PID 0 <=> e500 fff0 0 7f
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 13 address 00c01000 PID 0 <=> ef00 ff00 0 7f
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 14 address 00c01000 PID 0 <=> e200 fff0 0 7f
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 15 address 00c01000 PID 0 <=>  f000 0 7f
mmubooke_check_tlb: good TLB!
mmubooke_get_physical_address: access granted 00c01000 => 00c01000 7 
0

0x00c01354:  38c00040  li   r6, 0x40
0x00c01358:  38e10204  addi r7, r1, 0x204
0x00c0135c:  39010104  addi r8, r1, 0x104
0x00c01360:  39410004  addi r10, r1, 4
0x00c01364:  3920  li   r9, 0
0x00c01368:  7cc903a6  mtctrr6
0x00c0136c:  84c70004  lwzu r6, 4(r7)
0x00c01370:  7cc907a4  tlbwehi  r6, r9
0x00c01374:  84c80004  lwzu r6, 4(r8)
0x00c01378:  7cc90fa4  tlbwelo  r6, r9
0x00c0137c:  84ca0004  lwzu r6, 4(r10)
0x00c01380:  7cc917a4  tlbwehi  r6, r9
0x00c01384:  39290001  addi r9, r9, 1
0x00c01388:  4200ffe4  bdnz 0xc0136c

helper_440_tlbwe word 0 entry 0 value 0290
ppcemb_tlb_check: TLB 0 address 0df6bfb0 PID 0 <=>  f000 0 7f
mmubooke_check_tlb: good TLB!
mmubooke_get_physical_address: access granted 0df6bfb0 => 0004fdf6bfb0 7 
0

Invalid read at addr 0x4FDF6BFB0, size 4, region '(null)', reason: rejected
helper_440_tlbwe word 1 entry 0 value 
ppcemb_tlb_check: TLB 0 address 0df6beb0 PID 0 <=>  f000 0 7f
mmubooke_check_tlb: good TLB!
mmubooke_get_physical_address: access granted 0df6beb0 => 0df6beb0 7 
0

helper_440_tlbwe word 2 entry 0 value 003f
ppcemb_tlb_check: TLB 0 address 00c0136c PID 0 <=>  f000 0 7f
mmubooke_check_tlb: good TLB!
mmubooke_get_physical_address: access granted 00c0136c => 00c0136c 7 
0



and with commit 18a536f1f8 this changes to


IN:
ppcemb_tlb_check: TLB 0 address 00c01000 PID 0 <=> f000 f000 0 7f
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 1 address 00c01000 PID 0 <=> d000 f000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 2 address 00c01000 PID 0 <=> 8000 f000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 3 address 00c01000 PID 0 <=> 9000 f000 0 

Re: TCG change broke MorphOS boot on sam460ex

2024-03-21 Thread BALATON Zoltan

On 27/2/24 17:47, BALATON Zoltan wrote:

Hello,

Commit 18a536f1f8 (accel/tcg: Always require can_do_io) broke booting 
MorphOS on sam460ex (this was before 8.2.0 and I thought I've verified 
it before that release but apparently missed it back then). It can be 
reproduced with https://www.morphos-team.net/morphos-3.18.iso and 
following command:


qemu-system-ppc -M sam460ex -serial stdio -d unimp,guest_errors \
   -drive if=none,id=cd,format=raw,file=morphos-3.18.iso \
   -device ide-cd,drive=cd,bus=ide.1


Although it breaks at the TCG change it may also be related to tlbwe 
changes somehow but I don't really understand it. I've tried to get some 
more debug info in case somebody can tell what's happening. With 
18a536f1f8^ (the commit before the one it broke at and still works) I get:



IN:
ppcemb_tlb_check: TLB 0 address 00c01000 PID 0 <=> f000 f000 0 7f
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 1 address 00c01000 PID 0 <=> d000 f000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 2 address 00c01000 PID 0 <=> 8000 f000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 3 address 00c01000 PID 0 <=> 9000 f000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 4 address 00c01000 PID 0 <=> a000 f000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 5 address 00c01000 PID 0 <=> b000 f000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 6 address 00c01000 PID 0 <=> c000 f000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 7 address 00c01000 PID 0 <=> e000 ff00 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 8 address 00c01000 PID 0 <=> e100 ff00 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 9 address 00c01000 PID 0 <=> e300 fc00 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 10 address 00c01000 PID 0 <=> e3001000 fc00 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 11 address 00c01000 PID 0 <=> e400 c000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 12 address 00c01000 PID 0 <=> e500 fff0 0 7f
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 13 address 00c01000 PID 0 <=> ef00 ff00 0 7f
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 14 address 00c01000 PID 0 <=> e200 fff0 0 7f
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 15 address 00c01000 PID 0 <=>  f000 0 7f
mmubooke_check_tlb: good TLB!
mmubooke_get_physical_address: access granted 00c01000 => 00c01000 7 0
0x00c01354:  38c00040  li   r6, 0x40
0x00c01358:  38e10204  addi r7, r1, 0x204
0x00c0135c:  39010104  addi r8, r1, 0x104
0x00c01360:  39410004  addi r10, r1, 4
0x00c01364:  3920  li   r9, 0
0x00c01368:  7cc903a6  mtctrr6
0x00c0136c:  84c70004  lwzu r6, 4(r7)
0x00c01370:  7cc907a4  tlbwehi  r6, r9
0x00c01374:  84c80004  lwzu r6, 4(r8)
0x00c01378:  7cc90fa4  tlbwelo  r6, r9
0x00c0137c:  84ca0004  lwzu r6, 4(r10)
0x00c01380:  7cc917a4  tlbwehi  r6, r9
0x00c01384:  39290001  addi r9, r9, 1
0x00c01388:  4200ffe4  bdnz 0xc0136c

helper_440_tlbwe word 0 entry 0 value 0290
ppcemb_tlb_check: TLB 0 address 0df6bfb0 PID 0 <=>  f000 0 7f
mmubooke_check_tlb: good TLB!
mmubooke_get_physical_address: access granted 0df6bfb0 => 0004fdf6bfb0 7 0
Invalid read at addr 0x4FDF6BFB0, size 4, region '(null)', reason: rejected
helper_440_tlbwe word 1 entry 0 value 
ppcemb_tlb_check: TLB 0 address 0df6beb0 PID 0 <=>  f000 0 7f
mmubooke_check_tlb: good TLB!
mmubooke_get_physical_address: access granted 0df6beb0 => 0df6beb0 7 0
helper_440_tlbwe word 2 entry 0 value 003f
ppcemb_tlb_check: TLB 0 address 00c0136c PID 0 <=>  f000 0 7f
mmubooke_check_tlb: good TLB!
mmubooke_get_physical_address: access granted 00c0136c => 00c0136c 7 0


and with commit 18a536f1f8 this changes to


IN:
ppcemb_tlb_check: TLB 0 address 00c01000 PID 0 <=> f000 f000 0 7f
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 1 address 00c01000 PID 0 <=> d000 f000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 2 address 00c01000 PID 0 <=> 8000 f000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 3 address 00c01000 PID 0 <=> 9000 f000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 4 address 00c01000 PID 0 <=> a000 f000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 5 address 00c01000 PID 0 <=> b000 f000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 6 address 00c01000 PID 0 <=> c000 f000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 7 address 00c01000 PID 0 <=> 

Re: TCG change broke MorphOS boot on sam460ex

2024-02-27 Thread Philippe Mathieu-Daudé

Hi Zoltan,

On 27/2/24 17:47, BALATON Zoltan wrote:

Hello,

Commit 18a536f1f8 (accel/tcg: Always require can_do_io) broke booting 
MorphOS on sam460ex (this was before 8.2.0 and I thought I've verified 
it before that release but apparently missed it back then). It can be 
reproduced with https://www.morphos-team.net/morphos-3.18.iso and 
following command:


qemu-system-ppc -M sam460ex -serial stdio -d unimp,guest_errors \
   -drive if=none,id=cd,format=raw,file=morphos-3.18.iso \
   -device ide-cd,drive=cd,bus=ide.1

before:
Invalid read at addr 0xC08001216, size 1, region '(null)', reason: rejected
Invalid read at addr 0x216, size 1, region '(null)', reason: rejected
Invalid read at addr 0x4FDF6BFB0, size 4, region '(null)', reason: rejected
Invalid write at addr 0xE1014, size 4, region '(null)', reason: 
rejected
Invalid write at addr 0xE1214, size 4, region '(null)', reason: 
rejected
Invalid write at addr 0xE3014, size 4, region '(null)', reason: 
rejected
Invalid write at addr 0xE3214, size 4, region '(null)', reason: 
rejected

8.440| sam460_i2c_write: Error while writing, sts 34
8.463|
8.463|
8.463| ABox 1.30 (2.7.2018)...

after:
Invalid read at addr 0xC08001216, size 1, region '(null)', reason: rejected
Invalid read at addr 0x216, size 1, region '(null)', reason: rejected
Invalid read at addr 0x4F0C01374, size 4, region '(null)', reason: rejected
invalid/unsupported opcode: 00 - 00 - 00 - 00 () 00c01374
Invalid read at addr 0x4F700, size 4, region '(null)', reason: rejected
invalid/unsupported opcode: 00 - 00 - 00 - 00 () 0700

Not sure what it's trying to do here, maybe decompressing some code and 
then trying to execute it? Any idea what could be the problem or what to 
check further?


Are you testing with commit cf9b5790db ("accel/tcg: Remove CF_LAST_IO")
included?



TCG change broke MorphOS boot on sam460ex

2024-02-27 Thread BALATON Zoltan

Hello,

Commit 18a536f1f8 (accel/tcg: Always require can_do_io) broke booting 
MorphOS on sam460ex (this was before 8.2.0 and I thought I've verified it 
before that release but apparently missed it back then). It can be 
reproduced with https://www.morphos-team.net/morphos-3.18.iso and 
following command:


qemu-system-ppc -M sam460ex -serial stdio -d unimp,guest_errors \
  -drive if=none,id=cd,format=raw,file=morphos-3.18.iso \
  -device ide-cd,drive=cd,bus=ide.1

before:
Invalid read at addr 0xC08001216, size 1, region '(null)', reason: rejected
Invalid read at addr 0x216, size 1, region '(null)', reason: rejected
Invalid read at addr 0x4FDF6BFB0, size 4, region '(null)', reason: rejected
Invalid write at addr 0xE1014, size 4, region '(null)', reason: rejected
Invalid write at addr 0xE1214, size 4, region '(null)', reason: rejected
Invalid write at addr 0xE3014, size 4, region '(null)', reason: rejected
Invalid write at addr 0xE3214, size 4, region '(null)', reason: rejected
8.440| sam460_i2c_write: Error while writing, sts 34
8.463|
8.463|
8.463| ABox 1.30 (2.7.2018)...

after:
Invalid read at addr 0xC08001216, size 1, region '(null)', reason: rejected
Invalid read at addr 0x216, size 1, region '(null)', reason: rejected
Invalid read at addr 0x4F0C01374, size 4, region '(null)', reason: rejected
invalid/unsupported opcode: 00 - 00 - 00 - 00 () 00c01374
Invalid read at addr 0x4F700, size 4, region '(null)', reason: rejected
invalid/unsupported opcode: 00 - 00 - 00 - 00 () 0700

Not sure what it's trying to do here, maybe decompressing some code and 
then trying to execute it? Any idea what could be the problem or what to 
check further?


Regards,
BALATON Zoltan