Re: [Qemu-devel] linux-user target

2007-04-19 Thread J. Mayer
On Wed, 2007-04-18 at 16:42 -0400, Stuart Anderson wrote:
 On Thu, 19 Apr 2007, Igor Kovalenko wrote:
 
  as discussed before, to do this in dyngen you need to know the context
  better or you'll skip more than intended; that amounts to moving a
  large bit of decoder there as far as I understand that
 
 Yes, it was a quick hack along w/ visual inspection of the results.
 
 I have since tried the -mtune=nocona change, and it still crashes.

And I checked the code generated on my machine.
I got the repz at the end of the op_goto_tb0 and op_goto_tb1 and it
seems to work well here with the bash version I got.

-- 
J. Mayer [EMAIL PROTECTED]
Never organized





Re: [Qemu-devel] linux-user target

2007-04-19 Thread Stuart Anderson
On Thu, 19 Apr 2007, J. Mayer wrote:

 And I checked the code generated on my machine.
 I got the repz at the end of the op_goto_tb0 and op_goto_tb1 and it
 seems to work well here with the bash version I got.

IIrc from yesterday, they ended up in front of lea instuctions, which
I think always resulted in the same value being used, so no harm could
be done.


More digging last night, and this morning, and I have disovered that
a 32-bit build from the x86 system that works fine on the 32-bit system
will crash when run inside a 32-bit chroot on the x86_64 system (with
the save version of libraries in the chroot as are on the 32-bit system).
This suggests that there may be something in the execution environment
that is causing the problem. I traced both runs, and am comparing the
results. The first difference is where things get placed in the address
space. Stuff that is at 0x5xxx on one is at 0x7xxx on the other.
Same for 0xa and 0xfxxx. This doesn't seem to cause much of
a problem becasue if I use a simple text substitution on the log files, the
differences almost completely go away. At least that is true for the first
50,000 lines of the logs. Then, for some reason I haven't figured out yet,
the two instances start executing different instructions inside the guest.
I'll need to dig more to figure out it going on when that happens.


Another test which might be interesting is to trade executables with someone
that has a working build on an x86_64 system. The crash will either follow
my executable to someone elses system or their perfectly good executables
will start crashing on my system. I suspect it will be the latter, but it
would be nice to confirm it.

Note that I've had this system running under stress for quite a while now,
including a lot of runtime w/ qemu-arm, so I'm pretty certain it isn't
something mundane like bad RAM.

Sigh... it saddens me to think of the improvements to the rest of linux-user
that I could have finished in the amount of time I've spent on thhis one
problem 8-(.

Stuart

Stuart R. Anderson   [EMAIL PROTECTED]
Network  Software Engineering   http://www.netsweng.com/
1024D/37A79149:  0791 D3B8 9A4C 2CDC A31F
 BD03 0A62 E534 37A7 9149




Re: [Qemu-devel] linux-user target

2007-04-18 Thread Stuart Anderson

On Tue, 17 Apr 2007, Stuart Anderson wrote:


I've continued to work on this all week, and I still haven't managed to
solve it. I've chased down a lot of paths, but none of them have lead to
a solution. Here is a summary of the situation now.

* programs other than bash will run
* bash --version will run
* bash --noediting will run
* occasionally, bash has run if I'm stracing it, but I can't always
  reproduce it.
* when it runs, I occasionally see some odd behavior, but not always.
  The termios patch I just sent cleared up a lot of the oddness.
* when it runs, it hangs on exit. Killing it logs me all the way out
  of the system (ssh conection).
* when it crashes, gdb looses the user level thread, so I can't do any
  debugging
* I don't see any of the TLS related system calls being called. I also
  don't see any concrete proof one way or another that it is used in
  the executable (ie No R_PPC_*TLS relocations). I've been digging in
  the kernel  glibc source, and I don't see a lot of special code to
  support TLS on ppc. It mostly seems to be just taking care to not
  step on R2. Glibc seems to be the only place where it knows something
  specific about TLS, which leads me to think that TLS is mostly
  contain within the userspace on PPC.
* I've tried turning on most of the DEBUG_ defines under linux-user,
  but none of them has yielded anything useful, or noteworthy.


This morning, I went back and tried a 32-bit x86 host (instead of the
x86_64 host), and discovered that everything works just fine. This makes
me think it's a 64 bit issue, so I took a closer look at the build warnings
that exist on x86_64 but not on x86. This pointed to PPC_OP(goto_tb0) 
PPC_OP(goto_tb1) in target-ppc/op.c. It appears that x86_64 is using the
generic portable code, but one of the fields that it is taking as a
pointer (tb_next) is only an int. Changing it to a ulong didn't fix
things though, but it did eliminate the warning.

After more digging in the qemu.log, I noticed this difference that is
related to those two functions (op_goto_tb0  op_goto_tb1).

On x86:
0ebf op_goto_tb0:
 ebf:   e9 fc ff ff ff  jmpec0 op_goto_tb0+0x1
 ec4:   c3  ret

0ec5 op_goto_tb1:
 ec5:   e9 fc ff ff ff  jmpec6 op_goto_tb1+0x1
 eca:   c3  ret

On x86_64:
154e op_goto_tb0:
154e:   8b 05 00 00 00 00   mov0(%rip),%eax
1554:   ff e0   jmpq   *%rax
1556:   f3 c3   repz retq

1558 op_goto_tb1:
1558:   8b 05 00 00 00 00   mov0(%rip),%eax
155e:   ff e0   jmpq   *%rax
1560:   f3 c3   repz retq

Note repz before retq which is not in x86 code or in any other x86_64 op.


In use the micro ops are:
0x000d: goto_tb1 0x60233800
0x000e: set_T1 0x100a4df8
0x000f: b_T1

For which the generated code becomes
0x61a5998d:  mov-25321811(%rip),%eax# 0x60233840
0x61a59993:  jmpq   *%eax
0x61a59995:  repz lea -1369131941(%rip),%r12d# 0x100a4df8
0x61a5999d:  mov%r12d,%eax
0x61a599a0:  and$0xfffc,%eax
0x61a599a3:  mov%eax,0xc7f4(%r14)
0x61a599aa:  lea-25321904(%rip),%r15d# 0x60233801
0x61a599b1:  retq

The repz is still there from the goto_tb1 OP, but is now applied to the lea
isn from the set_T1 op.

Is this correct? Would it cause any kind of a problem?




Stuart

Stuart R. Anderson   [EMAIL PROTECTED]
Network  Software Engineering   http://www.netsweng.com/
1024D/37A79149:  0791 D3B8 9A4C 2CDC A31F
 BD03 0A62 E534 37A7 9149




Re: [Qemu-devel] linux-user target

2007-04-18 Thread J. Mayer
On Wed, 2007-04-18 at 13:31 -0400, Stuart Anderson wrote:
 On Tue, 17 Apr 2007, Stuart Anderson wrote:
 
  I've continued to work on this all week, and I still haven't managed to
  solve it. I've chased down a lot of paths, but none of them have lead to
  a solution. Here is a summary of the situation now.
 
  * programs other than bash will run
  * bash --version will run
  * bash --noediting will run
  * occasionally, bash has run if I'm stracing it, but I can't always
reproduce it.
  * when it runs, I occasionally see some odd behavior, but not always.
The termios patch I just sent cleared up a lot of the oddness.
  * when it runs, it hangs on exit. Killing it logs me all the way out
of the system (ssh conection).
  * when it crashes, gdb looses the user level thread, so I can't do any
debugging
  * I don't see any of the TLS related system calls being called. I also
don't see any concrete proof one way or another that it is used in
the executable (ie No R_PPC_*TLS relocations). I've been digging in
the kernel  glibc source, and I don't see a lot of special code to
support TLS on ppc. It mostly seems to be just taking care to not
step on R2. Glibc seems to be the only place where it knows something
specific about TLS, which leads me to think that TLS is mostly
contain within the userspace on PPC.

You're right: I think all TLS specific code is located in the glibc.

  * I've tried turning on most of the DEBUG_ defines under linux-user,
but none of them has yielded anything useful, or noteworthy.
 
 This morning, I went back and tried a 32-bit x86 host (instead of the
 x86_64 host), and discovered that everything works just fine. This makes
 me think it's a 64 bit issue, so I took a closer look at the build warnings
 that exist on x86_64 but not on x86. This pointed to PPC_OP(goto_tb0) 
 PPC_OP(goto_tb1) in target-ppc/op.c. It appears that x86_64 is using the
 generic portable code, but one of the fields that it is taking as a
 pointer (tb_next) is only an int. Changing it to a ulong didn't fix
 things though, but it did eliminate the warning.
 
 After more digging in the qemu.log, I noticed this difference that is
 related to those two functions (op_goto_tb0  op_goto_tb1).
[...]
 Is this correct? Would it cause any kind of a problem?

I don't think this is the problem.
- those functions are also used for system emulation. So the bug would
not be restricted to user mode emulation if this was the source of the
problem
- my development machine is an amd64 one and the test I did in a
previous mail were done on this machine, in 64 bits mode. So I'm quite
sure linux-user is able to run on a 64 bits host. There may still be 64
bits related bugs in Qemu but it seems the major ones have already been
fixed. It may be related to some of the library versions installed in
your 64 bits environment that would not be the same as the one used in
the 32 bits environment.
- I'm not a specialist of the x86 architecture but the generated code
seems correct to me and I don't think the repz instruction is so
important in that particular case (but I may be wrong). One important
precision that may make a big difference: I always use gcc 3.4 to
compile because I know several  gcc 4.x bugs (crash during ISO C
compliant code and/or incorrect generated asm instructions), then I do
not consider gcc 4.x as usable for a production environment today. Maybe
this is the reason why Qemu runs OK on my machine but not on yours.
For your information, my testing system is an up-to-dat Gentoo Linux
distribution. My 32 bits test environment is another Gentoo distribution
in a chrooted environment, running on the same machine.

-- 
J. Mayer [EMAIL PROTECTED]
Never organized





Re: [Qemu-devel] linux-user target

2007-04-18 Thread Stuart Anderson

On Wed, 18 Apr 2007, J. Mayer wrote:


You're right: I think all TLS specific code is located in the glibc.


In my last tracing through qemu.log, I did check for r2 references, and
there was one store near the beginning that looked like what glibc would
do (r2 = ptr+0x700), and the rest of the access were reads of r2.



It may be related to some of the library versions installed in
your 64 bits environment that would not be the same as the one used in
the 32 bits environment.


Both are current Debian etch systems, a real x86_64, and a real x86.
Both are running the same library versions.

ii  libc6  2.3.6.ds1-13 GNU C Library: Shared libraries



One important
precision that may make a big difference: I always use gcc 3.4 to
compile because I know several  gcc 4.x bugs (crash during ISO C
compliant code and/or incorrect generated asm instructions), then I do
not consider gcc 4.x as usable for a production environment today.


I'm using gcc-3.4 as well.

ii  gcc-3.43.4.6-5 The GNU C compiler
ii  gcc-3.4-base   3.4.6-5 The GNU Compiler Collection (base package)



Stuart

Stuart R. Anderson   [EMAIL PROTECTED]
Network  Software Engineering   http://www.netsweng.com/
1024D/37A79149:  0791 D3B8 9A4C 2CDC A31F
 BD03 0A62 E534 37A7 9149




Re: [Qemu-devel] linux-user target

2007-04-18 Thread Igor Kovalenko

On 4/18/07, Stuart Anderson [EMAIL PROTECTED] wrote:

On Wed, 18 Apr 2007, J. Mayer wrote:

 You're right: I think all TLS specific code is located in the glibc.

In my last tracing through qemu.log, I did check for r2 references, and
there was one store near the beginning that looked like what glibc would
do (r2 = ptr+0x700), and the rest of the access were reads of r2.


 It may be related to some of the library versions installed in
 your 64 bits environment that would not be the same as the one used in
 the 32 bits environment.

Both are current Debian etch systems, a real x86_64, and a real x86.
Both are running the same library versions.

ii  libc6  2.3.6.ds1-13 GNU C Library: Shared libraries


 One important
 precision that may make a big difference: I always use gcc 3.4 to
 compile because I know several  gcc 4.x bugs (crash during ISO C
 compliant code and/or incorrect generated asm instructions), then I do
 not consider gcc 4.x as usable for a production environment today.

I'm using gcc-3.4 as well.

ii  gcc-3.43.4.6-5 The GNU C compiler
ii  gcc-3.4-base   3.4.6-5 The GNU Compiler Collection (base package)



 Stuart

Stuart R. Anderson   [EMAIL PROTECTED]
Network  Software Engineering   http://www.netsweng.com/
1024D/37A79149:  0791 D3B8 9A4C 2CDC A31F
  BD03 0A62 E534 37A7 9149





This should be solved for x86_64 host with -mtune=nocona patch
posted a while ago.
The problem is with dyngen being confused by repz retq sequence.

--
Kind regards,
Igor V. Kovalenko




Re: [Qemu-devel] linux-user target

2007-04-18 Thread Stuart Anderson

On Wed, 18 Apr 2007, Igor Kovalenko wrote:


This should be solved for x86_64 host with -mtune=nocona patch
posted a while ago.


I'll go dig that up.


The problem is with dyngen being confused by repz retq sequence.


That's what caught my attention earlier today. It was only showing up in
two places in the generated code. I fixed it by hand by tweaking dyngen to
skip the repz as well as the retq, but I still got the crash.

It is of course possible that I'm seeing multiple crashes and not
recognizing when I have actually fixed something.



Stuart

Stuart R. Anderson   [EMAIL PROTECTED]
Network  Software Engineering   http://www.netsweng.com/
1024D/37A79149:  0791 D3B8 9A4C 2CDC A31F
 BD03 0A62 E534 37A7 9149




Re: [Qemu-devel] linux-user target

2007-04-18 Thread Igor Kovalenko

On 4/19/07, Stuart Anderson [EMAIL PROTECTED] wrote:

On Wed, 18 Apr 2007, Igor Kovalenko wrote:

 This should be solved for x86_64 host with -mtune=nocona patch
 posted a while ago.

I'll go dig that up.


Here, http://lists.gnu.org/archive/html/qemu-devel/2007-03/msg00198.html


 The problem is with dyngen being confused by repz retq sequence.

That's what caught my attention earlier today. It was only showing up in
two places in the generated code. I fixed it by hand by tweaking dyngen to
skip the repz as well as the retq, but I still got the crash.


as discussed before, to do this in dyngen you need to know the context
better or you'll skip more than intended; that amounts to moving a
large bit of decoder there as far as I understand that


It is of course possible that I'm seeing multiple crashes and not
recognizing when I have actually fixed something.



--
Kind regards,
Igor V. Kovalenko




Re: [Qemu-devel] linux-user target

2007-04-18 Thread Stuart Anderson

On Thu, 19 Apr 2007, Igor Kovalenko wrote:


as discussed before, to do this in dyngen you need to know the context
better or you'll skip more than intended; that amounts to moving a
large bit of decoder there as far as I understand that


Yes, it was a quick hack along w/ visual inspection of the results.

I have since tried the -mtune=nocona change, and it still crashes.



Stuart

Stuart R. Anderson   [EMAIL PROTECTED]
Network  Software Engineering   http://www.netsweng.com/
1024D/37A79149:  0791 D3B8 9A4C 2CDC A31F
 BD03 0A62 E534 37A7 9149




Re: [Qemu-devel] linux-user target

2007-04-17 Thread Stuart Anderson

On Tue, 10 Apr 2007, Jocelyn Mayer wrote:


PPC:

I am unable to get any executable to run.


projects:~/upstream/qemu# ./ppc-linux-user/qemu-ppc -L /mirror0/chroots/ppc/ 
/mirror0/chroots/ppc/bin/bash
init_ppc_proc: PVR 0008 mask  = 0008
Segmentation fault
projects:~/upstream/qemu#


Just checked, on an amd64 host with a random powerpc bash version I got
on my hard disk drive:

./ppc-linux-user/qemu-ppc
-L /mnt/local/hdc/part3/PPC/linux/archives 
/mnt/local/hdc/part3/PPC/linux/archives/bin/bash --version
init_ppc_proc: PVR 0008 mask  = 0008
GNU bash, version 2.05a.0(1)-release (powerpc-unknown-linux-gnu)
Copyright 2001 Free Software Foundation, Inc.

I also tried to really launch the shell and use it and it worked.

... I have to admit there are some strange behaviors with some
features...

But I think recent builds using glibc with TLS/NPTL would not run.



I've continued to work on this all week, and I still haven't managed to
solve it. I've chased down a lot of paths, but none of them have lead to
a solution. Here is a summary of the situation now.

 * programs other than bash will run
 * bash --version will run
 * bash --noediting will run
 * occasionally, bash has run if I'm stracing it, but I can't always
   reproduce it.
 * when it runs, I occasionally see some odd behavior, but not always.
   The termios patch I just sent cleared up a lot of the oddness.
 * when it runs, it hangs on exit. Killing it logs me all the way out
   of the system (ssh conection).
 * when it crashes, gdb looses the user level thread, so I can't do any
   debugging
 * I don't see any of the TLS related system calls being called. I also
   don't see any concrete proof one way or another that it is used in
   the executable (ie No R_PPC_*TLS relocations). I've been digging in
   the kernel  glibc source, and I don't see a lot of special code to
   support TLS on ppc. It mostly seems to be just taking care to not
   step on R2. Glibc seems to be the only place where it knows something
   specific about TLS, which leads me to think that TLS is mostly
   contain within the userspace on PPC.
 * I've tried turning on most of the DEBUG_ defines under linux-user,
   but none of them has yielded anything useful, or noteworthy.

Whew..

I'm in need a of a fresh idea or three.


Stuart

Stuart R. Anderson   [EMAIL PROTECTED]
Network  Software Engineering   http://www.netsweng.com/
1024D/37A79149:  0791 D3B8 9A4C 2CDC A31F
 BD03 0A62 E534 37A7 9149




Re: [Qemu-devel] linux-user target

2007-04-10 Thread Jocelyn Mayer
On Tue, 2007-04-10 at 09:34 -0400, Stuart Anderson wrote:
 I'm trying to test my fixes to the linux-user emulation on some additonal
 architectures now, but I'm running into problems. I can debug these some,
 but any suggestions or guidence, especially from people more familiar
 with the architecture core code, would be appreciated.

[...]

 PPC:
 
 I am unable to get any executable to run.
 
 
 projects:~/upstream/qemu# ./ppc-linux-user/qemu-ppc -L /mirror0/chroots/ppc/ 
 /mirror0/chroots/ppc/bin/bash
 init_ppc_proc: PVR 0008 mask  = 0008
 Segmentation fault
 projects:~/upstream/qemu#
 

Just checked, on an amd64 host with a random powerpc bash version I got
on my hard disk drive:

./ppc-linux-user/qemu-ppc
-L /mnt/local/hdc/part3/PPC/linux/archives 
/mnt/local/hdc/part3/PPC/linux/archives/bin/bash --version
init_ppc_proc: PVR 0008 mask  = 0008
GNU bash, version 2.05a.0(1)-release (powerpc-unknown-linux-gnu)
Copyright 2001 Free Software Foundation, Inc.

./ppc-linux-user/qemu-ppc
-L /mnt/local/hdc/part3/PPC/linux/archives 
/mnt/local/hdc/part3/PPC/linux/archives/bin/bash.static --version
init_ppc_proc: PVR 0008 mask  = 0008
GNU bash, version 2.05b.0(1)-release (powerpc-unknown-linux-gnu)
Copyright (C) 2002 Free Software Foundation, Inc.

I also tried to really launch the shell and use it and it worked.

[EMAIL PROTECTED]:qemu-merge\ ps ww
[...]
29026 pts/57   Ss+0:00 bash -rcfile .bashrc
29269 pts/33   S  0:00 ./ppc-linux-user/qemu-ppc
-L /mnt/local/hdc/part3/PPC/linux/archives 
/mnt/local/hdc/part3/PPC/linux/archives/bin/bash
29930 pts/33   R+ 0:00 ps ww
[EMAIL PROTECTED]:qemu-merge\ ps
  PID TTY  TIME CMD
28941 pts/33   00:00:00 bash
29269 pts/33   00:00:00 qemu-ppc
30017 pts/33   00:00:00 ps
[EMAIL PROTECTED]:qemu-merge\

... I have to admit there are some strange behaviors with some
features...

But I think recent builds using glibc with TLS/NPTL would not run.
Here are informations about the glibc used by this bash version:

./ppc-linux-user/qemu-ppc
-L /mnt/local/hdc/part3/PPC/linux/archives 
/mnt/local/hdc/part3/PPC/linux/archives/usr/bin/objdump -f 
/mnt/local/hdc/part3/PPC/linux/archives/lib/libc-2.2.5.so 
init_ppc_proc: PVR 0008 mask  = 0008

/mnt/local/hdc/part3/PPC/linux/archives/lib/libc-2.2.5.so: file
format elf32-powerpc
architecture: powerpc:common, flags 0x0150:
HAS_SYMS, DYNAMIC, D_PAGED
start address 0x00025db4

file /mnt/local/hdc/part3/PPC/linux/archives/bin/bash 
/mnt/local/hdc/part3/PPC/linux/archives/bin/bash: ELF 32-bit MSB
executable, PowerPC or cisco 4500, version 1 (SYSV), for GNU/Linux
2.2.0, dynamically linked (uses shared libs), stripped

If this can help.

-- 
Jocelyn Mayer [EMAIL PROTECTED]





Re: [Qemu-devel] linux-user target

2007-04-10 Thread Stuart Anderson
On Tue, 10 Apr 2007, Jocelyn Mayer wrote:

 Just checked, on an amd64 host with a random powerpc bash version I got
 on my hard disk drive:

 I also tried to really launch the shell and use it and it worked.

Interesting...

 But I think recent builds using glibc with TLS/NPTL would not run.

Ahh. that's probably it.  The executables I'm using are build with libc-2.3.6.

 If this can help.

Indeed, it has probably narrowed the search space considerably.

Thanks.


Stuart

Stuart R. Anderson   [EMAIL PROTECTED]
Network  Software Engineering   http://www.netsweng.com/
1024D/37A79149:  0791 D3B8 9A4C 2CDC A31F
 BD03 0A62 E534 37A7 9149