Re: [Qemu-devel] linux-user target
On Wed, 2007-04-18 at 16:42 -0400, Stuart Anderson wrote: On Thu, 19 Apr 2007, Igor Kovalenko wrote: as discussed before, to do this in dyngen you need to know the context better or you'll skip more than intended; that amounts to moving a large bit of decoder there as far as I understand that Yes, it was a quick hack along w/ visual inspection of the results. I have since tried the -mtune=nocona change, and it still crashes. And I checked the code generated on my machine. I got the repz at the end of the op_goto_tb0 and op_goto_tb1 and it seems to work well here with the bash version I got. -- J. Mayer [EMAIL PROTECTED] Never organized
Re: [Qemu-devel] linux-user target
On Thu, 19 Apr 2007, J. Mayer wrote: And I checked the code generated on my machine. I got the repz at the end of the op_goto_tb0 and op_goto_tb1 and it seems to work well here with the bash version I got. IIrc from yesterday, they ended up in front of lea instuctions, which I think always resulted in the same value being used, so no harm could be done. More digging last night, and this morning, and I have disovered that a 32-bit build from the x86 system that works fine on the 32-bit system will crash when run inside a 32-bit chroot on the x86_64 system (with the save version of libraries in the chroot as are on the 32-bit system). This suggests that there may be something in the execution environment that is causing the problem. I traced both runs, and am comparing the results. The first difference is where things get placed in the address space. Stuff that is at 0x5xxx on one is at 0x7xxx on the other. Same for 0xa and 0xfxxx. This doesn't seem to cause much of a problem becasue if I use a simple text substitution on the log files, the differences almost completely go away. At least that is true for the first 50,000 lines of the logs. Then, for some reason I haven't figured out yet, the two instances start executing different instructions inside the guest. I'll need to dig more to figure out it going on when that happens. Another test which might be interesting is to trade executables with someone that has a working build on an x86_64 system. The crash will either follow my executable to someone elses system or their perfectly good executables will start crashing on my system. I suspect it will be the latter, but it would be nice to confirm it. Note that I've had this system running under stress for quite a while now, including a lot of runtime w/ qemu-arm, so I'm pretty certain it isn't something mundane like bad RAM. Sigh... it saddens me to think of the improvements to the rest of linux-user that I could have finished in the amount of time I've spent on thhis one problem 8-(. Stuart Stuart R. Anderson [EMAIL PROTECTED] Network Software Engineering http://www.netsweng.com/ 1024D/37A79149: 0791 D3B8 9A4C 2CDC A31F BD03 0A62 E534 37A7 9149
Re: [Qemu-devel] linux-user target
On Tue, 17 Apr 2007, Stuart Anderson wrote: I've continued to work on this all week, and I still haven't managed to solve it. I've chased down a lot of paths, but none of them have lead to a solution. Here is a summary of the situation now. * programs other than bash will run * bash --version will run * bash --noediting will run * occasionally, bash has run if I'm stracing it, but I can't always reproduce it. * when it runs, I occasionally see some odd behavior, but not always. The termios patch I just sent cleared up a lot of the oddness. * when it runs, it hangs on exit. Killing it logs me all the way out of the system (ssh conection). * when it crashes, gdb looses the user level thread, so I can't do any debugging * I don't see any of the TLS related system calls being called. I also don't see any concrete proof one way or another that it is used in the executable (ie No R_PPC_*TLS relocations). I've been digging in the kernel glibc source, and I don't see a lot of special code to support TLS on ppc. It mostly seems to be just taking care to not step on R2. Glibc seems to be the only place where it knows something specific about TLS, which leads me to think that TLS is mostly contain within the userspace on PPC. * I've tried turning on most of the DEBUG_ defines under linux-user, but none of them has yielded anything useful, or noteworthy. This morning, I went back and tried a 32-bit x86 host (instead of the x86_64 host), and discovered that everything works just fine. This makes me think it's a 64 bit issue, so I took a closer look at the build warnings that exist on x86_64 but not on x86. This pointed to PPC_OP(goto_tb0) PPC_OP(goto_tb1) in target-ppc/op.c. It appears that x86_64 is using the generic portable code, but one of the fields that it is taking as a pointer (tb_next) is only an int. Changing it to a ulong didn't fix things though, but it did eliminate the warning. After more digging in the qemu.log, I noticed this difference that is related to those two functions (op_goto_tb0 op_goto_tb1). On x86: 0ebf op_goto_tb0: ebf: e9 fc ff ff ff jmpec0 op_goto_tb0+0x1 ec4: c3 ret 0ec5 op_goto_tb1: ec5: e9 fc ff ff ff jmpec6 op_goto_tb1+0x1 eca: c3 ret On x86_64: 154e op_goto_tb0: 154e: 8b 05 00 00 00 00 mov0(%rip),%eax 1554: ff e0 jmpq *%rax 1556: f3 c3 repz retq 1558 op_goto_tb1: 1558: 8b 05 00 00 00 00 mov0(%rip),%eax 155e: ff e0 jmpq *%rax 1560: f3 c3 repz retq Note repz before retq which is not in x86 code or in any other x86_64 op. In use the micro ops are: 0x000d: goto_tb1 0x60233800 0x000e: set_T1 0x100a4df8 0x000f: b_T1 For which the generated code becomes 0x61a5998d: mov-25321811(%rip),%eax# 0x60233840 0x61a59993: jmpq *%eax 0x61a59995: repz lea -1369131941(%rip),%r12d# 0x100a4df8 0x61a5999d: mov%r12d,%eax 0x61a599a0: and$0xfffc,%eax 0x61a599a3: mov%eax,0xc7f4(%r14) 0x61a599aa: lea-25321904(%rip),%r15d# 0x60233801 0x61a599b1: retq The repz is still there from the goto_tb1 OP, but is now applied to the lea isn from the set_T1 op. Is this correct? Would it cause any kind of a problem? Stuart Stuart R. Anderson [EMAIL PROTECTED] Network Software Engineering http://www.netsweng.com/ 1024D/37A79149: 0791 D3B8 9A4C 2CDC A31F BD03 0A62 E534 37A7 9149
Re: [Qemu-devel] linux-user target
On Wed, 2007-04-18 at 13:31 -0400, Stuart Anderson wrote: On Tue, 17 Apr 2007, Stuart Anderson wrote: I've continued to work on this all week, and I still haven't managed to solve it. I've chased down a lot of paths, but none of them have lead to a solution. Here is a summary of the situation now. * programs other than bash will run * bash --version will run * bash --noediting will run * occasionally, bash has run if I'm stracing it, but I can't always reproduce it. * when it runs, I occasionally see some odd behavior, but not always. The termios patch I just sent cleared up a lot of the oddness. * when it runs, it hangs on exit. Killing it logs me all the way out of the system (ssh conection). * when it crashes, gdb looses the user level thread, so I can't do any debugging * I don't see any of the TLS related system calls being called. I also don't see any concrete proof one way or another that it is used in the executable (ie No R_PPC_*TLS relocations). I've been digging in the kernel glibc source, and I don't see a lot of special code to support TLS on ppc. It mostly seems to be just taking care to not step on R2. Glibc seems to be the only place where it knows something specific about TLS, which leads me to think that TLS is mostly contain within the userspace on PPC. You're right: I think all TLS specific code is located in the glibc. * I've tried turning on most of the DEBUG_ defines under linux-user, but none of them has yielded anything useful, or noteworthy. This morning, I went back and tried a 32-bit x86 host (instead of the x86_64 host), and discovered that everything works just fine. This makes me think it's a 64 bit issue, so I took a closer look at the build warnings that exist on x86_64 but not on x86. This pointed to PPC_OP(goto_tb0) PPC_OP(goto_tb1) in target-ppc/op.c. It appears that x86_64 is using the generic portable code, but one of the fields that it is taking as a pointer (tb_next) is only an int. Changing it to a ulong didn't fix things though, but it did eliminate the warning. After more digging in the qemu.log, I noticed this difference that is related to those two functions (op_goto_tb0 op_goto_tb1). [...] Is this correct? Would it cause any kind of a problem? I don't think this is the problem. - those functions are also used for system emulation. So the bug would not be restricted to user mode emulation if this was the source of the problem - my development machine is an amd64 one and the test I did in a previous mail were done on this machine, in 64 bits mode. So I'm quite sure linux-user is able to run on a 64 bits host. There may still be 64 bits related bugs in Qemu but it seems the major ones have already been fixed. It may be related to some of the library versions installed in your 64 bits environment that would not be the same as the one used in the 32 bits environment. - I'm not a specialist of the x86 architecture but the generated code seems correct to me and I don't think the repz instruction is so important in that particular case (but I may be wrong). One important precision that may make a big difference: I always use gcc 3.4 to compile because I know several gcc 4.x bugs (crash during ISO C compliant code and/or incorrect generated asm instructions), then I do not consider gcc 4.x as usable for a production environment today. Maybe this is the reason why Qemu runs OK on my machine but not on yours. For your information, my testing system is an up-to-dat Gentoo Linux distribution. My 32 bits test environment is another Gentoo distribution in a chrooted environment, running on the same machine. -- J. Mayer [EMAIL PROTECTED] Never organized
Re: [Qemu-devel] linux-user target
On Wed, 18 Apr 2007, J. Mayer wrote: You're right: I think all TLS specific code is located in the glibc. In my last tracing through qemu.log, I did check for r2 references, and there was one store near the beginning that looked like what glibc would do (r2 = ptr+0x700), and the rest of the access were reads of r2. It may be related to some of the library versions installed in your 64 bits environment that would not be the same as the one used in the 32 bits environment. Both are current Debian etch systems, a real x86_64, and a real x86. Both are running the same library versions. ii libc6 2.3.6.ds1-13 GNU C Library: Shared libraries One important precision that may make a big difference: I always use gcc 3.4 to compile because I know several gcc 4.x bugs (crash during ISO C compliant code and/or incorrect generated asm instructions), then I do not consider gcc 4.x as usable for a production environment today. I'm using gcc-3.4 as well. ii gcc-3.43.4.6-5 The GNU C compiler ii gcc-3.4-base 3.4.6-5 The GNU Compiler Collection (base package) Stuart Stuart R. Anderson [EMAIL PROTECTED] Network Software Engineering http://www.netsweng.com/ 1024D/37A79149: 0791 D3B8 9A4C 2CDC A31F BD03 0A62 E534 37A7 9149
Re: [Qemu-devel] linux-user target
On 4/18/07, Stuart Anderson [EMAIL PROTECTED] wrote: On Wed, 18 Apr 2007, J. Mayer wrote: You're right: I think all TLS specific code is located in the glibc. In my last tracing through qemu.log, I did check for r2 references, and there was one store near the beginning that looked like what glibc would do (r2 = ptr+0x700), and the rest of the access were reads of r2. It may be related to some of the library versions installed in your 64 bits environment that would not be the same as the one used in the 32 bits environment. Both are current Debian etch systems, a real x86_64, and a real x86. Both are running the same library versions. ii libc6 2.3.6.ds1-13 GNU C Library: Shared libraries One important precision that may make a big difference: I always use gcc 3.4 to compile because I know several gcc 4.x bugs (crash during ISO C compliant code and/or incorrect generated asm instructions), then I do not consider gcc 4.x as usable for a production environment today. I'm using gcc-3.4 as well. ii gcc-3.43.4.6-5 The GNU C compiler ii gcc-3.4-base 3.4.6-5 The GNU Compiler Collection (base package) Stuart Stuart R. Anderson [EMAIL PROTECTED] Network Software Engineering http://www.netsweng.com/ 1024D/37A79149: 0791 D3B8 9A4C 2CDC A31F BD03 0A62 E534 37A7 9149 This should be solved for x86_64 host with -mtune=nocona patch posted a while ago. The problem is with dyngen being confused by repz retq sequence. -- Kind regards, Igor V. Kovalenko
Re: [Qemu-devel] linux-user target
On Wed, 18 Apr 2007, Igor Kovalenko wrote: This should be solved for x86_64 host with -mtune=nocona patch posted a while ago. I'll go dig that up. The problem is with dyngen being confused by repz retq sequence. That's what caught my attention earlier today. It was only showing up in two places in the generated code. I fixed it by hand by tweaking dyngen to skip the repz as well as the retq, but I still got the crash. It is of course possible that I'm seeing multiple crashes and not recognizing when I have actually fixed something. Stuart Stuart R. Anderson [EMAIL PROTECTED] Network Software Engineering http://www.netsweng.com/ 1024D/37A79149: 0791 D3B8 9A4C 2CDC A31F BD03 0A62 E534 37A7 9149
Re: [Qemu-devel] linux-user target
On 4/19/07, Stuart Anderson [EMAIL PROTECTED] wrote: On Wed, 18 Apr 2007, Igor Kovalenko wrote: This should be solved for x86_64 host with -mtune=nocona patch posted a while ago. I'll go dig that up. Here, http://lists.gnu.org/archive/html/qemu-devel/2007-03/msg00198.html The problem is with dyngen being confused by repz retq sequence. That's what caught my attention earlier today. It was only showing up in two places in the generated code. I fixed it by hand by tweaking dyngen to skip the repz as well as the retq, but I still got the crash. as discussed before, to do this in dyngen you need to know the context better or you'll skip more than intended; that amounts to moving a large bit of decoder there as far as I understand that It is of course possible that I'm seeing multiple crashes and not recognizing when I have actually fixed something. -- Kind regards, Igor V. Kovalenko
Re: [Qemu-devel] linux-user target
On Thu, 19 Apr 2007, Igor Kovalenko wrote: as discussed before, to do this in dyngen you need to know the context better or you'll skip more than intended; that amounts to moving a large bit of decoder there as far as I understand that Yes, it was a quick hack along w/ visual inspection of the results. I have since tried the -mtune=nocona change, and it still crashes. Stuart Stuart R. Anderson [EMAIL PROTECTED] Network Software Engineering http://www.netsweng.com/ 1024D/37A79149: 0791 D3B8 9A4C 2CDC A31F BD03 0A62 E534 37A7 9149
Re: [Qemu-devel] linux-user target
On Tue, 10 Apr 2007, Jocelyn Mayer wrote: PPC: I am unable to get any executable to run. projects:~/upstream/qemu# ./ppc-linux-user/qemu-ppc -L /mirror0/chroots/ppc/ /mirror0/chroots/ppc/bin/bash init_ppc_proc: PVR 0008 mask = 0008 Segmentation fault projects:~/upstream/qemu# Just checked, on an amd64 host with a random powerpc bash version I got on my hard disk drive: ./ppc-linux-user/qemu-ppc -L /mnt/local/hdc/part3/PPC/linux/archives /mnt/local/hdc/part3/PPC/linux/archives/bin/bash --version init_ppc_proc: PVR 0008 mask = 0008 GNU bash, version 2.05a.0(1)-release (powerpc-unknown-linux-gnu) Copyright 2001 Free Software Foundation, Inc. I also tried to really launch the shell and use it and it worked. ... I have to admit there are some strange behaviors with some features... But I think recent builds using glibc with TLS/NPTL would not run. I've continued to work on this all week, and I still haven't managed to solve it. I've chased down a lot of paths, but none of them have lead to a solution. Here is a summary of the situation now. * programs other than bash will run * bash --version will run * bash --noediting will run * occasionally, bash has run if I'm stracing it, but I can't always reproduce it. * when it runs, I occasionally see some odd behavior, but not always. The termios patch I just sent cleared up a lot of the oddness. * when it runs, it hangs on exit. Killing it logs me all the way out of the system (ssh conection). * when it crashes, gdb looses the user level thread, so I can't do any debugging * I don't see any of the TLS related system calls being called. I also don't see any concrete proof one way or another that it is used in the executable (ie No R_PPC_*TLS relocations). I've been digging in the kernel glibc source, and I don't see a lot of special code to support TLS on ppc. It mostly seems to be just taking care to not step on R2. Glibc seems to be the only place where it knows something specific about TLS, which leads me to think that TLS is mostly contain within the userspace on PPC. * I've tried turning on most of the DEBUG_ defines under linux-user, but none of them has yielded anything useful, or noteworthy. Whew.. I'm in need a of a fresh idea or three. Stuart Stuart R. Anderson [EMAIL PROTECTED] Network Software Engineering http://www.netsweng.com/ 1024D/37A79149: 0791 D3B8 9A4C 2CDC A31F BD03 0A62 E534 37A7 9149
Re: [Qemu-devel] linux-user target
On Tue, 2007-04-10 at 09:34 -0400, Stuart Anderson wrote: I'm trying to test my fixes to the linux-user emulation on some additonal architectures now, but I'm running into problems. I can debug these some, but any suggestions or guidence, especially from people more familiar with the architecture core code, would be appreciated. [...] PPC: I am unable to get any executable to run. projects:~/upstream/qemu# ./ppc-linux-user/qemu-ppc -L /mirror0/chroots/ppc/ /mirror0/chroots/ppc/bin/bash init_ppc_proc: PVR 0008 mask = 0008 Segmentation fault projects:~/upstream/qemu# Just checked, on an amd64 host with a random powerpc bash version I got on my hard disk drive: ./ppc-linux-user/qemu-ppc -L /mnt/local/hdc/part3/PPC/linux/archives /mnt/local/hdc/part3/PPC/linux/archives/bin/bash --version init_ppc_proc: PVR 0008 mask = 0008 GNU bash, version 2.05a.0(1)-release (powerpc-unknown-linux-gnu) Copyright 2001 Free Software Foundation, Inc. ./ppc-linux-user/qemu-ppc -L /mnt/local/hdc/part3/PPC/linux/archives /mnt/local/hdc/part3/PPC/linux/archives/bin/bash.static --version init_ppc_proc: PVR 0008 mask = 0008 GNU bash, version 2.05b.0(1)-release (powerpc-unknown-linux-gnu) Copyright (C) 2002 Free Software Foundation, Inc. I also tried to really launch the shell and use it and it worked. [EMAIL PROTECTED]:qemu-merge\ ps ww [...] 29026 pts/57 Ss+0:00 bash -rcfile .bashrc 29269 pts/33 S 0:00 ./ppc-linux-user/qemu-ppc -L /mnt/local/hdc/part3/PPC/linux/archives /mnt/local/hdc/part3/PPC/linux/archives/bin/bash 29930 pts/33 R+ 0:00 ps ww [EMAIL PROTECTED]:qemu-merge\ ps PID TTY TIME CMD 28941 pts/33 00:00:00 bash 29269 pts/33 00:00:00 qemu-ppc 30017 pts/33 00:00:00 ps [EMAIL PROTECTED]:qemu-merge\ ... I have to admit there are some strange behaviors with some features... But I think recent builds using glibc with TLS/NPTL would not run. Here are informations about the glibc used by this bash version: ./ppc-linux-user/qemu-ppc -L /mnt/local/hdc/part3/PPC/linux/archives /mnt/local/hdc/part3/PPC/linux/archives/usr/bin/objdump -f /mnt/local/hdc/part3/PPC/linux/archives/lib/libc-2.2.5.so init_ppc_proc: PVR 0008 mask = 0008 /mnt/local/hdc/part3/PPC/linux/archives/lib/libc-2.2.5.so: file format elf32-powerpc architecture: powerpc:common, flags 0x0150: HAS_SYMS, DYNAMIC, D_PAGED start address 0x00025db4 file /mnt/local/hdc/part3/PPC/linux/archives/bin/bash /mnt/local/hdc/part3/PPC/linux/archives/bin/bash: ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1 (SYSV), for GNU/Linux 2.2.0, dynamically linked (uses shared libs), stripped If this can help. -- Jocelyn Mayer [EMAIL PROTECTED]
Re: [Qemu-devel] linux-user target
On Tue, 10 Apr 2007, Jocelyn Mayer wrote: Just checked, on an amd64 host with a random powerpc bash version I got on my hard disk drive: I also tried to really launch the shell and use it and it worked. Interesting... But I think recent builds using glibc with TLS/NPTL would not run. Ahh. that's probably it. The executables I'm using are build with libc-2.3.6. If this can help. Indeed, it has probably narrowed the search space considerably. Thanks. Stuart Stuart R. Anderson [EMAIL PROTECTED] Network Software Engineering http://www.netsweng.com/ 1024D/37A79149: 0791 D3B8 9A4C 2CDC A31F BD03 0A62 E534 37A7 9149