Bug#758115: Disabled wait state X'32EE' on IPL of zIPL
Hello Stephen, On Tue, 26 Aug 2014 22:38:58 -0400 (EDT) Stephen Powell zlinux...@wowway.com wrote: [snip] I do worry, though, about this being a more general problem. What about the interrupt handler in the Linux kernel? Or any other portion of the kernel that needs to examine (or change) data in real page 0 for whatever reason? At least the upstream kernel 3.16 also uses -fno-delete-null-pointer-checks. See toplevel Makefile: KBUILD_CFLAGS += $(call cc-option,-fno-delete-null-pointer-checks,) Michael -- To UNSUBSCRIBE, email to debian-s390-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20140901134919.66889874@holzheu
Bug#758115: Disabled wait state X'32EE' on IPL of zIPL
On Mon, Aug 25, 2014 at 10:52:47AM +0200, Michael Holzheu wrote: On Sun, 24 Aug 2014 20:43:30 +0200 Bastian Blank wa...@debian.org wrote: And -fno-delete-null-pointer-checks seems to be the correct option. From the gcc man page: -fdelete-null-pointer-checks In some environments, this assumption is not true, and programs can safely dereference null pointers. Use -fno-delete-null-pointer-checks to disable this optimization for programs which depend on that behavior. So we should add this Option to CFLAGS in zipl/boot/Makefile? Yes. Why we have not seen this problem under RHEL and SLES up to now? Can you show a RHEL or SLES that uses gcc 4.9? Or any Redhat/Fedora/SUSE with support for s390* that uses it? RHEL 7 uses 4.8 (see https://git.centos.org/summary/?r=rpms/gcc.git). Maybe you should ask the gcc people why they enable this check on the freestanding implementation (-ffreestanding). Bastian -- Without freedom of choice there is no creativity. -- Kirk, The return of the Archons, stardate 3157.4 -- To UNSUBSCRIBE, email to debian-s390-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20140825163835.ga1...@mail.waldi.eu.org
Bug#758115: Disabled wait state X'32EE' on IPL of zIPL
On Fri, Aug 22, 2014 at 07:21:31PM -0400, Stephen Powell wrote: static inline int wait(void) { do { load_wait_psw(0x010200018000ULL, S390_lowcore.external_new_psw); 33d0: e3 20 d0 00 00 04 lg %r2,0(%r13) 33d6: a7 39 01 b0 lghi%r3,432 33da: c0 e5 ff ff fc f7 brasl %r14,2dc8 load_wait_psw if (S390_lowcore.ext_int_code == 0x1004) 33e0: e3 10 00 86 00 91 llgh%r1,134 33e6: a7 1e 10 04 chi %r1,4100 33ea: a7 74 00 06 jne 33f6 sclp_wait_for_int+0x9a 33ee: a7 28 00 02 lhi %r2,2 33f2: a7 f4 00 08 j 3402 sclp_wait_for_int+0xa6 return ETIMEOUT; } while (S390_lowcore.ext_int_code != 0x2401); 33f6: a7 1e 24 01 chi %r1,9217 33fa: a7 74 ff eb jne 33d0 sclp_wait_for_int+0x74 33fe: a7 28 00 00 lhi %r2,0 Would be interesting how the disassembly looks on your system. Indeed. Here is what I got: - static inline int wait(void) { do { load_wait_psw(0x010200018000ULL, S390_lowcore.external_new_psw); 32d6: a7 39 01 b0 lghi%r3,432 32da: e3 20 d0 00 00 04 lg %r2,0(%r13) 32e0: c0 e5 ff ff fb b8 brasl %r14,2a50 load_wait_psw if (S390_lowcore.ext_int_code == 0x1004) 32e6: 48 10 00 86 lh %r1,134 32ea: a7 f4 00 01 j 32ec sclp_wait_for_int+0x84 32ee: 07 07 nopr%r7 - With gcc-4.8: static inline int wait(void) { do { load_wait_psw(0x010200018000ULL, S390_lowcore.external_new_psw); 331e: e3 20 d0 00 00 04 lg %r2,0(%r13) 3324: a7 39 01 b0 lghi%r3,432 3328: c0 e5 ff ff fb ac brasl %r14,2a80 load_wait_psw if (S390_lowcore.ext_int_code == 0x1004) 332e: e3 10 00 86 00 91 llgh%r1,134 3334: a7 1e 10 04 chi %r1,4100 3338: a7 84 00 0a je 334c sclp_wait_for_int+0x9c return ETIMEOUT; } while (S390_lowcore.ext_int_code != 0x2401); 333c: a7 1e 24 01 chi %r1,9217 3340: a7 74 ff ef jne 331e sclp_wait_for_int+0x6e return 0; 3344: a7 28 00 00 lhi %r2,0 3348: a7 f4 00 04 j 3350 sclp_wait_for_int+0xa0 That does look much better for 3338, 3340, not really for 3348 (to 3350). It does fix the issue at hand, but it's a band-aid at most. I installed the package on wheezy (compiled on sid) and it booted... Kind regards Philipp Kern -- To UNSUBSCRIBE, email to debian-s390-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20140824184913.ga6...@hub.kern.lc
Bug#758115: Disabled wait state X'32EE' on IPL of zIPL
On Thu, 21 Aug 2014 19:26:44 -0400 (EDT) Stephen Powell zlinux...@wowway.com wrote: Here are the last few instructions prior to the failure on the failing version, thanks to the CP TRACE facility under z/VM on a real IBM z/890: 2A78 STG E310F0A80024 FEB0 CC 2 2A7E LG E320300401B0 CC 2 2A84 LG E3103008000401B8 CC 2 2A8A STG E3403024 01B0 CC 2 2A90 LA 4140F0A0 = FEA8 CC 2 2A94 LARLC05BCC 2 2A9A STG E35040080024 FEB0 CC 2 2AA0 STG E35030080024 01B8 CC 2 2AA6 LPSWE B2B2F0A0FEA8 CC 0 2AAA LMG EBDFF0B4 CC 0 2AB0 STG E3203024 01B0 CC 0 2AB6 STG E31030080024 01B8 CC 0 2ABC BR 07FE - 32E6 CC 0 - 32E6 LH 481000860086 CC 0 32EA BRU A7F40001 - 32EC CC 0 - 32EC 0001 *** 32EC PROG0001 - 39A8 And here is what appears to be the equivalent code on the working version, compiled under wheezy: 2A38 STG E310F0A80024 FEA0 CC 2 2A3E LG E320300401B0 CC 2 2A44 LG E3103008000401B8 CC 2 2A4A STG E3403024 01B0 CC 2 2A50 LA 4140F0A0 = FE98 CC 2 2A54 LARLC05BCC 2 2A5A STG E35040080024 FEA0 CC 2 2A60 STG E35030080024 01B8 CC 2 2A66 LPSWE B2B2F0A0FE98 CC 0 2A6A LMG EBDFF0B4 CC 0 2A70 STG E3203024 01B0 CC 0 2A76 STG E31030080024 01B8 CC 0 2A7C BR 07FE - 32C0 CC 0 - 32C0 LLGHE310008600910086 CC 0 32C6 CHI A71E1004CC 2 32CA BRZ A784000A32DE CC 2 ... And on we go from there. The BRU instruction in the first sequence is clearly bad. In assembler language format, the equivalent instruction would be BRU *+2. This is a bad branch. The instruction branches into the middle of itself, picking up 0001 as the next machine instruction, which causes an operation exception. Since the failing instruction starts at storage address 32EC, and is two bytes long, that means that the updated instruction address in the PSW at the time of the program interruption will be 32EE, which is the value used in the disabled wait PSW. Hi Stephen, You can get a disassembly for the eckd boot loader code when you go to s390-tools/zipl/boot and: 1) make 2) objdump -S eckd2.exec eckd2.list I think the corresponding code in zipl is load_wait_psw() in libc.c: __attribute__ ((noinline)) void load_wait_psw(uint64_t psw_mask, struct psw_t *psw) { struct psw_t wait_psw = { .mask = psw_mask, .addr = 0 }; 2df6: e3 20 f0 a0 00 24 stg %r2,160(%r15) struct psw_t old_psw, *wait_psw_ptr = wait_psw; unsigned long addr; old_psw = *psw; psw-mask = 0x00018000ULL; 2dfc: e3 10 30 00 00 24 stg %r1,0(%r3) asm volatile( 2e02: 41 20 f0 a0 la %r2,160(%r15) { struct psw_t wait_psw = { .mask = psw_mask, .addr = 0 }; struct psw_t old_psw, *wait_psw_ptr = wait_psw; unsigned long addr; old_psw = *psw; 2e06: e3 10 30 08 00 04 lg %r1,8(%r3) psw-mask = 0x00018000ULL; asm volatile( 2e0c: c0 50 00 00 00 0b larl%r5,2e22 load_wait_psw+0x5a 2e12: e3 50 20 08 00 24 stg %r5,8(%r2) 2e18: e3 50 30 08 00 24 stg %r5,8(%r3) 2e1e: b2 b2 f0 a0 lpswe 160(%r15) .Lwait:\n : [addr] =d (addr) : [wait_psw] Q (wait_psw), [wait_psw_ptr] a (wait_psw_ptr), [psw] a (psw) : cc, memory); *psw = old_psw; 2e22: e3 40 30 00 00 24 stg %r4,0(%r3) 2e28: e3 10 30 08 00 24 stg %r1,8(%r3) } 2e2e: eb df f0 b0 00 04 lmg %r13,%r15,176(%r15) 2e34: 07 fe br %r14 load_wait_psw() is called from
Bug#758115: Disabled wait state X'32EE' on IPL of zIPL
On Fri, 15 Aug 2014 19:12:24 -0400 (EDT), Stephen Powell wrote: The full PSW is as follows: 0002 8000 32EE By the way, Hercules has an instruction tracing facility, similar to the CP TRACE command on z/VM. The T command, along with the T+ and T- commands, are documented in the Hercules User Reference Guide, available as a pdf file from http://hercdoc.glanzmann.org Scroll down to the section for the manuals for version 3.07, which is the version which is currently packaged for Debian. And, by the way, the current main Hercules web site is http://www.hercules-390.eu The 3.07 documentation which ships with the current hercules package in Debian points to the old web site, which is no longer valid. -- .''`. Stephen Powell : :' : `. `'` `- -- To UNSUBSCRIBE, email to debian-s390-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/1638357473.343069.1408203104179.javamail.r...@md01.wow.synacor.com
Bug#758115: Disabled wait state X'32EE' on IPL of zIPL
On Thu, 14 Aug 2014 21:45:53 -0400 (EDT) Stephen Powell zlinux...@wowway.com wrote: On Thu, 14 Aug 2014 10:32:42 -0400 (EDT), Philipp Kern wrote: Hrm. Odd. It shouldn't be because the brokeness relates to the C library, not to the C compiler itself and zipl does not use the C library. Again, we must distinguish between zipl, the Linux command which runs at a Linux shell prompt, and zIPL, the boot loader proper, a customized version of which is written out by zipl when zipl gets run. zipl, the command which runs at a Linux shell prompt, most certainly does use the C library. It is written in C, it is compiled by the C compiler, and, at execution time, it uses the C run-time library, just like any other C program. zIPL, which is written out by zipl, does not use the C library. Or does it? Well, not the regular C library, no. But it does use a minimalist run-time library. In the source package, look at zipl/boot/libc.c. Yes, even zIPL, the boot loader proper, does use a C library of sorts. Just for confirmation: Stephen is right. The zipl tool is a normal C program that uses the glibc. The zipl boot loader code under the boot source directory does not use the glibc or any other external library. Before s390-tools-1.24.0 it was written 100% in assembler. With s390-tools-1.24.0 we have rewritten the code in C and have added our own minimal libc. Best Regards, Michael -- To UNSUBSCRIBE, email to debian-s390-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20140815104654.0a787ef6@holzheu
Re: Bug#758115: Disabled wait state X'32EE' on IPL of zIPL
On Fri, Aug 15, 2014 at 10:46:54AM +0200, Michael Holzheu wrote: On Thu, 14 Aug 2014 21:45:53 -0400 (EDT) Stephen Powell zlinux...@wowway.com wrote: On Thu, 14 Aug 2014 10:32:42 -0400 (EDT), Philipp Kern wrote: Hrm. Odd. It shouldn't be because the brokeness relates to the C library, not to the C compiler itself and zipl does not use the C library. Again, we must distinguish between zipl, the Linux command which runs at a Linux shell prompt, and zIPL, the boot loader proper, a customized version of which is written out by zipl when zipl gets run. zipl, the command which runs at a Linux shell prompt, most certainly does use the C library. It is written in C, it is compiled by the C compiler, and, at execution time, it uses the C run-time library, just like any other C program. zIPL, which is written out by zipl, does not use the C library. Or does it? Well, not the regular C library, no. But it does use a minimalist run-time library. In the source package, look at zipl/boot/libc.c. Yes, even zIPL, the boot loader proper, does use a C library of sorts. Just for confirmation: Stephen is right. The zipl tool is a normal C program that uses the glibc. The zipl boot loader code under the boot source directory does not use the glibc or any other external library. Before s390-tools-1.24.0 it was written 100% in assembler. With s390-tools-1.24.0 we have rewritten the code in C and have added our own minimal libc. I should've written (e)glibc instead of C library. It's what I meant. I tried to simplify things and failed. The question of is this Hercules was also more related to where is the value coming from, as CP might do things differently. So Hercules should log the whole PSW. I can also only see it logging that and the CPU address/ID, not a wait state code. Do you happen to have the PSW handy? Kind regards Philipp Kern signature.asc Description: Digital signature
Bug#758115: Disabled wait state X'32EE' on IPL of zIPL
Package: s390-tools Version: 1.24.1-1 Severity: critical Justification: The entire system is unbootable After installing s390-tools version 1.24.1-1 and re-running zipl, a reboot of the system causes a disabled wait PSW to be loaded during boot, with a wait state code of X'32EE', prior to the zipl menu being written out. The system is unbootable. This may be related to the general brokenness of C on s390x in jessie/sid. -- .''`. Stephen Powell : :' : `. `'` `- -- To UNSUBSCRIBE, email to debian-s390-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/174934116.323623.1408013360592.javamail.r...@md01.wow.synacor.com
Bug#758115: Disabled wait state X'32EE' on IPL of zIPL
On Thu, Aug 14, 2014 at 06:49:20AM -0400, Stephen Powell wrote: Justification: The entire system is unbootable After installing s390-tools version 1.24.1-1 and re-running zipl, a reboot of the system causes a disabled wait PSW to be loaded during boot, with a wait state code of X'32EE', prior to the zipl menu being written out. The system is unbootable. This may be related to the general brokenness of C on s390x in jessie/sid. Hrm. Odd. It shouldn't be because the brokeness relates to the C library, not to the C compiler itself and zipl does not use the C library. That being said, I had to recompile s390-tools on sid, and I do not run sid due to the C breakage. It worked before the recompilation, hence there might be a change in sid vs. wheezy that caused this. You are talking about Hercules, right? Kind regards Philipp Kern signature.asc Description: Digital signature
Bug#758115: Disabled wait state X'32EE' on IPL of zIPL
On Thu, 14 Aug 2014 10:32:42 -0400 (EDT), Philipp Kern wrote: Hrm. Odd. It shouldn't be because the brokeness relates to the C library, not to the C compiler itself and zipl does not use the C library. Again, we must distinguish between zipl, the Linux command which runs at a Linux shell prompt, and zIPL, the boot loader proper, a customized version of which is written out by zipl when zipl gets run. zipl, the command which runs at a Linux shell prompt, most certainly does use the C library. It is written in C, it is compiled by the C compiler, and, at execution time, it uses the C run-time library, just like any other C program. zIPL, which is written out by zipl, does not use the C library. Or does it? Well, not the regular C library, no. But it does use a minimalist run-time library. In the source package, look at zipl/boot/libc.c. Yes, even zIPL, the boot loader proper, does use a C library of sorts. That being said, I had to recompile s390-tools on sid, Therein lies the problem. and I do not run sid due to the C breakage. You should. You may not be able to directly install jessie or sid, but you can install wheezy and then do an upgrade to jessie or sid. Of course, you will likely experience problems during the upgrade, as I did, most likely with the upgrade of package perl-base. But there are posts to debian-s390 by me that explain how I worked around it. If you had a sid system to test with, you would have realized that this package is unusable and you never would have uploaded it. It worked before the recompilation, hence there might be a change in sid vs. wheezy that caused this. Oh, absolutely. I downloaded the new source package, built it on a wheezy system, transferred the binary package to my jessie system, installed the binary package on my jessie system, ran zipl, shutdown my system, and IPLed. It IPLed just fine. I then took the exact same source package, compiled it on a jessie system, installed the binary package, ran zipl, shutdown, and IPLed. Kaboom! disabled wait state code X'32EE'. The C compiler and run-time library used is the only difference. I think I've proven pretty conclusively that this is C breakage causing this problem. You are talking about Hercules, right? It doesn't matter. I get the exact same results on Hercules as I do on a real mainframe, and vice versa. I have found Hercules to be a deadly accurate emulation of real mainframe hardware, when properly configured. In my opinion, everyone who maintains a package which is mainframe-specific, such as s390-tools, and anyone responsible, in whole or in part, for the s390x port needs their own mainframe system that they can play around with in a totally unrestricted manner, without fear of messing someone else up. And if you don't have access to a real mainframe, a Hercules emulation of one is the next best thing. It's slower than a real mainframe; but architecturally, it is virtually indistinguishable from a real mainframe to the software. And that's what an emulator is all about, right? You need a jessie/sid system to play around with. I must say that the C breakage on s390x is the biggest mess that I have ever seen, and in the case of this package, has produced the worst error yet: a totally unbootable system. By the way, when version 1.17.1 of the package is compiled on a jessie system, it runs fine. To me, the most significant difference between the two packages is that the zIPL portion of the 1.17.1 package, the boot loader proper, is all written in assembly language (zipl/boot/sclp.S, zipl/boot/menu.S, etc.), whereas the zIPL portion of the 1.24.1 package has been rewritten in C (zipl/boot/sclp.c, zipl/boot/menu.c, etc.). Since it's written in C, it needs that minimalist C run-time library (zipl/boot/libc.c), which the 1.17.1 version doesn't need. Yes, this bug has C breakage written all over it. -- .''`. Stephen Powell : :' : `. `'` `- -- To UNSUBSCRIBE, email to debian-s390-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/1925263645.331324.1408067153733.javamail.r...@md01.wow.synacor.com