Re: Curious observation: lack of a simple optimization in a C program
In caajsdjgvpp5c-jwq5to-hmogrfxve-x3+nm0ej_xl6vsmm4...@mail.gmail.com, on 02/26/2014 at 08:56 AM, John McKown john.archie.mck...@gmail.com said: My bad. I couldn't do a cut and paste. So I had to type in by hand. Which I did not double check. My hands sometimes just automagically type in what _they_ think should be there instead of what my brain says to type. Not nearly as bad as typing EXEC CMS ERASE when I meant to type ERASE CMS EXEC. BTDT,GTS. -- Shmuel (Seymour J.) Metz, SysProg and JOAT ISO position; see http://patriot.net/~shmuel/resume/brief.html We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Curious observation: lack of a simple optimization in a C program
In ofe4016b7d.cd5a426d-on85257c89.0047d56f-85257c89.00480...@us.ibm.com, on 02/24/2014 at 08:06 AM, Peter Relson rel...@us.ibm.com said: SLR R0,R0 IC R1,0(,R1) CHI R1,=H'13' BNE ... Ignoring that this is lousy code Will it even assemble? Should the CHI not have an immediate value rather than a relocatable value? -- Shmuel (Seymour J.) Metz, SysProg and JOAT ISO position; see http://patriot.net/~shmuel/resume/brief.html We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Curious observation: lack of a simple optimization in a C program
My bad. I couldn't do a cut and paste. So I had to type in by hand. Which I did not double check. My hands sometimes just automagically type in what _they_ think should be there instead of what my brain says to type. I think they learned this from my mouth/vocal chords. On Tue, Feb 25, 2014 at 12:12 PM, Shmuel Metz (Seymour J.) shmuel+ibm-m...@patriot.net wrote: In ofe4016b7d.cd5a426d-on85257c89.0047d56f-85257c89.00480...@us.ibm.com, on 02/24/2014 at 08:06 AM, Peter Relson rel...@us.ibm.com said: SLR R0,R0 IC R1,0(,R1) CHI R1,=H'13' BNE ... Ignoring that this is lousy code Will it even assemble? Should the CHI not have an immediate value rather than a relocatable value? -- Shmuel (Seymour J.) Metz, SysProg and JOAT ISO position; see http://patriot.net/~shmuel/resume/brief.html We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- Wasn't there something about a PASCAL programmer knowing the value of everything and the Wirth of nothing? Maranatha! John McKown -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Curious observation: lack of a simple optimization in a C program
SLR R0,R0 IC R1,0(,R1) CHI R1,=H'13' BNE ... Ignoring that this is lousy code (unless, perhaps, the value is looked at again, in which case having it in a reg could be advantageous), I hope that this was a slightly incomplete snippet, or a typo; R1 needs to be zeroed for the IC/CHI, not R0. Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Curious observation: lack of a simple optimization in a C program
On Mon, Feb 24, 2014 at 7:06 AM, Peter Relson rel...@us.ibm.com wrote: SLR R0,R0 IC R1,0(,R1) CHI R1,=H'13' BNE ... Ignoring that this is lousy code (unless, perhaps, the value is looked at again, in which case having it in a reg could be advantageous), I hope that this was a slightly incomplete snippet, or a typo; R1 needs to be zeroed for the IC/CHI, not R0. Peter Relson It was a typo on my part. The IC and CHI were for R0, not R1. I am guessing that the compiler has a template for this which can also be used with wchar characters and so generates something that will work with them too (in general, just replacing the IC with a ICM for a wchar?). As I think I said in the original post, I failed compiler class in college. -- Wasn't there something about a PASCAL programmer knowing the value of everything and the Wirth of nothing? Maranatha! John McKown -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Curious observation: lack of a simple optimization in a C program
john.archie.mck...@gmail.com (John McKown) writes: Wasn't there something about a PASCAL programmer knowing the value of everything and the Wirth of nothing? two people from the Los Gatos VLSI lab originally did mainframe pascal for VLSI chip tools ... this goes on eventually to become the vs/pascal product. Amoung other things it was used to implement the original mainframe TCP/IP support. It originally had some performance issues ... getting around 44kbytes/sec throughput using 3090 processor. However, I did the RFC1044 changes and in some tuning tests at Cray Research got sustained channel media throughput between Cray and 4341 using only modest amount of the 4341 processor (possibly 500 times improvement in bytes moved per instruction executed). past posts mentioning doing rfc 1044 support http://www.garlic.com/~lynn/subnetwork.html#1044 One of the issues is the (pascal) implementation had none of the exploits that have been epidemic in c-language implementions ... observation it is about as hard for a programmer to *NOT* have such exploits in c-language as it is for a pascal programmer to have such problems. past posts mentioning c-language exploits http://www.garlic.com/~lynn/subintegrity.html#buffer in the period that IBM had gone into the red and was re-organized into the 13 baby blues in preparation for breaking up the company (until the board brought in Gerstner who reversed the breakup and resurrect the company) ... there was big move for business operations to get off of proprietary tools platforms. Part of this was to transfer proprietary tools to standard industry tool vendors and get them running on industry standar platforms. I had to do one such pascal 50,000+ lines of code vlsi application. Problem was that pascal on some of these other platforms appeared to have been used for little else than introduction to programming classes (one such platform was in the local area, but they had outsourced their pascal support to someplace 12 time zones away, located near a space launch center). -- virtualization experience starting Jan1968, online at home since Mar1970 -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Curious observation: lack of a simple optimization in a C program
Again, my thanks to all for the help, which was my fault in not realizing that I had used the -o switch instead of the -O switch and so was not having my code optimized. On the off chance that anybody was wondering why I was doing this, it was just a test to try to determine the best way to see if a C string ended with a \r, or carriage return. Why? Because sometimes people here goof up and ftp a Windows file to the mainframe, but do it from a UNIX server. Well, they don't, but we share an NAS box between Windows an Linux servers. So the file on the z sometimes has an extraneous \r at the very end of the line. One good thing this did for me was point out that I was really doing it wrong. I found the address of the end of the string using with the code argv[i]+strlen(argv[i]). But the strlen() code generated actually found the end of the string using the SRST and then converted that to an integer (size_t), which I then converted right back to an address. So I searched and found the strchr() routine which can find the terminating null and returns its address. Using this function resulted in a TRT in a loop. Which I didn't much care for. So I looked at the memchr() function. But it requires a maximum length. Now, since I'm looking for a \0. And since a proper C string is \0 delimited, I _ASSuMEd_ that the string was properly delimited. This allowed me to use an arbitrary length for the memchr() call. So I used 0x7fff, or 2GiB. That seems more than large enough to me. The code is now: for(i=0; iargc; i++) { if ('\r'==*( (char *) memchr(argv[i],'\0',0x7fff)-1) *( (char *) memchr(argv[i],'\0',0x7fff)-1) = '\0'; } Of course, this would really be done after the fgets() call. The code generated is lovely: * if ('\r' == *(findend-1)) Lr5,(*)uchar*(r2,r1,0) LR r6,r5 LA r0,0 AL r6,=F'2147483647' SRST r6,r5 JO *-4 BL @1L13 LA r6,0 @1L13DS 0H AHI r6,H'-1' CLI (*)uchar(r6,0),13 BNE @1L5 **(findend-1)='\0'; MVI (*)uchar(r6,0),0 @1L5 DS 0H where findend is: #define findend (char *)memchr(argv[i],'\0',0x7fff) // for ease to change method I need to cast the (void *) from memchr() to a (char *) in order to subtract 1 from it. Of course, this can S0C4 if the string is not \0 delimited. Or it could possibly corrupt a byte of innocent storage. But this should not happen in my planned use since fgets() should return a pointer to a \0 delimited string or NULL. And I _will_ check for NULL. On Sat, Feb 22, 2014 at 11:16 PM, John McKown john.archie.mck...@gmail.comwrote: Just for fun, I wrote a very small C program. I compiled it on Linux/Intel using GCC 4.8.2. I then got it compiled on z/OS 1.13. The program is very small: #include stdlib.h #include stdio.h #include string.h #undef strlen int main (int argc, char *argv[]) { int i; for(i=0;iargc;i++) { if ('\r' == *(argv[i]+strlen(argv[i])) ) *(argv[i]+strlen(argv[i]))='\0'; } return 0; } snip/ -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Curious observation: lack of a simple optimization in a C program
On 24 February 2014 10:44, Anne Lynn Wheeler l...@garlic.com wrote: two people from the Los Gatos VLSI lab originally did mainframe pascal for VLSI chip tools ... this goes on eventually to become the vs/pascal product. Amoung other things it was used to implement the original mainframe TCP/IP support. It originally had some performance issues ... getting around 44kbytes/sec throughput using 3090 processor. However, I did the RFC1044 changes and in some tuning tests at Cray Research got sustained channel media throughput between Cray and 4341 using only modest amount of the 4341 processor (possibly 500 times improvement in bytes moved per instruction executed). past posts mentioning doing rfc 1044 support You've mentioned this a number of times, but I don't think you've explained what you did to the Pascal code to get a 500x improvement. Was the original code exceptionally bad, was your new code exceptionally brilliant, did you take advantage of some knowledge of the VS Pascal code generator or were your changes applicable to code in any language...? Tony H. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Curious observation: lack of a simple optimization in a C program
t...@harminc.net (Tony Harminc) writes: You've mentioned this a number of times, but I don't think you've explained what you did to the Pascal code to get a 500x improvement. Was the original code exceptionally bad, was your new code exceptionally brilliant, did you take advantage of some knowledge of the VS Pascal code generator or were your changes applicable to code in any language...? only a little of it was vs pascal specific ... most of it was doing fast pathing for the most common case. also the communication group had been doing a lots to try and make tcp/ip perform as badly as possible ... and as a result, there was little or no optimization. The other was they limited the channel attach box to a lan bridge ... so the translation from tcp/ip to LAN packets had to be done in the mainframe. I was able to channel attach a TCP/IP router box ... which eliminated a whole bunch of slow serialized processing in the mainframe code (and the channel attached router box was much higher performance than the communication group channel attached LAN bridge). http://www.garlic.com/~lynn/subnetwork.html#1044 i've also told the story about later ... the communication group subcontracting tcp/ip support implemented in vtam. the initial implementation had TCP throughput much faster than approx. equivalent LU6.2. The communication group told the subcontractor that everybody knows that a *CORRECT* implementation of TCP is much slower than LU6.2 and they would only be paying for a *CORRECT* implementation. -- virtualization experience starting Jan1968, online at home since Mar1970 -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Curious observation: lack of a simple optimization in a C program
Was the original code exceptionally bad, was your new code exceptionally brilliant, When I worked on Strobe we saw this all of the time. One shop saw job runtime drop from 24 hours to eight minutes. We could only wonder what they had done in the first place. Bob Shannon Ex-Programart -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Curious observation: lack of a simple optimization in a C program
I bet if you timed it the strchr() TRT in a loop is faster. On 25/02/2014, at 12:19 AM, John McKown john.archie.mck...@gmail.com wrote: Again, my thanks to all for the help, which was my fault in not realizing that I had used the -o switch instead of the -O switch and so was not having my code optimized. On the off chance that anybody was wondering why I was doing this, it was just a test to try to determine the best way to see if a C string ended with a \r, or carriage return. Why? Because sometimes people here goof up and ftp a Windows file to the mainframe, but do it from a UNIX server. Well, they don't, but we share an NAS box between Windows an Linux servers. So the file on the z sometimes has an extraneous \r at the very end of the line. One good thing this did for me was point out that I was really doing it wrong. I found the address of the end of the string using with the code argv[i]+strlen(argv[i]). But the strlen() code generated actually found the end of the string using the SRST and then converted that to an integer (size_t), which I then converted right back to an address. So I searched and found the strchr() routine which can find the terminating null and returns its address. Using this function resulted in a TRT in a loop. Which I didn't much care for. So I looked at the memchr() function. But it requires a maximum length. Now, since I'm looking for a \0. And since a proper C string is \0 delimited, I _ASSuMEd_ that the string was properly delimited. This allowed me to use an arbitrary length for the memchr() call. So I used 0x7fff, or 2GiB. That seems more than large enough to me. The code is now: for(i=0; iargc; i++) { if ('\r'==*( (char *) memchr(argv[i],'\0',0x7fff)-1) *( (char *) memchr(argv[i],'\0',0x7fff)-1) = '\0'; } Of course, this would really be done after the fgets() call. The code generated is lovely: * if ('\r' == *(findend-1)) Lr5,(*)uchar*(r2,r1,0) LR r6,r5 LA r0,0 AL r6,=F'2147483647' SRST r6,r5 JO *-4 BL @1L13 LA r6,0 @1L13DS 0H AHI r6,H'-1' CLI (*)uchar(r6,0),13 BNE @1L5 **(findend-1)='\0'; MVI (*)uchar(r6,0),0 @1L5 DS 0H where findend is: #define findend (char *)memchr(argv[i],'\0',0x7fff) // for ease to change method I need to cast the (void *) from memchr() to a (char *) in order to subtract 1 from it. Of course, this can S0C4 if the string is not \0 delimited. Or it could possibly corrupt a byte of innocent storage. But this should not happen in my planned use since fgets() should return a pointer to a \0 delimited string or NULL. And I _will_ check for NULL. On Sat, Feb 22, 2014 at 11:16 PM, John McKown john.archie.mck...@gmail.comwrote: Just for fun, I wrote a very small C program. I compiled it on Linux/Intel using GCC 4.8.2. I then got it compiled on z/OS 1.13. The program is very small: #include stdlib.h #include stdio.h #include string.h #undef strlen int main (int argc, char *argv[]) { int i; for(i=0;iargc;i++) { if ('\r' == *(argv[i]+strlen(argv[i])) ) *(argv[i]+strlen(argv[i]))='\0'; } return 0; } snip/ -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Curious observation: lack of a simple optimization in a C program
As I posted recently in another thread, I certainly appreciate the intellectual challenge etc., etc. of finding the absolute fastest way of performing some machine function. However, I hope you realize that in real life the exact speed of TRT versus SRST is going to be dwarfed by the cost of the I/O. Possibly dwarfed by the cost of the optimized compile ... g The costs of TRT and SRST may vary with the length of the data read, and with other activity on the machine (due to cache sharing). Integer arithmetic -- converting pointer + length to pointer -- should be almost free. I would not base any decision on whether or not one had to convert a length to an address or not. Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of John McKown Sent: Monday, February 24, 2014 8:19 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Curious observation: lack of a simple optimization in a C program Again, my thanks to all for the help, which was my fault in not realizing that I had used the -o switch instead of the -O switch and so was not having my code optimized. On the off chance that anybody was wondering why I was doing this, it was just a test to try to determine the best way to see if a C string ended with a \r, or carriage return. Why? Because sometimes people here goof up and ftp a Windows file to the mainframe, but do it from a UNIX server. Well, they don't, but we share an NAS box between Windows an Linux servers. So the file on the z sometimes has an extraneous \r at the very end of the line. One good thing this did for me was point out that I was really doing it wrong. I found the address of the end of the string using with the code argv[i]+strlen(argv[i]). But the strlen() code generated actually found the end of the string using the SRST and then converted that to an integer (size_t), which I then converted right back to an address. So I searched and found the strchr() routine which can find the terminating null and returns its address. Using this function resulted in a TRT in a loop. Which I didn't much care for. So I looked at the memchr() function. But it requires a maximum length. Now, since I'm looking for a \0. And since a proper C string is \0 delimited, I _ASSuMEd_ that the string was properly delimited. This allowed me to use an arbitrary length for the memchr() call. So I used 0x7fff, or 2GiB. That seems more than large enough to me. The code is now: for(i=0; iargc; i++) { if ('\r'==*( (char *) memchr(argv[i],'\0',0x7fff)-1) *( (char *) memchr(argv[i],'\0',0x7fff)-1) = '\0'; } Of course, this would really be done after the fgets() call. The code generated is lovely: * if ('\r' == *(findend-1)) Lr5,(*)uchar*(r2,r1,0) LR r6,r5 LA r0,0 AL r6,=F'2147483647' SRST r6,r5 JO *-4 BL @1L13 LA r6,0 @1L13DS 0H AHI r6,H'-1' CLI (*)uchar(r6,0),13 BNE @1L5 **(findend-1)='\0'; MVI (*)uchar(r6,0),0 @1L5 DS 0H where findend is: #define findend (char *)memchr(argv[i],'\0',0x7fff) // for ease to change method I need to cast the (void *) from memchr() to a (char *) in order to subtract 1 from it. Of course, this can S0C4 if the string is not \0 delimited. Or it could possibly corrupt a byte of innocent storage. But this should not happen in my planned use since fgets() should return a pointer to a \0 delimited string or NULL. And I _will_ check for NULL. On Sat, Feb 22, 2014 at 11:16 PM, John McKown john.archie.mck...@gmail.comwrote: Just for fun, I wrote a very small C program. I compiled it on Linux/Intel using GCC 4.8.2. I then got it compiled on z/OS 1.13. The program is very small: #include stdlib.h #include stdio.h #include string.h #undef strlen int main (int argc, char *argv[]) { int i; for(i=0;iargc;i++) { if ('\r' == *(argv[i]+strlen(argv[i])) ) *(argv[i]+strlen(argv[i]))='\0'; } return 0; } snip/ -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Curious observation: lack of a simple optimization in a C program
On Sun, Feb 23, 2014 at 12:44 AM, David Crayford dcrayf...@gmail.comwrote: Hmm, I'm not getting the same result! When I compile with ARCH(5) there is only one strlen() call (SRST) and the value is retained in a register. BTW, I removed #undef strlen. What is the purpose of that directive? * int i; * for(i=0;iargc;i++) { LTR r1,r1 LA r3,0 JNH @1L6 LA r5,0 LR r8,r1 @1L3 DS 0H * if ('\r' == *(argv i +strlen(argv i )) ) Lr9,(*)uchar*(r5,r2,0) LR r11,r9 NILH r11,H'32767' LA r0,0 LR r10,r11 SRST r0,r10 JO *-4 LR r10,r0 SLR r10,r11 LLGC r0,(*)uchar(r10,r9,0) CHI r0,H'13' JNE @1L5 **(argv i +strlen(argv i ))='\0'; AL r10,(*)uchar*(r5,r2,0) MVI (*)uchar(r10,0),0 @1L5 DS 0H LA r5,#AMNESIA(,r5,4) BRCT r8,@1L3 * } * return 0; * } If I compile using ARCH(9) for our z114 the code is even more succinct. I can't pick a hole in it. * int i; * for(i=0;iargc;i++) { LA r3,0 LTR r1,r1 JNH @1L6 LA r5,0 RISBHG r0,r1,H'0',H'159',H'32' @1L3 DS 0H * if ('\r' == *(argv i +strlen(argv i )) ) Lr10,(*)uchar*(r5,r2,0) LA r0,0 LR r8,r10 NILH r8,H'32767' LR r9,r8 SRST r0,r9 JO *-4 SLRK r8,r0,r8 LLC r0,(*)uchar(r8,r10,0) CHI r0,H'13' JNE @1L5 **(argv i +strlen(argv i ))='\0'; AL r8,(*)uchar*(r5,r2,0) MVI (*)uchar(r8,0),0 @1L5 DS 0H LA r5,#AMNESIA(,r5,4) BRCTHr0,@1L3 * } * return 0; Hum, your results are what I was expecting. Perhaps I missed something in my compile parameters. Did you compile using JCL? I am using the UNIX xlc command from a UNIX prompt. I don't know why that would make a difference. Would you mind showing me your compile parameters? What version of z/OS are you running on? My compile came from a z/OS 1.13 system. I don't know the maintenance level. I forgot to remove the #undef line. Without the #undef, you get the builtin code you saw (with the SRST). I then wondered that perhaps if I forced an actual CALL to the strlen() subroutine, the compiler would smarten up and save the result. In both case, with without the #undef, the compiler did not save the results as I had hoped. What the #undef does is force the compiler to use the strlen() function call instead of the builtin. In the string.h include, there is a line like: #define strlen __strlen -- Wasn't there something about a PASCAL programmer knowing the value of everything and the Wirth of nothing? Maranatha! John McKown -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Curious observation: lack of a simple optimization in a C program
I would strongly suggest not to #undef such #defines coming from the standard headers; the results may be unpredictable. Maybe the compiler optimization and inlining strategies relies exactly on such #defines, that is: strlen needs to be __strlen, so that the compiler can recognize it and do the optimizations on it that it is supposed to do ??? Kind regards Bernd Am 23.02.2014 16:55, schrieb John McKown: Hum, your results are what I was expecting. Perhaps I missed something in my compile parameters. Did you compile using JCL? I am using the UNIX xlc command from a UNIX prompt. I don't know why that would make a difference. Would you mind showing me your compile parameters? What version of z/OS are you running on? My compile came from a z/OS 1.13 system. I don't know the maintenance level. I forgot to remove the #undef line. Without the #undef, you get the builtin code you saw (with the SRST). I then wondered that perhaps if I forced an actual CALL to the strlen() subroutine, the compiler would smarten up and save the result. In both case, with without the #undef, the compiler did not save the results as I had hoped. What the #undef does is force the compiler to use the strlen() function call instead of the builtin. In the string.h include, there is a line like: #define strlen __strlen -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Fwd: Re: Curious observation: lack of a simple optimization in a C program
An explanation, which is just a little more paranoid: if you #undef strlen, the compiler cannot be sure, if your strlen is not quite another function, which does not give the same result on two subsequent calls ... ??? Original-Nachricht Betreff: Re: Curious observation: lack of a simple optimization in a C program Datum: Sun, 23 Feb 2014 19:48:44 +0100 Von:Bernd Oppolzer bernd.oppol...@t-online.de An: IBM Mainframe Discussion List IBM-MAIN@LISTSERV.UA.EDU I would strongly suggest not to #undef such #defines coming from the standard headers; the results may be unpredictable. Maybe the compiler optimization and inlining strategies relies exactly on such #defines, that is: strlen needs to be __strlen, so that the compiler can recognize it and do the optimizations on it that it is supposed to do ??? Kind regards Bernd Am 23.02.2014 16:55, schrieb John McKown: Hum, your results are what I was expecting. Perhaps I missed something in my compile parameters. Did you compile using JCL? I am using the UNIX xlc command from a UNIX prompt. I don't know why that would make a difference. Would you mind showing me your compile parameters? What version of z/OS are you running on? My compile came from a z/OS 1.13 system. I don't know the maintenance level. I forgot to remove the #undef line. Without the #undef, you get the builtin code you saw (with the SRST). I then wondered that perhaps if I forced an actual CALL to the strlen() subroutine, the compiler would smarten up and save the result. In both case, with without the #undef, the compiler did not save the results as I had hoped. What the #undef does is force the compiler to use the strlen() function call instead of the builtin. In the string.h include, there is a line like: #define strlen __strlen -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Curious observation: lack of a simple optimization in a C program
I guess that IBM #defines some well known ANSI functions to other strange names (__strlen for strlen, as an example), to be able to do some special optimizations on those functions after or before inlining them. Remember, the names of the ANSI functions are not reserved in C - they are not part of the language, as in PL/1 for example - so before doing optimization on such a function call, the compiler must be absolutely sure that it is really dealing with that function. By issuing #undef strlen, John McKown disabled the optimization of strlen by the compiler. Kind regards Bernd Am 23.02.2014 07:44, schrieb David Crayford: Hmm, I'm not getting the same result! When I compile with ARCH(5) there is only one strlen() call (SRST) and the value is retained in a register. BTW, I removed #undef strlen. What is the purpose of that directive? * int i; * for(i=0;iargc;i++) { LTR r1,r1 LA r3,0 JNH @1L6 LA r5,0 LR r8,r1 @1L3 DS 0H * if ('\r' == *(argv i +strlen(argv i )) ) Lr9,(*)uchar*(r5,r2,0) LR r11,r9 NILH r11,H'32767' LA r0,0 LR r10,r11 SRST r0,r10 JO *-4 LR r10,r0 SLR r10,r11 LLGC r0,(*)uchar(r10,r9,0) CHI r0,H'13' JNE @1L5 **(argv i +strlen(argv i ))='\0'; AL r10,(*)uchar*(r5,r2,0) MVI (*)uchar(r10,0),0 @1L5 DS 0H LA r5,#AMNESIA(,r5,4) BRCT r8,@1L3 * } * return 0; * } If I compile using ARCH(9) for our z114 the code is even more succinct. I can't pick a hole in it. * int i; * for(i=0;iargc;i++) { LA r3,0 LTR r1,r1 JNH @1L6 LA r5,0 RISBHG r0,r1,H'0',H'159',H'32' @1L3 DS 0H * if ('\r' == *(argv i +strlen(argv i )) ) Lr10,(*)uchar*(r5,r2,0) LA r0,0 LR r8,r10 NILH r8,H'32767' LR r9,r8 SRST r0,r9 JO *-4 SLRK r8,r0,r8 LLC r0,(*)uchar(r8,r10,0) CHI r0,H'13' JNE @1L5 **(argv i +strlen(argv i ))='\0'; AL r8,(*)uchar*(r5,r2,0) MVI (*)uchar(r8,0),0 @1L5 DS 0H LA r5,#AMNESIA(,r5,4) BRCTHr0,@1L3 * } * return 0; On 23/02/2014 1:16 PM, John McKown wrote: Just for fun, I wrote a very small C program. I compiled it on Linux/Intel using GCC 4.8.2. I then got it compiled on z/OS 1.13. The program is very small: #include stdlib.h #include stdio.h #include string.h #undef strlen int main (int argc, char *argv[]) { int i; for(i=0;iargc;i++) { if ('\r' == *(argv[i]+strlen(argv[i])) ) *(argv[i]+strlen(argv[i]))='\0'; } return 0; } On the z/OS compiler, under UNIX, using the -O5 switch to optimize, the compiler generated in-line code for both calls to strlen, despite the fact that they had the identical arguments with no possibility of modification. The GCC compiler, on the other hand, retained the result of the first strlen() and used that value in the second statement. Actually, GCC retained the value of argv[i]+strlen(argv[i]), so that it the equivalent of a CLI, JNE, MVI to change the \r to \0. Of course, I can help the compiler by changing my code slightly: #include stdlib.h #include stdio.h #include string.h #undef strlen int main (int argc, char *argv[]) { int i; char *lastchar; */ for(i=0;iargc;i++) { lastchar=argv[i]+strlen(argv[i]); if ('\r' == *lastchar) *lastchar='\0'; } return 0; } Much nicer. But, again curiously, instead of doing a CLI and MVI, the compiler using -O5 did: SLR R0,R0 IC R1,0(,R1) CHI R1,=H'13' BNE ... On ARCH(7), it was a bit better, replacing the SLR/IC pair with an LLC instruction. I failed the compiler class in college, so for all I know there is a perfectly good reason why the z/OS compiler does it this way. But I just found it curious and thought that I'd throw it out onto the forum. Hopefully someone can explain it to me. BTW, I compile the above using no optimization, -O2, -O3, -O3, and -O5. I got the identical assembler in all cases. This was true in the default ARCH(5) and ARCH(7). I know some with likely say that I'm grasping at straws. Perhaps so. But if the compiler misses such a simple optimization, what else might it miss? -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe /
Re: Curious observation: lack of a simple optimization in a C program
Historically, the term 'builtin' is of course a PL/I one that has been adopted elsewhere too, e.g., by the HLASM. In PL/I it refers to implementation-supplied functions. Some of these, like SQRT and COSH, are always implemented by calls to library routines; some of them, like LENGTH and SIGN, are always implemented in line; and a very few are implemented sometimes in line and sometimes by a call to a library routine. ABS is the classical example of this mixed case. It is implemented in line, trivially, for real arithmetic values; but for complex ones, for, say, declare x complex decimal float(15) ; it is implemented by a library call. [The expression abs(a + bi) is defined/evaluated as +sqrt(a**2 + b**2) . Other languages have used different terminologies. In COBOL, for example, implementation-supplied functions/routines are called GENERIC ones. In C they have usually been called [standard] library functions.In PL/I such a declaration as, say, declare sqrt builtin ; followed by x = sqrt(y) ; forces the compiler or interpreter to use the implementation-supplied sqrt routine. If this declaration is omitted an attempt is first made to find and use another sqrt facility, and only if one is not found is the builtin facility used. (The search can be influenced by the presence of a programmer-supplied generic declaration.) Historically, compilers have found it easier to reuse values they generate in line, but they do not always elect to do so for various reasons. Values obtained by function calls can in PL/I and some other languages be characterized explicitly as reducible, in which case multiple subroutine calls or function references to the same exterrnal routine that are identical, specify the same unchanged arguments (actual parameters), can be reduced to a single one. (This can of course be highly problematic; the classic nightmare case is that of reducing calls to a pseudo-random number generator, a sequence-number generator, or the like.) John Gilmore, Ashland, MA 01721 - USA -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Curious observation: lack of a simple optimization in a C program
On Sun, Feb 23, 2014 at 12:48 PM, Bernd Oppolzer bernd.oppol...@t-online.de wrote: I would strongly suggest not to #undef such #defines coming from the standard headers; the results may be unpredictable. Maybe the compiler optimization and inlining strategies relies exactly on such #defines, that is: strlen needs to be __strlen, so that the compiler can recognize it and do the optimizations on it that it is supposed to do ??? Kind regards Bernd I did the compile both with and without the #undef. In fact, I did it without the #undef first. But, in both cases, I did not get the common expression elimination that David did. I _must_ have something set up really strangely. Not that I set up this particular z/OS system. I just have some access to it. -- Wasn't there something about a PASCAL programmer knowing the value of everything and the Wirth of nothing? Maranatha! John McKown -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Curious observation: lack of a simple optimization in a C program
Right. There is an XLC compiler option ANSIFUNCS or something like that. (Too lazy to look it up.) It tells the compiler things that have the names of standard functions really ARE standard functions. Without that option turned on, strlen() could be a private random number generator. Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Bernd Oppolzer Sent: Sunday, February 23, 2014 10:53 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Fwd: Re: Curious observation: lack of a simple optimization in a C program An explanation, which is just a little more paranoid: if you #undef strlen, the compiler cannot be sure, if your strlen is not quite another function, which does not give the same result on two subsequent calls ... ??? Original-Nachricht Betreff:Re: Curious observation: lack of a simple optimization in a C program Datum: Sun, 23 Feb 2014 19:48:44 +0100 Von:Bernd Oppolzer bernd.oppol...@t-online.de An: IBM Mainframe Discussion List IBM-MAIN@LISTSERV.UA.EDU I would strongly suggest not to #undef such #defines coming from the standard headers; the results may be unpredictable. Maybe the compiler optimization and inlining strategies relies exactly on such #defines, that is: strlen needs to be __strlen, so that the compiler can recognize it and do the optimizations on it that it is supposed to do ??? -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Curious observation: lack of a simple optimization in a C program
On 23/02/2014 11:55 PM, John McKown wrote: I don't know why that would make a difference. Would you mind showing me your compile parameters? What version of z/OS are you running on? My compile came from a z/OS 1.13 system. I don't know the maintenance level. z/OS 1.13 c99_x -O -Wc,'arch(9),list,source' opt.c -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Curious observation: lack of a simple optimization in a C program
I'm checking into to the old folks home soon. Instead of using -O5, I used -o5. Which did not cause any problem because that means write the output object code to the file named 5. I guess I was sleepier than I thought yesterday and just didn't catch that. My error, and my thanks to all for your kind help. -- Wasn't there something about a PASCAL programmer knowing the value of everything and the Wirth of nothing? Maranatha! John McKown -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Curious observation: lack of a simple optimization in a C program
Dang C case-sensitivity! Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of John McKown Sent: Sunday, February 23, 2014 5:58 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Curious observation: lack of a simple optimization in a C program I'm checking into to the old folks home soon. Instead of using -O5, I used -o5. Which did not cause any problem because that means write the output object code to the file named 5. I guess I was sleepier than I thought yesterday and just didn't catch that. My error, and my thanks to all for your kind help. -- Wasn't there something about a PASCAL programmer knowing the value of everything and the Wirth of nothing? Maranatha! John McKown -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Curious observation: lack of a simple optimization in a C program
On 24/02/2014 9:58 AM, John McKown wrote: I'm checking into to the old folks home soon. Instead of using -O5, I used -o5. Which did not cause any problem because that means write the output object code to the file named 5. I guess I was sleepier than I thought yesterday and just didn't catch that. FYI, optimization only goes up to -O3, which is the most aggressive. It will perform more loop unrolling and the size of your object code will more than treble. My error, and my thanks to all for your kind help. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Curious observation: lack of a simple optimization in a C program
Hmm, I'm not getting the same result! When I compile with ARCH(5) there is only one strlen() call (SRST) and the value is retained in a register. BTW, I removed #undef strlen. What is the purpose of that directive? * int i; * for(i=0;iargc;i++) { LTR r1,r1 LA r3,0 JNH @1L6 LA r5,0 LR r8,r1 @1L3 DS 0H * if ('\r' == *(argv i +strlen(argv i )) ) Lr9,(*)uchar*(r5,r2,0) LR r11,r9 NILH r11,H'32767' LA r0,0 LR r10,r11 SRST r0,r10 JO *-4 LR r10,r0 SLR r10,r11 LLGC r0,(*)uchar(r10,r9,0) CHI r0,H'13' JNE @1L5 **(argv i +strlen(argv i ))='\0'; AL r10,(*)uchar*(r5,r2,0) MVI (*)uchar(r10,0),0 @1L5 DS 0H LA r5,#AMNESIA(,r5,4) BRCT r8,@1L3 * } * return 0; * } If I compile using ARCH(9) for our z114 the code is even more succinct. I can't pick a hole in it. * int i; * for(i=0;iargc;i++) { LA r3,0 LTR r1,r1 JNH @1L6 LA r5,0 RISBHG r0,r1,H'0',H'159',H'32' @1L3 DS 0H * if ('\r' == *(argv i +strlen(argv i )) ) Lr10,(*)uchar*(r5,r2,0) LA r0,0 LR r8,r10 NILH r8,H'32767' LR r9,r8 SRST r0,r9 JO *-4 SLRK r8,r0,r8 LLC r0,(*)uchar(r8,r10,0) CHI r0,H'13' JNE @1L5 **(argv i +strlen(argv i ))='\0'; AL r8,(*)uchar*(r5,r2,0) MVI (*)uchar(r8,0),0 @1L5 DS 0H LA r5,#AMNESIA(,r5,4) BRCTHr0,@1L3 * } * return 0; On 23/02/2014 1:16 PM, John McKown wrote: Just for fun, I wrote a very small C program. I compiled it on Linux/Intel using GCC 4.8.2. I then got it compiled on z/OS 1.13. The program is very small: #include stdlib.h #include stdio.h #include string.h #undef strlen int main (int argc, char *argv[]) { int i; for(i=0;iargc;i++) { if ('\r' == *(argv[i]+strlen(argv[i])) ) *(argv[i]+strlen(argv[i]))='\0'; } return 0; } On the z/OS compiler, under UNIX, using the -O5 switch to optimize, the compiler generated in-line code for both calls to strlen, despite the fact that they had the identical arguments with no possibility of modification. The GCC compiler, on the other hand, retained the result of the first strlen() and used that value in the second statement. Actually, GCC retained the value of argv[i]+strlen(argv[i]), so that it the equivalent of a CLI, JNE, MVI to change the \r to \0. Of course, I can help the compiler by changing my code slightly: #include stdlib.h #include stdio.h #include string.h #undef strlen int main (int argc, char *argv[]) { int i; char *lastchar; */ for(i=0;iargc;i++) { lastchar=argv[i]+strlen(argv[i]); if ('\r' == *lastchar) *lastchar='\0'; } return 0; } Much nicer. But, again curiously, instead of doing a CLI and MVI, the compiler using -O5 did: SLR R0,R0 IC R1,0(,R1) CHI R1,=H'13' BNE ... On ARCH(7), it was a bit better, replacing the SLR/IC pair with an LLC instruction. I failed the compiler class in college, so for all I know there is a perfectly good reason why the z/OS compiler does it this way. But I just found it curious and thought that I'd throw it out onto the forum. Hopefully someone can explain it to me. BTW, I compile the above using no optimization, -O2, -O3, -O3, and -O5. I got the identical assembler in all cases. This was true in the default ARCH(5) and ARCH(7). I know some with likely say that I'm grasping at straws. Perhaps so. But if the compiler misses such a simple optimization, what else might it miss? -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN