Re: Branch (was: Performance question - adding)
On Tue, Feb 18, 2014 at 5:20 PM, Shmuel Metz (Seymour J.) shmuel+ibm-m...@patriot.net wrote: In of1dbe6dec.23eaa84d-on85257c83.0046f9a1-85257c83.0047a...@us.ibm.com, on 02/18/2014 at 08:02 AM, Peter Relson rel...@us.ibm.com said: So it's probably less about optimizing existing code (unless it's in a loop) than about understanding what is best for your new code, when the development and test costs of the choices are basically the same. Generally what's best is what's most maintainable and most readable. Around here, that would likely translate into (1) convert it to COBOL or (2) rewrite it to run on Windows, using .NET . Both of those are more readable and maintainable __in this shop__. Yes, I'm joking a bit. Kind of. -- Wasn't there something about a PASCAL programmer knowing the value of everything and the Wirth of nothing? Maranatha! John McKown -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Rewriting Assembler to COBOL was Re: Branch (was: Performance question - adding)
On 19 Feb 2014 03:48:24 -0800, in bit.listserv.ibm-main you wrote: On Tue, Feb 18, 2014 at 5:20 PM, Shmuel Metz (Seymour J.) shmuel+ibm-m...@patriot.net wrote: In of1dbe6dec.23eaa84d-on85257c83.0046f9a1-85257c83.0047a...@us.ibm.com, on 02/18/2014 at 08:02 AM, Peter Relson rel...@us.ibm.com said: So it's probably less about optimizing existing code (unless it's in a loop) than about understanding what is best for your new code, when the development and test costs of the choices are basically the same. Generally what's best is what's most maintainable and most readable. Around here, that would likely translate into (1) convert it to COBOL or (2) rewrite it to run on Windows, using .NET . Both of those are more readable and maintainable __in this shop__. Yes, I'm joking a bit. Kind of. With any supported COBOL compiler you have nested programs and multiple levels of COPY so that you could write an entire program in a COPY book that would have COPY statements in it. This allows the program to included in multiple programs thus eliminating much of the inter-module instruction overhead. The compiler may even do more sophisticated code elimination and optimization. In addition if many of the assembler routines were written to get around restrictions and difficulty of doing things in COBOL VS - the 1974 standard COBOL, those restrictions and difficulties may no longer exist. Reference modification and a number of other features in the newer compilers have made a major difference. Further, look at the 2002 standard and draft standards and see if the abilities to have bit manipulation, BIT and various FLOATING POINT usages including DECIMAL FLOATING POINT and various types of rounding including rounding to nearest even would allow your shop to eliminate even more assembler programs. Code that can easily be moved inline is code that doesn't incur inter-module overhead. I suspect that linked lists and queues are relatively easy. Also with LOCAL-STORAGE recursive routines can be written. I have used COBOL to manipulate the SMF 30 records among others so COBOL is more powerful than many here might realize. The 85 standard COBOL were a great leap forward. Many of the optimizations I did for a program using the COBOL VS would have been counterproductive with the VS COBOL V1.4 and the Enterprise COBOLs. Going through a shop's Assembler inventory to see which is worth converting to COBOL would enough fun to make me come out of retirement assuming financials could be worked out. Clark Morris -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
I have to ask: Why they big concern over a few instructions? Optimisation of a few is not worth the effort these days. In my opinion, it's not concern, it's pride. Surely all of us programmers like our code to be the best it can be, within reason. So it's probably less about optimizing existing code (unless it's in a loop) than about understanding what is best for your new code, when the development and test costs of the choices are basically the same. Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
Pride? Maybe. But, a few have misinterpreted my comments. I didn't say don't optimise. I said why worry about a few instructions? Even inside a loop, one instruction would have have to be executed a great amount before it can/will impact MSU based costs. Also, it have to be executed consistently within a second for a 4-hour period to impact at all. - -teD - Original Message From: Peter Relson Sent: Tuesday, February 18, 2014 08:02 To: IBM-MAIN@LISTSERV.UA.EDU Reply To: IBM Mainframe Discussion List Subject: Re: Branch (was: Performance question - adding) I have to ask: Why they big concern over a few instructions? Optimisation of a few is not worth the effort these days. In my opinion, it's not concern, it's pride. Surely all of us programmers like our code to be the best it can be, within reason. So it's probably less about optimizing existing code (unless it's in a loop) than about understanding what is best for your new code, when the development and test costs of the choices are basically the same. Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
Ted MacNeil wrote | Pride? Maybe. making it clear rhat he doesn't get it. There is a very small book, A mathematician's apology, by G. H. Hardy that I may well have mentioned here before. In it Hardy identifies three characteristics that all those who do good, sat all memorable intellectual work share. They are 1) more or less disinterested intellectual curiosity, the itch to know how things work, 2) a sense of craftsmanship, pride in one's work, evidenced by a need to do it as well as one can, and 3) ambition, a desire for recognition, even money in Hardy's words. Others in his view (and mine) will not and cannotr be expected to do exceptional work. About those others? Well, paraphrasing Hardy again: Since they cannot do anything really well it does not much matter what they do. John Gilmore, Ashland, MA 01721 - USA -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
On Tue, Feb 18, 2014 at 10:12 AM, John Gilmore jwgli...@gmail.com wrote: Ted MacNeil wrote | Pride? Maybe. making it clear rhat he doesn't get it. There is a very small book, A mathematician's apology, by G. H. Hardy that I may well have mentioned here before. In it Hardy identifies three characteristics that all those who do good, sat all memorable intellectual work share. They are 1) more or less disinterested intellectual curiosity, the itch to know how things work, 2) a sense of craftsmanship, pride in one's work, evidenced by a need to do it as well as one can, and 3) ambition, a desire for recognition, even money in Hardy's words. Others in his view (and mine) will not and cannotr be expected to do exceptional work. About those others? Well, paraphrasing Hardy again: Since they cannot do anything really well it does not much matter what they do. John Gilmore, Ashland, MA 01721 - USA Nice post. I picked up this tendency in a series of college math courses (3) Analysis of Variance, all taught by the same professor. I learned to love elegant proofs. I transferred this to my programming in that I like elegant programs. Well, another influence (for good or ill) was a group of us geeks who loved APL on MVT. We were all if you can't do it in one line, you don't know enough APL! (or your just not too bright) people. That sometimes carries over to my professional programming. I like efficient code. OTOH, I am also willing to write junk code if I really think what I need is an ad hoc, single shot, program where getting an answer quickly is more important than doing it with CPU efficient code. One offs don't need to be efficient. In the context of this discussion, I like participating because it is fun. And I learn some really interesting things that I wouldn't otherwise know. So that my normal programs just naturally become better. I.e. instead of only having a single hammer, I now have a tack hammer, a claw hammer, a sledge hammer, a jack hammer, a flathead screwdriver, a Phillips screwdriver, and a torque wrench. That is what _I_ get from this type of thread. Not a oh, my, let me rewrite the world. -- Wasn't there something about a PASCAL programmer knowing the value of everything and the Wirth of nothing? Maranatha! John McKown -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
Indeed. My favorite (which is, I suspect, where a lot of Big Numbers come from) is folks who have clearly extrapolated from a peak rate, like We peaked at 20,000 transactions per minute over Black Friday, so we need to be able to support 10 billion per year. But if you dig a bit more, you find out that their normal rate is more like 20 per minute, and that guess what, when they hit that peak, everything in their infrastructure was queuing work. So no, they don't need 10B/year capability: they need three orders of magnitude less. Or maybe two, to be safe. On Mon, Feb 17, 2014 at 5:54 PM, Chase, John jch...@ussco.com wrote: -Original Message- From: IBM Mainframe Discussion List On Behalf Of Ed Finnell Benchmarks, features, tuning knobs, performance bonds all factor in to the mix. The ones that scare me are the 'theoretically we can run some gazillion transactions on a mainframe'! We can, over a long enough time span. Over a long enough time span, everybody's survival rate is zero, too. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
In of1dbe6dec.23eaa84d-on85257c83.0046f9a1-85257c83.0047a...@us.ibm.com, on 02/18/2014 at 08:02 AM, Peter Relson rel...@us.ibm.com said: So it's probably less about optimizing existing code (unless it's in a loop) than about understanding what is best for your new code, when the development and test costs of the choices are basically the same. Generally what's best is what's most maintainable and most readable. -- Shmuel (Seymour J.) Metz, SysProg and JOAT ISO position; see http://patriot.net/~shmuel/resume/brief.html We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
Combining the thoughts engendered from about three replies, I wonder if avoiding a branch as follows (on a processor which supports the instructions) would perform better than branching. LT R0,CURRENT #LOAD CURRENT AND SET CC SPM R1 #SAVE CC FROM LT A R0,SUM #ADD SUM TO IT IPM R1 #RESTORE CC FROM LT STOC R0,SUM,NZ #STORE SUM ONLY IF CC OF LT WAS NZ Basically this loads CURRENT into R0, setting the CC based on its value. Then saves the CC in R1. Adds the SUM value into R0. Restores the CC from the LT, because the Add destroyed it. Then only stores the result in SUM if the CC is Not Zero, as set by the LT. I don't know if this code avoid the cache thrashing mention by Ed. I don't know if the CPU needs to lock the cache line if the STOC is a NOP due to the CC being zero (from the LT) most of the time (per the OP). I used R0 and R1 only because I tend to use them, along with R14 and R15, as junk temporary registers. -- Wasn't there something about a PASCAL programmer knowing the value of everything and the Wirth of nothing? Maranatha! John McKown -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
On 17/02/2014 10:25 PM, Paul Gilmartin wrote: Then you get to factor in how much readability is worth to you. Macros are your friend. But does providing readability at the programming interface level make such a macro unpleasantly verbose internally? Unless your desperately need highly optimized code readability is very important when writing assembler code. I suppose that's definitely the case for most vendors. The alternative is to use an optimizing compiler which go out of their way to remove branches. The following code snippet is a simple routine to convert a flag byte into a string of binary 1s and 0s. The optimizer unrolled the loop and used those fancy new load on condition instructions to remove all branches. I compiled two versions, one with loop unrolling and one without using the #pragma nounroll directive. The unrolled version was x3 faster! Now that's impressive. * static char buffer[CHAR_BIT + 1]; * int i; * int numBits = CHAR_BIT; * * for ( i = 0; numBits--; i++ ) LR r3,r1 * { * buffer[i] = ( c 0x80 ) ? '1' : '0'; LA r0,240 NILF r1,F'128' LA r8,241 LA r9,241 NILF r3,F'255' LA r10,241 LTR r1,r1 SLLK r1,r3,1 LOCREr8,r0 LR r3,r1 NILF r1,F'128' LA r11,241 NILF r3,F'255' STC r8,buffer[]0(,r5,9) LA r2,241 LTR r1,r1 SLL r3,1 LR r1,r3 LOCREr9,r0 NILF r3,F'128' STC r9,buffer[]0(,r5,10) NILF r1,F'255' LTR r3,r3 SLL r1,1 LOCREr10,r0 STC r10,buffer[]0(,r5,11) * c = 1; LR r3,r1 NILF r1,F'128' NILF r3,F'255' LTR r1,r1 SLLK r1,r3,1 LR r3,r1 LOCREr11,r0 NILF r3,F'255' STC r11,buffer[]0(,r5,12) LA r8,241 SLLK r9,r3,1 NILF r1,F'128' LR r10,r9 LA r11,241 LTR r1,r1 LOCREr8,r0 NILF r10,F'255' NILF r9,F'128' STC r8,buffer[]0(,r5,13) LTR r9,r9 SLLK r8,r10,1 LOCREr11,r0 LR r9,r8 NILF r8,F'128' LA r1,241 NILF r9,F'255' STC r11,buffer[]0(,r5,14) * } * * buffer[i] = '\0'; * * return buffer; LA r3,buffer(,r5,9) LTR r8,r8 SLLK r8,r9,1 LOCREr1,r0 NILF r8,F'128' STC r1,buffer[]0(,r5,15) LTR r8,r8 LOCREr2,r0 STC r2,buffer[]0(,r5,16) MVI buffer[]0(r5,17),0 * } -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
Nice! I got to thinking it would be nice to have a store different instruction (or make store behave this way automatically under the covers) which would invalidate the cache only if what it were storing were different from what was in memory already. Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of John McKown Sent: Monday, February 17, 2014 6:37 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Branch (was: Performance question - adding) Combining the thoughts engendered from about three replies, I wonder if avoiding a branch as follows (on a processor which supports the instructions) would perform better than branching. LT R0,CURRENT #LOAD CURRENT AND SET CC SPM R1 #SAVE CC FROM LT A R0,SUM #ADD SUM TO IT IPM R1 #RESTORE CC FROM LT STOC R0,SUM,NZ #STORE SUM ONLY IF CC OF LT WAS NZ Basically this loads CURRENT into R0, setting the CC based on its value. Then saves the CC in R1. Adds the SUM value into R0. Restores the CC from the LT, because the Add destroyed it. Then only stores the result in SUM if the CC is Not Zero, as set by the LT. I don't know if this code avoid the cache thrashing mention by Ed. I don't know if the CPU needs to lock the cache line if the STOC is a NOP due to the CC being zero (from the LT) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
On Mon, 17 Feb 2014 08:02:40 -0800, Charles Mills wrote: I got to thinking it would be nice to have a store different instruction (or make store behave this way automatically under the covers) which would invalidate the cache only if what it were storing were different from what was in memory already. -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of John McKown Sent: Monday, February 17, 2014 6:37 AM Combining the thoughts engendered from about three replies, I wonder if avoiding a branch as follows (on a processor which supports the instructions) would perform better than branching. LT R0,CURRENT #LOAD CURRENT AND SET CC SPM R1 #SAVE CC FROM LT A R0,SUM #ADD SUM TO IT IPM R1 #RESTORE CC FROM LT STOC R0,SUM,NZ #STORE SUM ONLY IF CC OF LT WAS NZ Doesn't one also want to avoid fetching the line into cache if it's not already there? I once examined the circuit diagram of some 3rd-party add-on DRAM for a PDP-12 we had. The hardware compared the data to be stored with that already in memory and bypassed the write-back if identical. -- gil -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
Another possibility which occurs to me, on newer hardware, is to try out the BPRP instruction. This also addresses Gil's thought about not fetching the cache line containing SUM unless it is necessary. Remember this assumes that CURRENT is almost always a zero, per the OP. * * SET UP BRANCH PREDICTION ON JZ * INSTRUCTION TO NOADD LABEL BPRP 8,JZ,NOADD PREDICT BRANCH IS TAKEN LT R0,CURRENT JZ JZ NOADD ADD R0,SUM ST R0,SUM NOADD DS 0H On Mon, Feb 17, 2014 at 10:02 AM, Charles Mills charl...@mcn.org wrote: Nice! I got to thinking it would be nice to have a store different instruction (or make store behave this way automatically under the covers) which would invalidate the cache only if what it were storing were different from what was in memory already. Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of John McKown Sent: Monday, February 17, 2014 6:37 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Branch (was: Performance question - adding) Combining the thoughts engendered from about three replies, I wonder if avoiding a branch as follows (on a processor which supports the instructions) would perform better than branching. LT R0,CURRENT #LOAD CURRENT AND SET CC SPM R1 #SAVE CC FROM LT A R0,SUM #ADD SUM TO IT IPM R1 #RESTORE CC FROM LT STOC R0,SUM,NZ #STORE SUM ONLY IF CC OF LT WAS NZ Basically this loads CURRENT into R0, setting the CC based on its value. Then saves the CC in R1. Adds the SUM value into R0. Restores the CC from the LT, because the Add destroyed it. Then only stores the result in SUM if the CC is Not Zero, as set by the LT. I don't know if this code avoid the cache thrashing mention by Ed. I don't know if the CPU needs to lock the cache line if the STOC is a NOP due to the CC being zero (from the LT) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- Wasn't there something about a PASCAL programmer knowing the value of everything and the Wirth of nothing? Maranatha! John McKown -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
I have to ask: Why they big concern over a few instructions? Optimisation of a few is not worth the effort these days. - -teD - Original Message From: John McKown Sent: Monday, February 17, 2014 12:02 To: IBM-MAIN@LISTSERV.UA.EDU Reply To: IBM Mainframe Discussion List Subject: Re: Branch (was: Performance question - adding) Another possibility which occurs to me, on newer hardware, is to try out the BPRP instruction. This also addresses Gil's thought about not fetching the cache line containing SUM unless it is necessary. Remember this assumes that CURRENT is almost always a zero, per the OP. * * SET UP BRANCH PREDICTION ON JZ * INSTRUCTION TO NOADD LABEL BPRP 8,JZ,NOADD PREDICT BRANCH IS TAKEN LT R0,CURRENT JZ JZ NOADD ADD R0,SUM ST R0,SUM NOADD DS 0H On Mon, Feb 17, 2014 at 10:02 AM, Charles Mills charl...@mcn.org wrote: Nice! I got to thinking it would be nice to have a store different instruction (or make store behave this way automatically under the covers) which would invalidate the cache only if what it were storing were different from what was in memory already. Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of John McKown Sent: Monday, February 17, 2014 6:37 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Branch (was: Performance question - adding) Combining the thoughts engendered from about three replies, I wonder if avoiding a branch as follows (on a processor which supports the instructions) would perform better than branching. LT R0,CURRENT #LOAD CURRENT AND SET CC SPM R1 #SAVE CC FROM LT A R0,SUM #ADD SUM TO IT IPM R1 #RESTORE CC FROM LT STOC R0,SUM,NZ #STORE SUM ONLY IF CC OF LT WAS NZ Basically this loads CURRENT into R0, setting the CC based on its value. Then saves the CC in R1. Adds the SUM value into R0. Restores the CC from the LT, because the Add destroyed it. Then only stores the result in SUM if the CC is Not Zero, as set by the LT. I don't know if this code avoid the cache thrashing mention by Ed. I don't know if the CPU needs to lock the cache line if the STOC is a NOP due to the CC being zero (from the LT) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- Wasn't there something about a PASCAL programmer knowing the value of everything and the Wirth of nothing? Maranatha! John McKown -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
On 2014-02-17, at 10:36, Ted MacNEIL wrote: I have to ask: Why they big concern over a few instructions? Optimisation of a few is not worth the effort these days. Hmmm... No single instruction is worth optimizing. No single instruction among a million is worth optimizing. It's not worth optimising a million instructions because that would imply optimizing each, which is not worth it. E.E. asked whether the code is in a loop. -- gil -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
Ted MacNEIL wrote: I have to ask: Why they big concern over a few instructions? Good question. This is why I asked that loop question earlier today. But I'm following this fun thread about the cache, fetch/modify by different CPs and execution prediction. Just curious of course. Optimisation of a few is not worth the effort these days. After my question, someone posted me off-line that if the machine only execute ONE instruction PER second, then only, then this optimisation work is worth the trouble. Groete / Greetings Elardus Engelbrecht -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
On Mon, Feb 17, 2014 at 12:03 PM, Paul Gilmartin paulgboul...@aim.comwrote: On 2014-02-17, at 10:36, Ted MacNEIL wrote: I have to ask: Why they big concern over a few instructions? Optimisation of a few is not worth the effort these days. Hmmm... No single instruction is worth optimizing. No single instruction among a million is worth optimizing. It's not worth optimising a million instructions because that would imply optimizing each, which is not worth it. E.E. asked whether the code is in a loop. -- gil I guess that I ASSuMEd that the code was in a heavily used loop. If you remove 1 instruction from a loop which is executed a million times, assuming the instruction is expensive, then it may well be worth the effort. Or maybe even replacing it with two simpler instructions (such as my thought on using IPM and SPM with an STOC instead of a JZ and ST). -- Wasn't there something about a PASCAL programmer knowing the value of everything and the Wirth of nothing? Maranatha! John McKown -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
On Mon, Feb 17, 2014 at 12:06 PM, Elardus Engelbrecht elardus.engelbre...@sita.co.za wrote: Ted MacNEIL wrote: I have to ask: Why they big concern over a few instructions? Good question. This is why I asked that loop question earlier today. But I'm following this fun thread about the cache, fetch/modify by different CPs and execution prediction. Just curious of course. Optimisation of a few is not worth the effort these days. After my question, someone posted me off-line that if the machine only execute ONE instruction PER second, then only, then this optimisation work is worth the trouble. Groete / Greetings Elardus Engelbrecht Of course, IBM is trying to make this discussion moot by getting people off of using assembler at all, and implementing a code generation back end which will produce better than the average HLASM programmer code for C/C++, Java, and COBOL (COBOL code generation, pre-V5.1 at least, really stinks IMO). I don't know if IBM worries as much about FORTRAN and PL/I these days. -- Wasn't there something about a PASCAL programmer knowing the value of everything and the Wirth of nothing? Maranatha! John McKown -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
Or if you are writing a compiler (or similar code generator, such as a sort compare generator, or a SQL implementation). One instruction saved X a million compiles = a million instructions saved. Some of us here do things of that type. Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of John McKown Sent: Monday, February 17, 2014 10:09 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Branch (was: Performance question - adding) On Mon, Feb 17, 2014 at 12:03 PM, Paul Gilmartin paulgboul...@aim.comwrote: On 2014-02-17, at 10:36, Ted MacNEIL wrote: I have to ask: Why they big concern over a few instructions? Optimisation of a few is not worth the effort these days. Hmmm... No single instruction is worth optimizing. No single instruction among a million is worth optimizing. It's not worth optimising a million instructions because that would imply optimizing each, which is not worth it. E.E. asked whether the code is in a loop. -- gil I guess that I ASSuMEd that the code was in a heavily used loop. If you remove 1 instruction from a loop which is executed a million times, assuming the instruction is expensive, then it may well be worth the effort. Or maybe even replacing it with two simpler instructions (such as my thought on using IPM and SPM with an STOC instead of a JZ and ST). -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
On a 600 MIPS single engine (z/990 class) 1,000,000 instructions is 0.17% of a CP. These days? - -teD - Original Message From: John McKown Sent: Monday, February 17, 2014 13:15 To: IBM-MAIN@LISTSERV.UA.EDU Reply To: IBM Mainframe Discussion List Subject: Re: Branch (was: Performance question - adding) On Mon, Feb 17, 2014 at 12:03 PM, Paul Gilmartin paulgboul...@aim.comwrote: On 2014-02-17, at 10:36, Ted MacNEIL wrote: I have to ask: Why they big concern over a few instructions? Optimisation of a few is not worth the effort these days. Hmmm... No single instruction is worth optimizing. No single instruction among a million is worth optimizing. It's not worth optimising a million instructions because that would imply optimizing each, which is not worth it. E.E. asked whether the code is in a loop. -- gil I guess that I ASSuMEd that the code was in a heavily used loop. If you remove 1 instruction from a loop which is executed a million times, assuming the instruction is expensive, then it may well be worth the effort. Or maybe even replacing it with two simpler instructions (such as my thought on using IPM and SPM with an STOC instead of a JZ and ST). -- Wasn't there something about a PASCAL programmer knowing the value of everything and the Wirth of nothing? Maranatha! John McKown -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
What does the statement | 1,000,000 instructions is 0.17% of a CP mean? What are the dimensions of % John Gilmore, Ashland, MA 01721 - USA -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
On Mon, 17 Feb 2014 14:04:56 -0500, John Gilmore wrote: What does the statement | 1,000,000 instructions is 0.17% of a CP mean? What are the dimensions of % I don't know, but it would appear to be a gross oversimplification. Ted should know as well as anyone here that MIPS is meaningless, and that a snippet of code taken out of context doesn't tell much. We don't know why the OP was concerned, but that doesn't mean that his concern isn't valid. -- Tom Marchant -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
I develop vendor code. Customers always ask about CPU time. If I answered oh we don't worry about that anymore do you think they would buy? Do you think I would have a job? Charles Composed on a mobile: please excuse my brevity Ted MacNEIL eamacn...@yahoo.ca wrote: On a 600 MIPS single engine (z/990 class) 1,000,000 instructions is 0.17% of a CP. These days? - -teD - Original Message From: John McKown Sent: Monday, February 17, 2014 13:15 To: IBM-MAIN@LISTSERV.UA.EDU Reply To: IBM Mainframe Discussion List Subject: Re: Branch (was: Performance question - adding) On Mon, Feb 17, 2014 at 12:03 PM, Paul Gilmartin paulgboul...@aim.comwrote: On 2014-02-17, at 10:36, Ted MacNEIL wrote: I have to ask: Why they big concern over a few instructions? Optimisation of a few is not worth the effort these days. Hmmm... No single instruction is worth optimizing. No single instruction among a million is worth optimizing. It's not worth optimising a million instructions because that would imply optimizing each, which is not worth it. E.E. asked whether the code is in a loop. -- gil I guess that I ASSuMEd that the code was in a heavily used loop. If you remove 1 instruction from a loop which is executed a million times, assuming the instruction is expensive, then it may well be worth the effort. Or maybe even replacing it with two simpler instructions (such as my thought on using IPM and SPM with an STOC instead of a JZ and ST). -- Wasn't there something about a PASCAL programmer knowing the value of everything and the Wirth of nothing? Maranatha! John McKown -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
Benchmarks, features, tuning knobs, performance bonds all factor in to the mix. The ones that scare me are the 'theoretically we can run some gazillion transactions on a mainframe'! In a message dated 2/17/2014 2:18:47 P.M. Central Standard Time, charl...@mcn.org writes: I develop vendor code. Customers always ask about CPU time. If I answered oh we don't worry about that anymore do you think they would buy? Do you think I would have a job? -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
On Mon, Feb 17, 2014 at 11:36 AM, Ted MacNEIL eamacn...@yahoo.ca wrote: I have to ask: Why they big concern over a few instructions? Optimisation of a few is not worth the effort these days. - -teD - OK, this then causes me to wonder why IBM has bothered to create instructions such as Load On Condition and Store On Condition. The manual in the STOC says: quote STORE ON CONDITION provides a function similar to that of a separate BRANCH ON CONDITION instruction followed by a STORE instruction, except that STORE ON CONDITION does not provide an index register. For example, the following two instruction sequences are equivalent. STOCG 15,256(7),8 BC 7,SKIP STG 15,256(7) SKIP DS 0H On models that implement predictive branching, the combination of the BRANCH ON CONDITION and STORE instructions may perform somewhat better than the STORE ON CONDITION instruction when the CPU is able to successfully predict the branch condition. However, on models where the CPU is not able to successfully predict the branch condition, such as when the condition is more random, the STORE ON CONDITION instruction may provide significant performance improvement. /quote The above makes me wonder if my example of using the BPRP (does anyone else read that as burper?) instruction, since I _know_ at that point that the branch _will be_ taken should be used instead of the STOC. -- Wasn't there something about a PASCAL programmer knowing the value of everything and the Wirth of nothing? Maranatha! John McKown -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
On 17 February 2014 09:37, John McKown john.archie.mck...@gmail.com wrote: LT R0,CURRENT #LOAD CURRENT AND SET CC SPM R1 #SAVE CC FROM LT A R0,SUM #ADD SUM TO IT IPM R1 #RESTORE CC FROM LT STOC R0,SUM,NZ #STORE SUM ONLY IF CC OF LT WAS NZ Basically this loads CURRENT into R0, setting the CC based on its value. Then saves the CC in R1. Adds the SUM value into R0. Restores the CC from the LT, because the Add destroyed it. Then only stores the result in SUM if the CC is Not Zero, as set by the LT. Not that it affects your proposal, but I think your SPM and IPM are reversed there... It's perhaps interesting that IPM appeared only in 370/XA; on 24-bit systems BALR was expected to do. Tony H. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Branch (was: Performance question - adding)
On 2014-02-17, at 10:36, Ted MacNEIL wrote: I have to ask: Why they big concern over a few instructions? Optimisation of a few is not worth the effort these days. LOL. If Binyamin's question wasn't worth asking, then IBM would never have recently introduced the STOC instruction that John McKown so kindly reminded us about. (Wish we could use instructions like that in other than our JIT-compiled Java code...) If most simple instructions run in roughly one cycle (with the wind at their back i.e., no interlocks or other delays), an L1 memory access is zero cycles, and an uncached memory access is roughly 1000 cycles, then it makes perfect sense for professional programmers to want to understand how they might avoid having even a small code fragment run three orders of magnitude slower. Add enough of them up and it can make a big difference. If professionals never ask questions, they'll never know the answers when they need them. -- Edward E Jaffe Phoenix Software International, Inc 831 Parkview Drive North El Segundo, CA 90245 http://www.phoenixsoftware.com/ -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN