Re: Branch (was: Performance question - adding)

2014-02-19 Thread John McKown
On Tue, Feb 18, 2014 at 5:20 PM, Shmuel Metz (Seymour J.) 
shmuel+ibm-m...@patriot.net wrote:

 In
 of1dbe6dec.23eaa84d-on85257c83.0046f9a1-85257c83.0047a...@us.ibm.com,
 on 02/18/2014
at 08:02 AM, Peter Relson rel...@us.ibm.com said:

 So it's probably less about optimizing existing code (unless it's in
 a loop) than about understanding what is best for your new code,
 when the development and test costs of the choices are basically
 the same.

 Generally what's best is what's most maintainable and most readable.


Around here, that would likely translate into (1) convert it to COBOL or
(2) rewrite it to run on Windows, using .NET . Both of those are more
readable and maintainable __in this shop__. Yes, I'm joking a bit. Kind of.


-- 
Wasn't there something about a PASCAL programmer knowing the value of
everything and the Wirth of nothing?

Maranatha! 
John McKown

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Rewriting Assembler to COBOL was Re: Branch (was: Performance question - adding)

2014-02-19 Thread Clark Morris
On 19 Feb 2014 03:48:24 -0800, in bit.listserv.ibm-main you wrote:

On Tue, Feb 18, 2014 at 5:20 PM, Shmuel Metz (Seymour J.) 
shmuel+ibm-m...@patriot.net wrote:

 In
 of1dbe6dec.23eaa84d-on85257c83.0046f9a1-85257c83.0047a...@us.ibm.com,
 on 02/18/2014
at 08:02 AM, Peter Relson rel...@us.ibm.com said:

 So it's probably less about optimizing existing code (unless it's in
 a loop) than about understanding what is best for your new code,
 when the development and test costs of the choices are basically
 the same.

 Generally what's best is what's most maintainable and most readable.


Around here, that would likely translate into (1) convert it to COBOL or
(2) rewrite it to run on Windows, using .NET . Both of those are more
readable and maintainable __in this shop__. Yes, I'm joking a bit. Kind of.

With any supported COBOL compiler you have nested programs and
multiple levels of COPY so that you could write an entire program in a
COPY book that would have COPY statements in it.  This allows the
program to included in multiple programs thus eliminating much of the
inter-module instruction overhead.  The compiler may even do more
sophisticated code elimination and optimization.  In addition if many
of the assembler routines were written to get around restrictions and
difficulty of doing things in COBOL VS - the 1974 standard COBOL,
those restrictions and difficulties may no longer exist.  Reference
modification and a number of other features in the newer compilers
have made a major difference.  

Further, look at the 2002 standard and draft standards and see if the
abilities to have bit manipulation, BIT and various FLOATING POINT
usages including DECIMAL FLOATING POINT and various types of rounding
including rounding to nearest even would allow your shop to eliminate
even more assembler programs.  Code that can easily be moved inline is
code that doesn't incur inter-module overhead.  I suspect that linked
lists and queues are relatively easy.  Also with LOCAL-STORAGE
recursive routines can be written.  

I have used COBOL to manipulate the SMF 30 records among others so
COBOL is more powerful than many here might realize.  The 85 standard
COBOL were a great leap forward.  Many of the optimizations I did for
a program using the COBOL VS would have been counterproductive with
the VS COBOL V1.4 and the Enterprise COBOLs.  

Going through a shop's Assembler inventory to see which is worth
converting to COBOL would enough fun to make me come out of retirement
assuming financials could be worked out.

Clark Morris   

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-18 Thread Peter Relson
I have to ask: Why they big concern over a few instructions?
 Optimisation of a few is not worth the effort these days.

In my opinion, it's not concern, it's pride. Surely all of us 
programmers like our code to be the best it can be, within reason.

So it's probably less about optimizing existing code (unless it's in a 
loop) than about understanding what is best for your new code, when the 
development and test costs of the choices are basically the same.

Peter Relson
z/OS Core Technology Design

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-18 Thread Ted MacNEIL
‎Pride? Maybe.

But, a few have misinterpreted my comments.
I didn't say don't optimise.
I said why worry about a few instructions?
Even inside a loop, one instruction would have have to be executed a great 
amount before it can/will impact MSU based costs.
Also, it have to be executed consistently within a second for a 4-hour perio‎d 
to impact at all.
-
-teD
-
  Original Message  
From: Peter Relson
Sent: Tuesday, February 18, 2014 08:02
To: IBM-MAIN@LISTSERV.UA.EDU
Reply To: IBM Mainframe Discussion List
Subject: Re: Branch (was: Performance question - adding)

I have to ask: Why they big concern over a few instructions?
 Optimisation of a few is not worth the effort these days.

In my opinion, it's not concern, it's pride. Surely all of us 
programmers like our code to be the best it can be, within reason.

So it's probably less about optimizing existing code (unless it's in a 
loop) than about understanding what is best for your new code, when the 
development and test costs of the choices are basically the same.

Peter Relson
z/OS Core Technology Design

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-18 Thread John Gilmore
Ted MacNeil wrote

| Pride?  Maybe.

making it clear rhat he doesn't get it.

There is a very small book,  A mathematician's apology, by G. H. Hardy
that I may well have mentioned here before.  In it Hardy identifies
three characteristics that all those who do good, sat all memorable
intellectual work share.  They are

1) more or less disinterested intellectual curiosity, the itch to know
how things work,

2) a sense of craftsmanship, pride in one's work, evidenced by a need
to do it as well as one can, and

3) ambition, a desire for recognition, even money in Hardy's words.

Others in his view (and mine) will not and cannotr be expected to do
exceptional work.

About those others?  Well, paraphrasing Hardy again: Since they cannot
do anything really well it does not much matter what they do.

John Gilmore, Ashland, MA 01721 - USA

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-18 Thread John McKown
On Tue, Feb 18, 2014 at 10:12 AM, John Gilmore jwgli...@gmail.com wrote:

 Ted MacNeil wrote

 | Pride?  Maybe.

 making it clear rhat he doesn't get it.

 There is a very small book,  A mathematician's apology, by G. H. Hardy
 that I may well have mentioned here before.  In it Hardy identifies
 three characteristics that all those who do good, sat all memorable
 intellectual work share.  They are

 1) more or less disinterested intellectual curiosity, the itch to know
 how things work,

 2) a sense of craftsmanship, pride in one's work, evidenced by a need
 to do it as well as one can, and

 3) ambition, a desire for recognition, even money in Hardy's words.

 Others in his view (and mine) will not and cannotr be expected to do
 exceptional work.

 About those others?  Well, paraphrasing Hardy again: Since they cannot
 do anything really well it does not much matter what they do.

 John Gilmore, Ashland, MA 01721 - USA


Nice post. I picked up this tendency  in a series of college math courses
(3) Analysis of Variance, all taught by the same professor. I learned to
love elegant proofs. I transferred this to my programming in that I like
elegant programs. Well, another influence (for good or ill) was a group
of us geeks who loved APL on MVT. We were all if you can't do it in one
line, you don't know enough APL! (or your just not too bright) people.
That sometimes carries over to my professional programming. I like
efficient code. OTOH, I am also willing to write junk code if I really
think what I need is an ad hoc, single shot, program where getting an
answer quickly is more important than doing it with CPU efficient code.
One offs don't need to be efficient.

In the context of this discussion, I like participating because it is fun.
And I learn some really interesting things that I wouldn't otherwise know.
So that my normal programs just naturally become better. I.e. instead
of only having a single hammer, I now have a tack hammer, a claw hammer, a
sledge hammer, a jack hammer, a flathead screwdriver, a Phillips
screwdriver, and a torque wrench. That is what _I_ get from this type of
thread. Not a oh, my, let me rewrite the world.



-- 
Wasn't there something about a PASCAL programmer knowing the value of
everything and the Wirth of nothing?

Maranatha! 
John McKown

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-18 Thread zMan
Indeed. My favorite (which is, I suspect, where a lot of Big Numbers come
from) is folks who have clearly extrapolated from a peak rate, like We
peaked at 20,000 transactions per minute over Black Friday, so we need to
be able to support 10 billion per year. But if you dig a bit more, you
find out that their normal rate is more like 20 per minute, and that guess
what, when they hit that peak, everything in their infrastructure was
queuing work. So no, they don't need 10B/year capability: they need three
orders of magnitude less. Or maybe two, to be safe.

On Mon, Feb 17, 2014 at 5:54 PM, Chase, John jch...@ussco.com wrote:

  -Original Message-
  From: IBM Mainframe Discussion List On Behalf Of Ed Finnell
 
  Benchmarks, features, tuning knobs, performance bonds all factor in to
 the mix. The ones that scare me
  are the 'theoretically we can run some gazillion  transactions on a
 mainframe'!

 We can, over a long enough time span.  Over a long enough time span,
 everybody's survival rate is zero, too.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-18 Thread Shmuel Metz (Seymour J.)
In
of1dbe6dec.23eaa84d-on85257c83.0046f9a1-85257c83.0047a...@us.ibm.com,
on 02/18/2014
   at 08:02 AM, Peter Relson rel...@us.ibm.com said:

So it's probably less about optimizing existing code (unless it's in
a loop) than about understanding what is best for your new code,
when the development and test costs of the choices are basically 
the same.

Generally what's best is what's most maintainable and most readable.
 
-- 
 Shmuel (Seymour J.) Metz, SysProg and JOAT
 ISO position; see http://patriot.net/~shmuel/resume/brief.html 
We don't care. We don't have to care, we're Congress.
(S877: The Shut up and Eat Your spam act of 2003)

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-17 Thread John McKown
Combining the thoughts engendered from about three replies, I wonder if
avoiding a branch as follows (on a processor which supports the
instructions) would perform better than branching.

LT  R0,CURRENT #LOAD CURRENT AND SET CC
SPM R1 #SAVE CC FROM LT
A R0,SUM #ADD SUM TO IT
IPM R1 #RESTORE CC FROM LT
STOC R0,SUM,NZ #STORE SUM ONLY IF CC OF LT WAS NZ

Basically this loads CURRENT into R0, setting the CC based on its value.
Then saves the CC in R1. Adds the SUM value into R0. Restores the CC from
the LT, because the Add destroyed it. Then only stores the result in SUM if
the CC is Not Zero, as set by the LT. I don't know if this code avoid the
cache thrashing mention by Ed. I don't know if the CPU needs to lock
the cache line if the STOC is a NOP due to the CC being zero (from the LT)
most of the time (per the OP). I used R0 and R1 only because I tend to use
them, along with R14 and R15, as junk temporary registers.



-- 
Wasn't there something about a PASCAL programmer knowing the value of
everything and the Wirth of nothing?

Maranatha! 
John McKown

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-17 Thread David Crayford

On 17/02/2014 10:25 PM, Paul Gilmartin wrote:

Then you get to factor in how much readability is worth to you.
  

Macros are your friend.  But does providing readability at the
programming interface level make such a macro unpleasantly
verbose internally?
Unless your desperately need highly optimized code readability is very 
important when writing assembler code. I suppose that's definitely the 
case for most vendors.


The alternative is to use an optimizing compiler which go out of their 
way to remove branches. The following code snippet is a simple routine 
to convert a flag byte into a string of binary 1s and 0s. The optimizer 
unrolled the loop and used those fancy new load on condition 
instructions to remove all branches.


I compiled two versions, one with loop unrolling and one without using 
the #pragma nounroll directive. The unrolled version was x3 faster! Now 
that's impressive.


 *  static char buffer[CHAR_BIT + 1];
 *  int i;
 *  int numBits = CHAR_BIT;
 *
 *  for ( i = 0; numBits--; i++ )
   LR   r3,r1
 *  {
 *  buffer[i] = ( c  0x80 ) ? '1' : '0';
   LA   r0,240
   NILF r1,F'128'
   LA   r8,241
   LA   r9,241
   NILF r3,F'255'
   LA   r10,241
   LTR  r1,r1
   SLLK r1,r3,1
   LOCREr8,r0
   LR   r3,r1
   NILF r1,F'128'
   LA   r11,241
   NILF r3,F'255'
   STC  r8,buffer[]0(,r5,9)
   LA   r2,241
   LTR  r1,r1
   SLL  r3,1
   LR   r1,r3
   LOCREr9,r0
   NILF r3,F'128'
   STC  r9,buffer[]0(,r5,10)
   NILF r1,F'255'
   LTR  r3,r3
   SLL  r1,1
   LOCREr10,r0
   STC  r10,buffer[]0(,r5,11)
 *  c = 1;
   LR   r3,r1
   NILF r1,F'128'
   NILF r3,F'255'
   LTR  r1,r1
   SLLK r1,r3,1
   LR   r3,r1
   LOCREr11,r0
   NILF r3,F'255'
   STC  r11,buffer[]0(,r5,12)
   LA   r8,241
   SLLK r9,r3,1
   NILF r1,F'128'
   LR   r10,r9
   LA   r11,241
   LTR  r1,r1
   LOCREr8,r0
   NILF r10,F'255'
   NILF r9,F'128'
   STC  r8,buffer[]0(,r5,13)
   LTR  r9,r9
   SLLK r8,r10,1
   LOCREr11,r0
   LR   r9,r8
   NILF r8,F'128'
   LA   r1,241
   NILF r9,F'255'
   STC  r11,buffer[]0(,r5,14)
 *  }
 *
 *  buffer[i] = '\0';
 *
 *  return buffer;
   LA   r3,buffer(,r5,9)
   LTR  r8,r8
   SLLK r8,r9,1
   LOCREr1,r0
   NILF r8,F'128'
   STC  r1,buffer[]0(,r5,15)
   LTR  r8,r8
   LOCREr2,r0
   STC  r2,buffer[]0(,r5,16)
   MVI  buffer[]0(r5,17),0
 *  }



--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-17 Thread Charles Mills
Nice!

I got to thinking it would be nice to have a store different instruction (or 
make store behave this way automatically under the covers) which would 
invalidate the cache only if what it were storing were different from what was 
in memory already.

Charles

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of John McKown
Sent: Monday, February 17, 2014 6:37 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Branch (was: Performance question - adding)

Combining the thoughts engendered from about three replies, I wonder if
avoiding a branch as follows (on a processor which supports the
instructions) would perform better than branching.

LT  R0,CURRENT #LOAD CURRENT AND SET CC
SPM R1 #SAVE CC FROM LT
A R0,SUM #ADD SUM TO IT
IPM R1 #RESTORE CC FROM LT
STOC R0,SUM,NZ #STORE SUM ONLY IF CC OF LT WAS NZ

Basically this loads CURRENT into R0, setting the CC based on its value.
Then saves the CC in R1. Adds the SUM value into R0. Restores the CC from
the LT, because the Add destroyed it. Then only stores the result in SUM if
the CC is Not Zero, as set by the LT. I don't know if this code avoid the
cache thrashing mention by Ed. I don't know if the CPU needs to lock
the cache line if the STOC is a NOP due to the CC being zero (from the LT)

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-17 Thread Paul Gilmartin
On Mon, 17 Feb 2014 08:02:40 -0800, Charles Mills wrote:

I got to thinking it would be nice to have a store different instruction (or 
make store behave this way automatically under the covers) which would 
invalidate the cache only if what it were storing were different from what was 
in memory already.

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On 
Behalf Of John McKown
Sent: Monday, February 17, 2014 6:37 AM

Combining the thoughts engendered from about three replies, I wonder if
avoiding a branch as follows (on a processor which supports the
instructions) would perform better than branching.

LT  R0,CURRENT #LOAD CURRENT AND SET CC
SPM R1 #SAVE CC FROM LT
A R0,SUM #ADD SUM TO IT
IPM R1 #RESTORE CC FROM LT
STOC R0,SUM,NZ #STORE SUM ONLY IF CC OF LT WAS NZ
 
Doesn't one also want to avoid fetching the line into cache if it's not
already there?

I once examined the circuit diagram of some 3rd-party add-on DRAM
for a PDP-12 we had.  The hardware compared the data to be stored
with that already in memory and bypassed the write-back if identical.

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-17 Thread John McKown
Another possibility which occurs to me, on newer hardware, is to try out
the BPRP instruction. This also addresses Gil's thought about not fetching
the cache line containing SUM unless it is necessary. Remember this assumes
that CURRENT is almost always a zero, per the OP.

*
* SET UP BRANCH PREDICTION ON JZ
* INSTRUCTION TO NOADD LABEL
BPRP 8,JZ,NOADD PREDICT BRANCH IS TAKEN
LT  R0,CURRENT
JZ  JZ  NOADD
ADD R0,SUM
ST  R0,SUM
NOADD   DS  0H


On Mon, Feb 17, 2014 at 10:02 AM, Charles Mills charl...@mcn.org wrote:

 Nice!

 I got to thinking it would be nice to have a store different instruction
 (or make store behave this way automatically under the covers) which would
 invalidate the cache only if what it were storing were different from what
 was in memory already.

 Charles

 -Original Message-
 From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
 Behalf Of John McKown
 Sent: Monday, February 17, 2014 6:37 AM
 To: IBM-MAIN@LISTSERV.UA.EDU
 Subject: Re: Branch (was: Performance question - adding)

 Combining the thoughts engendered from about three replies, I wonder if
 avoiding a branch as follows (on a processor which supports the
 instructions) would perform better than branching.

 LT  R0,CURRENT #LOAD CURRENT AND SET CC
 SPM R1 #SAVE CC FROM LT
 A R0,SUM #ADD SUM TO IT
 IPM R1 #RESTORE CC FROM LT
 STOC R0,SUM,NZ #STORE SUM ONLY IF CC OF LT WAS NZ

 Basically this loads CURRENT into R0, setting the CC based on its value.
 Then saves the CC in R1. Adds the SUM value into R0. Restores the CC from
 the LT, because the Add destroyed it. Then only stores the result in SUM if
 the CC is Not Zero, as set by the LT. I don't know if this code avoid the
 cache thrashing mention by Ed. I don't know if the CPU needs to lock
 the cache line if the STOC is a NOP due to the CC being zero (from the LT)

 --
 For IBM-MAIN subscribe / signoff / archive access instructions,
 send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN




-- 
Wasn't there something about a PASCAL programmer knowing the value of
everything and the Wirth of nothing?

Maranatha! 
John McKown

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-17 Thread Ted MacNEIL
I have to ask: Why they big concern over a few instructions?
                               Optimisation of a few is not worth the effort 
these days.


-
-teD
-
  Original Message  
From: John McKown
Sent: Monday, February 17, 2014 12:02
To: IBM-MAIN@LISTSERV.UA.EDU
Reply To: IBM Mainframe Discussion List
Subject: Re: Branch (was: Performance question - adding)

Another possibility which occurs to me, on newer hardware, is to try out
the BPRP instruction. This also addresses Gil's thought about not fetching
the cache line containing SUM unless it is necessary. Remember this assumes
that CURRENT is almost always a zero, per the OP.

*
* SET UP BRANCH PREDICTION ON JZ
* INSTRUCTION TO NOADD LABEL
BPRP 8,JZ,NOADD PREDICT BRANCH IS TAKEN
LT R0,CURRENT
JZ JZ NOADD
ADD R0,SUM
ST R0,SUM
NOADD DS 0H


On Mon, Feb 17, 2014 at 10:02 AM, Charles Mills charl...@mcn.org wrote:

 Nice!

 I got to thinking it would be nice to have a store different instruction
 (or make store behave this way automatically under the covers) which would
 invalidate the cache only if what it were storing were different from what
 was in memory already.

 Charles

 -Original Message-
 From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
 Behalf Of John McKown
 Sent: Monday, February 17, 2014 6:37 AM
 To: IBM-MAIN@LISTSERV.UA.EDU
 Subject: Re: Branch (was: Performance question - adding)

 Combining the thoughts engendered from about three replies, I wonder if
 avoiding a branch as follows (on a processor which supports the
 instructions) would perform better than branching.

 LT R0,CURRENT #LOAD CURRENT AND SET CC
 SPM R1 #SAVE CC FROM LT
 A R0,SUM #ADD SUM TO IT
 IPM R1 #RESTORE CC FROM LT
 STOC R0,SUM,NZ #STORE SUM ONLY IF CC OF LT WAS NZ

 Basically this loads CURRENT into R0, setting the CC based on its value.
 Then saves the CC in R1. Adds the SUM value into R0. Restores the CC from
 the LT, because the Add destroyed it. Then only stores the result in SUM if
 the CC is Not Zero, as set by the LT. I don't know if this code avoid the
 cache thrashing mention by Ed. I don't know if the CPU needs to lock
 the cache line if the STOC is a NOP due to the CC being zero (from the LT)

 --
 For IBM-MAIN subscribe / signoff / archive access instructions,
 send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN




-- 
Wasn't there something about a PASCAL programmer knowing the value of
everything and the Wirth of nothing?

Maranatha! 
John McKown

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-17 Thread Paul Gilmartin
On 2014-02-17, at 10:36, Ted MacNEIL wrote:

 I have to ask: Why they big concern over a few instructions?
Optimisation of a few is not worth the effort 
 these days.
  
Hmmm...  No single instruction is worth optimizing.

No single instruction among a million is worth optimizing.

It's not worth optimising a million instructions because
that would imply optimizing each, which is not worth it.

E.E. asked whether the code is in a loop.

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-17 Thread Elardus Engelbrecht
Ted MacNEIL wrote:

I have to ask: Why they big concern over a few instructions?

Good question. This is why I asked that loop question earlier today. But I'm 
following this fun thread about the cache, fetch/modify by different CPs and 
execution prediction. Just curious of course.

Optimisation of a few is not worth the effort these days.

After my question, someone posted me off-line that if the machine only execute 
ONE instruction PER second, then only, then this optimisation work is worth the 
trouble.

Groete / Greetings
Elardus Engelbrecht

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-17 Thread John McKown
On Mon, Feb 17, 2014 at 12:03 PM, Paul Gilmartin paulgboul...@aim.comwrote:

 On 2014-02-17, at 10:36, Ted MacNEIL wrote:

  I have to ask: Why they big concern over a few instructions?
 Optimisation of a few is not worth the
 effort these days.
 
 Hmmm...  No single instruction is worth optimizing.

 No single instruction among a million is worth optimizing.

 It's not worth optimising a million instructions because
 that would imply optimizing each, which is not worth it.

 E.E. asked whether the code is in a loop.

 -- gil


I guess that I ASSuMEd that the code was in a heavily used loop. If you
remove 1 instruction from a loop which is executed a million times,
assuming the instruction is expensive, then it may well be worth the
effort. Or maybe even replacing it with two simpler instructions (such as
my thought on using IPM and SPM with an STOC instead of a JZ and ST).



-- 
Wasn't there something about a PASCAL programmer knowing the value of
everything and the Wirth of nothing?

Maranatha! 
John McKown

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-17 Thread John McKown
On Mon, Feb 17, 2014 at 12:06 PM, Elardus Engelbrecht 
elardus.engelbre...@sita.co.za wrote:

 Ted MacNEIL wrote:

 I have to ask: Why they big concern over a few instructions?

 Good question. This is why I asked that loop question earlier today. But
 I'm following this fun thread about the cache, fetch/modify by different
 CPs and execution prediction. Just curious of course.

 Optimisation of a few is not worth the effort these days.

 After my question, someone posted me off-line that if the machine only
 execute ONE instruction PER second, then only, then this optimisation work
 is worth the trouble.

 Groete / Greetings
 Elardus Engelbrecht


Of course, IBM is trying to make this discussion moot by getting people off
of using assembler at all, and implementing a code generation back end
which will produce better than the average HLASM programmer code for
C/C++, Java, and COBOL (COBOL code generation, pre-V5.1 at least, really
stinks IMO). I don't know if IBM worries as much about FORTRAN and PL/I
these days.

-- 
Wasn't there something about a PASCAL programmer knowing the value of
everything and the Wirth of nothing?

Maranatha! 
John McKown

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-17 Thread Charles Mills
Or if you are writing a compiler (or similar code generator, such as a sort 
compare generator, or a SQL implementation). One instruction saved X a million 
compiles = a million instructions saved. Some of us here do things of that type.

Charles

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of John McKown
Sent: Monday, February 17, 2014 10:09 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Branch (was: Performance question - adding)

On Mon, Feb 17, 2014 at 12:03 PM, Paul Gilmartin paulgboul...@aim.comwrote:

 On 2014-02-17, at 10:36, Ted MacNEIL wrote:

  I have to ask: Why they big concern over a few instructions?
 Optimisation of a few is not worth 
  the
 effort these days.
 
 Hmmm...  No single instruction is worth optimizing.

 No single instruction among a million is worth optimizing.

 It's not worth optimising a million instructions because that would 
 imply optimizing each, which is not worth it.

 E.E. asked whether the code is in a loop.

 -- gil


I guess that I ASSuMEd that the code was in a heavily used loop. If you remove 
1 instruction from a loop which is executed a million times, assuming the 
instruction is expensive, then it may well be worth the effort. Or maybe even 
replacing it with two simpler instructions (such as my thought on using IPM 
and SPM with an STOC instead of a JZ and ST).

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-17 Thread Ted MacNEIL
On a 600 MIPS single engine (z/990 class) 1,000,000 instructions is 0.17% of a 
CP. These days?

-
-teD
-
  Original Message  
From: John McKown
Sent: Monday, February 17, 2014 13:15
To: IBM-MAIN@LISTSERV.UA.EDU
Reply To: IBM Mainframe Discussion List
Subject: Re: Branch (was: Performance question - adding)

On Mon, Feb 17, 2014 at 12:03 PM, Paul Gilmartin paulgboul...@aim.comwrote:

 On 2014-02-17, at 10:36, Ted MacNEIL wrote:

  I have to ask: Why they big concern over a few instructions?
  Optimisation of a few is not worth the
 effort these days.
 
 Hmmm... No single instruction is worth optimizing.

 No single instruction among a million is worth optimizing.

 It's not worth optimising a million instructions because
 that would imply optimizing each, which is not worth it.

 E.E. asked whether the code is in a loop.

 -- gil


I guess that I ASSuMEd that the code was in a heavily used loop. If you
remove 1 instruction from a loop which is executed a million times,
assuming the instruction is expensive, then it may well be worth the
effort. Or maybe even replacing it with two simpler instructions (such as
my thought on using IPM and SPM with an STOC instead of a JZ and ST).



-- 
Wasn't there something about a PASCAL programmer knowing the value of
everything and the Wirth of nothing?

Maranatha! 
John McKown

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-17 Thread John Gilmore
What does the statement

| 1,000,000 instructions is 0.17% of a CP

mean?  What are the dimensions of %

John Gilmore, Ashland, MA 01721 - USA

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-17 Thread Tom Marchant
On Mon, 17 Feb 2014 14:04:56 -0500, John Gilmore wrote:

What does the statement

| 1,000,000 instructions is 0.17% of a CP

mean?  What are the dimensions of %

I don't know, but it would appear to be a gross oversimplification. Ted should 
know as well as anyone here that MIPS is meaningless, and that a snippet of 
code taken out of context doesn't tell much. We don't know why the OP was 
concerned, but that doesn't mean that his concern isn't valid.

-- 
Tom Marchant

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-17 Thread Charles Mills
I develop vendor code. Customers always ask about CPU time. If I answered oh 
we don't worry about that anymore do you think they would buy? Do you think I 
would have a job?

Charles
Composed on a mobile: please excuse my brevity 

Ted MacNEIL eamacn...@yahoo.ca wrote:

On a 600 MIPS single engine (z/990 class) 1,000,000 instructions is 0.17% of a 
CP. These days?

-
-teD
-
  Original Message  
From: John McKown
Sent: Monday, February 17, 2014 13:15
To: IBM-MAIN@LISTSERV.UA.EDU
Reply To: IBM Mainframe Discussion List
Subject: Re: Branch (was: Performance question - adding)

On Mon, Feb 17, 2014 at 12:03 PM, Paul Gilmartin paulgboul...@aim.comwrote:

 On 2014-02-17, at 10:36, Ted MacNEIL wrote:

  I have to ask: Why they big concern over a few instructions?
  Optimisation of a few is not worth the
 effort these days.
 
 Hmmm... No single instruction is worth optimizing.

 No single instruction among a million is worth optimizing.

 It's not worth optimising a million instructions because
 that would imply optimizing each, which is not worth it.

 E.E. asked whether the code is in a loop.

 -- gil


I guess that I ASSuMEd that the code was in a heavily used loop. If you
remove 1 instruction from a loop which is executed a million times,
assuming the instruction is expensive, then it may well be worth the
effort. Or maybe even replacing it with two simpler instructions (such as
my thought on using IPM and SPM with an STOC instead of a JZ and ST).



-- 
Wasn't there something about a PASCAL programmer knowing the value of
everything and the Wirth of nothing?

Maranatha! 
John McKown

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-17 Thread Ed Finnell
Benchmarks, features, tuning knobs, performance bonds all factor in to the  
mix. The ones that scare me are the 'theoretically we can run some 
gazillion  transactions on a mainframe'!
 
 
In a message dated 2/17/2014 2:18:47 P.M. Central Standard Time,  
charl...@mcn.org writes:

I  develop vendor code. Customers always ask about CPU time. If I 
answered oh  we don't worry about that anymore do you think they would buy? 
Do 
you think I  would have a job?



--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-17 Thread John McKown
On Mon, Feb 17, 2014 at 11:36 AM, Ted MacNEIL eamacn...@yahoo.ca wrote:

 I have to ask: Why they big concern over a few instructions?
Optimisation of a few is not worth the
 effort these days.


 -
 -teD
 -


OK, this then causes me to wonder why IBM has bothered to create
instructions such as Load On Condition and Store On Condition. The
manual in the STOC says:

quote
STORE ON CONDITION provides a function
similar to that of a separate BRANCH ON CONDITION
instruction followed by a STORE instruction,
except that STORE ON CONDITION does
not provide an index register. For example, the
following two instruction sequences are equivalent.

STOCG 15,256(7),8   BC   7,SKIP
STG  15,256(7)
   SKIP DS   0H


On models that implement predictive branching,
the combination of the BRANCH ON CONDITION
and STORE instructions may perform
somewhat better than the STORE ON CONDITION
instruction when the CPU is able to successfully
predict the branch condition. However,
on models where the CPU is not able to successfully
predict the branch condition, such as when
the condition is more random, the STORE ON
CONDITION instruction may provide significant
performance improvement.
/quote

The above makes me wonder if my example of using the BPRP (does anyone else
read that as burper?) instruction, since I _know_ at that point that the
branch _will be_ taken should be used instead of the STOC.




-- 
Wasn't there something about a PASCAL programmer knowing the value of
everything and the Wirth of nothing?

Maranatha! 
John McKown

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-17 Thread Tony Harminc
On 17 February 2014 09:37, John McKown john.archie.mck...@gmail.com wrote:
 LT  R0,CURRENT #LOAD CURRENT AND SET CC
 SPM R1 #SAVE CC FROM LT
 A R0,SUM #ADD SUM TO IT
 IPM R1 #RESTORE CC FROM LT
 STOC R0,SUM,NZ #STORE SUM ONLY IF CC OF LT WAS NZ

 Basically this loads CURRENT into R0, setting the CC based on its value.
 Then saves the CC in R1. Adds the SUM value into R0. Restores the CC from
 the LT, because the Add destroyed it. Then only stores the result in SUM if
 the CC is Not Zero, as set by the LT.

Not that it affects your proposal, but I think your SPM and IPM are
reversed there...

It's perhaps interesting that IPM appeared only in 370/XA; on 24-bit
systems BALR was expected to do.

Tony H.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Branch (was: Performance question - adding)

2014-02-17 Thread Ed Jaffe

On 2014-02-17, at 10:36, Ted MacNEIL wrote:

I have to ask: Why they big concern over a few instructions?
Optimisation of a few is not worth the effort 
these days.


LOL. If Binyamin's question wasn't worth asking, then IBM would never 
have recently introduced the STOC instruction that John McKown so kindly 
reminded us about. (Wish we could use instructions like that in other 
than our JIT-compiled Java code...)


If most simple instructions run in roughly one cycle (with the wind at 
their back i.e., no interlocks or other delays), an L1 memory access is 
zero cycles, and an uncached memory access is roughly 1000 cycles, then 
it makes perfect sense for professional programmers to want to 
understand how they might avoid having even a small code fragment run 
three orders of magnitude slower. Add enough of them up and it can make 
a big difference.


If professionals never ask questions, they'll never know the answers 
when they need them.


--
Edward E Jaffe
Phoenix Software International, Inc
831 Parkview Drive North
El Segundo, CA 90245
http://www.phoenixsoftware.com/

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN