I'm no assembler expert, but how about
L R15,COUNTER
AFI R15,1
STR15,COUNTER
> Date: Fri, 22 Jan 2016 14:14:49 -0500
> From: i...@panix.com
> Subject: Re: Is there a source for detailed, instruction-level performance
> info?
> To: IBM-MAIN@LISTSERV.UA.EDU
>
> In
Or dare I suggest:
ASI COUNTER,1
> Date: Fri, 22 Jan 2016 14:30:02 -0700
> From: frank.swarbr...@outlook.com
> Subject: Re: Is there a source for detailed, instruction-level performance
> info?
> To: IBM-MAIN@LISTSERV.UA.EDU
>
> I'm no assembler expert, but how about
>
, instruction-level performance
info?
Or dare I suggest:
ASI COUNTER,1
--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
In article <000401d140df$6f05a2a0$4d10e7e0$@att.net> Skip wrote:
> As a newbie, I got curious about the relative speed of these strategies:
>
> 1. L R15, COUNTER
> 2. A R15,=F(+1)
> 3. ST R15, COUNTER
>
> 1. L R15, COUNTER
> 2. LA R15,1(,R15)
> 3. ST R15, COUNTER
>
> I asked my manager, who
Works on mine! :-)
> Date: Fri, 22 Jan 2016 13:33:49 -0800
> From: charl...@mcn.org
> Subject: Re: Is there a source for detailed, instruction-level performance
> info?
> To: IBM-MAIN@LISTSERV.UA.EDU
>
> Depending on the newness of your hardware.
>
> Charles
On 4 Jan 2016 07:26:22 -0800, in bit.listserv.ibm-main you wrote:
>Jerry Callen wrote:
>> I'm really looking to make this core as fast as possible.
>Mike Colishaw tells some funny stories about optimization initiatives on
>software projects. Is it certain that the place
>you are optimizing is
Jack J. Woehr wrote:
> Not sure how relevant that this is to mainframe programming, but years ago
> when I designed and executed with a team of nine a data-heavy server in
> Unix optimized for multiple cores, what we found was that reroutable queuing
> of data from one simplistic processing
Jerry Callen wrote:
Jack J. Woehr wrote:
Not sure how relevant that this is to mainframe programming, but years ago
when I designed and executed with a team of nine a data-heavy server in
Unix optimized for multiple cores, what we found was that reroutable queuing
of data from one simplistic
Jerry Callen wrote:
I'm really looking to make this core as fast as possible.
Mike Colishaw tells some funny stories about optimization initiatives on software projects. Is it certain that the place
you are optimizing is your real bottleneck.
--
Jack J. Woehr # Science is more than a body
On 01/04/2016 09:23 AM, Jack J. Woehr wrote:
> Jerry Callen wrote:
>> Jack J. Woehr wrote:
>>
>>> Not sure how relevant that this is to mainframe programming, but
>>> years ago
>>> when I designed and executed with a team of nine a data-heavy server in
>>> Unix optimized for multiple cores, what
On Mon, 28 Dec 2015 11:02:15 -0600, Jerry Callen wrote:
>I'm not really after detailed timing. I'm looking for implementation details
>of the same sort used by compiler writers to guide selection of instruction
>sequences, where the guiding principle is, "How can I avoid
On Wed, 30 Dec 2015 17:08:26 -0600, Jerry Callen wrote:
>Bob Rogers (no longer at IBM...)
Bob rejoined IBM a while back, working in z/VM.
Alan Altmark
IBM
--
For IBM-MAIN subscribe / signoff / archive
Alan Altmark wrote:
On Mon, 28 Dec 2015 11:02:15 -0600, Jerry Callen wrote:
"How can I avoid pipeline stalls?"
Not sure how relevant that this is to mainframe programming, but years ago when
I designed and executed
with a team of nine a data-heavy server in Unix
On 30 December 2015 at 18:42, Charles Mills wrote:
>On 30 December 2015 at 18:08, Jerry Callen wrote:
>> How about it, IBM? Surely there must be someone Poughkeepsie who wants to
>> visit San Antonio in
>> March? :-)
> Or possibly in Toronto!
I'll have
On 30 December 2015 at 17:10, Charles Mills wrote:
> I would assume there is some sort of a compiler/hardware architecture
> liaison group within IBM. I would bet that if someone from that group were
> to put together a SHARE presentation called "Write Machine Code Like a
>
Charles Mills wrote:
> I would assume there is some sort of a compiler/hardware architecture
> liaison group within IBM. I would bet that if someone from that group were
> to put together a SHARE presentation called "Write Machine Code Like a
> Compiler -- How to Write the Fastest Code Possible
Or possibly in Toronto!
Charles
-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf
Of Jerry Callen
Sent: Wednesday, December 30, 2015 3:08 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Is there a source for detailed, instruction-level
tever)" that it would be a big hit.
Charles
-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of Jim Mulder
Sent: Monday, December 28, 2015 9:57 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Is there a source for detailed, instruction-level p
As Tom has noted, the most dramatic performance enhancements typically
come from a change in strategy or algorithm used. In my experience you
get better results by looking for ways to accomplish the end result by
having the program do fewer actions rather than concentrating on
micro-optimzing
In <3361710c8fdd49d9a5ba76a61a993...@su806104.ad.ing.net>, on
12/28/2015
at 06:15 AM, "Windt, W.K.F. van der (Fred)"
said:
>And on newer machines (with the general-instructions-extension) you
>can simply do:
> ASI COUNTER,1
>I assume it is faster
Back from Christmas break to a lot of responses - thanks to all. Summarizing
the responses thus far:
* Select and/or tune your algorithm first.
Check.
* Tuning for a specific machine is a bad idea because you'll have to retune for
every new machine.
Check. This code is sufficiently critical
> An example: which of these code sequences do you suppose runs faster?
>
> la rX,0(rI,rBase) rX -> array[i]
> lg rY,0(,rX)rY = array[i]
> agsi 0(rX),1++array[i]
> * Now do something with rY
>
> vs:
> lg rY,0(rI,rBase) rY =
nc
> Sent: Sunday, December 27, 2015 12:14 PM
> To: IBM-MAIN@LISTSERV.UA.EDU
> Subject: [Bulk] Re: Is there a source for detailed, instruction-level
> performance
> info?
>
> On 27 December 2015 at 14:47, Skip Robinson <jo.skip.robin...@att.net>
> wrote:
> >
On Sun, 27 Dec 2015 13:21:23 -0800, Anne & Lynn Wheeler wrote:
>later, newer memory for 370/168 was less expensive ... and started to
>see four mbytes as much more common ... aka four mbytes as 370/165 would
>have met that typical MVT customer could have gotten 16 regions ... w/o
>having to
shmuel+ibm-m...@patriot.net (Shmuel Metz , Seymour J.) writes:
> We ran more than that, plus TSO, on a 2 MiB machine.
IBM executives were looking at 370/165 ... where typical customer had
1mbyte ... in part because 165 real memory was very expensive ... and
typical regions were such that they
In <87bn9fwuo0@garlic.com>, on 12/24/2015
at 10:47 AM, Anne & Lynn Wheeler said:
>As a result, a typical 1mbyte 370/165 would only have four regions.
We ran more than that, plus TSO, on a 2 MiB machine.
--
Shmuel (Seymour J.) Metz, SysProg and JOAT
ISO
In
,
on 12/24/2015
at 02:31 PM, Mike Schwab said:
>https://en.wikipedia.org/wiki/IBM_7030_Stretch
>First computer to implement: Multiprogramming, memory protection,
>generalized interrupts, the
In <013c01d13e5a$89687c80$9c397580$@mcn.org>, on 12/24/2015
at 06:51 AM, Charles Mills said:
>This is true so much that the z13 processors implement a kind of
>"internal multiprogramming"
IBM calls it Simultaneous Multi-threading, except in PoOps where it is
just
In , on 12/25/2015
at 01:23 AM, "Robert A. Rosenberg" said:
>This story (and the others) reminds me of an incident that occurred
>early in my programming life.
The clasic example is Multics. Early on they redesigned the file
system,
In <567b4a30.8050...@yahoo.com>, on 12/23/2015
at 08:28 PM, Thomas Kern
<0041d919e708-dmarc-requ...@listserv.ua.edu> said:
>Perhaps what might be useful would be an assembler program to run
>loops of individual instructions and output some timing
>information.
That would work on a
On 27 December 2015 at 14:47, Skip Robinson wrote:
> As a newbie, I got curious about the relative speed of these strategies:
>
> 1. L R15, COUNTER
> 2. A R15,=F(+1)
> 3. ST R15, COUNTER
>
> 1. L R15, COUNTER
> 2. LA R15,1(,R15)
> 3. ST R15, COUNTER
>
> I asked my
A.EDU
> Subject: [Bulk] Re: Is there a source for detailed, instruction-level
performance
> info?
>
> In <567b4a30.8050...@yahoo.com>, on 12/23/2015
>at 08:28 PM, Thomas Kern
> <0041d919e708-dmarc-requ...@listserv.ua.edu> said:
>
> >Perhaps what migh
In
,
on 12/27/2015
at 03:14 PM, Tony Harminc said:
>There's a third model for this very common operation:
>LAR15,1
>A R15,COUNTER
>STR15,COUNTER
If you're that concerned about speed:
> >LAR15,1
> >A R15,COUNTER
> >STR15,COUNTER
>
> If you're that concerned about speed:
>
> LAR11,1
> LOOP GET foo
> logic to determine type
> L Rl,COUNTER
> ARR1,R11
> STR1,COUNTER
> B LOOP
And on
mike.a.sch...@gmail.com (Mike Schwab) writes:
> If branch predicting is a big hang up, the obvious solution is to
> start processing all possible outcomes then keep the one that is
> actually taken. I. E. B OUTCOME(R15) where R15 is a return code of
> 0,4,8,12,16.
aka, speculative execution ...
, instruction-level performance
info?
Interesting article. Do you have a link to the article it appears to be a
response to?
--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu
On 12/24/2015 12:52 PM, Tom Brennan wrote:
> Farley, Peter x23353 wrote:
>> So what is an ordinary programmer to do?
>
> Years ago I guess I had nothing to do so I wrote a program that hooked
> into various LINK/LOAD SVC's and recorded the load module name (like
> Isogon and TADz do). That huge
to a sequential
search.
These simple coding techniques can also reduce CPU time.
--- jcew...@acm.org wrote:
From: "Joel C. Ewing" <jcew...@acm.org>
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Is there a source for detailed, instruction-level performance info?
Date:
At 15:53 -0600 on 12/24/2015, Joel C. Ewing wrote about Re: Is there
a source for detailed, instruction-level perfo:
As Tom has noted, the most dramatic performance enhancements typically
come from a change in strategy or algorithm used. In my experience you
get better results by looking for
Interesting article. Do you have a link to the article it appears to be a
response to?
> Date: Thu, 24 Dec 2015 14:42:19 -0500
> From: charl...@mcn.org
> Subject: Re: Is there a source for detailed, instruction-level performance
> info?
> To: IBM-MAIN@LISTSERV.UA.EDU
>
>
rpin...@netscape.com (Richard Pinion) writes:
> Don't use zoned decimal for subscripts or counters, rather use indexes
> for subscripts and binary for counter type variables. And when using
> conditional branching, try to code so as to make the branch the
> exception rather than the rule. For
On Dec 24, 2015, at 4:06 PM, Richard Pinion wrote:
Don't use zoned decimal for subscripts or counters, rather use
indexes for
subscripts and binary for counter type variables. And when using
conditional
branching, try to code so as to make the branch the exception
rather than the
rule.
Farley, Peter x23353 wrote:
So what is an ordinary programmer to do?
Years ago I guess I had nothing to do so I wrote a program that hooked
into various LINK/LOAD SVC's and recorded the load module name (like
Isogon and TADz do). That huge pile of data ended up on a tape and I
wrote some
On 23 December 2015 at 10:46, Jerry Callen wrote:
> I'm in the process of hand-tuning a small, performance critical algorithm on
> a Z13, and I'm hampered by the lack of detailed information on the
> instruction-level performance of the machine.
Just to add two thoughts to
> -Original Message-
> From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU]
> On Behalf Of Tom Brennan
> Sent: Thursday, December 24, 2015 10:52 AM
> To: IBM-MAIN@LISTSERV.UA.EDU
> Subject: [Bulk] Re: Is there a source for detailed, instruction-level
ennan
Sent: Thursday, December 24, 2015 1:52 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Is there a source for detailed, instruction-level performance
info?
Farley, Peter x23353 wrote:
> So what is an ordinary programmer to do?
Years ago I guess I had nothing to do so I wrote a progra
ject: Re: Is there a source for detailed, instruction-level performance
info?
so such accounting measuring CPU time (elapsed instruction time) is
analogous to early accounting which measured by elapsed wall clock time.
cache miss/memory access latency ... when measured in count of processor
cy
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Is there a source for detailed, instruction-level performance
info?
This is in no way a personal comment on Tom's experience.
'What a programmer is supposed to do' is avoid stupid code. We were once
tasked with finding the bottleneck in a fairly mundane
: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Is there a source for detailed, instruction-level performance
info?
Farley, Peter x23353 wrote:
So what is an ordinary programmer to do?
Years ago I guess I had nothing to do so I wrote a program that hooked into
various LINK/LOAD SVC's and recorded the load module
On Thu, Dec 24, 2015 at 12:47 PM, Anne & Lynn Wheeler wrote:
>
> risc has been doing cache miss compensation for decades, out-of-order
> execution, branch prediction, speculative execution, hyperthreading ...
> can be viewed as hardware analogy to 60s multitasking ... given the
Of Blaicher, Christopher Y.
Sent: Thursday, December 24, 2015 12:22 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Is there a source for detailed, instruction-level performance info?
I have looked at the public documentation on the z13 and had the privilege to
speak to some of the people behind parts
-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf
Of Charles Mills
Sent: Thursday, December 24, 2015 9:51 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Is there a source for detailed, instruction-level performance info?
Not so simple anymore.
"How
On 12/23/2015 7:46 AM, Jerry Callen wrote:
I'm in the process of hand-tuning a small, performance critical algorithm on a Z13, and
I'm hampered by the lack of detailed information on the instruction-level performance of
the machine. Back in the day, IBM used to publish a "Functional
charl...@mcn.org (Charles Mills) writes:
> Not so simple anymore.
>
> "How long does a store halfword take?" used to be a question that had an
> answer. It no longer does.
>
> My working rule of thumb (admittedly grossly oversimplified) is
> "instructions take no time, storage references take
ch less processor "wait" (for
instruction and data fetch, not ECB type wait) time.
Charles
-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of Thomas Kern
Sent: Wednesday, December 23, 2015 5:28 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Su
BTW - I applaud IBM's provision of the non-privileged ECAG instruction for
obtaining cache characteristics. Here's the output of a little program I wrote
to format the information it provides (on a Z13):
level 0: private
data: line size=256, set associativity=8, total size=128K
instruction:
Perhaps what might be useful would be an assembler program to run loops
of individual instructions and output some timing information.
/Tom
On 12/23/2015 11:20, Shmuel Metz (Seymour J.) wrote:
In <8970116796168447.wa.jcallennarsil@listserv.ua.edu>, on
12/23/2015
at 09:46 AM, Jerry
57 matches
Mail list logo