Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-24 Thread Shmuel Metz (Seymour J.)
In 4fbd69b6.5080...@t-online.de, on 05/24/2012
   at 12:50 AM, Bernd Oppolzer bernd.oppol...@t-online.de said:

This limit is too high, in our opinion, because some tests showed,

On what processor? I assume that IBM wants to optimize for a196 and
later.
 
-- 
 Shmuel (Seymour J.) Metz, SysProg and JOAT
 ISO position; see http://patriot.net/~shmuel/resume/brief.html 
We don't care. We don't have to care, we're Congress.
(S877: The Shut up and Eat Your spam act of 2003)

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-24 Thread Bernd Oppolzer

The test runs were on a z196 with current z/OS release.

I didn't do the tests myself, only was told about the results,
but the co-worker normally is very reliable.

Regards

Bernd



Am 24.05.2012 12:54, schrieb Shmuel Metz (Seymour J.):

In4fbd69b6.5080...@t-online.de, on 05/24/2012
at 12:50 AM, Bernd Oppolzerbernd.oppol...@t-online.de  said:


This limit is too high, in our opinion, because some tests showed,

On what processor? I assume that IBM wants to optimize for a196 and
later.



--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-23 Thread Bernd Oppolzer

Update to this CALL PLIMOVE generates loop of MVCs topic:

we did some further research and found out, that for CALL PLIMOVE with
known length at compile time, even with EP PL/1 V3.9, there is a code
sequence generated which uses MVCL, if the length is greater or equal
16384. For lengths below 16384, PL/1 generates a loop of MVCs.

This limit is too high, in our opinion, because some tests showed,
that MVCL is faster than MVCs starting from a length of ca. 768 bytes,
that is 3 MVCs. 16384 needs 64 MVCs (and maybe loop control instructions).

We will ask IBM to change this.

Furthermore, I think that MVCL should not only be generated with
CALL PLIMOVE, but also with normal assignments, if the length is
in the same range. But we did no research on this so far; our focus
is on CALL PLIMOVE at the moment.

Kind regards

Bernd



Am 17.05.2012 15:42, schrieb Robert AH Prins:
But here we have a simple instruction of the HLL (PLIMOVE) which I 
expect

to be implemented using the best instructions the machine provides. If
this turns out not to be the case, this is IMHO simply a bug, not only
a flaw of the optimizer. The programmer already did some kind of 
optimization
him- or herself, when he or she decided to use PLIMOVE. He or she may 
well
expect that the compiler generates the best available machine 
instruction

for this HLL instruction.



You hit the nail right on the head!
But I do remember that there was a APAR that explains
why the MVCL was removed again.
I can't point you to it as the link to the PL/I APARs has gone 404.

...

Robert


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-18 Thread Robert AH Prins

On 2012-05-17 11:42, Robert AH Prins wrote:

On 2012-05-17 11:14, Bernd Oppolzer wrote:

I would like to add:

with the previous compiler, CALL PLIMOVE enabled us to force the generation of
MVCL.

Using, for example

CALL PLIMOVE (ADDR (target), ADDR (source), length);

the compiler generated MVCL,

but coding

target = source;

(if applicable), or BY-NAME-assignments, the compile generated MVCs etc.

Now, with V3.9, the compiler generates the same in the two cases,
that is MVCs or MVC loops, so we have no possibility to force the
generation of MVCL. AFAIK, my co-workers didn't play with the ARCH
options, so far. TUNE is TUNE(2), again AFAIK.


The TUNE option has been removed from the V4.1 anyway.


I have other projects at the moment, so I had not much time so far to
investigate this. But remember: the problem showed up by a Strobe Report,
so it seems to be significant.

But: if PLIMOVE does no better than a simple assignment, using PLIMOVE
seems to make no sense to me.

In a certain way, this problem is somewhat different from the first
problem in this thread.

Robert complained about the optimizer doing a bad job, that is: some
instructions are generated that are useless, and others are questionable.

But here we have a simple instruction of the HLL (PLIMOVE) which I expect
to be implemented using the best instructions the machine provides. If
this turns out not to be the case, this is IMHO simply a bug, not only
a flaw of the optimizer. The programmer already did some kind of optimization
him- or herself, when he or she decided to use PLIMOVE. He or she may well
expect that the compiler generates the best available machine instruction
for this HLL instruction.


You hit the nail right on the head! But I do remember that there was a APAR that
explains why the MVCL was removed again. I can't point you to it as the link to
the PL/I APARs has gone 404.


It's back and I have to correct myself, I've only managed to find a APAR telling 
that the generation MVCLE 's was removed 
http://www-01.ibm.com/support/docview.wss?rs=619context=SSY2V3q1=%2b5655H3100+%2br370uid=swg1PK79325loc=en_UScs=utf-8cc=uslang=all


Robert
--
Robert AH Prins
robert(a)prino(d)org

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-18 Thread Mike Schwab
On Fri, May 18, 2012 at 3:54 AM, Robert AH Prins
robert.ah.pr...@gmail.com wrote:
 On 2012-05-17 11:42, Robert AH Prins wrote:

 On 2012-05-17 11:14, Bernd Oppolzer wrote:

 I would like to add:

 with the previous compiler, CALL PLIMOVE enabled us to force the
 generation of
 MVCL.

 Using, for example

 CALL PLIMOVE (ADDR (target), ADDR (source), length);

 the compiler generated MVCL,

 but coding

 target = source;

 (if applicable), or BY-NAME-assignments, the compile generated MVCs etc.

 Now, with V3.9, the compiler generates the same in the two cases,
 that is MVCs or MVC loops, so we have no possibility to force the
 generation of MVCL. AFAIK, my co-workers didn't play with the ARCH
 options, so far. TUNE is TUNE(2), again AFAIK.


 The TUNE option has been removed from the V4.1 anyway.

 I have other projects at the moment, so I had not much time so far to
 investigate this. But remember: the problem showed up by a Strobe Report,
 so it seems to be significant.

 But: if PLIMOVE does no better than a simple assignment, using PLIMOVE
 seems to make no sense to me.

 In a certain way, this problem is somewhat different from the first
 problem in this thread.

 Robert complained about the optimizer doing a bad job, that is: some
 instructions are generated that are useless, and others are questionable.

 But here we have a simple instruction of the HLL (PLIMOVE) which I expect
 to be implemented using the best instructions the machine provides. If
 this turns out not to be the case, this is IMHO simply a bug, not only
 a flaw of the optimizer. The programmer already did some kind of
 optimization
 him- or herself, when he or she decided to use PLIMOVE. He or she may
 well
 expect that the compiler generates the best available machine instruction
 for this HLL instruction.


 You hit the nail right on the head! But I do remember that there was a
 APAR that
 explains why the MVCL was removed again. I can't point you to it as the
 link to
 the PL/I APARs has gone 404.


 It's back and I have to correct myself, I've only managed to find a APAR
 telling that the generation MVCLE 's was removed
 http://www-01.ibm.com/support/docview.wss?rs=619context=SSY2V3q1=%2b5655H3100+%2br370uid=swg1PK79325loc=en_UScs=utf-8cc=uslang=all

 Robert
 --
 Robert AH Prins
 robert(a)prino(d)org

From the APAR, dated 2009-02-02:
Problem summary

* USERS AFFECTED: Enterprise PL/I users with code that has *
* assignments to NONVARYING CHAR strings where *
* the source and/or the target has a length*
* not known at compile time*

* PROBLEM DESCRIPTION: The 3.6 and later releases of   *
*  Enterprise PL/I generated a MVCLE   *
*  instruction to perform such *
*  assignments. Unfortunately, while this  *
*  led to a much shorter set of*
*  instructions, it also led to*
*  significantly worse performance.*

* RECOMMENDATION:  *

The compiler has been changed so that it no longer generates
MVLCE's

-- 
Mike A Schwab, Springfield IL USA
Where do Forest Rangers go to get away from it all?

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-17 Thread David Crayford

On 17/05/2012 2:06 AM, Tom Marchant wrote:

On Tue, 15 May 2012 20:07:52 +, Robert Prins wrote:


maybe a 16-byte
three-instruction sequence like

003FC0  E310  DF10  0158  003120 | LY   r1,a1:d7952:l4(,r13,7952)
003FC6  E300  1047  0015  003120 | LGH  r0,_shadow20(,r1,71)
003FCC  4000  E064003120 | STH  r0,_shadow20(,r14,100)

is really faster than the simple 6-byte one-instruction sequence

0026D4  D2 01 7 064 6 047  MVC   REPT_LINE.DATE.MONTH(2),REPT_LIST.DATE.MONTH

Not likely.

Address Generation Interlock (AGI) will cause the second instruction
to stall until the address is available in R1.


Tom,

I'm not sure if that's still true on z10/z196 processors which implement 
AGI bypass.


http://www.ibmsystemsmag.com/CMSTemplates/IBMSystemsMag/Print.aspx?path=/mainframe/administrator/performance/cpu_pipeline

Apparently the worst case scenario is a load in 1 cycle. Load address 
has been mitigated.



In addition, instruction cracking will, under some circumstances, cause
a z196 processor to execute a load and a store when a MVC instruction
is executed.



--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-17 Thread Bernd Oppolzer
I would like to add: 

with the previous compiler, CALL PLIMOVE enabled us to force the generation of 
MVCL. 

Using, for example 

   CALL PLIMOVE (ADDR (target), ADDR (source), length); 

the compiler generated MVCL, 

but coding 

   target = source;   

(if applicable), or BY-NAME-assignments, the compile generated MVCs etc.

Now, with V3.9, the compiler generates the same in the two cases, 
that is MVCs or MVC loops, so we have no possibility to force the 
generation of MVCL. AFAIK, my co-workers didn't play with the ARCH 
options, so far. TUNE is TUNE(2), again AFAIK. 

I have other projects at the moment, so I had not much time so far to 
investigate this. But remember: the problem showed up by a Strobe Report, 
so it seems to be significant. 

But: if PLIMOVE does no better than a simple assignment, using PLIMOVE
seems to make no sense to me. 

In a certain way, this problem is somewhat different from the first 
problem in this thread. 

Robert complained about the optimizer doing a bad job, that is: some 
instructions are generated that are useless, and others are questionable. 

But here we have a simple instruction of the HLL (PLIMOVE) which I expect
to be implemented using the best instructions the machine provides. If 
this turns out not to be the case, this is IMHO simply a bug, not only 
a flaw of the optimizer. The programmer already did some kind of optimization 
him- or herself, when he or she decided to use PLIMOVE. He or she may well 
expect that the compiler generates the best available machine instruction 
for this HLL instruction. 

Kind regards

Bernd




Am Mittwoch, 16. Mai 2012 21:41 schrieb Bernd Oppolzer:
 First, I would like to thank you for starting this thread.

 I posted it to the performance people of my customer, and they told me,
 that they just found a similar problem with EP PL/1 3.9, that is: the
 PLIMOVE calls
 don't generate MVCLs any more, as in previous releases, but series of MVCs
 and loops. Even when the length of PLIMOVE is - for example - 8000 bytes.
 They discovered it, because one of the PLIMOVE locations showed up in a
 Strobe report.

 I asked them to test using a ASSEMBLER program, if the MVC loop is faster,
 but they told me, that even with lengths around 500 or 600, the MVCL
 solution is faster
 - this is on a z196. I have still to confirm this.

 If this turns out to be true, this sounds like a bug, and we will try to
 convince IBM to go back to
 the previous solution. If we compile our modules during normal service
 using EP PL/1 3.9,
 our system will get slower and slower, because PLIMOVE is widely used.
 This is not acceptable.

 Because the PLIMOVEs are generated by a site-specific macro called PLICOPY,
 I already thought about calling a short ASSEMBLER routine (with minimal
 linkage conventions)
 doing the transfer using MVCL instead of CALL PLIMOVE. The applications
 need not
 to be changed, because the PLICOPY syntax stays the same. Maybe this
 could still be
 faster than doing the MVC loop.

 Kind regards

 Bernd


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-17 Thread Robert AH Prins

On 2012-05-17 11:14, Bernd Oppolzer wrote:

I would like to add:

with the previous compiler, CALL PLIMOVE enabled us to force the generation of
MVCL.

Using, for example

CALL PLIMOVE (ADDR (target), ADDR (source), length);

the compiler generated MVCL,

but coding

target = source;

(if applicable), or BY-NAME-assignments, the compile generated MVCs etc.

Now, with V3.9, the compiler generates the same in the two cases,
that is MVCs or MVC loops, so we have no possibility to force the
generation of MVCL. AFAIK, my co-workers didn't play with the ARCH
options, so far. TUNE is TUNE(2), again AFAIK.


The TUNE option has been removed from the V4.1 anyway.


I have other projects at the moment, so I had not much time so far to
investigate this. But remember: the problem showed up by a Strobe Report,
so it seems to be significant.

But: if PLIMOVE does no better than a simple assignment, using PLIMOVE
seems to make no sense to me.

In a certain way, this problem is somewhat different from the first
problem in this thread.

Robert complained about the optimizer doing a bad job, that is: some
instructions are generated that are useless, and others are questionable.

But here we have a simple instruction of the HLL (PLIMOVE) which I expect
to be implemented using the best instructions the machine provides. If
this turns out not to be the case, this is IMHO simply a bug, not only
a flaw of the optimizer. The programmer already did some kind of optimization
him- or herself, when he or she decided to use PLIMOVE. He or she may well
expect that the compiler generates the best available machine instruction
for this HLL instruction.


You hit the nail right on the head! But I do remember that there was a APAR that 
explains why the MVCL was removed again. I can't point you to it as the link to 
the PL/I APARs has gone 404.



Finally a note to those following this thread, due to the closure of the gateway 
between 'bit.listserv.ibm-main' and the list, it is now available in two 
diverging versions, one here on the list, the other one on news://comp.lang.pl1 
very regrettable.


Robert
--
Robert AH Prins
robert(a)prino(d)org

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-16 Thread Elardus Engelbrecht
Robert Prins wrote:

Can anyone skilled in the art tell me why a compiler that probably dates back 
to the late 1970'ies or early 1980'ies generates the following short and sweet 
code for a PL/I BY NAME assignment, while the not completely new (but still 
fairly recent) version of Enterprise PL/I (V3R9) generates the very, very, 
very long-winded code below it?

I'm not skilled in this art, but is your Enterprise PL/I (v3r9) also using 
Language Environment or not? 


Then again, I always thought that the fastest instructions are those ones that 
are never executed...

Those instructions don't need to be optimized... :-)

Groete / Greetings
Elardus Engelbrecht

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-16 Thread David Crayford

Robert,

I'm no expert but I have read that newer hardware models (Z10 and above) 
are essentially RISC processors that run complex instructions in 
millicode. In the
case of a MVC instruction it would have to do that in a loop which would 
require branching, the enemy of pipelined exeuction units. It's also 
possible to run simple instructions
in parallel. It's plausible an MVC instruction can be executed more 
efficiently as a sequence of LG/STG instructions.
The OOO decode units do this for you with instruction cracking on a 
z196, it seems that on a  z10 the optimizer is doing the same thing.


See this document - page 21 
http://www-01.ibm.com/software/htp/tpf/tpfug/tgf11/How_do_you_do_when_youre_a_z196_CPU.pdf


Optimizers create arcane code. It's almost impossible to verify without 
understanding the secret sauce. A lot of the code the optimizers spit 
out is intractable,

and it's almost a paradox that a longer code path produces faster code.

If you don't like it you can always compile at a different ARCH() level 
and ask IBM.


On 16/05/2012 4:07 AM, Robert Prins wrote:

Can anyone skilled in the art tell me why a compiler that probably
dates back to the late 1970'ies or early 1980'ies generates the
following short and sweet code for a PL/I BY NAME assignment, while
the not completely new (but still fairly recent) version of Enterprise
PL/I (V3R9) generates the very, very, very long-winded code below it?
Or is this (V3R9) code (that predates the OOO z196 architecture)
really faster?

OS PL/I V2.3.0 - OPT(2)
  343   1  2  REPT_LINE= REPT_LIST, BY NAME;

* STATEMENT NUMBER  343
002664  58 70 8 268L 7,REPT_WORK.LINE_PTR
002668  58 60 8 030L 6,REPT_WORK.REPT_PTR
00266C  58 F0 3 600L 15,1536(0,3)
002670  D2 03 7 003 F B54  MVC   REPT_LINE.TR(4),2900(15)
002676  DE 03 7 003 6 00C  EDREPT_LINE.TR(4),REPT_LIST.TR
00267C  D2 03 7 00A F B54  MVC   REPT_LINE.RE(4),2900(15)
002682  DE 03 7 00A 6 00E  EDREPT_LINE.RI(4),REPT_LIST.RI
002688  D2 02 7 011 6 010  MVC   REPT_LINE.DA(3),REPT_LIST.DA
00268E  58 E0 3 608L 14,1544(0,3)
002692  D2 06 4 158 E 5D4  MVC   344(7,4),1492(14)
002698  DE 06 4 158 6 014  ED344(7,4),REPT_LIST.K+1
00269E  D2 05 7 017 4 159  MVC   REPT_LINE.K(6),345(4)
0026A4  D2 06 4 158 E 5D4  MVC   344(7,4),1492(14)
0026AA  DE 06 4 158 6 01B  ED344(7,4),REPT_LIST.V
0026B0  D2 04 7 028 4 15A  MVC   REPT_LINE.V(5),346(4)
0026B6  D2 03 7 030 6 026  MVC   REPT_LINE.NA(4),REPT_LIST.NA
0026BC  D2 03 7 036 6 02A  MVC   REPT_LINE.TY(4),REPT_LIST.TY
0026C2  D2 03 7 03D 6 02E  MVC   REPT_LINE.CO(4),REPT_LIST.CO
0026C8  D2 00 7 04B 6 036  MVC   REPT_LINE.SP(1),REPT_LIST.SP
0026CE  D2 03 7 05F 6 043  MVC   REPT_LINE.DATE.YEAR(4),REPT_LIST.DATE.YEAR
0026D4  D2 01 7 064 6 047  MVC   REPT_LINE.DATE.MONTH(2),REPT_LIST.DATE.MONTH
0026DA  D2 01 7 067 6 049  MVC   REPT_LINE.DATE.DAY(2),REPT_LIST.DATE.DAY

Enterprise PL/I for z/OS   V3.R9.M0 (Built:20100923) - OPT(3)
 3120.0 368  1  2   rept_line= rept_list, by name;

003E40  E350  D340  0624  003120 | STG  r5,#SPILL33(,r13,25408)
003E46  E320  D270  0624  003120 | STG  r2,#SPILL7(,r13,25200)
003E4C  E350  D8FD  0571  003120 | LAY  r5,_temp9(,r13,22781)
003E52  E300  D368  0604  003120 | LG   r0,#SPILL38(,r13,25448)
003E58  E340  D308  0624  003120 | STG  r4,#SPILL26(,r13,25352)
003E5E  E310  D4B4  0271  003119 | LAY  r1,LINE(,r13,9396)
003E64  E300  D8FC  0550  003120 | STY  r0,_temp9(,r13,22780)
003E6A  E300  D148  0214  003120 | LGF  r0,a1:d8520:l4(,r13,8520)
003E70  D278  1000  4D33  003119 | MVC  LINE(121,r1,0),REPT_INIT(r4,3379)
003E76  4110  E00C003120 | LA   r1,_shadow21(,r14,12)
003E7A  E3E0  D8FC  0571  003120 | LAY  r14,_temp9(,r13,22780)
003E80  DE03  E000  1000  003120 | ED   _temp9(4,r14,0),_shadow21(r1,0)
003E86  B914  00E0003120 | LGFR r14,r0
003E8A  E300  D368  0604  003120 | LG   r0,#SPILL38(,r13,25448)
003E90  4110  E003003120 | LA   r1,#AddressShadow(,r14,3)
003E94  41F0  E00A003120 | LA   r15,#AddressShadow(,r14,10)
003E98  D202  1001  5000  003120 | MVC  _shadow21(3,r1,1),_temp9(r5,0)
003E9E  9240  E003003120 | MVI  _shadow21(r14,3),64
003EA2  E310  DF10  0158  003120 | LY   r1,a1:d7952:l4(,r13,7952)
003EA8  E300  D984  0550  003120 | STY  r0,_temp8(,r13,22916)
003EAE  E350  D984  0571  003120 | LAY  r5,_temp8(,r13,22916)
003EB4  4120  E017003120 | LA   r2,#AddressShadow(,r14,23)
003EB8  4110  100E003120 | LA   r1,_shadow21(,r1,14)
003EBC  DE03  5000  1000  003120 | ED   _temp8(4,r5,0),_shadow21(r1,0)
003EC2  E310  D985  0571  003120 | LAY  r1,_temp8(,r13,22917)
003EC8  4140  E028003120 | LA   r4,#AddressShadow(,r14,40)
003ECC  D202  F001  1000  003120 | MVC  _shadow21(3,r15,1),_temp8(r1,0)
003ED2  9240  E00A003120 | MVI  _shadow21(r14,10),64
003ED6  E310  DF10  0158  003120 | LY   r1,a1:d7952:l4(,r13,7952)
003EDC  E3F0  D974  0571  003120 | LAY  r15,_temp19(,r13,22900)

Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-16 Thread Robert Prins

On 2012-05-16 07:26, Elardus Engelbrecht wrote:

Robert Prins wrote:


Can anyone skilled in the art tell me why a compiler that probably dates back to the late 
1970'ies or early 1980'ies generates the following short and sweet code for a PL/I 
BY NAME assignment, while the not completely new (but still fairly recent) 
version of Enterprise PL/I (V3R9) generates the very, very, very long-winded code below 
it?


I'm not skilled in this art, but is your Enterprise PL/I (v3r9) also using 
Language Environment or not?


Yes, it is.


Then again, I always thought that the fastest instructions are those ones that 
are never executed...


Those instructions don't need to be optimized... :-)


Exactly!

Robert
--
Robert AH Prins
robert(a)prino(d)org

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-16 Thread Steve Comstock

On 5/16/2012 1:26 AM, Elardus Engelbrecht wrote:

Robert Prins wrote:


Can anyone skilled in the art tell me why a compiler that probably dates
back

to the late 1970'ies or early 1980'ies generates the following short and sweet
code for a PL/I BY NAME assignment, while the not completely new (but still
fairly recent) version of Enterprise PL/I (V3R9) generates the very, very, very
long-winded code below it?


I'm not skilled in this art, but is your Enterprise PL/I (v3r9) also using 
Language Environment or not?


He has no choice on this: all the new compilers _must_ use LE.





Then again, I always thought that the fastest instructions are those ones that 
are never executed...


Those instructions don't need to be optimized... :-)

Groete / Greetings
Elardus Engelbrecht


--

Kind regards,

-Steve Comstock
The Trainer's Friend, Inc.

303-355-2752
http://www.trainersfriend.com

* To get a good Return on your Investment, first make an investment!
  + Training your people is an excellent investment

* Try our tool for calculating your Return On Investment
for training dollars at
  http://www.trainersfriend.com/ROI/roi.html

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-16 Thread Paul Gilmartin
On Wed, 16 May 2012 06:41:27 -0600, Steve Comstock wrote:

He has no choice on this: all the new compilers _must_ use LE.

Even Metal C?

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-16 Thread Lloyd Fuller
Metal C does NOT use LE.  And, of course, with HLASM you have the choice to not 
use LE or to make your program LE compatible.

Lloyd



- Original Message 
From: Paul Gilmartin paulgboul...@aim.com
To: IBM-MAIN@bama.ua.edu
Sent: Wed, May 16, 2012 9:12:24 AM
Subject: Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

On Wed, 16 May 2012 06:41:27 -0600, Steve Comstock wrote:

He has no choice on this: all the new compilers _must_ use LE.

Even Metal C?

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-16 Thread Steve Comstock

On 5/16/2012 7:11 AM, Paul Gilmartin wrote:

On Wed, 16 May 2012 06:41:27 -0600, Steve Comstock wrote:


He has no choice on this: all the new compilers _must_ use LE.


Even Metal C?

-- gil


Well, I knew someone would raise that exception. No,
Metal C does not use LE. Not sure if SP C (Systems
Programmer C) is still around and it would be an
exception too.


--

Kind regards,

-Steve Comstock
The Trainer's Friend, Inc.

303-355-2752
http://www.trainersfriend.com

* To get a good Return on your Investment, first make an investment!
  + Training your people is an excellent investment

* Try our tool for calculating your Return On Investment
for training dollars at
  http://www.trainersfriend.com/ROI/roi.html

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-16 Thread Paul Gilmartin
On Wed, 16 May 2012 07:55:48 -0600, Steve Comstock wrote:

Well, I knew someone would raise that exception. No,
Metal C does not use LE. Not sure if SP C (Systems
Programmer C) is still around and it would be an
exception too.
 
I believe it's been discussed in these fora that C and PL/I
share an optimizer/code generator.  I hope this includes
Metal C.  It's a long leap of logic, but that might weaken the
argument for LE entanglement.  Is MOVE, BY NAME
plausibly dependent on LE?

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-16 Thread Robert Prins

David,

On 2012-05-16 08:23, David Crayford wrote:
 Robert,

 I'm no expert but I have read that newer hardware models (Z10 and above) are
 essentially RISC processors that run complex instructions in millicode. In the

I may be wrong, but I think the z196 is the first OOO machine and Enterprise PL/I V3R9 pre-dates it 
by two years.


 case of a MVC instruction it would have to do that in a loop which would 
require
 branching, the enemy of pipelined exeuction units. It's also possible to run
 simple instructions
 in parallel. It's plausible an MVC instruction can be executed more 
efficiently
 as a sequence of LG/STG instructions.

Given that moves are the most executed instructions, at least on x86, (see, among many others 
www.ijpg.org/index.php/IJACSci/article/download/118/29) and I have little doubt that the same 
holds true for about any other architecture and that there is special x86 circuitry to optimize MOVS 
instructions, it would be highly surprising if IBM did not make MVC as fast as possible, millicoded 
or not.


 The OOO decode units do this for you with instruction cracking on a z196, it
 seems that on a z10 the optimizer is doing the same thing.

Possibly, but that does not explain the 10 superfluous reloads of r1.

 See this document - page 21
 
http://www-01.ibm.com/software/htp/tpf/tpfug/tgf11/How_do_you_do_when_youre_a_z196_CPU.pdf

 Optimizers create arcane code. It's almost impossible to verify without
 understanding the secret sauce. A lot of the code the optimizers spit out is
 intractable,

I don't know much about z/OS assembler, but at least I sort of managed to understand the code 
generated by the OS PL/I compiler. The code generated by Enterprise PL/I is completely unreadable, 
even some (or more than some) on this list might have trouble figuring out why it does what it does.


 and it's almost a paradox that a longer code path produces faster code.

 If you don't like it you can always compile at a different ARCH() level and 
ask
 IBM.

Going back to ARCH(5) doesn't produce anything that seems much shorter, still the ridiculous 
reloading of the same register, and oodles and oodles instructions which would run and take time on 
a definitely not-OOO CPU:


003A58  E300  8238  0014  003119 | LGF   r0,LINE_PTR(,r8,568)
003A5E  4110  E00C003119 | LAr1,_shadow21(,r14,12)
003A62  B914  00E0003119 | LGFR  r14,r0
003A66  D278  B38E  6D33  003118 | MVC   LINE(121,r11,910),REPT_INIT(r6,3379)
003A6C  E3B0  DC20  0004  003119 | LGr11,#SPILL17(,r13,3104)
003A72  50B0  D25C003119 | STr11,_temp9(,r13,604)
003A76  DE03  D25C  1000  003119 | ED_temp9(4,r13,604),_shadow21(r1,0)
003A7C  4110  E003003119 | LAr1,#AddressShadow(,r14,3)
003A80  41F0  E00A003119 | LAr15,#AddressShadow(,r14,10)
003A84  D202  1001  D25D  003119 | MVC   _shadow21(3,r1,1),_temp9(r13,605)
003A8A  9240  E003003119 | MVI   _shadow21(r14,3),64
003A8E  5810  8000003119 | L r1,REPT_PTR(,r8,0)
003A92  50B0  D2E4003119 | STr11,_temp8(,r13,740)
003A96  41B0  E017003119 | LAr11,#AddressShadow(,r14,23)
003A9A  4110  100E003119 | LAr1,_shadow21(,r1,14)
003A9E  DE03  D2E4  1000  003119 | ED_temp8(4,r13,740),_shadow21(r1,0)
003AA4  D202  F001  D2E5  003119 | MVC   _shadow21(3,r15,1),_temp8(r13,741)
003AAA  9240  E00A003119 | MVI   _shadow21(r14,10),64
003AAE  5810  8000003119 | L r1,REPT_PTR(,r8,0)
003AB2  E3F0  DB98  0004  003119 | LGr15,#SPILL0(,r13,2968)
003AB8  D202  E011  1010  003119 | MVC   _shadow21(3,r14,17),_shadow21(r1,16)
003ABE  5810  8000003119 | L r1,REPT_PTR(,r8,0)
003AC2  D206  D2D4  F4A4  003119 | MVC   _temp19(7,r13,724),' ..'(r15,1188)
003AC8  D203  D26C  1013  003119 | MVC   _temp15(4,r13,620),_shadow18(r1,19)
003ACE  4110  D26C003119 | LAr1,_temp15(,r13,620)
003AD2  D202  D24C  1001  003119 | MVC   _temp11(3,r13,588),_shadow12(r1,1)
003AD8  4110  D24C003119 | LAr1,_temp11(,r13,588)
003ADC  DE06  D2D4  1000  003119 | ED_temp19(7,r13,724),_temp11(r1,0)
003AE2  D205  B000  D2D5  003119 | MVC   _shadow21(6,r11,0),_temp19(r13,725)
003AE8  5810  8000003119 | L r1,REPT_PTR(,r8,0)
003AEC  D206  D2CC  F4A4  003119 | MVC   _temp21(7,r13,716),' ..'(r15,1188)
003AF2  D202  D249  101B  003119 | MVC   _temp18(3,r13,585),_shadow12(r1,27)
003AF8  D202  D246  D249  003119 | MVC   _temp20(3,r13,582),_temp18(r13,585)
003AFE  4110  E028003119 | LAr1,#AddressShadow(,r14,40)
003B02  E300  D246  0090  003119 | LLGC  r0,a1:d582:l1(,r13,582)
003B08  E300  3114  0080  003119 | NGr0,=X' 000F'
003B0E  41B0  D246003119 | LAr11,_temp20(,r13,582)
003B12  4200  D246003119 | STC   r0,a1:d582:l1(,r13,582)
003B16  DE06  D2CC  B000  003119 | ED_temp21(7,r13,716),_temp20(r11,0)
003B1C  D204  1000  D2CE  003119 | MVC   _shadow21(5,r1,0),_temp21(r13,718)
003B22  5810  8000003119 | L  

Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-16 Thread Robert Prins

On 2012-05-16 14:59, Paul Gilmartin wrote:

On Wed, 16 May 2012 07:55:48 -0600, Steve Comstock wrote:


Well, I knew someone would raise that exception. No,
Metal C does not use LE. Not sure if SP C (Systems
Programmer C) is still around and it would be an
exception too.


I believe it's been discussed in these fora that C and PL/I
share an optimizer/code generator.  I hope this includes
Metal C.  It's a long leap of logic, but that might weaken the
argument for LE entanglement.  Is MOVE, BY NAME
plausibly dependent on LE?


For PL/I is is most definitely not, it's just a shortcut for lazy people and I've worked at sites 
that explicitly forbade its use, considering it just as bad as a SELECT * in SQL.


Robert
--
Robert AH Prins
robert(a)prino(d)org

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-16 Thread Miklos Szigetvari
Hi

Do you have the chance to compare the speed of the two codes ?


 David,

 On 2012-05-16 08:23, David Crayford wrote:
   Robert,
  
   I'm no expert but I have read that newer hardware models (Z10 and
 above) are
   essentially RISC processors that run complex instructions in millicode.
 In the

 I may be wrong, but I think the z196 is the first OOO machine and
 Enterprise PL/I V3R9 pre-dates it
 by two years.

   case of a MVC instruction it would have to do that in a loop which
 would require
   branching, the enemy of pipelined exeuction units. It's also possible
 to run
   simple instructions
   in parallel. It's plausible an MVC instruction can be executed more
 efficiently
   as a sequence of LG/STG instructions.

 Given that moves are the most executed instructions, at least on x86,
 (see, among many others
 www.ijpg.org/index.php/IJACSci/article/download/118/29) and I have
 little doubt that the same
 holds true for about any other architecture and that there is special x86
 circuitry to optimize MOVS
 instructions, it would be highly surprising if IBM did not make MVC as
 fast as possible, millicoded
 or not.

   The OOO decode units do this for you with instruction cracking on a
 z196, it
   seems that on a z10 the optimizer is doing the same thing.

 Possibly, but that does not explain the 10 superfluous reloads of r1.

   See this document - page 21
   
 http://www-01.ibm.com/software/htp/tpf/tpfug/tgf11/How_do_you_do_when_youre_a_z196_CPU.pdf
  
   Optimizers create arcane code. It's almost impossible to verify without
   understanding the secret sauce. A lot of the code the optimizers spit
 out is
   intractable,

 I don't know much about z/OS assembler, but at least I sort of managed to
 understand the code
 generated by the OS PL/I compiler. The code generated by Enterprise PL/I
 is completely unreadable,
 even some (or more than some) on this list might have trouble figuring out
 why it does what it does.

   and it's almost a paradox that a longer code path produces faster code.
  
   If you don't like it you can always compile at a different ARCH() level
 and ask
   IBM.

 Going back to ARCH(5) doesn't produce anything that seems much shorter,
 still the ridiculous
 reloading of the same register, and oodles and oodles instructions which
 would run and take time on
 a definitely not-OOO CPU:

 003A58  E300  8238  0014  003119 | LGF   r0,LINE_PTR(,r8,568)
 003A5E  4110  E00C003119 | LAr1,_shadow21(,r14,12)
 003A62  B914  00E0003119 | LGFR  r14,r0
 003A66  D278  B38E  6D33  003118 | MVC
 LINE(121,r11,910),REPT_INIT(r6,3379)
 003A6C  E3B0  DC20  0004  003119 | LGr11,#SPILL17(,r13,3104)
 003A72  50B0  D25C003119 | STr11,_temp9(,r13,604)
 003A76  DE03  D25C  1000  003119 | ED_temp9(4,r13,604),_shadow21(r1,0)
 003A7C  4110  E003003119 | LAr1,#AddressShadow(,r14,3)
 003A80  41F0  E00A003119 | LAr15,#AddressShadow(,r14,10)
 003A84  D202  1001  D25D  003119 | MVC   _shadow21(3,r1,1),_temp9(r13,605)
 003A8A  9240  E003003119 | MVI   _shadow21(r14,3),64
 003A8E  5810  8000003119 | L r1,REPT_PTR(,r8,0)
 003A92  50B0  D2E4003119 | STr11,_temp8(,r13,740)
 003A96  41B0  E017003119 | LAr11,#AddressShadow(,r14,23)
 003A9A  4110  100E003119 | LAr1,_shadow21(,r1,14)
 003A9E  DE03  D2E4  1000  003119 | ED_temp8(4,r13,740),_shadow21(r1,0)
 003AA4  D202  F001  D2E5  003119 | MVC
 _shadow21(3,r15,1),_temp8(r13,741)
 003AAA  9240  E00A003119 | MVI   _shadow21(r14,10),64
 003AAE  5810  8000003119 | L r1,REPT_PTR(,r8,0)
 003AB2  E3F0  DB98  0004  003119 | LGr15,#SPILL0(,r13,2968)
 003AB8  D202  E011  1010  003119 | MVC
 _shadow21(3,r14,17),_shadow21(r1,16)
 003ABE  5810  8000003119 | L r1,REPT_PTR(,r8,0)
 003AC2  D206  D2D4  F4A4  003119 | MVC   _temp19(7,r13,724),'
 ..'(r15,1188)
 003AC8  D203  D26C  1013  003119 | MVC
 _temp15(4,r13,620),_shadow18(r1,19)
 003ACE  4110  D26C003119 | LAr1,_temp15(,r13,620)
 003AD2  D202  D24C  1001  003119 | MVC
 _temp11(3,r13,588),_shadow12(r1,1)
 003AD8  4110  D24C003119 | LAr1,_temp11(,r13,588)
 003ADC  DE06  D2D4  1000  003119 | ED_temp19(7,r13,724),_temp11(r1,0)
 003AE2  D205  B000  D2D5  003119 | MVC
 _shadow21(6,r11,0),_temp19(r13,725)
 003AE8  5810  8000003119 | L r1,REPT_PTR(,r8,0)
 003AEC  D206  D2CC  F4A4  003119 | MVC   _temp21(7,r13,716),'
 ..'(r15,1188)
 003AF2  D202  D249  101B  003119 | MVC
 _temp18(3,r13,585),_shadow12(r1,27)
 003AF8  D202  D246  D249  003119 | MVC
 _temp20(3,r13,582),_temp18(r13,585)
 003AFE  4110  E028003119 | LAr1,#AddressShadow(,r14,40)
 003B02  E300  D246  0090  003119 | LLGC  r0,a1:d582:l1(,r13,582)
 003B08  E300  3114  0080  003119 | NGr0,=X' 000F'
 003B0E  41B0  D246003119 | LAr11,_temp20(,r13,582)
 003B12  4200  D246003119 | STC   r0,a1:d582:l1(,r13,582)
 003B16  DE06  D2CC  B000  

Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-16 Thread Paul Gilmartin
On Wed, 16 May 2012 17:21:25 +0200, Miklos Szigetvari wrote:

Do you have the chance to compare the speed of the two codes ?
 
Does execution speed always trump code size?  Where should the
tradeoff be?  For example, any loop with a fixed number of
iterations (even a million) could be flattened to linear code; fewer
instructions executed; no test and branch.  (But it might yet be
slower because of instruction cache faults.)

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-16 Thread Tom Marchant
On Tue, 15 May 2012 20:07:52 +, Robert Prins wrote:

maybe a 16-byte
three-instruction sequence like

003FC0  E310  DF10  0158  003120 | LY   r1,a1:d7952:l4(,r13,7952)
003FC6  E300  1047  0015  003120 | LGH  r0,_shadow20(,r1,71)
003FCC  4000  E064003120 | STH  r0,_shadow20(,r14,100)

is really faster than the simple 6-byte one-instruction sequence

0026D4  D2 01 7 064 6 047  MVC   REPT_LINE.DATE.MONTH(2),REPT_LIST.DATE.MONTH

Not likely.

Address Generation Interlock (AGI) will cause the second instruction 
to stall until the address is available in R1.

In addition, instruction cracking will, under some circumstances, cause 
a z196 processor to execute a load and a store when a MVC instruction 
is executed.

-- 
Tom Marchant

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-16 Thread Bernd Oppolzer

First, I would like to thank you for starting this thread.

I posted it to the performance people of my customer, and they told me, that
they just found a similar problem with EP PL/1 3.9, that is: the PLIMOVE 
calls

don't generate MVCLs any more, as in previous releases, but series of MVCs
and loops. Even when the length of PLIMOVE is - for example - 8000 bytes.
They discovered it, because one of the PLIMOVE locations showed up in a 
Strobe report.


I asked them to test using a ASSEMBLER program, if the MVC loop is faster,
but they told me, that even with lengths around 500 or 600, the MVCL 
solution is faster

- this is on a z196. I have still to confirm this.

If this turns out to be true, this sounds like a bug, and we will try to 
convince IBM to go back to
the previous solution. If we compile our modules during normal service 
using EP PL/1 3.9,

our system will get slower and slower, because PLIMOVE is widely used.
This is not acceptable.

Because the PLIMOVEs are generated by a site-specific macro called PLICOPY,
I already thought about calling a short ASSEMBLER routine (with minimal 
linkage conventions)
doing the transfer using MVCL instead of CALL PLIMOVE. The applications 
need not
to be changed, because the PLICOPY syntax stays the same. Maybe this 
could still be

faster than doing the MVC loop.

Kind regards

Bernd



Am 16.05.2012 19:05, schrieb Robert Prins:

On 2012-05-16 14:59, Paul Gilmartin wrote:

On Wed, 16 May 2012 07:55:48 -0600, Steve Comstock wrote:


Well, I knew someone would raise that exception. No,
Metal C does not use LE. Not sure if SP C (Systems
Programmer C) is still around and it would be an
exception too.


I believe it's been discussed in these fora that C and PL/I
share an optimizer/code generator.  I hope this includes
Metal C.  It's a long leap of logic, but that might weaken the
argument for LE entanglement.  Is MOVE, BY NAME
plausibly dependent on LE?


For PL/I is is most definitely not, it's just a shortcut for lazy 
people and I've worked at sites that explicitly forbade its use, 
considering it just as bad as a SELECT * in SQL.


Robert


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-16 Thread Mike Schwab
The Hercules group did some testing comparing MVCL to MVC.
If both source and destination had the same alignment to a double word
boundary, you could move 8 bytes, then increment the 4 registers to
reflect this, before being interuptable.  If they aligned differently
between boundaries, each 8 bytes would do this twice.

Whereas an MVC would do 256 bytes or less without interupting or
touching registers, and was much faster.

Of course, emulation is much different from hardware, ie, updating all
4 registers at once.

On Wed, May 16, 2012 at 2:41 PM, Bernd Oppolzer
bernd.oppol...@t-online.de wrote:
 First, I would like to thank you for starting this thread.

 I posted it to the performance people of my customer, and they told me, that
 they just found a similar problem with EP PL/1 3.9, that is: the PLIMOVE
 calls
 don't generate MVCLs any more, as in previous releases, but series of MVCs
 and loops. Even when the length of PLIMOVE is - for example - 8000 bytes.
 They discovered it, because one of the PLIMOVE locations showed up in a
 Strobe report.

 I asked them to test using a ASSEMBLER program, if the MVC loop is faster,
 but they told me, that even with lengths around 500 or 600, the MVCL
 solution is faster
 - this is on a z196. I have still to confirm this.

 If this turns out to be true, this sounds like a bug, and we will try to
 convince IBM to go back to
 the previous solution. If we compile our modules during normal service using
 EP PL/1 3.9,
 our system will get slower and slower, because PLIMOVE is widely used.
 This is not acceptable.

 Because the PLIMOVEs are generated by a site-specific macro called PLICOPY,
 I already thought about calling a short ASSEMBLER routine (with minimal
 linkage conventions)
 doing the transfer using MVCL instead of CALL PLIMOVE. The applications need
 not
 to be changed, because the PLICOPY syntax stays the same. Maybe this could
 still be
 faster than doing the MVC loop.

 Kind regards

 Bernd


-- 
Mike A Schwab, Springfield IL USA
Where do Forest Rangers go to get away from it all?

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-16 Thread Scott Ford
Guys,

A little lost on the point of this thread , not trying be rude or flippant , 
just I see a lot about performance, is it valid with today's hardware and 
software, yes there is time sensitive events and software and hardware. 

Scott ford
www.identityforge.com

On May 16, 2012, at 4:35 PM, Mike Schwab mike.a.sch...@gmail.com wrote:

 The Hercules group did some testing comparing MVCL to MVC.
 If both source and destination had the same alignment to a double word
 boundary, you could move 8 bytes, then increment the 4 registers to
 reflect this, before being interuptable.  If they aligned differently
 between boundaries, each 8 bytes would do this twice.
 
 Whereas an MVC would do 256 bytes or less without interupting or
 touching registers, and was much faster.
 
 Of course, emulation is much different from hardware, ie, updating all
 4 registers at once.
 
 On Wed, May 16, 2012 at 2:41 PM, Bernd Oppolzer
 bernd.oppol...@t-online.de wrote:
 First, I would like to thank you for starting this thread.
 
 I posted it to the performance people of my customer, and they told me, that
 they just found a similar problem with EP PL/1 3.9, that is: the PLIMOVE
 calls
 don't generate MVCLs any more, as in previous releases, but series of MVCs
 and loops. Even when the length of PLIMOVE is - for example - 8000 bytes.
 They discovered it, because one of the PLIMOVE locations showed up in a
 Strobe report.
 
 I asked them to test using a ASSEMBLER program, if the MVC loop is faster,
 but they told me, that even with lengths around 500 or 600, the MVCL
 solution is faster
 - this is on a z196. I have still to confirm this.
 
 If this turns out to be true, this sounds like a bug, and we will try to
 convince IBM to go back to
 the previous solution. If we compile our modules during normal service using
 EP PL/1 3.9,
 our system will get slower and slower, because PLIMOVE is widely used.
 This is not acceptable.
 
 Because the PLIMOVEs are generated by a site-specific macro called PLICOPY,
 I already thought about calling a short ASSEMBLER routine (with minimal
 linkage conventions)
 doing the transfer using MVCL instead of CALL PLIMOVE. The applications need
 not
 to be changed, because the PLICOPY syntax stays the same. Maybe this could
 still be
 faster than doing the MVC loop.
 
 Kind regards
 
 Bernd
 
 
 -- 
 Mike A Schwab, Springfield IL USA
 Where do Forest Rangers go to get away from it all?
 
 --
 For IBM-MAIN subscribe / signoff / archive access instructions,
 send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-16 Thread Scott Ford
Oh, now i see what Bernd was talking about, sorry guys , old age

Scott ford
www.identityforge.com

On May 16, 2012, at 4:53 PM, Scott Ford scott_j_f...@yahoo.com wrote:

 Guys,
 
 A little lost on the point of this thread , not trying be rude or flippant , 
 just I see a lot about performance, is it valid with today's hardware and 
 software, yes there is time sensitive events and software and hardware. 
 
 Scott ford
 www.identityforge.com
 
 On May 16, 2012, at 4:35 PM, Mike Schwab mike.a.sch...@gmail.com wrote:
 
 The Hercules group did some testing comparing MVCL to MVC.
 If both source and destination had the same alignment to a double word
 boundary, you could move 8 bytes, then increment the 4 registers to
 reflect this, before being interuptable.  If they aligned differently
 between boundaries, each 8 bytes would do this twice.
 
 Whereas an MVC would do 256 bytes or less without interupting or
 touching registers, and was much faster.
 
 Of course, emulation is much different from hardware, ie, updating all
 4 registers at once.
 
 On Wed, May 16, 2012 at 2:41 PM, Bernd Oppolzer
 bernd.oppol...@t-online.de wrote:
 First, I would like to thank you for starting this thread.
 
 I posted it to the performance people of my customer, and they told me, that
 they just found a similar problem with EP PL/1 3.9, that is: the PLIMOVE
 calls
 don't generate MVCLs any more, as in previous releases, but series of MVCs
 and loops. Even when the length of PLIMOVE is - for example - 8000 bytes.
 They discovered it, because one of the PLIMOVE locations showed up in a
 Strobe report.
 
 I asked them to test using a ASSEMBLER program, if the MVC loop is faster,
 but they told me, that even with lengths around 500 or 600, the MVCL
 solution is faster
 - this is on a z196. I have still to confirm this.
 
 If this turns out to be true, this sounds like a bug, and we will try to
 convince IBM to go back to
 the previous solution. If we compile our modules during normal service using
 EP PL/1 3.9,
 our system will get slower and slower, because PLIMOVE is widely used.
 This is not acceptable.
 
 Because the PLIMOVEs are generated by a site-specific macro called PLICOPY,
 I already thought about calling a short ASSEMBLER routine (with minimal
 linkage conventions)
 doing the transfer using MVCL instead of CALL PLIMOVE. The applications need
 not
 to be changed, because the PLICOPY syntax stays the same. Maybe this could
 still be
 faster than doing the MVC loop.
 
 Kind regards
 
 Bernd
 
 
 -- 
 Mike A Schwab, Springfield IL USA
 Where do Forest Rangers go to get away from it all?
 
 --
 For IBM-MAIN subscribe / signoff / archive access instructions,
 send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN
 
 --
 For IBM-MAIN subscribe / signoff / archive access instructions,
 send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Comparison of compiler generated code AD 1980(ish) v 2010(ish)

2012-05-15 Thread Robert Prins
Can anyone skilled in the art tell me why a compiler that probably
dates back to the late 1970'ies or early 1980'ies generates the
following short and sweet code for a PL/I BY NAME assignment, while
the not completely new (but still fairly recent) version of Enterprise
PL/I (V3R9) generates the very, very, very long-winded code below it?
Or is this (V3R9) code (that predates the OOO z196 architecture)
really faster?

OS PL/I V2.3.0 - OPT(2)
 343   1  2  REPT_LINE= REPT_LIST, BY NAME;

* STATEMENT NUMBER  343
002664  58 70 8 268L 7,REPT_WORK.LINE_PTR
002668  58 60 8 030L 6,REPT_WORK.REPT_PTR
00266C  58 F0 3 600L 15,1536(0,3)
002670  D2 03 7 003 F B54  MVC   REPT_LINE.TR(4),2900(15)
002676  DE 03 7 003 6 00C  EDREPT_LINE.TR(4),REPT_LIST.TR
00267C  D2 03 7 00A F B54  MVC   REPT_LINE.RE(4),2900(15)
002682  DE 03 7 00A 6 00E  EDREPT_LINE.RI(4),REPT_LIST.RI
002688  D2 02 7 011 6 010  MVC   REPT_LINE.DA(3),REPT_LIST.DA
00268E  58 E0 3 608L 14,1544(0,3)
002692  D2 06 4 158 E 5D4  MVC   344(7,4),1492(14)
002698  DE 06 4 158 6 014  ED344(7,4),REPT_LIST.K+1
00269E  D2 05 7 017 4 159  MVC   REPT_LINE.K(6),345(4)
0026A4  D2 06 4 158 E 5D4  MVC   344(7,4),1492(14)
0026AA  DE 06 4 158 6 01B  ED344(7,4),REPT_LIST.V
0026B0  D2 04 7 028 4 15A  MVC   REPT_LINE.V(5),346(4)
0026B6  D2 03 7 030 6 026  MVC   REPT_LINE.NA(4),REPT_LIST.NA
0026BC  D2 03 7 036 6 02A  MVC   REPT_LINE.TY(4),REPT_LIST.TY
0026C2  D2 03 7 03D 6 02E  MVC   REPT_LINE.CO(4),REPT_LIST.CO
0026C8  D2 00 7 04B 6 036  MVC   REPT_LINE.SP(1),REPT_LIST.SP
0026CE  D2 03 7 05F 6 043  MVC   REPT_LINE.DATE.YEAR(4),REPT_LIST.DATE.YEAR
0026D4  D2 01 7 064 6 047  MVC   REPT_LINE.DATE.MONTH(2),REPT_LIST.DATE.MONTH
0026DA  D2 01 7 067 6 049  MVC   REPT_LINE.DATE.DAY(2),REPT_LIST.DATE.DAY

Enterprise PL/I for z/OS   V3.R9.M0 (Built:20100923) - OPT(3)
3120.0 368  1  2   rept_line= rept_list, by name;

003E40  E350  D340  0624  003120 | STG  r5,#SPILL33(,r13,25408)
003E46  E320  D270  0624  003120 | STG  r2,#SPILL7(,r13,25200)
003E4C  E350  D8FD  0571  003120 | LAY  r5,_temp9(,r13,22781)
003E52  E300  D368  0604  003120 | LG   r0,#SPILL38(,r13,25448)
003E58  E340  D308  0624  003120 | STG  r4,#SPILL26(,r13,25352)
003E5E  E310  D4B4  0271  003119 | LAY  r1,LINE(,r13,9396)
003E64  E300  D8FC  0550  003120 | STY  r0,_temp9(,r13,22780)
003E6A  E300  D148  0214  003120 | LGF  r0,a1:d8520:l4(,r13,8520)
003E70  D278  1000  4D33  003119 | MVC  LINE(121,r1,0),REPT_INIT(r4,3379)
003E76  4110  E00C003120 | LA   r1,_shadow21(,r14,12)
003E7A  E3E0  D8FC  0571  003120 | LAY  r14,_temp9(,r13,22780)
003E80  DE03  E000  1000  003120 | ED   _temp9(4,r14,0),_shadow21(r1,0)
003E86  B914  00E0003120 | LGFR r14,r0
003E8A  E300  D368  0604  003120 | LG   r0,#SPILL38(,r13,25448)
003E90  4110  E003003120 | LA   r1,#AddressShadow(,r14,3)
003E94  41F0  E00A003120 | LA   r15,#AddressShadow(,r14,10)
003E98  D202  1001  5000  003120 | MVC  _shadow21(3,r1,1),_temp9(r5,0)
003E9E  9240  E003003120 | MVI  _shadow21(r14,3),64
003EA2  E310  DF10  0158  003120 | LY   r1,a1:d7952:l4(,r13,7952)
003EA8  E300  D984  0550  003120 | STY  r0,_temp8(,r13,22916)
003EAE  E350  D984  0571  003120 | LAY  r5,_temp8(,r13,22916)
003EB4  4120  E017003120 | LA   r2,#AddressShadow(,r14,23)
003EB8  4110  100E003120 | LA   r1,_shadow21(,r1,14)
003EBC  DE03  5000  1000  003120 | ED   _temp8(4,r5,0),_shadow21(r1,0)
003EC2  E310  D985  0571  003120 | LAY  r1,_temp8(,r13,22917)
003EC8  4140  E028003120 | LA   r4,#AddressShadow(,r14,40)
003ECC  D202  F001  1000  003120 | MVC  _shadow21(3,r15,1),_temp8(r1,0)
003ED2  9240  E00A003120 | MVI  _shadow21(r14,10),64
003ED6  E310  DF10  0158  003120 | LY   r1,a1:d7952:l4(,r13,7952)
003EDC  E3F0  D974  0571  003120 | LAY  r15,_temp19(,r13,22900)
003EE2  D202  E011  1010  003120 | MVC  _shadow21(3,r14,17),_shadow21(r1,16)
003EE8  E310  D238  0604  003120 | LG   r1,#SPILL0(,r13,25144)
003EEE  D206  F000  14A4  003120 | MVC  _temp19(7,r15,0),' ..'(r1,1188)
003EF4  E310  DF10  0158  003120 | LY   r1,a1:d7952:l4(,r13,7952)
003EFA  D203  B95C  1013  003120 | MVC  _temp15(4,r11,2396),_shadow18(r1,19)
003F00  E310  D90C  0571  003120 | LAY  r1,_temp15(,r13,22796)
003F06  D202  B93C  1001  003120 | MVC  _temp11(3,r11,2364),_shadow12(r1,1)
003F0C  E310  D8EC  0571  003120 | LAY  r1,_temp11(,r13,22764)
003F12  DE06  F000  1000  003120 | ED   _temp19(7,r15,0),_temp11(r1,0)
003F18  E310  D975  0571  003120 | LAY  r1,_temp19(,r13,22901)
003F1E  D205  2000  1000  003120 | MVC  _shadow21(6,r2,0),_temp19(r1,0)
003F24  E310  D238  0604  003120 | LG   r1,#SPILL0(,r13,25144)
003F2A  E320  D96C  0571  003120 | LAY  r2,_temp21(,r13,22892)
003F30  D206  2000  14A4  003120 | MVC  _temp21(7,r2,0),' ..'(r1,1188)
003F36  E310  DF10  0158  003120 | LY   r1,a1:d7952:l4(,r13,7952)
003F3C  D202  B939  101B  003120 | MVC