date:20201020

Re: Conditional MVCL macro?

2020-10-20 Thread Brian Westerman

Can you share your $MVC macro?

Brian

On Tue, 20 Oct 2020 17:42:55 +, Christopher Y. Blaicher 
 wrote:

>We just got a z15 and I have not tested MVCL vs MVC loop, but on all prior 
>machines a MVC loop beat a MVCL up to about 32K.  Over 32K MVCL is the way to 
>go.  In our environment we rarely are moving more than 32K.  We built a $MVC 
>macro with 3 parameters, destination, source and length and use that.
>
>FYI - MVCL is a micro-code (milli-code, call it what you want) instruction.  
>There is a hefty startup and end cost to micro-code instructions.  MVCL only 
>really gets going when it can use the internal move page function.  That has 
>to be moving whole pages and they have to be page aligned.  CLCL and similar 
>instructions, at least used to, suffer the same type of startup costs.
>
>Chris Blaicher
>Technical Architect
>Precisely.com
>
>
>-Original Message-
>From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
>Behalf Of Mike Hochee
>Sent: Tuesday, October 20, 2020 12:40 PM
>To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
>Subject: Re: Conditional MVCL macro?
>
>This message originated Externally. Use proper judgement and caution with 
>attachments, links, or responses.
>
>
>Really interesting thread to start the day with!
>
>Our experience has been that the MVC loops are typically faster, up to a 
>point, that being about 30-40 instructions in the pipeline and as mentioned,  
>and this seemed very processor dependent. However when source and target 
>operands happen to both be aligned on a page boundary, then the opportunity 
>exists for the async data mover to kick in if a move long is being used.  I 
>think this applied to both MVCL and MVCLE, but not sure. So ideally a macro 
>would want to utilize both MVCs and MVCL/E.
>
>More grist for the mill!
>
>-Original Message-
>From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
>Behalf Of baron_car...@technologist.com
>Sent: Tuesday, October 20, 2020 12:12 PM
>To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
>Subject: Re: Conditional MVCL macro?
>
>Caution! This message was sent from outside your organization.
>
>The COBOL compiler for a 4000 byte move, from to the same with OPT(2) generates
>
>LAY R10,5072(,R9)   FROM
>LA  R7,1072(,R9)  TO
>MVC 0(256,R7),0(R10)
>MVC 256(256,R7),256(R10)
>MVC 512(256,R7),512(R10)
>MVC 768(256,R7),768(R10)
>MVC 1024(256,R7),1024(R10)
>MVC 1280(256,R7),1280(R10)
>MVC 1536(256,R7),1536(R10)
>MVC 1792(256,R7),1792(R10)
>MVC 2048(256,R7),2048(R10)
>MVC 2304(256,R7),2304(R10)
>MVC 2560(256,R7),2560(R10)
>MVC 2816(256,R7),2816(R10)
>MVC 3072(256,R7),3072(R10)
>MVC 3328(256,R7),3328(R10)
>MVC 3584(256,R7),3584(R10)
>MVC 3840(160,R7),3840(R10)
>
>However for 5000 bytes it generates:
>
>LAY R7,6072(,R9)
>LA  R10,0(,R7)
>LA  R7,1072(,R9)
>LHI R11,0x13
>EQU *
>MVC 0(256,R7),0(R10)
>LA  R10,256(,R10)
>LA  R7,256(,R7)
>BRCTR11,L0128
>MVC 0(136,R7),0(R10)
>
>And yes the change occurred at 4097  bytes.
>
>
>
>-Original Message-
>From: IBM Mainframe Assembler List  On Behalf 
>Of Charles Mills
>Sent: Tuesday, October 20, 2020 10:54
>To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
>Subject: Re: Conditional MVCL macro?
>
>@Ed, can you elaborate a little on your reasoning? (Not doubting it; just
>curious.) Is it that the interruptibility provides a significant improvement 
>over MVCL? Or the support for lengths greater than 16M? Or ... ?
>
>When I asked Dr. Shum about move strategies he seemed to indicate that for 
>data that was already or would soon anyway be in cache an MVC loop was 
>generally faster than MVCL. (I did not ask about MVCLE at the time; not sure 
>why. He did not suggest it.)
>
>Charles
>
>
>-Original Message-
>From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
>On Behalf Of Ed Jaffe
>Sent: Tuesday, October 20, 2020 6:52 AM
>To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
>Subject: Re: Conditional MVCL macro?
>
>We've switched almost exclusively to MVCLE except for short, fixed-length 
>moves.

Re: Conditional MVCL macro?

2020-10-20 Thread Dan Greiner

One of the questions that I always ask in response to concerns about whether 
one instruction is better than another is, "Do you ever expect to recoup the 
number of instructions needed to reassemble, rebind, and retest the code?" 

Assuming that the answer is "Yes, this code is executed a gazillion times a 
second," then I would ask whether the data being moved are (a) going to be 
manipulated by the CPU in the near future, or (b) on its way to an output 
buffer that the CPU won't touch again. 

If (a), then the question of how much data is being moved comes into question. 
If you're moving a small amount of data (say, less than 4K), then one or more 
MVCs is probably a good choice. If you're moving a gazillion bytes of data 
(i.e., more than the CPU's cache size), then (by default) you're assured that 
whatever was last moved is what's in the cache when the instruction completes 
(which may or may not be what you intended) ... so the answer implicitly looks 
more like (b).

To assert control over whether MVCL wipes out the cache, check out the 
discussion of the special pad characters X'B0' and X'B8' used by the 
instruction on page 7-291 (RHC) of the "z/Architecture Principles of Operation" 
(SA22-7821-12); similarly for MVCLE, see page 7-296 (LHC). Additionally, both 
instructions provide a special pad character (X'B1'), determining whether the 
instruction can perform multiple access references to the data (which is really 
only interesting if other CPUs are simultaneously observing the same locations 
in memory). 

Kevin Shum's seminal work on processor optimization, "IBM z Systems Processor 
Optimization Primer" (which discusses MVCL and MVCLE), can be found at 
https://community.ibm.com/HigherLogic/System/DownloadDocumentFile.ashx?DocumentFileKey=d1cdb394-0159-464c-92a3-3f74f8c545c4=0.

Re: Conditional MVCL macro?

2020-10-20 Thread Seymour J Metz

Likewise CLCL.

--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3

From: IBM Mainframe Assembler List [ASSEMBLER-LIST@LISTSERV.UGA.EDU] on behalf 
of Charles Mills [charl...@mcn.org]
Sent: Tuesday, October 20, 2020 6:35 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

Unless I am thinking fuzzily, an interrupted MVCL leaves the PSW pointing to
the MVCL (not past it) and the relevant registers incremented and
decremented appropriately, so the supervisor may dispatch other tasks on the
affected CPU, let them run as they will, and then resume the interrupted
task when appropriate. The task will take off with the MVCL continuing from
where it left off.

Charles

-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
On Behalf Of Paul Gilmartin
Sent: Tuesday, October 20, 2020 3:02 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

On 2020-10-20, at 14:58:52, Steve Smith wrote:
>
> There's actually a big difference between MVCL being interruptible, and
> MVCLE stopping periodically before it's finished.  The latter is not
> interruptible, it just stops before completion periodically for the
program
> to do something else if it wants to.  ...
>
I thought it was so that supervisor could dispatch another task.

-- gil

Re: Conditional MVCL macro?

2020-10-20 Thread Seymour J Metz

I'm pretty sure that testing for pending interrupts didn't slow down CLCL or 
MVCL on the 370/165 or 370/168. But the microinstruction was 108 bits; longer 
if you had an emulator feature.

--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3

From: IBM Mainframe Assembler List [ASSEMBLER-LIST@LISTSERV.UGA.EDU] on behalf 
of Steve Smith [sasd...@gmail.com]
Sent: Tuesday, October 20, 2020 6:36 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

Interrupts can only be handled between instructions (don't ask me how
pipelining figures)  except, MVCL has the potential to delay that too
long, so it (and a handful of others) were made to be interruptible.
Probably, that just means the micro/milli-code program gets interrupted
between micro/milli-code instructions.  Possibly, that affects its
performance negatively.  Anyway, for MVCLE, the 4K limit means it can't run
very long anyway, so yeah, it does help keep interrupts flowing.

sas

On Tue, Oct 20, 2020 at 6:16 PM Keven  wrote:

>
>
>
>
> I’d say you were both correct.
> Keven
>
>
>
>
>
>
>
>
>
>
> On Tue, Oct 20, 2020 at 5:01 PM -0500, "Paul Gilmartin" <
> 0014e0e4a59b-dmarc-requ...@listserv.uga.edu> wrote:
>
>
>
>
>
>
>
>
>
>
> On 2020-10-20, at 14:58:52, Steve Smith wrote:
> >
> > There's actually a big difference between MVCL being interruptible, and
> > MVCLE stopping periodically before it's finished.  The latter is not
> > interruptible, it just stops before completion periodically for the
> program
> > to do something else if it wants to.  ...
> >
> I thought it was so that supervisor could dispatch another task.
>
> -- gil
>

Re: Another Macro question

2020-10-20 Thread graeme

umm, another meme looking for idle minds to prosper in?

I use eyecatchers A LOT, and I want every instance of the string to be easily 
find-able.

Cheers from Fort Lockdown,
The Masked Marauder

> On 21 Oct 2020, at 9:12 am, Keven  wrote:
> 
> 
> 
> 
> 
>For the 0.75 seconds it would take to differentiate an eye catching 
> string in program storage from an instance of it in in use to nominate a 
> control block, I have to wonder if this is a solution looking for a problem 
> to solve.
> Keven
>
>
> 
>
> 
> 
> 
> 
> 
> 
> On Tue, Oct 20, 2020 at 4:22 PM -0500, "Steve Smith"  
> wrote:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> It's not complicated if you know how to manipulate strings in Conditional
> Assembly. There's no much way around the fact it's vanity programming
> though.
> 
> Besides, the technique is defeated by palindromes :-).
> 
> sas
> 
> 
> On Tue, Oct 20, 2020 at 4:16 PM Tom Harper 
> wrote:
> 
>> I already have such a macro. I’ll post it later.
>> 
>>

Re: Conditional MVCL macro?

2020-10-20 Thread Steve Smith

So it was written, and it is so done.
sas

On Tue, Oct 20, 2020 at 6:35 PM Charles Mills  wrote:

> Unless I am thinking fuzzily, an interrupted MVCL leaves the PSW pointing
> to
> the MVCL (not past it) and the relevant registers incremented and
> decremented appropriately, so the supervisor may dispatch other tasks on
> the
> affected CPU, let them run as they will, and then resume the interrupted
> task when appropriate. The task will take off with the MVCL continuing from
> where it left off.
>
>

Re: Conditional MVCL macro?

2020-10-20 Thread Steve Smith

Interrupts can only be handled between instructions (don't ask me how
pipelining figures)  except, MVCL has the potential to delay that too
long, so it (and a handful of others) were made to be interruptible.
Probably, that just means the micro/milli-code program gets interrupted
between micro/milli-code instructions.  Possibly, that affects its
performance negatively.  Anyway, for MVCLE, the 4K limit means it can't run
very long anyway, so yeah, it does help keep interrupts flowing.

sas

On Tue, Oct 20, 2020 at 6:16 PM Keven  wrote:

>
>
>
>
> I’d say you were both correct.
> Keven
>
>
>
>
>
>
>
>
>
>
> On Tue, Oct 20, 2020 at 5:01 PM -0500, "Paul Gilmartin" <
> 0014e0e4a59b-dmarc-requ...@listserv.uga.edu> wrote:
>
>
>
>
>
>
>
>
>
>
> On 2020-10-20, at 14:58:52, Steve Smith wrote:
> >
> > There's actually a big difference between MVCL being interruptible, and
> > MVCLE stopping periodically before it's finished.  The latter is not
> > interruptible, it just stops before completion periodically for the
> program
> > to do something else if it wants to.  ...
> >
> I thought it was so that supervisor could dispatch another task.
>
> -- gil
>

Re: Conditional MVCL macro?

2020-10-20 Thread Charles Mills

Unless I am thinking fuzzily, an interrupted MVCL leaves the PSW pointing to
the MVCL (not past it) and the relevant registers incremented and
decremented appropriately, so the supervisor may dispatch other tasks on the
affected CPU, let them run as they will, and then resume the interrupted
task when appropriate. The task will take off with the MVCL continuing from
where it left off.

Charles

-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
On Behalf Of Paul Gilmartin
Sent: Tuesday, October 20, 2020 3:02 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

On 2020-10-20, at 14:58:52, Steve Smith wrote:
> 
> There's actually a big difference between MVCL being interruptible, and
> MVCLE stopping periodically before it's finished.  The latter is not
> interruptible, it just stops before completion periodically for the
program
> to do something else if it wants to.  ...
>  
I thought it was so that supervisor could dispatch another task.

-- gil

Re: Another Macro question

2020-10-20 Thread Charles Mills

A simpler solution would be

MVC Real_Eyecatcher(8),=C'MY BLOCK-not really, just the literal'

Charles


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Keven
Sent: Tuesday, October 20, 2020 3:12 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Another Macro question

  
  
  

For the 0.75 seconds it would take to differentiate an eye catching 
string in program storage from an instance of it in in use to nominate a 
control block, I have to wonder if this is a solution looking for a problem to 
solve.
Keven

Re: Conditional MVCL macro?

2020-10-20 Thread Keven

  
  
  

I’d say you were both correct.
Keven





  




On Tue, Oct 20, 2020 at 5:01 PM -0500, "Paul Gilmartin" 
<0014e0e4a59b-dmarc-requ...@listserv.uga.edu> wrote:










On 2020-10-20, at 14:58:52, Steve Smith wrote:
> 
> There's actually a big difference between MVCL being interruptible, and
> MVCLE stopping periodically before it's finished.  The latter is not
> interruptible, it just stops before completion periodically for the program
> to do something else if it wants to.  ...
>  
I thought it was so that supervisor could dispatch another task.

-- gil

Re: Another Macro question

2020-10-20 Thread Keven

For the 0.75 seconds it would take to differentiate an eye catching 
string in program storage from an instance of it in in use to nominate a 
control block, I have to wonder if this is a solution looking for a problem to 
solve.
Keven

On Tue, Oct 20, 2020 at 4:22 PM -0500, "Steve Smith"  wrote:

It's not complicated if you know how to manipulate strings in Conditional
Assembly. There's no much way around the fact it's vanity programming
though.

Besides, the technique is defeated by palindromes :-).

sas

On Tue, Oct 20, 2020 at 4:16 PM Tom Harper 
wrote:

> I already have such a macro. I’ll post it later.
>
>

Re: Conditional MVCL macro?

2020-10-20 Thread Paul Gilmartin

On 2020-10-20, at 14:58:52, Steve Smith wrote:
> 
> There's actually a big difference between MVCL being interruptible, and
> MVCLE stopping periodically before it's finished.  The latter is not
> interruptible, it just stops before completion periodically for the program
> to do something else if it wants to.  ...
>  
I thought it was so that supervisor could dispatch another task.

-- gil

Re: Another Macro question

2020-10-20 Thread Steve Smith

It's not complicated if you know how to manipulate strings in Conditional
Assembly. There's no much way around the fact it's vanity programming
though.

Besides, the technique is defeated by palindromes :-).

sas

On Tue, Oct 20, 2020 at 4:16 PM Tom Harper 
wrote:

> I already have such a macro. I’ll post it later.
>
>

Re: Conditional MVCL macro?

2020-10-20 Thread Christopher Y. Blaicher

The first, base code, is just the following to get the overhead of loop control;
 TIMEUSED STORADR=STARTIME,CPU=TOD,LINKAGE=SYSTEM   
LOOPBDS0H   
 L R3,POOLADDRGET FROM ADDRESS  
 L R4,TOADDR  GET TO ADDR   
 BCT   R9,LOOPB   LOOP THE NEEDED NUMBER OF TIMES   
 TIMEUSED STORADR=ENDTIME,CPU=TOD,LINKAGE=SYSTEM
 SPACE 3
 LAR1,=CL12'BASE CODE'  
 BAL   R14,TIMEOUT  

The second case was just a move of 1K using four MVC instructions in a row, 
which is the fastest.

All the others are just $MVC macro vs MVCL instruction.

Chris Blaicher
Technical Architect
Precisely.com


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Mike Hochee
Sent: Tuesday, October 20, 2020 4:40 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

This message originated Externally. Use proper judgement and caution with 
attachments, links, or responses.


Thanks for sharing your test results, although I had trouble explaining the 
results of the first two tests, and maybe this is related to how the $MVC macro 
does its thing.

Anyway, If you throw out the first two tests, the $MVC technique appears to be 
250-300% more efficient than the MVCL technique with lengths between 4K-64K. 
But with a length of 128K, $MVC efficiency drops down to only 60%.  My guess is 
that MVCL will eventually prove to be more efficient than $MVC with move 
lengths in excess of 256K.

I don't know if moving to/from the same storage locations makes any difference 
for this test, but assuming intentional for the purpose of controlling this as 
a variable.  There's already enough unknowns!

Again, thanks for sharing!

Re: Conditional MVCL macro?

2020-10-20 Thread Steve Smith

There's actually a big difference between MVCL being interruptible, and
MVCLE stopping periodically before it's finished.  The latter is not
interruptible, it just stops before completion periodically for the program
to do something else if it wants to.  Checking a flag is a possibility, but
to what end I cannot figure.  Anyway, that is not possible with MVCL
(without OS assistance that does not exist).

I've heard contrary information on the relative performance of MVCL &
MVCLE.  I had no idea that MVCLE generally only moved a maximum of 4K per
iteration, which on the face of it would seem to imply it could be very
slow for large moves (particularly if your program fools around much before
re-driving it).

As for MVCL, I've heard consistently that it is considerably slower than
MVCs galore, which still puzzles me.  The explanations I've heard sound to
me like they could possibly amount to a trivial extra cost, i.e. much less
than what is commonly observed.

And for something completely different... sometimes I use MVCK for a
variable-length move instead of EX/MVC or MVCL.  I haven't done any
performance tests, because I haven't used it in performance-critical code
(and it does have a warning that it is slow).  But for programming
convenience, getting & setting the key is (at least slightly) less of an
annoyance than setting up EX.  I vaguely recall a rumor that there is an
MVCX milli-code instruction that works the same without the key
specification.  Sure would be nice if that appeared in PoOp.

sas

On Tue, Oct 20, 2020 at 3:09 PM Christopher Y. Blaicher <
cblaic...@precisely.com> wrote:

> There may be a hint to the reason for the jump in the explanation of
> MVCLE, programming note 3.
>
> "The   function of not  processing more than approximately 4K bytes of
> either operand is intended to permit software polling of a flag that may be
> set by a program on another CPU during long operations."
>
> If a similar process happens with MVCL at the 2K boundary, that could be
> the explanation.  I'm not a hardware guy, so just guessing.
>
> Chris Blaicher
> Technical Architect
> Precisely.com
>
>
> -Original Message-
> From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
> On Behalf Of Christopher Y. Blaicher
> Sent: Tuesday, October 20, 2020 2:47 PM
> To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
> Subject: Re: Conditional MVCL macro?
>
> This message originated Externally. Use proper judgement and caution with
> attachments, links, or responses.
>
>
> I just re-ran a test on our z15 machine and got interesting numbers.  The
> $MVC was reasonably linear from start to finish.  The MVCL has a big jump
> from 2K to 4K, but was also reasonably linear outside of that jump.  It
> never caught up to the $MVC implementation.
>
> TEST TYPE =  BASE CODE
> CPU TIME USED=  0.003873
> TEST TYPE =  1K 4 MVC
> CPU TIME USED=  0.171274
> TEST TYPE =  1K $MVC
> CPU TIME USED=  0.183642
> TEST TYPE =  1K MVCL
> CPU TIME USED=  0.345227
> TEST TYPE =  2K $MVC
> CPU TIME USED=  0.357314
> TEST TYPE =  2K MVCL
> CPU TIME USED=  0.509385
> TEST TYPE =  4K $MVC
> CPU TIME USED=  0.704173
> TEST TYPE =  4K MVCL
> CPU TIME USED=  2.790247
> TEST TYPE =  8K $MVC
> CPU TIME USED=  1.426892
> TEST TYPE =  8K MVCL
> CPU TIME USED=  5.480536
> TEST TYPE =  32K $MVC
> CPU TIME USED=  5.835773
> TEST TYPE =  32K MVCL
> CPU TIME USED= 21.734112
> TEST TYPE =  64K $MVC
> CPU TIME USED= 12.278130
> TEST TYPE =  64K MVCL
> CPU TIME USED= 43.380435
> TEST TYPE =  128K $MVC
> CPU TIME USED= 54.570900
> TEST TYPE =  128K MVCL
> CPU TIME USED= 86.739562
>
> All the iterations used this basic set of instructions.
> *
> *TEST 1K $MVC
> *
>  SPACE ,
>  L R9,REPEATCOUNT DO IT 100,000 TIMES
>  TIMEUSED STORADR=STARTIME,CPU=TOD,LINKAGE=SYSTEM
> LOOP1A   DS0H
>  L R3,POOLADDRGET FROM ADDRESS
>  L R4,TOADDR  GET TO ADDR
>  L R5,=A(1024)MOVE 1K  BYTES
>  $MVC  (R4),(R3),(R5) MOVE IT
>  AHI   R3,1024
>  AHI   R4,1024
>  BCT   R9,LOOP1A  LOOP THE NEEDED NUMBER OF TIMES
>  TIMEUSED STORADR=ENDTIME,CPU=TOD,LINKAGE=SYSTEM
>  SPACE 3
>  LAR1,=CL12'1K $MVC'
>  BAL   R14,TIMEOUT
> *
> *TEST 1K MVCL
> *
>  SPACE ,
>  L R9,REPEATCOUNT DO IT 100,000 TIMES
>  TIMEUSED STORADR=STARTIME,CPU=TOD,LINKAGE=SYSTEM
> LOOP2DS0H
>  L R2,POOLADDRGET FROM ADDRESS
>  L R3,=F'1024'
>  L R4,TOADDR  GET TO ADDR
>  L R5,=F'1024'
>  MVCL  R4,R2  MOVE IT
>  BCT   R9,LOOP2   LOOP THE NEEDED NUMBER OF TIMES
>  TIMEUSED STORADR=ENDTIME,CPU=TOD,LINKAGE=SYSTEM
>  SPACE 3
>  LAR1,=CL12'1K MVCL'
>  BAL   R14,TIMEOUT
> The REPEATCOUNT value is

Re: Conditional MVCL macro?

2020-10-20 Thread Mike Hochee

Thanks for sharing your test results, although I had trouble explaining the 
results of the first two tests, and maybe this is related to how the $MVC macro 
does its thing. 

Anyway, If you throw out the first two tests, the $MVC technique appears to be 
250-300% more efficient than the MVCL technique with lengths between 4K-64K. 
But with a length of 128K, $MVC efficiency drops down to only 60%.  My guess is 
that MVCL will eventually prove to be more efficient than $MVC with move 
lengths in excess of 256K.

I don't know if moving to/from the same storage locations makes any difference 
for this test, but assuming intentional for the purpose of controlling this as 
a variable.  There's already enough unknowns!   

Again, thanks for sharing! 

-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Christopher Y. Blaicher
Sent: Tuesday, October 20, 2020 3:09 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

Caution! This message was sent from outside your organization.

There may be a hint to the reason for the jump in the explanation of MVCLE, 
programming note 3.

"The   function of not  processing more than approximately 4K bytes of either 
operand is intended to permit software polling of a flag that may be set by a 
program on another CPU during long operations."

If a similar process happens with MVCL at the 2K boundary, that could be the 
explanation.  I'm not a hardware guy, so just guessing.

Chris Blaicher
Technical Architect
Precisely.com


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Christopher Y. Blaicher
Sent: Tuesday, October 20, 2020 2:47 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

This message originated Externally. Use proper judgement and caution with 
attachments, links, or responses.


I just re-ran a test on our z15 machine and got interesting numbers.  The $MVC 
was reasonably linear from start to finish.  The MVCL has a big jump from 2K to 
4K, but was also reasonably linear outside of that jump.  It never caught up to 
the $MVC implementation.

TEST TYPE =  BASE CODE
CPU TIME USED=  0.003873
TEST TYPE =  1K 4 MVC
CPU TIME USED=  0.171274
TEST TYPE =  1K $MVC
CPU TIME USED=  0.183642
TEST TYPE =  1K MVCL
CPU TIME USED=  0.345227
TEST TYPE =  2K $MVC
CPU TIME USED=  0.357314
TEST TYPE =  2K MVCL
CPU TIME USED=  0.509385
TEST TYPE =  4K $MVC
CPU TIME USED=  0.704173
TEST TYPE =  4K MVCL
CPU TIME USED=  2.790247
TEST TYPE =  8K $MVC
CPU TIME USED=  1.426892
TEST TYPE =  8K MVCL
CPU TIME USED=  5.480536
TEST TYPE =  32K $MVC
CPU TIME USED=  5.835773
TEST TYPE =  32K MVCL
CPU TIME USED= 21.734112
TEST TYPE =  64K $MVC
CPU TIME USED= 12.278130
TEST TYPE =  64K MVCL
CPU TIME USED= 43.380435
TEST TYPE =  128K $MVC
CPU TIME USED= 54.570900
TEST TYPE =  128K MVCL
CPU TIME USED= 86.739562

All the iterations used this basic set of instructions.
*
*TEST 1K $MVC
*
 SPACE ,
 L R9,REPEATCOUNT DO IT 100,000 TIMES
 TIMEUSED STORADR=STARTIME,CPU=TOD,LINKAGE=SYSTEM
LOOP1A   DS0H
 L R3,POOLADDRGET FROM ADDRESS
 L R4,TOADDR  GET TO ADDR
 L R5,=A(1024)MOVE 1K  BYTES
 $MVC  (R4),(R3),(R5) MOVE IT
 AHI   R3,1024
 AHI   R4,1024
 BCT   R9,LOOP1A  LOOP THE NEEDED NUMBER OF TIMES
 TIMEUSED STORADR=ENDTIME,CPU=TOD,LINKAGE=SYSTEM
 SPACE 3
 LAR1,=CL12'1K $MVC'
 BAL   R14,TIMEOUT
*
*TEST 1K MVCL
*
 SPACE ,
 L R9,REPEATCOUNT DO IT 100,000 TIMES
 TIMEUSED STORADR=STARTIME,CPU=TOD,LINKAGE=SYSTEM
LOOP2DS0H
 L R2,POOLADDRGET FROM ADDRESS
 L R3,=F'1024'
 L R4,TOADDR  GET TO ADDR
 L R5,=F'1024'
 MVCL  R4,R2  MOVE IT
 BCT   R9,LOOP2   LOOP THE NEEDED NUMBER OF TIMES
 TIMEUSED STORADR=ENDTIME,CPU=TOD,LINKAGE=SYSTEM
 SPACE 3
 LAR1,=CL12'1K MVCL'
 BAL   R14,TIMEOUT
The REPEATCOUNT value is 10,000,000
Both POOLADDR and TOADDR areas are 256K in size, so they both should be on page 
boundaries.

Chris Blaicher
Technical Architect
Precisely.com


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Charles Mills
Sent: Tuesday, October 20, 2020 1:57 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

This message originated Externally. Use proper judgement and caution with 
attachments, links, or responses.


Right.

I should have said "an interruptibility that is visible to the surrounding 
assembler instructions via the CC."

Charles


-Original Message-
From: IBM Mainframe Assembler List

Re: Another Macro question

2020-10-20 Thread Tom Harper

I already have such a macro. I’ll post it later. 

Sent from my iPhone

> On Oct 20, 2020, at 4:06 PM, Tony Thigpen  wrote:
> 
> While we are talking about macros, a while back, someone posted they liked to 
> fill eye-catchers using MVCIN so that scans of the dump for a tag only found 
> the real eye-catcher, not the literal used to fill the eye-catcher.
> 
> So, instead of:
>  MVC   EYE1,=C'(BTABLE>'
> use:
>  MVCIN EYE1+L'EYE1-1(l'EYE1),=C'>ELBATB('
> 
> This would seem to be a good place for a macro:
>  MVCEYE EYE1,'(BTABLE>'
> that would generate the correct MVCIN.
> 
> Anyone want to try their hand at writing this macro?
> 
> 
> Tony Thigpen

This e-mail message, including any attachments, appended messages and the
information contained therein, is for the sole use of the intended
recipient(s). If you are not an intended recipient or have otherwise
received this email message in error, any use, dissemination, distribution,
review, storage or copying of this e-mail message and the information
contained therein is strictly prohibited. If you are not an intended
recipient, please contact the sender by reply e-mail and destroy all copies
of this email message and do not otherwise utilize or retain this email
message or any or all of the information contained therein. Although this
email message and any attachments or appended messages are believed to be
free of any virus or other defect that might affect any computer system into
which it is received and opened, it is the responsibility of the recipient
to ensure that it is virus free and no responsibility is accepted by the
sender for any loss or damage arising in any way from its opening or use.

Another Macro question

2020-10-20 Thread Tony Thigpen

While we are talking about macros, a while back, someone posted they 
liked to fill eye-catchers using MVCIN so that scans of the dump for a 
tag only found the real eye-catcher, not the literal used to fill the 
eye-catcher.


So, instead of:
  MVC   EYE1,=C'(BTABLE>'
use:
  MVCIN EYE1+L'EYE1-1(l'EYE1),=C'>ELBATB('

This would seem to be a good place for a macro:
  MVCEYE EYE1,'(BTABLE>'
that would generate the correct MVCIN.

Anyone want to try their hand at writing this macro?


Tony Thigpen

Re: Conditional MVCL macro?

2020-10-20 Thread Christopher Y. Blaicher

There may be a hint to the reason for the jump in the explanation of MVCLE, 
programming note 3.

"The   function of not  processing more than approximately 4K bytes of either 
operand is intended to permit software polling of a flag that may be set by a 
program on another CPU during long operations."

If a similar process happens with MVCL at the 2K boundary, that could be the 
explanation.  I'm not a hardware guy, so just guessing.

Chris Blaicher
Technical Architect
Precisely.com


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Christopher Y. Blaicher
Sent: Tuesday, October 20, 2020 2:47 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

This message originated Externally. Use proper judgement and caution with 
attachments, links, or responses.


I just re-ran a test on our z15 machine and got interesting numbers.  The $MVC 
was reasonably linear from start to finish.  The MVCL has a big jump from 2K to 
4K, but was also reasonably linear outside of that jump.  It never caught up to 
the $MVC implementation.

TEST TYPE =  BASE CODE
CPU TIME USED=  0.003873
TEST TYPE =  1K 4 MVC
CPU TIME USED=  0.171274
TEST TYPE =  1K $MVC
CPU TIME USED=  0.183642
TEST TYPE =  1K MVCL
CPU TIME USED=  0.345227
TEST TYPE =  2K $MVC
CPU TIME USED=  0.357314
TEST TYPE =  2K MVCL
CPU TIME USED=  0.509385
TEST TYPE =  4K $MVC
CPU TIME USED=  0.704173
TEST TYPE =  4K MVCL
CPU TIME USED=  2.790247
TEST TYPE =  8K $MVC
CPU TIME USED=  1.426892
TEST TYPE =  8K MVCL
CPU TIME USED=  5.480536
TEST TYPE =  32K $MVC
CPU TIME USED=  5.835773
TEST TYPE =  32K MVCL
CPU TIME USED= 21.734112
TEST TYPE =  64K $MVC
CPU TIME USED= 12.278130
TEST TYPE =  64K MVCL
CPU TIME USED= 43.380435
TEST TYPE =  128K $MVC
CPU TIME USED= 54.570900
TEST TYPE =  128K MVCL
CPU TIME USED= 86.739562

All the iterations used this basic set of instructions.
*
*TEST 1K $MVC
*
 SPACE ,
 L R9,REPEATCOUNT DO IT 100,000 TIMES
 TIMEUSED STORADR=STARTIME,CPU=TOD,LINKAGE=SYSTEM
LOOP1A   DS0H
 L R3,POOLADDRGET FROM ADDRESS
 L R4,TOADDR  GET TO ADDR
 L R5,=A(1024)MOVE 1K  BYTES
 $MVC  (R4),(R3),(R5) MOVE IT
 AHI   R3,1024
 AHI   R4,1024
 BCT   R9,LOOP1A  LOOP THE NEEDED NUMBER OF TIMES
 TIMEUSED STORADR=ENDTIME,CPU=TOD,LINKAGE=SYSTEM
 SPACE 3
 LAR1,=CL12'1K $MVC'
 BAL   R14,TIMEOUT
*
*TEST 1K MVCL
*
 SPACE ,
 L R9,REPEATCOUNT DO IT 100,000 TIMES
 TIMEUSED STORADR=STARTIME,CPU=TOD,LINKAGE=SYSTEM
LOOP2DS0H
 L R2,POOLADDRGET FROM ADDRESS
 L R3,=F'1024'
 L R4,TOADDR  GET TO ADDR
 L R5,=F'1024'
 MVCL  R4,R2  MOVE IT
 BCT   R9,LOOP2   LOOP THE NEEDED NUMBER OF TIMES
 TIMEUSED STORADR=ENDTIME,CPU=TOD,LINKAGE=SYSTEM
 SPACE 3
 LAR1,=CL12'1K MVCL'
 BAL   R14,TIMEOUT
The REPEATCOUNT value is 10,000,000
Both POOLADDR and TOADDR areas are 256K in size, so they both should be on page 
boundaries.

Chris Blaicher
Technical Architect
Precisely.com


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Charles Mills
Sent: Tuesday, October 20, 2020 1:57 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

This message originated Externally. Use proper judgement and caution with 
attachments, links, or responses.


Right.

I should have said "an interruptibility that is visible to the surrounding 
assembler instructions via the CC."

Charles


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
On Behalf Of Seymour J Metz
Sent: Tuesday, October 20, 2020 10:52 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

MVCL is, and always has been, interruptible.

Re: Conditional MVCL macro?

2020-10-20 Thread Christopher Y. Blaicher

I just re-ran a test on our z15 machine and got interesting numbers.  The $MVC 
was reasonably linear from start to finish.  The MVCL has a big jump from 2K to 
4K, but was also reasonably linear outside of that jump.  It never caught up to 
the $MVC implementation.
 
TEST TYPE =  BASE CODE  
CPU TIME USED=  0.003873
TEST TYPE =  1K 4 MVC   
CPU TIME USED=  0.171274
TEST TYPE =  1K $MVC
CPU TIME USED=  0.183642
TEST TYPE =  1K MVCL
CPU TIME USED=  0.345227
TEST TYPE =  2K $MVC
CPU TIME USED=  0.357314
TEST TYPE =  2K MVCL
CPU TIME USED=  0.509385
TEST TYPE =  4K $MVC
CPU TIME USED=  0.704173
TEST TYPE =  4K MVCL
CPU TIME USED=  2.790247
TEST TYPE =  8K $MVC
CPU TIME USED=  1.426892
TEST TYPE =  8K MVCL
CPU TIME USED=  5.480536
TEST TYPE =  32K $MVC   
CPU TIME USED=  5.835773
TEST TYPE =  32K MVCL   
CPU TIME USED= 21.734112
TEST TYPE =  64K $MVC   
CPU TIME USED= 12.278130
TEST TYPE =  64K MVCL   
CPU TIME USED= 43.380435
TEST TYPE =  128K $MVC  
CPU TIME USED= 54.570900
TEST TYPE =  128K MVCL  
CPU TIME USED= 86.739562

All the iterations used this basic set of instructions.
* 
*TEST 1K $MVC 
* 
 SPACE ,  
 L R9,REPEATCOUNT DO IT 100,000 TIMES 
 TIMEUSED STORADR=STARTIME,CPU=TOD,LINKAGE=SYSTEM 
LOOP1A   DS0H 
 L R3,POOLADDRGET FROM ADDRESS
 L R4,TOADDR  GET TO ADDR 
 L R5,=A(1024)MOVE 1K  BYTES  
 $MVC  (R4),(R3),(R5) MOVE IT 
 AHI   R3,1024
 AHI   R4,1024
 BCT   R9,LOOP1A  LOOP THE NEEDED NUMBER OF TIMES 
 TIMEUSED STORADR=ENDTIME,CPU=TOD,LINKAGE=SYSTEM  
 SPACE 3  
 LAR1,=CL12'1K $MVC'  
 BAL   R14,TIMEOUT
* 
*TEST 1K MVCL 
* 
 SPACE ,  
 L R9,REPEATCOUNT DO IT 100,000 TIMES 
 TIMEUSED STORADR=STARTIME,CPU=TOD,LINKAGE=SYSTEM 
LOOP2DS0H 
 L R2,POOLADDRGET FROM ADDRESS
 L R3,=F'1024'
 L R4,TOADDR  GET TO ADDR 
 L R5,=F'1024'
 MVCL  R4,R2  MOVE IT 
 BCT   R9,LOOP2   LOOP THE NEEDED NUMBER OF TIMES 
 TIMEUSED STORADR=ENDTIME,CPU=TOD,LINKAGE=SYSTEM  
 SPACE 3  
 LAR1,=CL12'1K MVCL'  
 BAL   R14,TIMEOUT
The REPEATCOUNT value is 10,000,000
Both POOLADDR and TOADDR areas are 256K in size, so they both should be on page 
boundaries.

Chris Blaicher
Technical Architect
Precisely.com


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Charles Mills
Sent: Tuesday, October 20, 2020 1:57 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

This message originated Externally. Use proper judgement and caution with 
attachments, links, or responses.


Right.

I should have said "an interruptibility that is visible to the surrounding 
assembler instructions via the CC."

Charles


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
On Behalf Of Seymour J Metz
Sent: Tuesday, October 20, 2020 10:52 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

MVCL is, and always has been, interruptible.

Re: Conditional MVCL macro?

2020-10-20 Thread Charles Mills

Right.

I should have said "an interruptibility that is visible to the surrounding
assembler instructions via the CC."

Charles


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
On Behalf Of Seymour J Metz
Sent: Tuesday, October 20, 2020 10:52 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

MVCL is, and always has been, interruptible.

Re: Conditional MVCL macro?

2020-10-20 Thread baron_carter

COBOL version was 6.3 using ARCH(13) OPT(2)

-Original Message-
From: IBM Mainframe Assembler List  On
Behalf Of John Melcher
Sent: Tuesday, October 20, 2020 12:09
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

JES2 has had a $MVCL macro since SP2.2.0.
What version of COBOL, I wonder?

-Original Message-
From: IBM Mainframe Assembler List  On
Behalf Of Charles Mills
Sent: Tuesday, October 20, 2020 12:05 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

*** External email: Verify sender before opening attachments or links ***

Note that in neither case does it use MVCL or MVCLE!

The 4097 boundary may simply be a reasonableness thing, not a performance
thing. For a 150K move, 600-or-so MVC's in-line might be faster than a loop,
but does it really seem reasonable?

Slightly OT to 'move' but I find it interesting that in the second case it
uses LA Rx,256(,Rx) to increment the registers. I was told that AHI is
sometimes faster.

Charles

-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
On Behalf Of Mike Hochee
Sent: Tuesday, October 20, 2020 9:40 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

Really interesting thread to start the day with!

Our experience has been that the MVC loops are typically faster, up to a
point, that being about 30-40 instructions in the pipeline and as mentioned,
and this seemed very processor dependent. However when source and target
operands happen to both be aligned on a page boundary, then the opportunity
exists for the async data mover to kick in if a move long is being used.  I
think this applied to both MVCL and MVCLE, but not sure. So ideally a macro
would want to utilize both MVCs and MVCL/E.

More grist for the mill!

-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
On Behalf Of baron_car...@technologist.com
Sent: Tuesday, October 20, 2020 12:12 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

Caution! This message was sent from outside your organization.

The COBOL compiler for a 4000 byte move, from to the same with OPT(2)
generates

LAY R10,5072(,R9)   FROM
LA  R7,1072(,R9)  TO
MVC 0(256,R7),0(R10)
MVC 256(256,R7),256(R10)
MVC 512(256,R7),512(R10)
MVC 768(256,R7),768(R10)
MVC 1024(256,R7),1024(R10)
MVC 1280(256,R7),1280(R10)
MVC 1536(256,R7),1536(R10)
MVC 1792(256,R7),1792(R10)
MVC 2048(256,R7),2048(R10)
MVC 2304(256,R7),2304(R10)
MVC 2560(256,R7),2560(R10)
MVC 2816(256,R7),2816(R10)
MVC 3072(256,R7),3072(R10)
MVC 3328(256,R7),3328(R10)
MVC 3584(256,R7),3584(R10)
MVC 3840(160,R7),3840(R10)

However for 5000 bytes it generates:

LAY R7,6072(,R9)
LA  R10,0(,R7)
LA  R7,1072(,R9)
LHI R11,0x13
EQU *
MVC 0(256,R7),0(R10)
LA  R10,256(,R10)
LA  R7,256(,R7)
BRCTR11,L0128
MVC 0(136,R7),0(R10)

And yes the change occurred at 4097  bytes.

Re: Conditional MVCL macro?

2020-10-20 Thread Charles Mills

> What version of COBOL, I wonder?

Presumably 6. 

COBOL 4 does not have OPT(2) and COBOL 5 was mostly a non-starter.

Charles


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
On Behalf Of John Melcher
Sent: Tuesday, October 20, 2020 10:09 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

JES2 has had a $MVCL macro since SP2.2.0. 
What version of COBOL, I wonder?




The COBOL compiler for a 4000 byte move, from to the same with OPT(2)
generates

LAY R10,5072(,R9)   FROM
LA  R7,1072(,R9)  TO
...

Re: Conditional MVCL macro?

2020-10-20 Thread Seymour J Metz

MVCL is, and always has been, interruptible.


--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3


From: IBM Mainframe Assembler List [ASSEMBLER-LIST@LISTSERV.UGA.EDU] on behalf 
of Charles Mills [charl...@mcn.org]
Sent: Tuesday, October 20, 2020 11:54 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

@Ed, can you elaborate a little on your reasoning? (Not doubting it; just
curious.) Is it that the interruptibility provides a significant improvement
over MVCL? Or the support for lengths greater than 16M? Or ... ?

When I asked Dr. Shum about move strategies he seemed to indicate that for
data that was already or would soon anyway be in cache an MVC loop was
generally faster than MVCL. (I did not ask about MVCLE at the time; not sure
why. He did not suggest it.)

Charles


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
On Behalf Of Ed Jaffe
Sent: Tuesday, October 20, 2020 6:52 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

We've switched almost exclusively to MVCLE except for short,
fixed-length moves.

Re: Conditional MVCL macro?

2020-10-20 Thread Christopher Y. Blaicher

We just got a z15 and I have not tested MVCL vs MVC loop, but on all prior 
machines a MVC loop beat a MVCL up to about 32K.  Over 32K MVCL is the way to 
go.  In our environment we rarely are moving more than 32K.  We built a $MVC 
macro with 3 parameters, destination, source and length and use that.

FYI - MVCL is a micro-code (milli-code, call it what you want) instruction.  
There is a hefty startup and end cost to micro-code instructions.  MVCL only 
really gets going when it can use the internal move page function.  That has to 
be moving whole pages and they have to be page aligned.  CLCL and similar 
instructions, at least used to, suffer the same type of startup costs.

Chris Blaicher
Technical Architect
Precisely.com


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Mike Hochee
Sent: Tuesday, October 20, 2020 12:40 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

This message originated Externally. Use proper judgement and caution with 
attachments, links, or responses.


Really interesting thread to start the day with!

Our experience has been that the MVC loops are typically faster, up to a point, 
that being about 30-40 instructions in the pipeline and as mentioned,  and this 
seemed very processor dependent. However when source and target operands happen 
to both be aligned on a page boundary, then the opportunity exists for the 
async data mover to kick in if a move long is being used.  I think this applied 
to both MVCL and MVCLE, but not sure. So ideally a macro would want to utilize 
both MVCs and MVCL/E.

More grist for the mill!

-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of baron_car...@technologist.com
Sent: Tuesday, October 20, 2020 12:12 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

Caution! This message was sent from outside your organization.

The COBOL compiler for a 4000 byte move, from to the same with OPT(2) generates

LAY R10,5072(,R9)   FROM
LA  R7,1072(,R9)  TO
MVC 0(256,R7),0(R10)
MVC 256(256,R7),256(R10)
MVC 512(256,R7),512(R10)
MVC 768(256,R7),768(R10)
MVC 1024(256,R7),1024(R10)
MVC 1280(256,R7),1280(R10)
MVC 1536(256,R7),1536(R10)
MVC 1792(256,R7),1792(R10)
MVC 2048(256,R7),2048(R10)
MVC 2304(256,R7),2304(R10)
MVC 2560(256,R7),2560(R10)
MVC 2816(256,R7),2816(R10)
MVC 3072(256,R7),3072(R10)
MVC 3328(256,R7),3328(R10)
MVC 3584(256,R7),3584(R10)
MVC 3840(160,R7),3840(R10)

However for 5000 bytes it generates:

LAY R7,6072(,R9)
LA  R10,0(,R7)
LA  R7,1072(,R9)
LHI R11,0x13
EQU *
MVC 0(256,R7),0(R10)
LA  R10,256(,R10)
LA  R7,256(,R7)
BRCTR11,L0128
MVC 0(136,R7),0(R10)

And yes the change occurred at 4097  bytes.



-Original Message-
From: IBM Mainframe Assembler List  On Behalf 
Of Charles Mills
Sent: Tuesday, October 20, 2020 10:54
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

@Ed, can you elaborate a little on your reasoning? (Not doubting it; just
curious.) Is it that the interruptibility provides a significant improvement 
over MVCL? Or the support for lengths greater than 16M? Or ... ?

When I asked Dr. Shum about move strategies he seemed to indicate that for data 
that was already or would soon anyway be in cache an MVC loop was generally 
faster than MVCL. (I did not ask about MVCLE at the time; not sure why. He did 
not suggest it.)

Charles


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
On Behalf Of Ed Jaffe
Sent: Tuesday, October 20, 2020 6:52 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

We've switched almost exclusively to MVCLE except for short, fixed-length moves.

Re: Conditional MVCL macro?

2020-10-20 Thread Gary Weinhold


Our testing on a Z14 (MVS under VM), MVCL was considerably slower than a
256-byte MVC loop plus an executed MVC for various unaligned data
lengths from 40 bytes to 32K.

For zeroing memory up to 1G, XC in a loop was about the same as MVCL up
to 256 bytes, then MVCL was faster (MVCLE was slightly slower even when
the MVCL had to be looped)).  MVCL was also faster than MVPG, DSPSERV
RELEASE, PGSER in general, except when page aligned for MVPG.

On 2020-10-20 12:39 p.m., Mike Hochee wrote:

Really interesting thread to start the day with!

Our experience has been that the MVC loops are typically faster, up to a point, 
that being about 30-40 instructions in the pipeline and as mentioned,  and this 
seemed very processor dependent. However when source and target operands happen 
to both be aligned on a page boundary, then the opportunity exists for the 
async data mover to kick in if a move long is being used.  I think this applied 
to both MVCL and MVCLE, but not sure. So ideally a macro would want to utilize 
both MVCs and MVCL/E.

More grist for the mill!



Gary Weinhold
Senior Application Architect
DATAKINETICS | Data Performance & Optimization
Phone:+1.613.523.5500 x216
Email: weinh...@dkl.com
Visit us online at www.DKL.com
E-mail Notification: The information contained in this email and any 
attachments is confidential and may be subject to copyright or other 
intellectual property protection. If you are not the intended recipient, you 
are not authorized to use or disclose this information, and we request that you 
notify us by reply mail or telephone and delete the original message from your 
mail system.


-Original Message-

From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of baron_car...@technologist.com
Sent: Tuesday, October 20, 2020 12:12 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

Caution! This message was sent from outside your organization.

The COBOL compiler for a 4000 byte move, from to the same with OPT(2) generates

LAY R10,5072(,R9)   FROM
LA  R7,1072(,R9)  TO
MVC 0(256,R7),0(R10)
MVC 256(256,R7),256(R10)
MVC 512(256,R7),512(R10)
MVC 768(256,R7),768(R10)
MVC 1024(256,R7),1024(R10)
MVC 1280(256,R7),1280(R10)
MVC 1536(256,R7),1536(R10)
MVC 1792(256,R7),1792(R10)
MVC 2048(256,R7),2048(R10)
MVC 2304(256,R7),2304(R10)
MVC 2560(256,R7),2560(R10)
MVC 2816(256,R7),2816(R10)
MVC 3072(256,R7),3072(R10)
MVC 3328(256,R7),3328(R10)
MVC 3584(256,R7),3584(R10)
MVC 3840(160,R7),3840(R10)

However for 5000 bytes it generates:

LAY R7,6072(,R9)
LA  R10,0(,R7)
LA  R7,1072(,R9)
LHI R11,0x13
EQU *
MVC 0(256,R7),0(R10)
LA  R10,256(,R10)
LA  R7,256(,R7)
BRCTR11,L0128
MVC 0(136,R7),0(R10)

And yes the change occurred at 4097  bytes.



-Original Message-
From: IBM Mainframe Assembler List  On Behalf 
Of Charles Mills
Sent: Tuesday, October 20, 2020 10:54
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

@Ed, can you elaborate a little on your reasoning? (Not doubting it; just
curious.) Is it that the interruptibility provides a significant improvement 
over MVCL? Or the support for lengths greater than 16M? Or ... ?

When I asked Dr. Shum about move strategies he seemed to indicate that for data 
that was already or would soon anyway be in cache an MVC loop was generally 
faster than MVCL. (I did not ask about MVCLE at the time; not sure why. He did 
not suggest it.)

Charles


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
On Behalf Of Ed Jaffe
Sent: Tuesday, October 20, 2020 6:52 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

We've switched almost exclusively to MVCLE except for short, fixed-length moves.

Re: Conditional MVCL macro?

2020-10-20 Thread Ed Jaffe


On 10/20/2020 8:54 AM, Charles Mills wrote:

@Ed, can you elaborate a little on your reasoning? (Not doubting it; just
curious.) Is it that the interruptibility provides a significant improvement
over MVCL? Or the support for lengths greater than 16M? Or ... ?
MVCL with anything other than zero pad requires an extra instruction and 
we've been burned more than once with >16M lengths not being handled right.

When I asked Dr. Shum about move strategies he seemed to indicate that for
data that was already or would soon anyway be in cache an MVC loop was
generally faster than MVCL. (I did not ask about MVCLE at the time; not sure
why. He did not suggest it.)


Oh yes, we have code in a performance path that moves 4K blocks on a 4K 
boundary using 16 256-byte MVCs.


I did say "almost" exclusively... ;-)

What I meant to say is we've replaced nearly all MVCLs with MVCLEs. The 
biggest exception is the x'B0' and x'B8' stuff for non-padded moves we 
use in a few places. To be honest, we didn't even research those to see 
if there were MVCLE equivalents. We just left that working code alone...


--
Phoenix Software International
Edward E. Jaffe
831 Parkview Drive North
El Segundo, CA 90245
https://www.phoenixsoftware.com/



This e-mail message, including any attachments, appended messages and the
information contained therein, is for the sole use of the intended
recipient(s). If you are not an intended recipient or have otherwise
received this email message in error, any use, dissemination, distribution,
review, storage or copying of this e-mail message and the information
contained therein is strictly prohibited. If you are not an intended
recipient, please contact the sender by reply e-mail and destroy all copies
of this email message and do not otherwise utilize or retain this email
message or any or all of the information contained therein. Although this
email message and any attachments or appended messages are believed to be
free of any virus or other defect that might affect any computer system into
which it is received and opened, it is the responsibility of the recipient
to ensure that it is virus free and no responsibility is accepted by the
sender for any loss or damage arising in any way from its opening or use.

Re: Conditional MVCL macro?

2020-10-20 Thread John Melcher

JES2 has had a $MVCL macro since SP2.2.0. 
What version of COBOL, I wonder?

-Original Message-
From: IBM Mainframe Assembler List  On Behalf 
Of Charles Mills
Sent: Tuesday, October 20, 2020 12:05 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

*** External email: Verify sender before opening attachments or links ***

Note that in neither case does it use MVCL or MVCLE!

The 4097 boundary may simply be a reasonableness thing, not a performance 
thing. For a 150K move, 600-or-so MVC's in-line might be faster than a loop, 
but does it really seem reasonable?

Slightly OT to 'move' but I find it interesting that in the second case it uses 
LA Rx,256(,Rx) to increment the registers. I was told that AHI is sometimes 
faster.

Charles

-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
On Behalf Of Mike Hochee
Sent: Tuesday, October 20, 2020 9:40 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

Really interesting thread to start the day with!

Our experience has been that the MVC loops are typically faster, up to a point, 
that being about 30-40 instructions in the pipeline and as mentioned, and this 
seemed very processor dependent. However when source and target operands happen 
to both be aligned on a page boundary, then the opportunity exists for the 
async data mover to kick in if a move long is being used.  I think this applied 
to both MVCL and MVCLE, but not sure. So ideally a macro would want to utilize 
both MVCs and MVCL/E.

More grist for the mill!

-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
On Behalf Of baron_car...@technologist.com
Sent: Tuesday, October 20, 2020 12:12 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

Caution! This message was sent from outside your organization.

The COBOL compiler for a 4000 byte move, from to the same with OPT(2) generates

LAY R10,5072(,R9)   FROM
LA  R7,1072(,R9)  TO
MVC 0(256,R7),0(R10)
MVC 256(256,R7),256(R10)
MVC 512(256,R7),512(R10)
MVC 768(256,R7),768(R10)
MVC 1024(256,R7),1024(R10)
MVC 1280(256,R7),1280(R10)
MVC 1536(256,R7),1536(R10)
MVC 1792(256,R7),1792(R10)
MVC 2048(256,R7),2048(R10)
MVC 2304(256,R7),2304(R10)
MVC 2560(256,R7),2560(R10)
MVC 2816(256,R7),2816(R10)
MVC 3072(256,R7),3072(R10)
MVC 3328(256,R7),3328(R10)
MVC 3584(256,R7),3584(R10)
MVC 3840(160,R7),3840(R10)

However for 5000 bytes it generates:

LAY R7,6072(,R9)
LA  R10,0(,R7)
LA  R7,1072(,R9)
LHI R11,0x13
EQU *
MVC 0(256,R7),0(R10)
LA  R10,256(,R10)
LA  R7,256(,R7)
BRCTR11,L0128
MVC 0(136,R7),0(R10)

And yes the change occurred at 4097  bytes.

Re: Conditional MVCL macro?

2020-10-20 Thread Charles Mills

Note that in neither case does it use MVCL or MVCLE!

The 4097 boundary may simply be a reasonableness thing, not a performance
thing. For a 150K move, 600-or-so MVC's in-line might be faster than a loop,
but does it really seem reasonable?

Slightly OT to 'move' but I find it interesting that in the second case it
uses LA Rx,256(,Rx) to increment the registers. I was told that AHI is
sometimes faster.

Charles


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
On Behalf Of Mike Hochee
Sent: Tuesday, October 20, 2020 9:40 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

Really interesting thread to start the day with! 

Our experience has been that the MVC loops are typically faster, up to a
point, that being about 30-40 instructions in the pipeline and as mentioned,
and this seemed very processor dependent. However when source and target
operands happen to both be aligned on a page boundary, then the opportunity
exists for the async data mover to kick in if a move long is being used.  I
think this applied to both MVCL and MVCLE, but not sure. So ideally a macro
would want to utilize both MVCs and MVCL/E. 

More grist for the mill!   

-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
On Behalf Of baron_car...@technologist.com
Sent: Tuesday, October 20, 2020 12:12 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

Caution! This message was sent from outside your organization.

The COBOL compiler for a 4000 byte move, from to the same with OPT(2)
generates

LAY R10,5072(,R9)   FROM
LA  R7,1072(,R9)  TO
MVC 0(256,R7),0(R10)
MVC 256(256,R7),256(R10)
MVC 512(256,R7),512(R10)
MVC 768(256,R7),768(R10)
MVC 1024(256,R7),1024(R10)
MVC 1280(256,R7),1280(R10)
MVC 1536(256,R7),1536(R10)
MVC 1792(256,R7),1792(R10)
MVC 2048(256,R7),2048(R10)
MVC 2304(256,R7),2304(R10)
MVC 2560(256,R7),2560(R10)
MVC 2816(256,R7),2816(R10)
MVC 3072(256,R7),3072(R10)
MVC 3328(256,R7),3328(R10)
MVC 3584(256,R7),3584(R10)
MVC 3840(160,R7),3840(R10)

However for 5000 bytes it generates:

LAY R7,6072(,R9)
LA  R10,0(,R7)
LA  R7,1072(,R9)
LHI R11,0x13
EQU *
MVC 0(256,R7),0(R10)
LA  R10,256(,R10)
LA  R7,256(,R7)
BRCTR11,L0128
MVC 0(136,R7),0(R10)

And yes the change occurred at 4097  bytes.

Re: Conditional MVCL macro?

2020-10-20 Thread Mike Hochee

Really interesting thread to start the day with! 

Our experience has been that the MVC loops are typically faster, up to a point, 
that being about 30-40 instructions in the pipeline and as mentioned,  and this 
seemed very processor dependent. However when source and target operands happen 
to both be aligned on a page boundary, then the opportunity exists for the 
async data mover to kick in if a move long is being used.  I think this applied 
to both MVCL and MVCLE, but not sure. So ideally a macro would want to utilize 
both MVCs and MVCL/E. 

More grist for the mill!   

-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of baron_car...@technologist.com
Sent: Tuesday, October 20, 2020 12:12 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

Caution! This message was sent from outside your organization.

The COBOL compiler for a 4000 byte move, from to the same with OPT(2) generates

LAY R10,5072(,R9)   FROM
LA  R7,1072(,R9)  TO
MVC 0(256,R7),0(R10)
MVC 256(256,R7),256(R10)
MVC 512(256,R7),512(R10)
MVC 768(256,R7),768(R10)
MVC 1024(256,R7),1024(R10)
MVC 1280(256,R7),1280(R10)
MVC 1536(256,R7),1536(R10)
MVC 1792(256,R7),1792(R10)
MVC 2048(256,R7),2048(R10)
MVC 2304(256,R7),2304(R10)
MVC 2560(256,R7),2560(R10)
MVC 2816(256,R7),2816(R10)
MVC 3072(256,R7),3072(R10)
MVC 3328(256,R7),3328(R10)
MVC 3584(256,R7),3584(R10)
MVC 3840(160,R7),3840(R10)

However for 5000 bytes it generates:

LAY R7,6072(,R9)
LA  R10,0(,R7)
LA  R7,1072(,R9)
LHI R11,0x13
EQU *
MVC 0(256,R7),0(R10)
LA  R10,256(,R10)
LA  R7,256(,R7)
BRCTR11,L0128
MVC 0(136,R7),0(R10)

And yes the change occurred at 4097  bytes.



-Original Message-
From: IBM Mainframe Assembler List  On Behalf 
Of Charles Mills
Sent: Tuesday, October 20, 2020 10:54
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

@Ed, can you elaborate a little on your reasoning? (Not doubting it; just
curious.) Is it that the interruptibility provides a significant improvement 
over MVCL? Or the support for lengths greater than 16M? Or ... ?

When I asked Dr. Shum about move strategies he seemed to indicate that for data 
that was already or would soon anyway be in cache an MVC loop was generally 
faster than MVCL. (I did not ask about MVCLE at the time; not sure why. He did 
not suggest it.)

Charles


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
On Behalf Of Ed Jaffe
Sent: Tuesday, October 20, 2020 6:52 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

We've switched almost exclusively to MVCLE except for short, fixed-length moves.

Re: Conditional MVCL macro?

2020-10-20 Thread baron_carter

The COBOL compiler for a 4000 byte move, from to the same with OPT(2)
generates

LAY R10,5072(,R9)   FROM
LA  R7,1072(,R9)  TO
MVC 0(256,R7),0(R10)
MVC 256(256,R7),256(R10)
MVC 512(256,R7),512(R10)
MVC 768(256,R7),768(R10)
MVC 1024(256,R7),1024(R10)
MVC 1280(256,R7),1280(R10)
MVC 1536(256,R7),1536(R10)
MVC 1792(256,R7),1792(R10)
MVC 2048(256,R7),2048(R10)
MVC 2304(256,R7),2304(R10)
MVC 2560(256,R7),2560(R10)
MVC 2816(256,R7),2816(R10)
MVC 3072(256,R7),3072(R10)
MVC 3328(256,R7),3328(R10)
MVC 3584(256,R7),3584(R10)
MVC 3840(160,R7),3840(R10)

However for 5000 bytes it generates:

LAY R7,6072(,R9)
LA  R10,0(,R7)
LA  R7,1072(,R9)
LHI R11,0x13
EQU *
MVC 0(256,R7),0(R10)
LA  R10,256(,R10)
LA  R7,256(,R7)
BRCTR11,L0128
MVC 0(136,R7),0(R10)

And yes the change occurred at 4097  bytes.



-Original Message-
From: IBM Mainframe Assembler List  On
Behalf Of Charles Mills
Sent: Tuesday, October 20, 2020 10:54
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

@Ed, can you elaborate a little on your reasoning? (Not doubting it; just
curious.) Is it that the interruptibility provides a significant improvement
over MVCL? Or the support for lengths greater than 16M? Or ... ?

When I asked Dr. Shum about move strategies he seemed to indicate that for
data that was already or would soon anyway be in cache an MVC loop was
generally faster than MVCL. (I did not ask about MVCLE at the time; not sure
why. He did not suggest it.)

Charles


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
On Behalf Of Ed Jaffe
Sent: Tuesday, October 20, 2020 6:52 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

We've switched almost exclusively to MVCLE except for short, fixed-length
moves.

Re: Conditional MVCL macro?

2020-10-20 Thread Farley, Peter x23353

I don't know how today's machines (z13 and up) perform, but back when I had 
access to Strobe it regularly pointed out long MVCL / CLCL instructions 
generated by COBOL 4.2 (in the specific application case I was working on these 
were usually around 8K bytes) as relatively large "hot spots" of CPU usage.  
Mitigating how often those moves and compares were actually needed (as opposed 
to blind usage) saved us something on the order of 3-5% average CPU time.

Our current performance analyzer is useless, so I can't tell you what happens 
now that we are on reasonably current generation z and using COBOL 6.2.

I like Dave's suggestion, it seems a reasonable compromise when you have the 
option (or need) of coding in assembler.

Peter

-Original Message-
From: IBM Mainframe Assembler List  On Behalf 
Of Thomas David Rivers
Sent: Tuesday, October 20, 2020 9:06 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

> 
> What is the effect of the conditional branch and the EX on the pipeline? Are 
> the performance tradeoffs the same on all supported processors? Also, tuning 
> code for a current processor may slow it down on a new one.
> 
> 
> --
> Shmuel (Seymour J.) Metz
> https://urldefense.com/v3/__http://mason.gmu.edu/*smetz3__;fg!!Ebr-cpP
> eAnfNniQ8HSAI-g_K5b7VKg!bX31ApFbaISNX6nSDgPjHkDZ-rYYj9xqye_K7xbGA8eNl8
> dq0VYfrx7W5BL6q4-EazeBzQ$

 In *very* casual tests we and some customers did, we determined  that this 
general scenerio seems to be a good approach for  moving bytes with a constant 
length:

 sizes less than 1024:
   generate up to 4 MVCs in a row
   
 sizes greater than or equal to 1024:
   if MVCLE is allowed (there is a compiler option for this)
   then use MVCLE
 
   otherwise:
 generate a loop of MVCs updating the src/target
 address and lengths as needed (you don't need an EX
 for this.)   Basically divide the length by 256
 and loop moving 256 bytes at a time by that count;
 then get the modulus of the length by 256 and
 move those remaining bytes (since the length is constant,
 the division and mod operations provide constants.)

 That seems to be a good balance between code-size and speed. 
 And, the loop is small enough that it probably fits in the  machines 
instruction-cache, so hopefully the branch back  (a BCTR back to the MVC) isn't 
that painful.

 Just some thoughts...

- Dave R. -

--

This message and any attachments are intended only for the use of the addressee 
and may contain information that is privileged and confidential. If the reader 
of the message is not the intended recipient or an authorized representative of 
the intended recipient, you are hereby notified that any dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please notify us immediately by e-mail and delete the message and any 
attachments from your system.

Re: Conditional MVCL macro?

2020-10-20 Thread Charles Mills

@Ed, can you elaborate a little on your reasoning? (Not doubting it; just
curious.) Is it that the interruptibility provides a significant improvement
over MVCL? Or the support for lengths greater than 16M? Or ... ?

When I asked Dr. Shum about move strategies he seemed to indicate that for
data that was already or would soon anyway be in cache an MVC loop was
generally faster than MVCL. (I did not ask about MVCLE at the time; not sure
why. He did not suggest it.)

Charles


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
On Behalf Of Ed Jaffe
Sent: Tuesday, October 20, 2020 6:52 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Conditional MVCL macro?

We've switched almost exclusively to MVCLE except for short, 
fixed-length moves.

Re: Conditional MVCL macro?

2020-10-20 Thread Ed Jaffe

We've switched almost exclusively to MVCLE except for short, 
fixed-length moves.


On 10/20/2020 5:42 AM, Tony Thigpen wrote:
I have several programs that work with buffers and moving random 
length data around using MVCLs. I am considering writing a 
'conditional MVCL' macro that, at runtime, looks at the lengths and 
either executes the MVCL or bypasses it and uses a MVC via EX.


I know this would generate a longer code segment due to the dual-path.

1) With new machines, I wonder if the micro-code/mili-code already 
optimizes the MVCL making this a null-issue?


2) Is anyone else willing to share an existing macro that performs 
this function?






This e-mail message, including any attachments, appended messages and the
information contained therein, is for the sole use of the intended
recipient(s). If you are not an intended recipient or have otherwise
received this email message in error, any use, dissemination, distribution,
review, storage or copying of this e-mail message and the information
contained therein is strictly prohibited. If you are not an intended
recipient, please contact the sender by reply e-mail and destroy all copies
of this email message and do not otherwise utilize or retain this email
message or any or all of the information contained therein. Although this
email message and any attachments or appended messages are believed to be
free of any virus or other defect that might affect any computer system into
which it is received and opened, it is the responsibility of the recipient
to ensure that it is virus free and no responsibility is accepted by the
sender for any loss or damage arising in any way from its opening or use.

SV: Conditional MVCL macro?

2020-10-20 Thread Willy Jensen

Extract from a larger macro. And no, I was not overly concerned with 
performance.

.* r15 :  length  
.* r14 -> source  
.* r0  -> target  
.* r1 used when short copy
.* select method  
 clfi  r15,255  if source length  
 jh  gt 255 then use movelong
 bctr  r15,0  
 lrr1,r0copy target address   
 exr15,  
 j
mvc   0(*-*,r1),0(r14) short copy
lrr1,r15   copy length   
 mvcl  r0,r14   long copy 
 ds0h 

Willy

-Oprindelig meddelelse-
Fra: IBM Mainframe Assembler List  På vegne af 
Tony Thigpen
Sendt: 20. oktober 2020 14:43
Til: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Emne: Conditional MVCL macro?

I have several programs that work with buffers and moving random length data 
around using MVCLs. I am considering writing a 'conditional MVCL' 
macro that, at runtime, looks at the lengths and either executes the MVCL or 
bypasses it and uses a MVC via EX.

I know this would generate a longer code segment due to the dual-path.

1) With new machines, I wonder if the micro-code/mili-code already optimizes 
the MVCL making this a null-issue?

2) Is anyone else willing to share an existing macro that performs this 
function?



Tony Thigpen

Re: Conditional MVCL macro?

2020-10-20 Thread Thomas David Rivers

> 
> What is the effect of the conditional branch and the EX on the pipeline? Are 
> the performance tradeoffs the same on all supported processors? Also, tuning 
> code for a current processor may slow it down on a new one.
> 
> 
> --
> Shmuel (Seymour J.) Metz
> http://mason.gmu.edu/~smetz3

 In *very* casual tests we and some customers did, we determined 
 that this general scenerio seems to be a good approach for
 moving bytes with a constant length:

 sizes less than 1024:
   generate up to 4 MVCs in a row
   
 sizes greater than or equal to 1024:
   if MVCLE is allowed (there is a compiler option for this)
   then use MVCLE
 
   otherwise:
 generate a loop of MVCs updating the src/target
 address and lengths as needed (you don't need an EX
 for this.)   Basically divide the length by 256
 and loop moving 256 bytes at a time by that count;
 then get the modulus of the length by 256 and
 move those remaining bytes (since the length is constant,
 the division and mod operations provide constants.)

 That seems to be a good balance between code-size and speed. 
 And, the loop is small enough that it probably fits in the 
 machines instruction-cache, so hopefully the branch back
 (a BCTR back to the MVC) isn't that painful.

 Just some thoughts...

- Dave R. -

--
riv...@dignus.comWork: (919) 676-0847
Get your mainframe programming tools at http://www.dignus.com

Re: Conditional MVCL macro?

2020-10-20 Thread Seymour J Metz

What is the effect of the conditional branch and the EX on the pipeline? Are 
the performance tradeoffs the same on all supported processors? Also, tuning 
code for a current processor may slow it down on a new one.


--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3


From: IBM Mainframe Assembler List [ASSEMBLER-LIST@LISTSERV.UGA.EDU] on behalf 
of Tony Thigpen [t...@vse2pdf.com]
Sent: Tuesday, October 20, 2020 8:42 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Conditional MVCL macro?

I have several programs that work with buffers and moving random length
data around using MVCLs. I am considering writing a 'conditional MVCL'
macro that, at runtime, looks at the lengths and either executes the
MVCL or bypasses it and uses a MVC via EX.

I know this would generate a longer code segment due to the dual-path.

1) With new machines, I wonder if the micro-code/mili-code already
optimizes the MVCL making this a null-issue?

2) Is anyone else willing to share an existing macro that performs this
function?



Tony Thigpen

Conditional MVCL macro?

2020-10-20 Thread Tony Thigpen

I have several programs that work with buffers and moving random length 
data around using MVCLs. I am considering writing a 'conditional MVCL' 
macro that, at runtime, looks at the lengths and either executes the 
MVCL or bypasses it and uses a MVC via EX.


I know this would generate a longer code segment due to the dual-path.

1) With new machines, I wonder if the micro-code/mili-code already 
optimizes the MVCL making this a null-issue?


2) Is anyone else willing to share an existing macro that performs this 
function?




Tony Thigpen

38 matches

Mail list logo