Re: Conditional MVCL macro?
Can you share your $MVC macro? Brian On Tue, 20 Oct 2020 17:42:55 +, Christopher Y. Blaicher wrote: >We just got a z15 and I have not tested MVCL vs MVC loop, but on all prior >machines a MVC loop beat a MVCL up to about 32K. Over 32K MVCL is the way to >go. In our environment we rarely are moving more than 32K. We built a $MVC >macro with 3 parameters, destination, source and length and use that. > >FYI - MVCL is a micro-code (milli-code, call it what you want) instruction. >There is a hefty startup and end cost to micro-code instructions. MVCL only >really gets going when it can use the internal move page function. That has >to be moving whole pages and they have to be page aligned. CLCL and similar >instructions, at least used to, suffer the same type of startup costs. > >Chris Blaicher >Technical Architect >Precisely.com > > >-Original Message- >From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On >Behalf Of Mike Hochee >Sent: Tuesday, October 20, 2020 12:40 PM >To: ASSEMBLER-LIST@LISTSERV.UGA.EDU >Subject: Re: Conditional MVCL macro? > >This message originated Externally. Use proper judgement and caution with >attachments, links, or responses. > > >Really interesting thread to start the day with! > >Our experience has been that the MVC loops are typically faster, up to a >point, that being about 30-40 instructions in the pipeline and as mentioned, >and this seemed very processor dependent. However when source and target >operands happen to both be aligned on a page boundary, then the opportunity >exists for the async data mover to kick in if a move long is being used. I >think this applied to both MVCL and MVCLE, but not sure. So ideally a macro >would want to utilize both MVCs and MVCL/E. > >More grist for the mill! > >-Original Message- >From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On >Behalf Of baron_car...@technologist.com >Sent: Tuesday, October 20, 2020 12:12 PM >To: ASSEMBLER-LIST@LISTSERV.UGA.EDU >Subject: Re: Conditional MVCL macro? > >Caution! This message was sent from outside your organization. > >The COBOL compiler for a 4000 byte move, from to the same with OPT(2) generates > >LAY R10,5072(,R9) FROM >LA R7,1072(,R9) TO >MVC 0(256,R7),0(R10) >MVC 256(256,R7),256(R10) >MVC 512(256,R7),512(R10) >MVC 768(256,R7),768(R10) >MVC 1024(256,R7),1024(R10) >MVC 1280(256,R7),1280(R10) >MVC 1536(256,R7),1536(R10) >MVC 1792(256,R7),1792(R10) >MVC 2048(256,R7),2048(R10) >MVC 2304(256,R7),2304(R10) >MVC 2560(256,R7),2560(R10) >MVC 2816(256,R7),2816(R10) >MVC 3072(256,R7),3072(R10) >MVC 3328(256,R7),3328(R10) >MVC 3584(256,R7),3584(R10) >MVC 3840(160,R7),3840(R10) > >However for 5000 bytes it generates: > >LAY R7,6072(,R9) >LA R10,0(,R7) >LA R7,1072(,R9) >LHI R11,0x13 >EQU * >MVC 0(256,R7),0(R10) >LA R10,256(,R10) >LA R7,256(,R7) >BRCTR11,L0128 >MVC 0(136,R7),0(R10) > >And yes the change occurred at 4097 bytes. > > > >-Original Message- >From: IBM Mainframe Assembler List On Behalf >Of Charles Mills >Sent: Tuesday, October 20, 2020 10:54 >To: ASSEMBLER-LIST@LISTSERV.UGA.EDU >Subject: Re: Conditional MVCL macro? > >@Ed, can you elaborate a little on your reasoning? (Not doubting it; just >curious.) Is it that the interruptibility provides a significant improvement >over MVCL? Or the support for lengths greater than 16M? Or ... ? > >When I asked Dr. Shum about move strategies he seemed to indicate that for >data that was already or would soon anyway be in cache an MVC loop was >generally faster than MVCL. (I did not ask about MVCLE at the time; not sure >why. He did not suggest it.) > >Charles > > >-Original Message- >From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] >On Behalf Of Ed Jaffe >Sent: Tuesday, October 20, 2020 6:52 AM >To: ASSEMBLER-LIST@LISTSERV.UGA.EDU >Subject: Re: Conditional MVCL macro? > >We've switched almost exclusively to MVCLE except for short, fixed-length >moves.
Re: Conditional MVCL macro?
One of the questions that I always ask in response to concerns about whether one instruction is better than another is, "Do you ever expect to recoup the number of instructions needed to reassemble, rebind, and retest the code?" Assuming that the answer is "Yes, this code is executed a gazillion times a second," then I would ask whether the data being moved are (a) going to be manipulated by the CPU in the near future, or (b) on its way to an output buffer that the CPU won't touch again. If (a), then the question of how much data is being moved comes into question. If you're moving a small amount of data (say, less than 4K), then one or more MVCs is probably a good choice. If you're moving a gazillion bytes of data (i.e., more than the CPU's cache size), then (by default) you're assured that whatever was last moved is what's in the cache when the instruction completes (which may or may not be what you intended) ... so the answer implicitly looks more like (b). To assert control over whether MVCL wipes out the cache, check out the discussion of the special pad characters X'B0' and X'B8' used by the instruction on page 7-291 (RHC) of the "z/Architecture Principles of Operation" (SA22-7821-12); similarly for MVCLE, see page 7-296 (LHC). Additionally, both instructions provide a special pad character (X'B1'), determining whether the instruction can perform multiple access references to the data (which is really only interesting if other CPUs are simultaneously observing the same locations in memory). Kevin Shum's seminal work on processor optimization, "IBM z Systems Processor Optimization Primer" (which discusses MVCL and MVCLE), can be found at https://community.ibm.com/HigherLogic/System/DownloadDocumentFile.ashx?DocumentFileKey=d1cdb394-0159-464c-92a3-3f74f8c545c4=0.
Re: Conditional MVCL macro?
Likewise CLCL. -- Shmuel (Seymour J.) Metz http://mason.gmu.edu/~smetz3 From: IBM Mainframe Assembler List [ASSEMBLER-LIST@LISTSERV.UGA.EDU] on behalf of Charles Mills [charl...@mcn.org] Sent: Tuesday, October 20, 2020 6:35 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? Unless I am thinking fuzzily, an interrupted MVCL leaves the PSW pointing to the MVCL (not past it) and the relevant registers incremented and decremented appropriately, so the supervisor may dispatch other tasks on the affected CPU, let them run as they will, and then resume the interrupted task when appropriate. The task will take off with the MVCL continuing from where it left off. Charles -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Paul Gilmartin Sent: Tuesday, October 20, 2020 3:02 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? On 2020-10-20, at 14:58:52, Steve Smith wrote: > > There's actually a big difference between MVCL being interruptible, and > MVCLE stopping periodically before it's finished. The latter is not > interruptible, it just stops before completion periodically for the program > to do something else if it wants to. ... > I thought it was so that supervisor could dispatch another task. -- gil
Re: Conditional MVCL macro?
I'm pretty sure that testing for pending interrupts didn't slow down CLCL or MVCL on the 370/165 or 370/168. But the microinstruction was 108 bits; longer if you had an emulator feature. -- Shmuel (Seymour J.) Metz http://mason.gmu.edu/~smetz3 From: IBM Mainframe Assembler List [ASSEMBLER-LIST@LISTSERV.UGA.EDU] on behalf of Steve Smith [sasd...@gmail.com] Sent: Tuesday, October 20, 2020 6:36 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? Interrupts can only be handled between instructions (don't ask me how pipelining figures) except, MVCL has the potential to delay that too long, so it (and a handful of others) were made to be interruptible. Probably, that just means the micro/milli-code program gets interrupted between micro/milli-code instructions. Possibly, that affects its performance negatively. Anyway, for MVCLE, the 4K limit means it can't run very long anyway, so yeah, it does help keep interrupts flowing. sas On Tue, Oct 20, 2020 at 6:16 PM Keven wrote: > > > > > I’d say you were both correct. > Keven > > > > > > > > > > > On Tue, Oct 20, 2020 at 5:01 PM -0500, "Paul Gilmartin" < > 0014e0e4a59b-dmarc-requ...@listserv.uga.edu> wrote: > > > > > > > > > > > On 2020-10-20, at 14:58:52, Steve Smith wrote: > > > > There's actually a big difference between MVCL being interruptible, and > > MVCLE stopping periodically before it's finished. The latter is not > > interruptible, it just stops before completion periodically for the > program > > to do something else if it wants to. ... > > > I thought it was so that supervisor could dispatch another task. > > -- gil >
Re: Another Macro question
umm, another meme looking for idle minds to prosper in? I use eyecatchers A LOT, and I want every instance of the string to be easily find-able. Cheers from Fort Lockdown, The Masked Marauder > On 21 Oct 2020, at 9:12 am, Keven wrote: > > > > > >For the 0.75 seconds it would take to differentiate an eye catching > string in program storage from an instance of it in in use to nominate a > control block, I have to wonder if this is a solution looking for a problem > to solve. > Keven > > > > > > > > > > > On Tue, Oct 20, 2020 at 4:22 PM -0500, "Steve Smith" > wrote: > > > > > > > > > > > It's not complicated if you know how to manipulate strings in Conditional > Assembly. There's no much way around the fact it's vanity programming > though. > > Besides, the technique is defeated by palindromes :-). > > sas > > > On Tue, Oct 20, 2020 at 4:16 PM Tom Harper > wrote: > >> I already have such a macro. I’ll post it later. >> >>
Re: Conditional MVCL macro?
So it was written, and it is so done. sas On Tue, Oct 20, 2020 at 6:35 PM Charles Mills wrote: > Unless I am thinking fuzzily, an interrupted MVCL leaves the PSW pointing > to > the MVCL (not past it) and the relevant registers incremented and > decremented appropriately, so the supervisor may dispatch other tasks on > the > affected CPU, let them run as they will, and then resume the interrupted > task when appropriate. The task will take off with the MVCL continuing from > where it left off. > >
Re: Conditional MVCL macro?
Interrupts can only be handled between instructions (don't ask me how pipelining figures) except, MVCL has the potential to delay that too long, so it (and a handful of others) were made to be interruptible. Probably, that just means the micro/milli-code program gets interrupted between micro/milli-code instructions. Possibly, that affects its performance negatively. Anyway, for MVCLE, the 4K limit means it can't run very long anyway, so yeah, it does help keep interrupts flowing. sas On Tue, Oct 20, 2020 at 6:16 PM Keven wrote: > > > > > I’d say you were both correct. > Keven > > > > > > > > > > > On Tue, Oct 20, 2020 at 5:01 PM -0500, "Paul Gilmartin" < > 0014e0e4a59b-dmarc-requ...@listserv.uga.edu> wrote: > > > > > > > > > > > On 2020-10-20, at 14:58:52, Steve Smith wrote: > > > > There's actually a big difference between MVCL being interruptible, and > > MVCLE stopping periodically before it's finished. The latter is not > > interruptible, it just stops before completion periodically for the > program > > to do something else if it wants to. ... > > > I thought it was so that supervisor could dispatch another task. > > -- gil >
Re: Conditional MVCL macro?
Unless I am thinking fuzzily, an interrupted MVCL leaves the PSW pointing to the MVCL (not past it) and the relevant registers incremented and decremented appropriately, so the supervisor may dispatch other tasks on the affected CPU, let them run as they will, and then resume the interrupted task when appropriate. The task will take off with the MVCL continuing from where it left off. Charles -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Paul Gilmartin Sent: Tuesday, October 20, 2020 3:02 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? On 2020-10-20, at 14:58:52, Steve Smith wrote: > > There's actually a big difference between MVCL being interruptible, and > MVCLE stopping periodically before it's finished. The latter is not > interruptible, it just stops before completion periodically for the program > to do something else if it wants to. ... > I thought it was so that supervisor could dispatch another task. -- gil
Re: Another Macro question
A simpler solution would be MVC Real_Eyecatcher(8),=C'MY BLOCK-not really, just the literal' Charles -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Keven Sent: Tuesday, October 20, 2020 3:12 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Another Macro question For the 0.75 seconds it would take to differentiate an eye catching string in program storage from an instance of it in in use to nominate a control block, I have to wonder if this is a solution looking for a problem to solve. Keven
Re: Conditional MVCL macro?
I’d say you were both correct. Keven On Tue, Oct 20, 2020 at 5:01 PM -0500, "Paul Gilmartin" <0014e0e4a59b-dmarc-requ...@listserv.uga.edu> wrote: On 2020-10-20, at 14:58:52, Steve Smith wrote: > > There's actually a big difference between MVCL being interruptible, and > MVCLE stopping periodically before it's finished. The latter is not > interruptible, it just stops before completion periodically for the program > to do something else if it wants to. ... > I thought it was so that supervisor could dispatch another task. -- gil
Re: Another Macro question
For the 0.75 seconds it would take to differentiate an eye catching string in program storage from an instance of it in in use to nominate a control block, I have to wonder if this is a solution looking for a problem to solve. Keven On Tue, Oct 20, 2020 at 4:22 PM -0500, "Steve Smith" wrote: It's not complicated if you know how to manipulate strings in Conditional Assembly. There's no much way around the fact it's vanity programming though. Besides, the technique is defeated by palindromes :-). sas On Tue, Oct 20, 2020 at 4:16 PM Tom Harper wrote: > I already have such a macro. I’ll post it later. > >
Re: Conditional MVCL macro?
On 2020-10-20, at 14:58:52, Steve Smith wrote: > > There's actually a big difference between MVCL being interruptible, and > MVCLE stopping periodically before it's finished. The latter is not > interruptible, it just stops before completion periodically for the program > to do something else if it wants to. ... > I thought it was so that supervisor could dispatch another task. -- gil
Re: Another Macro question
It's not complicated if you know how to manipulate strings in Conditional Assembly. There's no much way around the fact it's vanity programming though. Besides, the technique is defeated by palindromes :-). sas On Tue, Oct 20, 2020 at 4:16 PM Tom Harper wrote: > I already have such a macro. I’ll post it later. > >
Re: Conditional MVCL macro?
The first, base code, is just the following to get the overhead of loop control; TIMEUSED STORADR=STARTIME,CPU=TOD,LINKAGE=SYSTEM LOOPBDS0H L R3,POOLADDRGET FROM ADDRESS L R4,TOADDR GET TO ADDR BCT R9,LOOPB LOOP THE NEEDED NUMBER OF TIMES TIMEUSED STORADR=ENDTIME,CPU=TOD,LINKAGE=SYSTEM SPACE 3 LAR1,=CL12'BASE CODE' BAL R14,TIMEOUT The second case was just a move of 1K using four MVC instructions in a row, which is the fastest. All the others are just $MVC macro vs MVCL instruction. Chris Blaicher Technical Architect Precisely.com -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Mike Hochee Sent: Tuesday, October 20, 2020 4:40 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? This message originated Externally. Use proper judgement and caution with attachments, links, or responses. Thanks for sharing your test results, although I had trouble explaining the results of the first two tests, and maybe this is related to how the $MVC macro does its thing. Anyway, If you throw out the first two tests, the $MVC technique appears to be 250-300% more efficient than the MVCL technique with lengths between 4K-64K. But with a length of 128K, $MVC efficiency drops down to only 60%. My guess is that MVCL will eventually prove to be more efficient than $MVC with move lengths in excess of 256K. I don't know if moving to/from the same storage locations makes any difference for this test, but assuming intentional for the purpose of controlling this as a variable. There's already enough unknowns! Again, thanks for sharing!
Re: Conditional MVCL macro?
There's actually a big difference between MVCL being interruptible, and MVCLE stopping periodically before it's finished. The latter is not interruptible, it just stops before completion periodically for the program to do something else if it wants to. Checking a flag is a possibility, but to what end I cannot figure. Anyway, that is not possible with MVCL (without OS assistance that does not exist). I've heard contrary information on the relative performance of MVCL & MVCLE. I had no idea that MVCLE generally only moved a maximum of 4K per iteration, which on the face of it would seem to imply it could be very slow for large moves (particularly if your program fools around much before re-driving it). As for MVCL, I've heard consistently that it is considerably slower than MVCs galore, which still puzzles me. The explanations I've heard sound to me like they could possibly amount to a trivial extra cost, i.e. much less than what is commonly observed. And for something completely different... sometimes I use MVCK for a variable-length move instead of EX/MVC or MVCL. I haven't done any performance tests, because I haven't used it in performance-critical code (and it does have a warning that it is slow). But for programming convenience, getting & setting the key is (at least slightly) less of an annoyance than setting up EX. I vaguely recall a rumor that there is an MVCX milli-code instruction that works the same without the key specification. Sure would be nice if that appeared in PoOp. sas On Tue, Oct 20, 2020 at 3:09 PM Christopher Y. Blaicher < cblaic...@precisely.com> wrote: > There may be a hint to the reason for the jump in the explanation of > MVCLE, programming note 3. > > "The function of not processing more than approximately 4K bytes of > either operand is intended to permit software polling of a flag that may be > set by a program on another CPU during long operations." > > If a similar process happens with MVCL at the 2K boundary, that could be > the explanation. I'm not a hardware guy, so just guessing. > > Chris Blaicher > Technical Architect > Precisely.com > > > -Original Message- > From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] > On Behalf Of Christopher Y. Blaicher > Sent: Tuesday, October 20, 2020 2:47 PM > To: ASSEMBLER-LIST@LISTSERV.UGA.EDU > Subject: Re: Conditional MVCL macro? > > This message originated Externally. Use proper judgement and caution with > attachments, links, or responses. > > > I just re-ran a test on our z15 machine and got interesting numbers. The > $MVC was reasonably linear from start to finish. The MVCL has a big jump > from 2K to 4K, but was also reasonably linear outside of that jump. It > never caught up to the $MVC implementation. > > TEST TYPE = BASE CODE > CPU TIME USED= 0.003873 > TEST TYPE = 1K 4 MVC > CPU TIME USED= 0.171274 > TEST TYPE = 1K $MVC > CPU TIME USED= 0.183642 > TEST TYPE = 1K MVCL > CPU TIME USED= 0.345227 > TEST TYPE = 2K $MVC > CPU TIME USED= 0.357314 > TEST TYPE = 2K MVCL > CPU TIME USED= 0.509385 > TEST TYPE = 4K $MVC > CPU TIME USED= 0.704173 > TEST TYPE = 4K MVCL > CPU TIME USED= 2.790247 > TEST TYPE = 8K $MVC > CPU TIME USED= 1.426892 > TEST TYPE = 8K MVCL > CPU TIME USED= 5.480536 > TEST TYPE = 32K $MVC > CPU TIME USED= 5.835773 > TEST TYPE = 32K MVCL > CPU TIME USED= 21.734112 > TEST TYPE = 64K $MVC > CPU TIME USED= 12.278130 > TEST TYPE = 64K MVCL > CPU TIME USED= 43.380435 > TEST TYPE = 128K $MVC > CPU TIME USED= 54.570900 > TEST TYPE = 128K MVCL > CPU TIME USED= 86.739562 > > All the iterations used this basic set of instructions. > * > *TEST 1K $MVC > * > SPACE , > L R9,REPEATCOUNT DO IT 100,000 TIMES > TIMEUSED STORADR=STARTIME,CPU=TOD,LINKAGE=SYSTEM > LOOP1A DS0H > L R3,POOLADDRGET FROM ADDRESS > L R4,TOADDR GET TO ADDR > L R5,=A(1024)MOVE 1K BYTES > $MVC (R4),(R3),(R5) MOVE IT > AHI R3,1024 > AHI R4,1024 > BCT R9,LOOP1A LOOP THE NEEDED NUMBER OF TIMES > TIMEUSED STORADR=ENDTIME,CPU=TOD,LINKAGE=SYSTEM > SPACE 3 > LAR1,=CL12'1K $MVC' > BAL R14,TIMEOUT > * > *TEST 1K MVCL > * > SPACE , > L R9,REPEATCOUNT DO IT 100,000 TIMES > TIMEUSED STORADR=STARTIME,CPU=TOD,LINKAGE=SYSTEM > LOOP2DS0H > L R2,POOLADDRGET FROM ADDRESS > L R3,=F'1024' > L R4,TOADDR GET TO ADDR > L R5,=F'1024' > MVCL R4,R2 MOVE IT > BCT R9,LOOP2 LOOP THE NEEDED NUMBER OF TIMES > TIMEUSED STORADR=ENDTIME,CPU=TOD,LINKAGE=SYSTEM > SPACE 3 > LAR1,=CL12'1K MVCL' > BAL R14,TIMEOUT > The REPEATCOUNT value is
Re: Conditional MVCL macro?
Thanks for sharing your test results, although I had trouble explaining the results of the first two tests, and maybe this is related to how the $MVC macro does its thing. Anyway, If you throw out the first two tests, the $MVC technique appears to be 250-300% more efficient than the MVCL technique with lengths between 4K-64K. But with a length of 128K, $MVC efficiency drops down to only 60%. My guess is that MVCL will eventually prove to be more efficient than $MVC with move lengths in excess of 256K. I don't know if moving to/from the same storage locations makes any difference for this test, but assuming intentional for the purpose of controlling this as a variable. There's already enough unknowns! Again, thanks for sharing! -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Christopher Y. Blaicher Sent: Tuesday, October 20, 2020 3:09 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? Caution! This message was sent from outside your organization. There may be a hint to the reason for the jump in the explanation of MVCLE, programming note 3. "The function of not processing more than approximately 4K bytes of either operand is intended to permit software polling of a flag that may be set by a program on another CPU during long operations." If a similar process happens with MVCL at the 2K boundary, that could be the explanation. I'm not a hardware guy, so just guessing. Chris Blaicher Technical Architect Precisely.com -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Christopher Y. Blaicher Sent: Tuesday, October 20, 2020 2:47 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? This message originated Externally. Use proper judgement and caution with attachments, links, or responses. I just re-ran a test on our z15 machine and got interesting numbers. The $MVC was reasonably linear from start to finish. The MVCL has a big jump from 2K to 4K, but was also reasonably linear outside of that jump. It never caught up to the $MVC implementation. TEST TYPE = BASE CODE CPU TIME USED= 0.003873 TEST TYPE = 1K 4 MVC CPU TIME USED= 0.171274 TEST TYPE = 1K $MVC CPU TIME USED= 0.183642 TEST TYPE = 1K MVCL CPU TIME USED= 0.345227 TEST TYPE = 2K $MVC CPU TIME USED= 0.357314 TEST TYPE = 2K MVCL CPU TIME USED= 0.509385 TEST TYPE = 4K $MVC CPU TIME USED= 0.704173 TEST TYPE = 4K MVCL CPU TIME USED= 2.790247 TEST TYPE = 8K $MVC CPU TIME USED= 1.426892 TEST TYPE = 8K MVCL CPU TIME USED= 5.480536 TEST TYPE = 32K $MVC CPU TIME USED= 5.835773 TEST TYPE = 32K MVCL CPU TIME USED= 21.734112 TEST TYPE = 64K $MVC CPU TIME USED= 12.278130 TEST TYPE = 64K MVCL CPU TIME USED= 43.380435 TEST TYPE = 128K $MVC CPU TIME USED= 54.570900 TEST TYPE = 128K MVCL CPU TIME USED= 86.739562 All the iterations used this basic set of instructions. * *TEST 1K $MVC * SPACE , L R9,REPEATCOUNT DO IT 100,000 TIMES TIMEUSED STORADR=STARTIME,CPU=TOD,LINKAGE=SYSTEM LOOP1A DS0H L R3,POOLADDRGET FROM ADDRESS L R4,TOADDR GET TO ADDR L R5,=A(1024)MOVE 1K BYTES $MVC (R4),(R3),(R5) MOVE IT AHI R3,1024 AHI R4,1024 BCT R9,LOOP1A LOOP THE NEEDED NUMBER OF TIMES TIMEUSED STORADR=ENDTIME,CPU=TOD,LINKAGE=SYSTEM SPACE 3 LAR1,=CL12'1K $MVC' BAL R14,TIMEOUT * *TEST 1K MVCL * SPACE , L R9,REPEATCOUNT DO IT 100,000 TIMES TIMEUSED STORADR=STARTIME,CPU=TOD,LINKAGE=SYSTEM LOOP2DS0H L R2,POOLADDRGET FROM ADDRESS L R3,=F'1024' L R4,TOADDR GET TO ADDR L R5,=F'1024' MVCL R4,R2 MOVE IT BCT R9,LOOP2 LOOP THE NEEDED NUMBER OF TIMES TIMEUSED STORADR=ENDTIME,CPU=TOD,LINKAGE=SYSTEM SPACE 3 LAR1,=CL12'1K MVCL' BAL R14,TIMEOUT The REPEATCOUNT value is 10,000,000 Both POOLADDR and TOADDR areas are 256K in size, so they both should be on page boundaries. Chris Blaicher Technical Architect Precisely.com -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Charles Mills Sent: Tuesday, October 20, 2020 1:57 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? This message originated Externally. Use proper judgement and caution with attachments, links, or responses. Right. I should have said "an interruptibility that is visible to the surrounding assembler instructions via the CC." Charles -Original Message- From: IBM Mainframe Assembler List
Re: Another Macro question
I already have such a macro. I’ll post it later. Sent from my iPhone > On Oct 20, 2020, at 4:06 PM, Tony Thigpen wrote: > > While we are talking about macros, a while back, someone posted they liked to > fill eye-catchers using MVCIN so that scans of the dump for a tag only found > the real eye-catcher, not the literal used to fill the eye-catcher. > > So, instead of: > MVC EYE1,=C'(BTABLE>' > use: > MVCIN EYE1+L'EYE1-1(l'EYE1),=C'>ELBATB(' > > This would seem to be a good place for a macro: > MVCEYE EYE1,'(BTABLE>' > that would generate the correct MVCIN. > > Anyone want to try their hand at writing this macro? > > > Tony Thigpen This e-mail message, including any attachments, appended messages and the information contained therein, is for the sole use of the intended recipient(s). If you are not an intended recipient or have otherwise received this email message in error, any use, dissemination, distribution, review, storage or copying of this e-mail message and the information contained therein is strictly prohibited. If you are not an intended recipient, please contact the sender by reply e-mail and destroy all copies of this email message and do not otherwise utilize or retain this email message or any or all of the information contained therein. Although this email message and any attachments or appended messages are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by the sender for any loss or damage arising in any way from its opening or use.
Another Macro question
While we are talking about macros, a while back, someone posted they liked to fill eye-catchers using MVCIN so that scans of the dump for a tag only found the real eye-catcher, not the literal used to fill the eye-catcher. So, instead of: MVC EYE1,=C'(BTABLE>' use: MVCIN EYE1+L'EYE1-1(l'EYE1),=C'>ELBATB(' This would seem to be a good place for a macro: MVCEYE EYE1,'(BTABLE>' that would generate the correct MVCIN. Anyone want to try their hand at writing this macro? Tony Thigpen
Re: Conditional MVCL macro?
There may be a hint to the reason for the jump in the explanation of MVCLE, programming note 3. "The function of not processing more than approximately 4K bytes of either operand is intended to permit software polling of a flag that may be set by a program on another CPU during long operations." If a similar process happens with MVCL at the 2K boundary, that could be the explanation. I'm not a hardware guy, so just guessing. Chris Blaicher Technical Architect Precisely.com -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Christopher Y. Blaicher Sent: Tuesday, October 20, 2020 2:47 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? This message originated Externally. Use proper judgement and caution with attachments, links, or responses. I just re-ran a test on our z15 machine and got interesting numbers. The $MVC was reasonably linear from start to finish. The MVCL has a big jump from 2K to 4K, but was also reasonably linear outside of that jump. It never caught up to the $MVC implementation. TEST TYPE = BASE CODE CPU TIME USED= 0.003873 TEST TYPE = 1K 4 MVC CPU TIME USED= 0.171274 TEST TYPE = 1K $MVC CPU TIME USED= 0.183642 TEST TYPE = 1K MVCL CPU TIME USED= 0.345227 TEST TYPE = 2K $MVC CPU TIME USED= 0.357314 TEST TYPE = 2K MVCL CPU TIME USED= 0.509385 TEST TYPE = 4K $MVC CPU TIME USED= 0.704173 TEST TYPE = 4K MVCL CPU TIME USED= 2.790247 TEST TYPE = 8K $MVC CPU TIME USED= 1.426892 TEST TYPE = 8K MVCL CPU TIME USED= 5.480536 TEST TYPE = 32K $MVC CPU TIME USED= 5.835773 TEST TYPE = 32K MVCL CPU TIME USED= 21.734112 TEST TYPE = 64K $MVC CPU TIME USED= 12.278130 TEST TYPE = 64K MVCL CPU TIME USED= 43.380435 TEST TYPE = 128K $MVC CPU TIME USED= 54.570900 TEST TYPE = 128K MVCL CPU TIME USED= 86.739562 All the iterations used this basic set of instructions. * *TEST 1K $MVC * SPACE , L R9,REPEATCOUNT DO IT 100,000 TIMES TIMEUSED STORADR=STARTIME,CPU=TOD,LINKAGE=SYSTEM LOOP1A DS0H L R3,POOLADDRGET FROM ADDRESS L R4,TOADDR GET TO ADDR L R5,=A(1024)MOVE 1K BYTES $MVC (R4),(R3),(R5) MOVE IT AHI R3,1024 AHI R4,1024 BCT R9,LOOP1A LOOP THE NEEDED NUMBER OF TIMES TIMEUSED STORADR=ENDTIME,CPU=TOD,LINKAGE=SYSTEM SPACE 3 LAR1,=CL12'1K $MVC' BAL R14,TIMEOUT * *TEST 1K MVCL * SPACE , L R9,REPEATCOUNT DO IT 100,000 TIMES TIMEUSED STORADR=STARTIME,CPU=TOD,LINKAGE=SYSTEM LOOP2DS0H L R2,POOLADDRGET FROM ADDRESS L R3,=F'1024' L R4,TOADDR GET TO ADDR L R5,=F'1024' MVCL R4,R2 MOVE IT BCT R9,LOOP2 LOOP THE NEEDED NUMBER OF TIMES TIMEUSED STORADR=ENDTIME,CPU=TOD,LINKAGE=SYSTEM SPACE 3 LAR1,=CL12'1K MVCL' BAL R14,TIMEOUT The REPEATCOUNT value is 10,000,000 Both POOLADDR and TOADDR areas are 256K in size, so they both should be on page boundaries. Chris Blaicher Technical Architect Precisely.com -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Charles Mills Sent: Tuesday, October 20, 2020 1:57 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? This message originated Externally. Use proper judgement and caution with attachments, links, or responses. Right. I should have said "an interruptibility that is visible to the surrounding assembler instructions via the CC." Charles -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Seymour J Metz Sent: Tuesday, October 20, 2020 10:52 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? MVCL is, and always has been, interruptible.
Re: Conditional MVCL macro?
I just re-ran a test on our z15 machine and got interesting numbers. The $MVC was reasonably linear from start to finish. The MVCL has a big jump from 2K to 4K, but was also reasonably linear outside of that jump. It never caught up to the $MVC implementation. TEST TYPE = BASE CODE CPU TIME USED= 0.003873 TEST TYPE = 1K 4 MVC CPU TIME USED= 0.171274 TEST TYPE = 1K $MVC CPU TIME USED= 0.183642 TEST TYPE = 1K MVCL CPU TIME USED= 0.345227 TEST TYPE = 2K $MVC CPU TIME USED= 0.357314 TEST TYPE = 2K MVCL CPU TIME USED= 0.509385 TEST TYPE = 4K $MVC CPU TIME USED= 0.704173 TEST TYPE = 4K MVCL CPU TIME USED= 2.790247 TEST TYPE = 8K $MVC CPU TIME USED= 1.426892 TEST TYPE = 8K MVCL CPU TIME USED= 5.480536 TEST TYPE = 32K $MVC CPU TIME USED= 5.835773 TEST TYPE = 32K MVCL CPU TIME USED= 21.734112 TEST TYPE = 64K $MVC CPU TIME USED= 12.278130 TEST TYPE = 64K MVCL CPU TIME USED= 43.380435 TEST TYPE = 128K $MVC CPU TIME USED= 54.570900 TEST TYPE = 128K MVCL CPU TIME USED= 86.739562 All the iterations used this basic set of instructions. * *TEST 1K $MVC * SPACE , L R9,REPEATCOUNT DO IT 100,000 TIMES TIMEUSED STORADR=STARTIME,CPU=TOD,LINKAGE=SYSTEM LOOP1A DS0H L R3,POOLADDRGET FROM ADDRESS L R4,TOADDR GET TO ADDR L R5,=A(1024)MOVE 1K BYTES $MVC (R4),(R3),(R5) MOVE IT AHI R3,1024 AHI R4,1024 BCT R9,LOOP1A LOOP THE NEEDED NUMBER OF TIMES TIMEUSED STORADR=ENDTIME,CPU=TOD,LINKAGE=SYSTEM SPACE 3 LAR1,=CL12'1K $MVC' BAL R14,TIMEOUT * *TEST 1K MVCL * SPACE , L R9,REPEATCOUNT DO IT 100,000 TIMES TIMEUSED STORADR=STARTIME,CPU=TOD,LINKAGE=SYSTEM LOOP2DS0H L R2,POOLADDRGET FROM ADDRESS L R3,=F'1024' L R4,TOADDR GET TO ADDR L R5,=F'1024' MVCL R4,R2 MOVE IT BCT R9,LOOP2 LOOP THE NEEDED NUMBER OF TIMES TIMEUSED STORADR=ENDTIME,CPU=TOD,LINKAGE=SYSTEM SPACE 3 LAR1,=CL12'1K MVCL' BAL R14,TIMEOUT The REPEATCOUNT value is 10,000,000 Both POOLADDR and TOADDR areas are 256K in size, so they both should be on page boundaries. Chris Blaicher Technical Architect Precisely.com -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Charles Mills Sent: Tuesday, October 20, 2020 1:57 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? This message originated Externally. Use proper judgement and caution with attachments, links, or responses. Right. I should have said "an interruptibility that is visible to the surrounding assembler instructions via the CC." Charles -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Seymour J Metz Sent: Tuesday, October 20, 2020 10:52 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? MVCL is, and always has been, interruptible.
Re: Conditional MVCL macro?
Right. I should have said "an interruptibility that is visible to the surrounding assembler instructions via the CC." Charles -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Seymour J Metz Sent: Tuesday, October 20, 2020 10:52 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? MVCL is, and always has been, interruptible.
Re: Conditional MVCL macro?
COBOL version was 6.3 using ARCH(13) OPT(2) -Original Message- From: IBM Mainframe Assembler List On Behalf Of John Melcher Sent: Tuesday, October 20, 2020 12:09 To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? JES2 has had a $MVCL macro since SP2.2.0. What version of COBOL, I wonder? -Original Message- From: IBM Mainframe Assembler List On Behalf Of Charles Mills Sent: Tuesday, October 20, 2020 12:05 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? *** External email: Verify sender before opening attachments or links *** Note that in neither case does it use MVCL or MVCLE! The 4097 boundary may simply be a reasonableness thing, not a performance thing. For a 150K move, 600-or-so MVC's in-line might be faster than a loop, but does it really seem reasonable? Slightly OT to 'move' but I find it interesting that in the second case it uses LA Rx,256(,Rx) to increment the registers. I was told that AHI is sometimes faster. Charles -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Mike Hochee Sent: Tuesday, October 20, 2020 9:40 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? Really interesting thread to start the day with! Our experience has been that the MVC loops are typically faster, up to a point, that being about 30-40 instructions in the pipeline and as mentioned, and this seemed very processor dependent. However when source and target operands happen to both be aligned on a page boundary, then the opportunity exists for the async data mover to kick in if a move long is being used. I think this applied to both MVCL and MVCLE, but not sure. So ideally a macro would want to utilize both MVCs and MVCL/E. More grist for the mill! -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of baron_car...@technologist.com Sent: Tuesday, October 20, 2020 12:12 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? Caution! This message was sent from outside your organization. The COBOL compiler for a 4000 byte move, from to the same with OPT(2) generates LAY R10,5072(,R9) FROM LA R7,1072(,R9) TO MVC 0(256,R7),0(R10) MVC 256(256,R7),256(R10) MVC 512(256,R7),512(R10) MVC 768(256,R7),768(R10) MVC 1024(256,R7),1024(R10) MVC 1280(256,R7),1280(R10) MVC 1536(256,R7),1536(R10) MVC 1792(256,R7),1792(R10) MVC 2048(256,R7),2048(R10) MVC 2304(256,R7),2304(R10) MVC 2560(256,R7),2560(R10) MVC 2816(256,R7),2816(R10) MVC 3072(256,R7),3072(R10) MVC 3328(256,R7),3328(R10) MVC 3584(256,R7),3584(R10) MVC 3840(160,R7),3840(R10) However for 5000 bytes it generates: LAY R7,6072(,R9) LA R10,0(,R7) LA R7,1072(,R9) LHI R11,0x13 EQU * MVC 0(256,R7),0(R10) LA R10,256(,R10) LA R7,256(,R7) BRCTR11,L0128 MVC 0(136,R7),0(R10) And yes the change occurred at 4097 bytes.
Re: Conditional MVCL macro?
> What version of COBOL, I wonder? Presumably 6. COBOL 4 does not have OPT(2) and COBOL 5 was mostly a non-starter. Charles -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of John Melcher Sent: Tuesday, October 20, 2020 10:09 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? JES2 has had a $MVCL macro since SP2.2.0. What version of COBOL, I wonder? The COBOL compiler for a 4000 byte move, from to the same with OPT(2) generates LAY R10,5072(,R9) FROM LA R7,1072(,R9) TO ...
Re: Conditional MVCL macro?
MVCL is, and always has been, interruptible. -- Shmuel (Seymour J.) Metz http://mason.gmu.edu/~smetz3 From: IBM Mainframe Assembler List [ASSEMBLER-LIST@LISTSERV.UGA.EDU] on behalf of Charles Mills [charl...@mcn.org] Sent: Tuesday, October 20, 2020 11:54 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? @Ed, can you elaborate a little on your reasoning? (Not doubting it; just curious.) Is it that the interruptibility provides a significant improvement over MVCL? Or the support for lengths greater than 16M? Or ... ? When I asked Dr. Shum about move strategies he seemed to indicate that for data that was already or would soon anyway be in cache an MVC loop was generally faster than MVCL. (I did not ask about MVCLE at the time; not sure why. He did not suggest it.) Charles -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Ed Jaffe Sent: Tuesday, October 20, 2020 6:52 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? We've switched almost exclusively to MVCLE except for short, fixed-length moves.
Re: Conditional MVCL macro?
We just got a z15 and I have not tested MVCL vs MVC loop, but on all prior machines a MVC loop beat a MVCL up to about 32K. Over 32K MVCL is the way to go. In our environment we rarely are moving more than 32K. We built a $MVC macro with 3 parameters, destination, source and length and use that. FYI - MVCL is a micro-code (milli-code, call it what you want) instruction. There is a hefty startup and end cost to micro-code instructions. MVCL only really gets going when it can use the internal move page function. That has to be moving whole pages and they have to be page aligned. CLCL and similar instructions, at least used to, suffer the same type of startup costs. Chris Blaicher Technical Architect Precisely.com -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Mike Hochee Sent: Tuesday, October 20, 2020 12:40 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? This message originated Externally. Use proper judgement and caution with attachments, links, or responses. Really interesting thread to start the day with! Our experience has been that the MVC loops are typically faster, up to a point, that being about 30-40 instructions in the pipeline and as mentioned, and this seemed very processor dependent. However when source and target operands happen to both be aligned on a page boundary, then the opportunity exists for the async data mover to kick in if a move long is being used. I think this applied to both MVCL and MVCLE, but not sure. So ideally a macro would want to utilize both MVCs and MVCL/E. More grist for the mill! -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of baron_car...@technologist.com Sent: Tuesday, October 20, 2020 12:12 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? Caution! This message was sent from outside your organization. The COBOL compiler for a 4000 byte move, from to the same with OPT(2) generates LAY R10,5072(,R9) FROM LA R7,1072(,R9) TO MVC 0(256,R7),0(R10) MVC 256(256,R7),256(R10) MVC 512(256,R7),512(R10) MVC 768(256,R7),768(R10) MVC 1024(256,R7),1024(R10) MVC 1280(256,R7),1280(R10) MVC 1536(256,R7),1536(R10) MVC 1792(256,R7),1792(R10) MVC 2048(256,R7),2048(R10) MVC 2304(256,R7),2304(R10) MVC 2560(256,R7),2560(R10) MVC 2816(256,R7),2816(R10) MVC 3072(256,R7),3072(R10) MVC 3328(256,R7),3328(R10) MVC 3584(256,R7),3584(R10) MVC 3840(160,R7),3840(R10) However for 5000 bytes it generates: LAY R7,6072(,R9) LA R10,0(,R7) LA R7,1072(,R9) LHI R11,0x13 EQU * MVC 0(256,R7),0(R10) LA R10,256(,R10) LA R7,256(,R7) BRCTR11,L0128 MVC 0(136,R7),0(R10) And yes the change occurred at 4097 bytes. -Original Message- From: IBM Mainframe Assembler List On Behalf Of Charles Mills Sent: Tuesday, October 20, 2020 10:54 To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? @Ed, can you elaborate a little on your reasoning? (Not doubting it; just curious.) Is it that the interruptibility provides a significant improvement over MVCL? Or the support for lengths greater than 16M? Or ... ? When I asked Dr. Shum about move strategies he seemed to indicate that for data that was already or would soon anyway be in cache an MVC loop was generally faster than MVCL. (I did not ask about MVCLE at the time; not sure why. He did not suggest it.) Charles -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Ed Jaffe Sent: Tuesday, October 20, 2020 6:52 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? We've switched almost exclusively to MVCLE except for short, fixed-length moves.
Re: Conditional MVCL macro?
Our testing on a Z14 (MVS under VM), MVCL was considerably slower than a 256-byte MVC loop plus an executed MVC for various unaligned data lengths from 40 bytes to 32K. For zeroing memory up to 1G, XC in a loop was about the same as MVCL up to 256 bytes, then MVCL was faster (MVCLE was slightly slower even when the MVCL had to be looped)). MVCL was also faster than MVPG, DSPSERV RELEASE, PGSER in general, except when page aligned for MVPG. On 2020-10-20 12:39 p.m., Mike Hochee wrote: Really interesting thread to start the day with! Our experience has been that the MVC loops are typically faster, up to a point, that being about 30-40 instructions in the pipeline and as mentioned, and this seemed very processor dependent. However when source and target operands happen to both be aligned on a page boundary, then the opportunity exists for the async data mover to kick in if a move long is being used. I think this applied to both MVCL and MVCLE, but not sure. So ideally a macro would want to utilize both MVCs and MVCL/E. More grist for the mill! Gary Weinhold Senior Application Architect DATAKINETICS | Data Performance & Optimization Phone:+1.613.523.5500 x216 Email: weinh...@dkl.com Visit us online at www.DKL.com E-mail Notification: The information contained in this email and any attachments is confidential and may be subject to copyright or other intellectual property protection. If you are not the intended recipient, you are not authorized to use or disclose this information, and we request that you notify us by reply mail or telephone and delete the original message from your mail system. -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of baron_car...@technologist.com Sent: Tuesday, October 20, 2020 12:12 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? Caution! This message was sent from outside your organization. The COBOL compiler for a 4000 byte move, from to the same with OPT(2) generates LAY R10,5072(,R9) FROM LA R7,1072(,R9) TO MVC 0(256,R7),0(R10) MVC 256(256,R7),256(R10) MVC 512(256,R7),512(R10) MVC 768(256,R7),768(R10) MVC 1024(256,R7),1024(R10) MVC 1280(256,R7),1280(R10) MVC 1536(256,R7),1536(R10) MVC 1792(256,R7),1792(R10) MVC 2048(256,R7),2048(R10) MVC 2304(256,R7),2304(R10) MVC 2560(256,R7),2560(R10) MVC 2816(256,R7),2816(R10) MVC 3072(256,R7),3072(R10) MVC 3328(256,R7),3328(R10) MVC 3584(256,R7),3584(R10) MVC 3840(160,R7),3840(R10) However for 5000 bytes it generates: LAY R7,6072(,R9) LA R10,0(,R7) LA R7,1072(,R9) LHI R11,0x13 EQU * MVC 0(256,R7),0(R10) LA R10,256(,R10) LA R7,256(,R7) BRCTR11,L0128 MVC 0(136,R7),0(R10) And yes the change occurred at 4097 bytes. -Original Message- From: IBM Mainframe Assembler List On Behalf Of Charles Mills Sent: Tuesday, October 20, 2020 10:54 To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? @Ed, can you elaborate a little on your reasoning? (Not doubting it; just curious.) Is it that the interruptibility provides a significant improvement over MVCL? Or the support for lengths greater than 16M? Or ... ? When I asked Dr. Shum about move strategies he seemed to indicate that for data that was already or would soon anyway be in cache an MVC loop was generally faster than MVCL. (I did not ask about MVCLE at the time; not sure why. He did not suggest it.) Charles -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Ed Jaffe Sent: Tuesday, October 20, 2020 6:52 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? We've switched almost exclusively to MVCLE except for short, fixed-length moves.
Re: Conditional MVCL macro?
On 10/20/2020 8:54 AM, Charles Mills wrote: @Ed, can you elaborate a little on your reasoning? (Not doubting it; just curious.) Is it that the interruptibility provides a significant improvement over MVCL? Or the support for lengths greater than 16M? Or ... ? MVCL with anything other than zero pad requires an extra instruction and we've been burned more than once with >16M lengths not being handled right. When I asked Dr. Shum about move strategies he seemed to indicate that for data that was already or would soon anyway be in cache an MVC loop was generally faster than MVCL. (I did not ask about MVCLE at the time; not sure why. He did not suggest it.) Oh yes, we have code in a performance path that moves 4K blocks on a 4K boundary using 16 256-byte MVCs. I did say "almost" exclusively... ;-) What I meant to say is we've replaced nearly all MVCLs with MVCLEs. The biggest exception is the x'B0' and x'B8' stuff for non-padded moves we use in a few places. To be honest, we didn't even research those to see if there were MVCLE equivalents. We just left that working code alone... -- Phoenix Software International Edward E. Jaffe 831 Parkview Drive North El Segundo, CA 90245 https://www.phoenixsoftware.com/ This e-mail message, including any attachments, appended messages and the information contained therein, is for the sole use of the intended recipient(s). If you are not an intended recipient or have otherwise received this email message in error, any use, dissemination, distribution, review, storage or copying of this e-mail message and the information contained therein is strictly prohibited. If you are not an intended recipient, please contact the sender by reply e-mail and destroy all copies of this email message and do not otherwise utilize or retain this email message or any or all of the information contained therein. Although this email message and any attachments or appended messages are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by the sender for any loss or damage arising in any way from its opening or use.
Re: Conditional MVCL macro?
JES2 has had a $MVCL macro since SP2.2.0. What version of COBOL, I wonder? -Original Message- From: IBM Mainframe Assembler List On Behalf Of Charles Mills Sent: Tuesday, October 20, 2020 12:05 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? *** External email: Verify sender before opening attachments or links *** Note that in neither case does it use MVCL or MVCLE! The 4097 boundary may simply be a reasonableness thing, not a performance thing. For a 150K move, 600-or-so MVC's in-line might be faster than a loop, but does it really seem reasonable? Slightly OT to 'move' but I find it interesting that in the second case it uses LA Rx,256(,Rx) to increment the registers. I was told that AHI is sometimes faster. Charles -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Mike Hochee Sent: Tuesday, October 20, 2020 9:40 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? Really interesting thread to start the day with! Our experience has been that the MVC loops are typically faster, up to a point, that being about 30-40 instructions in the pipeline and as mentioned, and this seemed very processor dependent. However when source and target operands happen to both be aligned on a page boundary, then the opportunity exists for the async data mover to kick in if a move long is being used. I think this applied to both MVCL and MVCLE, but not sure. So ideally a macro would want to utilize both MVCs and MVCL/E. More grist for the mill! -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of baron_car...@technologist.com Sent: Tuesday, October 20, 2020 12:12 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? Caution! This message was sent from outside your organization. The COBOL compiler for a 4000 byte move, from to the same with OPT(2) generates LAY R10,5072(,R9) FROM LA R7,1072(,R9) TO MVC 0(256,R7),0(R10) MVC 256(256,R7),256(R10) MVC 512(256,R7),512(R10) MVC 768(256,R7),768(R10) MVC 1024(256,R7),1024(R10) MVC 1280(256,R7),1280(R10) MVC 1536(256,R7),1536(R10) MVC 1792(256,R7),1792(R10) MVC 2048(256,R7),2048(R10) MVC 2304(256,R7),2304(R10) MVC 2560(256,R7),2560(R10) MVC 2816(256,R7),2816(R10) MVC 3072(256,R7),3072(R10) MVC 3328(256,R7),3328(R10) MVC 3584(256,R7),3584(R10) MVC 3840(160,R7),3840(R10) However for 5000 bytes it generates: LAY R7,6072(,R9) LA R10,0(,R7) LA R7,1072(,R9) LHI R11,0x13 EQU * MVC 0(256,R7),0(R10) LA R10,256(,R10) LA R7,256(,R7) BRCTR11,L0128 MVC 0(136,R7),0(R10) And yes the change occurred at 4097 bytes.
Re: Conditional MVCL macro?
Note that in neither case does it use MVCL or MVCLE! The 4097 boundary may simply be a reasonableness thing, not a performance thing. For a 150K move, 600-or-so MVC's in-line might be faster than a loop, but does it really seem reasonable? Slightly OT to 'move' but I find it interesting that in the second case it uses LA Rx,256(,Rx) to increment the registers. I was told that AHI is sometimes faster. Charles -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Mike Hochee Sent: Tuesday, October 20, 2020 9:40 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? Really interesting thread to start the day with! Our experience has been that the MVC loops are typically faster, up to a point, that being about 30-40 instructions in the pipeline and as mentioned, and this seemed very processor dependent. However when source and target operands happen to both be aligned on a page boundary, then the opportunity exists for the async data mover to kick in if a move long is being used. I think this applied to both MVCL and MVCLE, but not sure. So ideally a macro would want to utilize both MVCs and MVCL/E. More grist for the mill! -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of baron_car...@technologist.com Sent: Tuesday, October 20, 2020 12:12 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? Caution! This message was sent from outside your organization. The COBOL compiler for a 4000 byte move, from to the same with OPT(2) generates LAY R10,5072(,R9) FROM LA R7,1072(,R9) TO MVC 0(256,R7),0(R10) MVC 256(256,R7),256(R10) MVC 512(256,R7),512(R10) MVC 768(256,R7),768(R10) MVC 1024(256,R7),1024(R10) MVC 1280(256,R7),1280(R10) MVC 1536(256,R7),1536(R10) MVC 1792(256,R7),1792(R10) MVC 2048(256,R7),2048(R10) MVC 2304(256,R7),2304(R10) MVC 2560(256,R7),2560(R10) MVC 2816(256,R7),2816(R10) MVC 3072(256,R7),3072(R10) MVC 3328(256,R7),3328(R10) MVC 3584(256,R7),3584(R10) MVC 3840(160,R7),3840(R10) However for 5000 bytes it generates: LAY R7,6072(,R9) LA R10,0(,R7) LA R7,1072(,R9) LHI R11,0x13 EQU * MVC 0(256,R7),0(R10) LA R10,256(,R10) LA R7,256(,R7) BRCTR11,L0128 MVC 0(136,R7),0(R10) And yes the change occurred at 4097 bytes.
Re: Conditional MVCL macro?
Really interesting thread to start the day with! Our experience has been that the MVC loops are typically faster, up to a point, that being about 30-40 instructions in the pipeline and as mentioned, and this seemed very processor dependent. However when source and target operands happen to both be aligned on a page boundary, then the opportunity exists for the async data mover to kick in if a move long is being used. I think this applied to both MVCL and MVCLE, but not sure. So ideally a macro would want to utilize both MVCs and MVCL/E. More grist for the mill! -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of baron_car...@technologist.com Sent: Tuesday, October 20, 2020 12:12 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? Caution! This message was sent from outside your organization. The COBOL compiler for a 4000 byte move, from to the same with OPT(2) generates LAY R10,5072(,R9) FROM LA R7,1072(,R9) TO MVC 0(256,R7),0(R10) MVC 256(256,R7),256(R10) MVC 512(256,R7),512(R10) MVC 768(256,R7),768(R10) MVC 1024(256,R7),1024(R10) MVC 1280(256,R7),1280(R10) MVC 1536(256,R7),1536(R10) MVC 1792(256,R7),1792(R10) MVC 2048(256,R7),2048(R10) MVC 2304(256,R7),2304(R10) MVC 2560(256,R7),2560(R10) MVC 2816(256,R7),2816(R10) MVC 3072(256,R7),3072(R10) MVC 3328(256,R7),3328(R10) MVC 3584(256,R7),3584(R10) MVC 3840(160,R7),3840(R10) However for 5000 bytes it generates: LAY R7,6072(,R9) LA R10,0(,R7) LA R7,1072(,R9) LHI R11,0x13 EQU * MVC 0(256,R7),0(R10) LA R10,256(,R10) LA R7,256(,R7) BRCTR11,L0128 MVC 0(136,R7),0(R10) And yes the change occurred at 4097 bytes. -Original Message- From: IBM Mainframe Assembler List On Behalf Of Charles Mills Sent: Tuesday, October 20, 2020 10:54 To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? @Ed, can you elaborate a little on your reasoning? (Not doubting it; just curious.) Is it that the interruptibility provides a significant improvement over MVCL? Or the support for lengths greater than 16M? Or ... ? When I asked Dr. Shum about move strategies he seemed to indicate that for data that was already or would soon anyway be in cache an MVC loop was generally faster than MVCL. (I did not ask about MVCLE at the time; not sure why. He did not suggest it.) Charles -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Ed Jaffe Sent: Tuesday, October 20, 2020 6:52 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? We've switched almost exclusively to MVCLE except for short, fixed-length moves.
Re: Conditional MVCL macro?
The COBOL compiler for a 4000 byte move, from to the same with OPT(2) generates LAY R10,5072(,R9) FROM LA R7,1072(,R9) TO MVC 0(256,R7),0(R10) MVC 256(256,R7),256(R10) MVC 512(256,R7),512(R10) MVC 768(256,R7),768(R10) MVC 1024(256,R7),1024(R10) MVC 1280(256,R7),1280(R10) MVC 1536(256,R7),1536(R10) MVC 1792(256,R7),1792(R10) MVC 2048(256,R7),2048(R10) MVC 2304(256,R7),2304(R10) MVC 2560(256,R7),2560(R10) MVC 2816(256,R7),2816(R10) MVC 3072(256,R7),3072(R10) MVC 3328(256,R7),3328(R10) MVC 3584(256,R7),3584(R10) MVC 3840(160,R7),3840(R10) However for 5000 bytes it generates: LAY R7,6072(,R9) LA R10,0(,R7) LA R7,1072(,R9) LHI R11,0x13 EQU * MVC 0(256,R7),0(R10) LA R10,256(,R10) LA R7,256(,R7) BRCTR11,L0128 MVC 0(136,R7),0(R10) And yes the change occurred at 4097 bytes. -Original Message- From: IBM Mainframe Assembler List On Behalf Of Charles Mills Sent: Tuesday, October 20, 2020 10:54 To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? @Ed, can you elaborate a little on your reasoning? (Not doubting it; just curious.) Is it that the interruptibility provides a significant improvement over MVCL? Or the support for lengths greater than 16M? Or ... ? When I asked Dr. Shum about move strategies he seemed to indicate that for data that was already or would soon anyway be in cache an MVC loop was generally faster than MVCL. (I did not ask about MVCLE at the time; not sure why. He did not suggest it.) Charles -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Ed Jaffe Sent: Tuesday, October 20, 2020 6:52 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? We've switched almost exclusively to MVCLE except for short, fixed-length moves.
Re: Conditional MVCL macro?
I don't know how today's machines (z13 and up) perform, but back when I had access to Strobe it regularly pointed out long MVCL / CLCL instructions generated by COBOL 4.2 (in the specific application case I was working on these were usually around 8K bytes) as relatively large "hot spots" of CPU usage. Mitigating how often those moves and compares were actually needed (as opposed to blind usage) saved us something on the order of 3-5% average CPU time. Our current performance analyzer is useless, so I can't tell you what happens now that we are on reasonably current generation z and using COBOL 6.2. I like Dave's suggestion, it seems a reasonable compromise when you have the option (or need) of coding in assembler. Peter -Original Message- From: IBM Mainframe Assembler List On Behalf Of Thomas David Rivers Sent: Tuesday, October 20, 2020 9:06 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? > > What is the effect of the conditional branch and the EX on the pipeline? Are > the performance tradeoffs the same on all supported processors? Also, tuning > code for a current processor may slow it down on a new one. > > > -- > Shmuel (Seymour J.) Metz > https://urldefense.com/v3/__http://mason.gmu.edu/*smetz3__;fg!!Ebr-cpP > eAnfNniQ8HSAI-g_K5b7VKg!bX31ApFbaISNX6nSDgPjHkDZ-rYYj9xqye_K7xbGA8eNl8 > dq0VYfrx7W5BL6q4-EazeBzQ$ In *very* casual tests we and some customers did, we determined that this general scenerio seems to be a good approach for moving bytes with a constant length: sizes less than 1024: generate up to 4 MVCs in a row sizes greater than or equal to 1024: if MVCLE is allowed (there is a compiler option for this) then use MVCLE otherwise: generate a loop of MVCs updating the src/target address and lengths as needed (you don't need an EX for this.) Basically divide the length by 256 and loop moving 256 bytes at a time by that count; then get the modulus of the length by 256 and move those remaining bytes (since the length is constant, the division and mod operations provide constants.) That seems to be a good balance between code-size and speed. And, the loop is small enough that it probably fits in the machines instruction-cache, so hopefully the branch back (a BCTR back to the MVC) isn't that painful. Just some thoughts... - Dave R. - -- This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail and delete the message and any attachments from your system.
Re: Conditional MVCL macro?
@Ed, can you elaborate a little on your reasoning? (Not doubting it; just curious.) Is it that the interruptibility provides a significant improvement over MVCL? Or the support for lengths greater than 16M? Or ... ? When I asked Dr. Shum about move strategies he seemed to indicate that for data that was already or would soon anyway be in cache an MVC loop was generally faster than MVCL. (I did not ask about MVCLE at the time; not sure why. He did not suggest it.) Charles -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Ed Jaffe Sent: Tuesday, October 20, 2020 6:52 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Conditional MVCL macro? We've switched almost exclusively to MVCLE except for short, fixed-length moves.
Re: Conditional MVCL macro?
We've switched almost exclusively to MVCLE except for short, fixed-length moves. On 10/20/2020 5:42 AM, Tony Thigpen wrote: I have several programs that work with buffers and moving random length data around using MVCLs. I am considering writing a 'conditional MVCL' macro that, at runtime, looks at the lengths and either executes the MVCL or bypasses it and uses a MVC via EX. I know this would generate a longer code segment due to the dual-path. 1) With new machines, I wonder if the micro-code/mili-code already optimizes the MVCL making this a null-issue? 2) Is anyone else willing to share an existing macro that performs this function? This e-mail message, including any attachments, appended messages and the information contained therein, is for the sole use of the intended recipient(s). If you are not an intended recipient or have otherwise received this email message in error, any use, dissemination, distribution, review, storage or copying of this e-mail message and the information contained therein is strictly prohibited. If you are not an intended recipient, please contact the sender by reply e-mail and destroy all copies of this email message and do not otherwise utilize or retain this email message or any or all of the information contained therein. Although this email message and any attachments or appended messages are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by the sender for any loss or damage arising in any way from its opening or use.
SV: Conditional MVCL macro?
Extract from a larger macro. And no, I was not overly concerned with performance. .* r15 : length .* r14 -> source .* r0 -> target .* r1 used when short copy .* select method clfi r15,255 if source length jh gt 255 then use movelong bctr r15,0 lrr1,r0copy target address exr15, j mvc 0(*-*,r1),0(r14) short copy lrr1,r15 copy length mvcl r0,r14 long copy ds0h Willy -Oprindelig meddelelse- Fra: IBM Mainframe Assembler List På vegne af Tony Thigpen Sendt: 20. oktober 2020 14:43 Til: ASSEMBLER-LIST@LISTSERV.UGA.EDU Emne: Conditional MVCL macro? I have several programs that work with buffers and moving random length data around using MVCLs. I am considering writing a 'conditional MVCL' macro that, at runtime, looks at the lengths and either executes the MVCL or bypasses it and uses a MVC via EX. I know this would generate a longer code segment due to the dual-path. 1) With new machines, I wonder if the micro-code/mili-code already optimizes the MVCL making this a null-issue? 2) Is anyone else willing to share an existing macro that performs this function? Tony Thigpen
Re: Conditional MVCL macro?
> > What is the effect of the conditional branch and the EX on the pipeline? Are > the performance tradeoffs the same on all supported processors? Also, tuning > code for a current processor may slow it down on a new one. > > > -- > Shmuel (Seymour J.) Metz > http://mason.gmu.edu/~smetz3 In *very* casual tests we and some customers did, we determined that this general scenerio seems to be a good approach for moving bytes with a constant length: sizes less than 1024: generate up to 4 MVCs in a row sizes greater than or equal to 1024: if MVCLE is allowed (there is a compiler option for this) then use MVCLE otherwise: generate a loop of MVCs updating the src/target address and lengths as needed (you don't need an EX for this.) Basically divide the length by 256 and loop moving 256 bytes at a time by that count; then get the modulus of the length by 256 and move those remaining bytes (since the length is constant, the division and mod operations provide constants.) That seems to be a good balance between code-size and speed. And, the loop is small enough that it probably fits in the machines instruction-cache, so hopefully the branch back (a BCTR back to the MVC) isn't that painful. Just some thoughts... - Dave R. - -- riv...@dignus.comWork: (919) 676-0847 Get your mainframe programming tools at http://www.dignus.com
Re: Conditional MVCL macro?
What is the effect of the conditional branch and the EX on the pipeline? Are the performance tradeoffs the same on all supported processors? Also, tuning code for a current processor may slow it down on a new one. -- Shmuel (Seymour J.) Metz http://mason.gmu.edu/~smetz3 From: IBM Mainframe Assembler List [ASSEMBLER-LIST@LISTSERV.UGA.EDU] on behalf of Tony Thigpen [t...@vse2pdf.com] Sent: Tuesday, October 20, 2020 8:42 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Conditional MVCL macro? I have several programs that work with buffers and moving random length data around using MVCLs. I am considering writing a 'conditional MVCL' macro that, at runtime, looks at the lengths and either executes the MVCL or bypasses it and uses a MVC via EX. I know this would generate a longer code segment due to the dual-path. 1) With new machines, I wonder if the micro-code/mili-code already optimizes the MVCL making this a null-issue? 2) Is anyone else willing to share an existing macro that performs this function? Tony Thigpen
Conditional MVCL macro?
I have several programs that work with buffers and moving random length data around using MVCLs. I am considering writing a 'conditional MVCL' macro that, at runtime, looks at the lengths and either executes the MVCL or bypasses it and uses a MVC via EX. I know this would generate a longer code segment due to the dual-path. 1) With new machines, I wonder if the micro-code/mili-code already optimizes the MVCL making this a null-issue? 2) Is anyone else willing to share an existing macro that performs this function? Tony Thigpen