Re: Long translate (TR) instruction?
William: Thank you for taking the time to give this a try. I had heard some horror stories about TR performance being disappointing on some earlier Z/Arch machines and I was wondering if it was pervasive. Obviously not. Not to be a nit-picker, but the OP (Kirk Wolf) said, I'm looking for the fastest way in assembler to translate data in one buffer to another using a 256-byte translate table, which is part of what prompted me to suggest the open-code solution that I did, since it includes a move from one buffer to another as part of the process. I'm convinced that TRE and TR are faster but it seems that a truly fair comparison of solutions to the stated problem should have included equivalent moves in the TRE and TR solutions. -- Art C. At 05:33 AM 3/26/2008, William H. Blair wrote: Edward Jaffe wrote: The following fragment should work if you prefer looping TRE over traditional TR. TRE requires you to manually translate the so-called stop character with an MVC. But, at least there's no EXecute for the final segment. LM R14,R15,xx Load string ptr and its length LA R1,xxPtr to translation table XR R0,R0Set stop char = x'00' DO INFDo for translate TRE R14,R1 Translate the string DOEXIT Z Exit if no more data IF O If iterate needed ITERATE , Process another segment ENDIF , EndIf MVC 0(1,R14),0(R1) Translate x'00' to whatever LAR14,1(,R14) Advance past stop character AHI R15,-1 Decrement length remaining DOEXIT NPExit if no more data ENDDO , EndDo for translate Art Celestini wrote: It seems that the TRE instruction has been in z/Arch for at least a few years. If anyone is inclined to try this: XR R1,R1 Clear for insert LR15,LengthLoad string length Loop IC R1,Input-1(R15) Get input byte IC R0,XlatTab(R1)Get translated character ... STC R0,Output-1(R15) ... and store it in output BCT R15,Loop Decrement length loop until done it would be interesting to see how it fares against Ed Jaffe's code. I did this, since I had a program I could just plug these code segments into without doing a lot of work. Results are below. I believe the OP said that the data to be translated had to first be moved from one buffer to another. The above does that, but a move of some type needs to be added to Ed's code to make it a true comparison. Maybe, maybe not. I've got code that needs to translate stuff in a buffer and it does not need it moved. And I have other code that first moves it and then translates it, because it doesn't want to clobber what it's translating. But, I did it both ways, just to find out for sure if it made a difference. It does not. The TRE loop is so much faster for any substantial number of bytes (which I define as more than 256, since that number or less can be handled directly, inline, simply by using the TR instruction) that the overhead of even a MVCL does not even begin to eat into the gain by using a TRE loop. So, the fact that with a TRE loop subroutine or macro you might whip up you first have to move the data to be translated if you do not want the original data clobbered is simply not relevant from a performance perspective. Since there is no use for the non-TRE loop subroutine (because its performance is horrible for any substantial number of bytes), we are left with the TRE or TR subroutines, which translate the data directly in the buffer provided, which is what most programmers would want to have available to call most of the time anyway, IMHO. If not, then they would first have had to move the data to some other buffer before TR'ing it anyway. As you will see below, the TRE loop was faster for me when I gave it more than 7 to 19 bytes. I'd never give it that few since for anything = 256 I'd just code a TR inline. But if I didn't know how many bytes, then you can see that there is plenty of CPU time left to test for 256 or less and do a TR inline if so, or else call the TR[E] subroutine if I had more than 256. Regardless, an ordinary TR loop is still faster than using TRE. But this is what you would expect. The TR loop code is not any more complicated than the TRE loop code in the first place. It's just different. TRE does not replace TR. It's for another purpose, basically, not for performance. I revised the code above to suit my own personal taste and needs. I made an improvement in the TRE subroutine proposed by Edward Jaffe to allow the caller to specify the test character, so that performance will not suffer if the data to be translated contains a lot of null bytes (as Ed's would). That meant that the MVC
Re: Long translate (TR) instruction?
Art Celestini wrote: I'm convinced that TRE and TR are faster but it seems that a truly fair comparison of solutions to the stated problem should have included equivalent moves in the TRE and TR solutions. I did write and run versions with the code like that. And, I said so: | I've got code that needs to translate stuff | in a buffer and it does not need it moved. And I have other | code that first moves it and then translates it, because it | doesn't want to clobber what it's translating. But, I did it | both ways, just to find out for sure if it made a difference. | It does not. But since you asked, I added those into the mix, so you can see and judge for yourself: TIME (IN SECONDS) FOR 001,000,000 REPETITIONS OF: - --BYTES- NO TR(E) TRE INPL TRE MVC TR INPL TR MVC - - - - - 0800 14.939655 1.245189 1.642310 1.082476 1.236875 0400 7.162529 0.731567 0.971124 0.487941 0.580783 0200 3.593004 0.461754 0.673962 0.206117 0.268123 0100 1.802772 0.253433 0.342846 0.032038 0.050725 00C0 1.355390 0.240958 0.311724 0.031969 0.048488 0080 0.909253 0.210573 0.276103 0.031942 0.046119 0040 0.463195 0.150320 0.164585 0.032047 0.043604 0020 0.238923 0.101492 0.113927 0.032032 0.042417 001E 0.225827 0.111231 0.122245 0.032019 0.042544 001C 0.210944 0.110432 0.122021 0.031966 0.042432 001A 0.197080 0.110823 0.122119 0.031953 0.042508 0018 0.183400 0.104318 0.116599 0.031982 0.042673 0016 0.169207 0.099349 0.110853 0.031980 0.042465 0014 0.155477 0.100393 0.109962 0.032081 0.042704 0012 0.141733 0.099860 0.111362 0.031961 0.042495 0010 0.127308 0.070471 0.083389 0.031962 0.041866 000E 0.113336 0.074843 0.086993 0.031981 0.041867 000C 0.099318 0.073958 0.086677 0.031962 0.041833 000A 0.085462 0.074848 0.086733 0.032057 0.041985 0008 0.071609 0.069932 0.081476 0.030228 0.038990 0007 0.064623 0.058755 0.068647 0.030245 0.039025 0006 0.057541 0.058729 0.068720 0.030278 0.038971 0005 0.050582 0.058701 0.068568 0.030230 0.038931 0004 0.043603 0.058764 0.068620 0.030246 0.039029 0003 0.036664 0.058748 0.068683 0.030220 0.038934 0002 0.029665 0.058824 0.068732 0.030386 0.039100 0001 0.022716 0.059113 0.069109 0.029829 0.038662 0.005250 0.016894 0.005825 0.005239 0.005835 TESTNAME DESCRIPTION NO TR(E) Basic move and translate, one byte at a time TRE INPL TRE loop in-place TRE MVC TRE loop buffer-to-buffer move first TR INPL TR loop in-place TR MVC TR loop buffer-to-buffer move first TR is always faster than TRE. Having to move the data from an input buffer to a separate output buffer for translation increases the CPU time required by ~15%. That is still way less than the overhead of the basic move and translate, which is the fastest technique only for 0, 1, 2, or 3 bytes (for more than 3 bytes, the basic TR loop, or even the TR loop with the data to be translated having to be moved to the output buffer first, is fastest). The above figures include the equivalent moves to make it a 'truly fair comparison of solutions to the stated problem'. It reflects what I have always observed about such tests: a well-coded, basic, tight MVC loop (or an MVCL) is pretty fast compared to almost anything else that involves a half-dozen or so instructions that do virtually anything. Thus, counting the CPU time that is required to move the data to a separate buffer as part of the overhead doesn't actually add that much to the CPU time required to get the whole job done. I suspect that this is simply due to the fact that MVC and MVCL are already pretty well-optimized for the job they do. Even a basic, tight loop will be limited by some performance constraint, probably by the rate at which instructions whose execution cannot be overlapped can be pumped through the machine (in contrast to blobs of data MVCing and TR[T]ing thru the wires all as part of one instruction). Today, for all intents and purposes, the time required to execute any given standard instruction is the same as any other. This is because the work to be done can be done in the available time, before another instruction is fetched and shoved through the internal machinery. The instructions which process more than a word or two of data take longer, of course. But some of those are very highly optimized (in hardware -- for example, the LM and STM instructions are no longer pigs. They are, in fact, fairly effective substitutes for MVC, except that you toast the contents of several registers when you use enough to make it worthwhile. Thus, optimization in our
Re: Long translate (TR) instruction?
William: Thanks (again). I found these results most interesting. Art At 11:18 PM 3/27/2008, William H. Blair wrote: Art Celestini wrote: I'm convinced that TRE and TR are faster but it seems that a truly fair comparison of solutions to the stated problem should have included equivalent moves in the TRE and TR solutions. I did write and run versions with the code like that. And, I said so: | I've got code that needs to translate stuff | in a buffer and it does not need it moved. And I have other | code that first moves it and then translates it, because it | doesn't want to clobber what it's translating. But, I did it | both ways, just to find out for sure if it made a difference. | It does not. But since you asked, I added those into the mix, so you can see and judge for yourself: TIME (IN SECONDS) FOR 001,000,000 REPETITIONS OF: - --BYTES- NO TR(E) TRE INPL TRE MVC TR INPL TR MVC - - - - - 0800 14.939655 1.245189 1.642310 1.082476 1.236875 0400 7.162529 0.731567 0.971124 0.487941 0.580783 0200 3.593004 0.461754 0.673962 0.206117 0.268123 0100 1.802772 0.253433 0.342846 0.032038 0.050725 00C0 1.355390 0.240958 0.311724 0.031969 0.048488 0080 0.909253 0.210573 0.276103 0.031942 0.046119 0040 0.463195 0.150320 0.164585 0.032047 0.043604 0020 0.238923 0.101492 0.113927 0.032032 0.042417 001E 0.225827 0.111231 0.122245 0.032019 0.042544 001C 0.210944 0.110432 0.122021 0.031966 0.042432 001A 0.197080 0.110823 0.122119 0.031953 0.042508 0018 0.183400 0.104318 0.116599 0.031982 0.042673 0016 0.169207 0.099349 0.110853 0.031980 0.042465 0014 0.155477 0.100393 0.109962 0.032081 0.042704 0012 0.141733 0.099860 0.111362 0.031961 0.042495 0010 0.127308 0.070471 0.083389 0.031962 0.041866 000E 0.113336 0.074843 0.086993 0.031981 0.041867 000C 0.099318 0.073958 0.086677 0.031962 0.041833 000A 0.085462 0.074848 0.086733 0.032057 0.041985 0008 0.071609 0.069932 0.081476 0.030228 0.038990 0007 0.064623 0.058755 0.068647 0.030245 0.039025 0006 0.057541 0.058729 0.068720 0.030278 0.038971 0005 0.050582 0.058701 0.068568 0.030230 0.038931 0004 0.043603 0.058764 0.068620 0.030246 0.039029 0003 0.036664 0.058748 0.068683 0.030220 0.038934 0002 0.029665 0.058824 0.068732 0.030386 0.039100 0001 0.022716 0.059113 0.069109 0.029829 0.038662 0.005250 0.016894 0.005825 0.005239 0.005835 TESTNAME DESCRIPTION NO TR(E) Basic move and translate, one byte at a time TRE INPL TRE loop in-place TRE MVC TRE loop buffer-to-buffer move first TR INPL TR loop in-place TR MVC TR loop buffer-to-buffer move first TR is always faster than TRE. Having to move the data from an input buffer to a separate output buffer for translation increases the CPU time required by ~15%. That is still way less than the overhead of the basic move and translate, which is the fastest technique only for 0, 1, 2, or 3 bytes (for more than 3 bytes, the basic TR loop, or even the TR loop with the data to be translated having to be moved to the output buffer first, is fastest). The above figures include the equivalent moves to make it a 'truly fair comparison of solutions to the stated problem'. It reflects what I have always observed about such tests: a well-coded, basic, tight MVC loop (or an MVCL) is pretty fast compared to almost anything else that involves a half-dozen or so instructions that do virtually anything. Thus, counting the CPU time that is required to move the data to a separate buffer as part of the overhead doesn't actually add that much to the CPU time required to get the whole job done. I suspect that this is simply due to the fact that MVC and MVCL are already pretty well-optimized for the job they do. Even a basic, tight loop will be limited by some performance constraint, probably by the rate at which instructions whose execution cannot be overlapped can be pumped through the machine (in contrast to blobs of data MVCing and TR[T]ing thru the wires all as part of one instruction). Today, for all intents and purposes, the time required to execute any given standard instruction is the same as any other. This is because the work to be done can be done in the available time, before another instruction is fetched and shoved through the internal machinery. The instructions which process more than a word or two of data take longer, of course. But some of those are very highly optimized (in hardware -- for example, the LM and STM instructions are no longer pigs. They are, in fact, fairly effective substitutes for MVC, except
Re: Long translate (TR) instruction?
Edward Jaffe wrote: The following fragment should work if you prefer looping TRE over traditional TR. TRE requires you to manually translate the so-called stop character with an MVC. But, at least there's no EXecute for the final segment. LM R14,R15,xx Load string ptr and its length LA R1,xxPtr to translation table XR R0,R0Set stop char = x'00' DO INFDo for translate TRE R14,R1 Translate the string DOEXIT Z Exit if no more data IF O If iterate needed ITERATE , Process another segment ENDIF , EndIf MVC 0(1,R14),0(R1) Translate x'00' to whatever LAR14,1(,R14) Advance past stop character AHI R15,-1 Decrement length remaining DOEXIT NPExit if no more data ENDDO , EndDo for translate Art Celestini wrote: It seems that the TRE instruction has been in z/Arch for at least a few years. If anyone is inclined to try this: XR R1,R1 Clear for insert LR15,LengthLoad string length Loop IC R1,Input-1(R15) Get input byte IC R0,XlatTab(R1)Get translated character ... STC R0,Output-1(R15) ... and store it in output BCT R15,Loop Decrement length loop until done it would be interesting to see how it fares against Ed Jaffe's code. I did this, since I had a program I could just plug these code segments into without doing a lot of work. Results are below. I believe the OP said that the data to be translated had to first be moved from one buffer to another. The above does that, but a move of some type needs to be added to Ed's code to make it a true comparison. Maybe, maybe not. I've got code that needs to translate stuff in a buffer and it does not need it moved. And I have other code that first moves it and then translates it, because it doesn't want to clobber what it's translating. But, I did it both ways, just to find out for sure if it made a difference. It does not. The TRE loop is so much faster for any substantial number of bytes (which I define as more than 256, since that number or less can be handled directly, inline, simply by using the TR instruction) that the overhead of even a MVCL does not even begin to eat into the gain by using a TRE loop. So, the fact that with a TRE loop subroutine or macro you might whip up you first have to move the data to be translated if you do not want the original data clobbered is simply not relevant from a performance perspective. Since there is no use for the non-TRE loop subroutine (because its performance is horrible for any substantial number of bytes), we are left with the TRE or TR subroutines, which translate the data directly in the buffer provided, which is what most programmers would want to have available to call most of the time anyway, IMHO. If not, then they would first have had to move the data to some other buffer before TR'ing it anyway. As you will see below, the TRE loop was faster for me when I gave it more than 7 to 19 bytes. I'd never give it that few since for anything = 256 I'd just code a TR inline. But if I didn't know how many bytes, then you can see that there is plenty of CPU time left to test for 256 or less and do a TR inline if so, or else call the TR[E] subroutine if I had more than 256. Regardless, an ordinary TR loop is still faster than using TRE. But this is what you would expect. The TR loop code is not any more complicated than the TRE loop code in the first place. It's just different. TRE does not replace TR. It's for another purpose, basically, not for performance. I revised the code above to suit my own personal taste and needs. I made an improvement in the TRE subroutine proposed by Edward Jaffe to allow the caller to specify the test character, so that performance will not suffer if the data to be translated contains a lot of null bytes (as Ed's would). That meant that the MVC had to become an IC + STC. Here is the code for the subroutines I called repeatedly to gather the timing figures: **-- ** ** NOTE: ENTER VIABAS R8,NOTR WITH REGS SET AS FOLLOWS: ** R14 = INPUT BUFFER ADDRESS ** R15 = OUTPUT BUFFER ADDRESS ** R0 = LENGTH OF BOTH INPUT AND OUTPUT BUFFER (MAY BE ZERO) ** R1 = 256-BYTE TRANSLATE TABLE ADDRESS ** **-- NOTR LTR R2,R0COPY LENGTH AND TEST FOR ZERO
Re: Long translate (TR) instruction?
In [EMAIL PROTECTED], on 03/24/2008 at 12:30 PM, Kirk Wolf [EMAIL PROTECTED] said: I'm looking for the fastest way in assembler to translate data in one buffer to another using a 256-byte translate table. The fastest way on one model may not be the fastest way on another model. Any advise on the fastest instruction path to do this would be appreciated. Time seveal approaches on the specific box you're targetting. -- Shmuel (Seymour J.) Metz, SysProg and JOAT ISO position; see http://patriot.net/~shmuel/resume/brief.html We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Long translate (TR) instruction?
Kirk Wolf said: I'm looking for the fastest way in assembler to translate data in one buffer to another using a 256-byte translate table. Want my test program to help you decide? Let me know. But don't waste your time. I already know the answer. Look at my TR subroutine in a previous post as a place to get started (if you need that). Shmuel Metz (Seymour J.) said: The fastest way on one model may not be the fastest way on another model. True. But -- I just knew you were expecting a but -- I have been looking at this off and on for about 8 years, and have had access to most (if not all) models of zXXX hardware (currently I have access to a 2094, 2096, 2086 and a 2066). I have NEVER found an instruction sequence that would run faster than a simple old-fashioned TR[T] (or MVC or CLC) loop on ANY z model machine - except an MVCL or CLCL for a very large number of bytes. Since very little code like this is on a performance-critical path, I mostly just use whatever is convenient; in such a case it does not really matter. If I believe the code is on a performance-critical path I'll use a subroutine that does it the old-fashioned way (TR/TRT/CLC/MVC loop or whatever), unless I have special knowledge that lots of bytes (more than 4KB) need to be MVCed/CLCed. Thus, if Mr. Wolf currently has a z box (Duh!) I can tell him that the answer to that question -- TODAY -- is just do an old-fashioned MVC loop (or an MVCL) to move the data to the buffer where one will need it after translation, and then use an old-fashioned TR loop to actually do it in that (output) buffer. On any z box that exists today that is the fastest way. And I bet it stays that way in the future, probably forever. Why? There is very little that microcode/millicode can do faster than the current raw, basic machine can do with these fundamental S/360- era instructions. The same basic internal operations to get the job done have to be done in each instance so it does not matter whether the orders are coming from code or millicode/microcode. Now, if the machine offered the TR[T]L instructions, then probably -- just as it is the case for MVCL and CLCL -- those would run just a little faster than an old-fashioned basic TR[T] loop, but only for large numbers of bytes. But we don't have TR[T]L so the System/360 instructions are still the fastest way. -- WB -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Long translate (TR) instruction?
In a message dated 3/24/2008 2:10:15 P.M. Central Daylight Time, [EMAIL PROTECTED] writes: Even if the z10 offered a Translate Extended instruction, the OP couldn't count on it being there on every Customer's machine for quite a while. The OP can use dual paths. If executing on machines without the newer instruction, then use TR; if the newer instruction is available, then use it. But don't put the test inside the loop. Nor is there any guarantee that IBM won't redesign the internals of whatever today is the fastest way to do something so that on a future processor it is slower, as in changing microcode into millicode. Bill Fairchild Rocket Software **Create a Home Theater Like the Pros. Watch the video on AOL Home. (http://home.aol.com/diy/home-improvement-eric-stromer?video=15?ncid=aolhom000301) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Long translate (TR) instruction?
It seems that the TRE instruction has been in z/Arch for at least a few years. If anyone is inclined to try this, it would be interesting to see how it fares against Ed Jaffe's code: XR R1,R1 Clear for insert LR15,LengthLoad string length Loop IC R1,Input-1(R15) Get input byte IC R0,XlatTab(R1)Get translated character ... STC R0,Output-1(R15) ... and store it in output BCT R15,Loop Decrement length loop until done I believe the OP said that the data to be translated had to first be moved from one buffer to another. The above does that, but a move of some type needs to be added to Ed's code to make it a true comparison. --Art C. At 03:42 PM 3/24/2008, Edward Jaffe wrote: McKown, John wrote: I don't think you have a choice, in the general case. That is because all the new TRxx type instructions seem to terminate when the data in your buffer equals to the contents of the low order byte general register 0. I.e. they stop at an end of buffer type character, like a null in a C string. If you can tolerate this behaviour, then I'd look at the TRE or TROO instruction. The TRE seems easier to use, to me. The following fragment should work if you prefer looping TRE over traditional TR. TRE requires you to manually translate the so-called stop character with an MVC. But, at least there's no EXecute for the final segment. LM R14,R15,xx Load string ptr and its length LA R1,xxPtr to translation table XR R0,R0Set stop char = x'00' DO INFDo for translate TRE R14,R1 Translate the string DOEXIT Z Exit if no more data IF O If iterate needed ITERATE , Process another segment ENDIF , EndIf MVC 0(1,R14),0(R1) Translate x'00' to whatever LAR14,1(,R14) Advance past stop character AHI R15,-1 Decrement length remaining DOEXIT NPExit if no more data ENDDO , EndDo for translate == Art Celestini Celestini Development Services Phone: 201-670-1674Wyckoff, NJ = http://celestini.com = Mail sent to the From address used in this post will be rejected by our server. Please send off- list email to: ibmmainat-signcelestinidotcom. == -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Long translate (TR) instruction?
Art Celestini wrote: It seems that the TRE instruction has been in z/Arch for at least a few years. If anyone is inclined to try this, it would be interesting to see how it fares against Ed Jaffe's code: XR R1,R1 Clear for insert LR15,LengthLoad string length Loop IC R1,Input-1(R15) Get input byte IC R0,XlatTab(R1)Get translated character ... STC R0,Output-1(R15) ... and store it in output BCT R15,Loop Decrement length loop until done I believe the OP said that the data to be translated had to first be moved from one buffer to another. The above does that, but a move of some type needs to be added to Ed's code to make it a true comparison. Some years ago, on our z800 processor, we measured the performance of (in-place) TR against a software-coded loop. We found that the loop was faster than TR for strings shorter than nine (9) bytes in length. When we spoke to IBM about this, we learned that TR had been partially moved into millicode for the z900/z800. It ran slower for short strings because of the millicode start/stop (aka subroutine linkage) costs. For strings longer than nine bytes, TR was faster because it had access to a hardware facility that could translate two bytes per cycle. The code fragments we compared were: |CASE1DC0H | LAR2,9 | LAR3,DATA | XRR4,R4 |CASE1L1 DS0H | ICR4,0(,R3) | ICR4,EBCDIC(R4) | STC R4,0(,R3) | AHI R4,1 | AHI R3,1 | JCT R2,CASE1L1 |CASE1L EQU *-CASE1 |CASE2DC0H | TRDATA(9),EBCDIC |CASE2L EQU *-CASE2 We later unrolled the loop, interleaving the use of three different registers, and found it was now faster than TR for strings of 24 bytes or fewer! |Stride EQU 3 |CASE1DC0H | LAR0,9/Stride | LAR3,DATA | XRR4,R4 | XRR5,R5 | XRR6,R6 |CASE1L1 DS0H | ICR4,0(,R3) | ICR5,1(,R3) | ICR6,2(,R3) | ICR4,EBCDIC(R4) | ICR5,EBCDIC(R5) | ICR6,EBCDIC(R6) | STC R4,0(,R3) | STC R5,1(,R3) | STC R6,2(,R3) | AHI R3,Stride | JCT R0,CASE1L1 |CASE1L EQU *-CASE1 The results of the above experiments suggest that your loop has an excellent chance of being faster than *any* sequence involving TR or TRE, for strings shorter than some number of bytes 'n', on any given hardware generation supporting z/Architecture. -- Edward E Jaffe Phoenix Software International, Inc 5200 W Century Blvd, Suite 800 Los Angeles, CA 90045 310-338-0400 x318 [EMAIL PROTECTED] http://www.phoenixsoftware.com/ -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Long translate (TR) instruction?
Hi, I'm looking for the fastest way in assembler to translate data in one buffer to another using a 256-byte translate table. The TR instruction is only up to 256 bytes, and I can't figure out if one of the newer instructions is a replacement for arbitrary length translations, or if the best approach is just to loop for 256 byte chunks. The average length transaction is almost certainly less than 256 bytes. Any advise on the fastest instruction path to do this would be appreciated. Thanks, Kirk Wolf Dovetailed Technologies -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Long translate (TR) instruction?
Look at the TRTE, TRanslate and Test Extended, instruction on pp. 7-231ff of the current PROP. Looping is still required when a condition-code value of 3 is set, but only a branch back to the same, already executed TRTE is required to accomplish it. In particular, there is no requirement for a running count of the number of bytes that remain to be translated. John Gilmore Ashland, MA 01721-1817 USA Date: Mon, 24 Mar 2008 12:30:51 -0500 From: [EMAIL PROTECTED] Subject: Long translate (TR) instruction? To: IBM-MAIN@bama.ua.edu Hi, I'm looking for the fastest way in assembler to translate data in one buffer to another using a 256-byte translate table. The TR instruction is only up to 256 bytes, and I can't figure out if one of the newer instructions is a replacement for arbitrary length translations, or if the best approach is just to loop for 256 byte chunks. The average length transaction is almost certainly less than 256 bytes. Any advise on the fastest instruction path to do this would be appreciated. Thanks, Kirk Wolf Dovetailed Technologies -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html _ Watch “Cause Effect,” a show about real people making a real difference. Learn more. http://im.live.com/Messenger/IM/MTV/?source=text_watchcause -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Long translate (TR) instruction?
TRT[E] is not the same as TR. It *stops* if the translated-to byte is non-zero. It's hard to say whether a TR loop or just an open code loop would be better on the current in-use crop of hardware. If CPU performance is extremely important, I'd do some experimenting. Even if the z10 offered a Translate Extended instruction, the OP couldn't count on it being there on every Customer's machine for quite a while. It sounded like the data had to be moved in addition to translated, so an open code solution might handle both the move and the translate (one byte at a time). --Art C. At 02:22 PM 3/24/2008, john gilmore wrote: Look at the TRTE, TRanslate and Test Extended, instruction on pp. 7-231ff of the current PROP. Looping is still required when a condition-code value of 3 is set, but only a branch back to the same, already executed TRTE is required to accomplish it. In particular, there is no requirement for a running count of the number of bytes that remain to be translated. == Art Celestini Celestini Development Services Phone: 201-670-1674Wyckoff, NJ = http://celestini.com = Mail sent to the From address used in this post will be rejected by our server. Please send off- list email to: ibmmainat-signcelestinidotcom. == -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Long translate (TR) instruction?
-Original Message- From: IBM Mainframe Discussion List [mailto:[EMAIL PROTECTED] On Behalf Of Kirk Wolf Sent: Monday, March 24, 2008 12:31 PM To: IBM-MAIN@bama.ua.edu Subject: Long translate (TR) instruction? Hi, I'm looking for the fastest way in assembler to translate data in one buffer to another using a 256-byte translate table. The TR instruction is only up to 256 bytes, and I can't figure out if one of the newer instructions is a replacement for arbitrary length translations, or if the best approach is just to loop for 256 byte chunks. The average length transaction is almost certainly less than 256 bytes. Any advise on the fastest instruction path to do this would be appreciated. Thanks, Kirk Wolf Dovetailed Technologies I don't think you have a choice, in the general case. That is because all the new TRxx type instructions seem to terminate when the data in your buffer equals to the contents of the low order byte general register 0. I.e. they stop at an end of buffer type character, like a null in a C string. If you can tolerate this behaviour, then I'd look at the TRE or TROO instruction. The TRE seems easier to use, to me. -- John McKown Senior Systems Programmer HealthMarkets Keeping the Promise of Affordable Coverage Administrative Services Group Information Technology The information contained in this e-mail message may be privileged and/or confidential. It is for intended addressee(s) only. If you are not the intended recipient, you are hereby notified that any disclosure, reproduction, distribution or other use of this communication is strictly prohibited and could, in certain circumstances, be a criminal offense. If you have received this e-mail in error, please notify the sender by reply and delete this message without copying or disclosing it. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Long translate (TR) instruction?
McKown, John wrote: I don't think you have a choice, in the general case. That is because all the new TRxx type instructions seem to terminate when the data in your buffer equals to the contents of the low order byte general register 0. I.e. they stop at an end of buffer type character, like a null in a C string. If you can tolerate this behaviour, then I'd look at the TRE or TROO instruction. The TRE seems easier to use, to me. The following fragment should work if you prefer looping TRE over traditional TR. TRE requires you to manually translate the so-called stop character with an MVC. But, at least there's no EXecute for the final segment. LM R14,R15,xx Load string ptr and its length LA R1,xxPtr to translation table XR R0,R0Set stop char = x'00' DO INFDo for translate TRE R14,R1 Translate the string DOEXIT Z Exit if no more data IF O If iterate needed ITERATE , Process another segment ENDIF , EndIf MVC 0(1,R14),0(R1) Translate x'00' to whatever LAR14,1(,R14) Advance past stop character AHI R15,-1 Decrement length remaining DOEXIT NPExit if no more data ENDDO , EndDo for translate -- Edward E Jaffe Phoenix Software International, Inc 5200 W Century Blvd, Suite 800 Los Angeles, CA 90045 310-338-0400 x318 [EMAIL PROTECTED] http://www.phoenixsoftware.com/ -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Long translate (TR) instruction?
Thanks everyone for all the advice. Kirk On Mon, Mar 24, 2008 at 2:42 PM, Edward Jaffe [EMAIL PROTECTED] wrote: McKown, John wrote: I don't think you have a choice, in the general case. That is because all the new TRxx type instructions seem to terminate when the data in your buffer equals to the contents of the low order byte general register 0. I.e. they stop at an end of buffer type character, like a null in a C string. If you can tolerate this behaviour, then I'd look at the TRE or TROO instruction. The TRE seems easier to use, to me. The following fragment should work if you prefer looping TRE over traditional TR. TRE requires you to manually translate the so-called stop character with an MVC. But, at least there's no EXecute for the final segment. LM R14,R15,xx Load string ptr and its length LA R1,xxPtr to translation table XR R0,R0Set stop char = x'00' DO INFDo for translate TRE R14,R1 Translate the string DOEXIT Z Exit if no more data IF O If iterate needed ITERATE , Process another segment ENDIF , EndIf MVC 0(1,R14),0(R1) Translate x'00' to whatever LAR14,1(,R14) Advance past stop character AHI R15,-1 Decrement length remaining DOEXIT NPExit if no more data ENDDO , EndDo for translate -- Edward E Jaffe Phoenix Software International, Inc 5200 W Century Blvd, Suite 800 Los Angeles, CA 90045 310-338-0400 x318 [EMAIL PROTECTED] http://www.phoenixsoftware.com/ -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html