Re: bareword test on ebcdic.
--- Yitzchak Scott-Thoennes <[EMAIL PROTECTED]> wrote: > On Thu, Jul 28, 2005 at 12:35:13AM -0700, rajarshi > das wrote: > > Nicholas Clark wrote: > >> If you put those 3 bytes directly between the '{' > and '}' characters in > >> the EBCDIC version of that 1 liner, does it also > print 3500? > > > I am unable to put those three bytes in the > 1-liner you mentioned above, since I am unable to > print the chars corresponding to those bytes > (www.kostis.net/charsets/ebc1047.htm) on the command > line. > > >> I think that the regression tests tended to do > something like > >> > >> if (ord 'A' == 65) { > >> # Do the ASCII/UTF-8 version > >> } else { > >> # Assume EBCDIC > >> } > > I tried to fix the attribution above; apologies if I > got it wrong. > > I think the way you want to test this is something > like: > > $key = "\x{0442}\x{0435}\x{0441}\x{0442}"; > if ( $hash{$key} eq eval "\$hash{$key}" ) But, would doing something like, $key = "\x{0442}\x{0435}\x{0441}\x{0442}"; be within the scope of a bareword test ? Also, does eval "\$hash{$key}" as in the 'if' condition remain within the scope of a bareword test ? > > It's unclear to me whether $key needs to be > different for EBCDIC. \x{0442} is the unicode value for the character that we are trying to test. So, as long as we are testing the same character, $key needs to be the same on both platforms. > > Are you just using perl on z/OS, or are you building > it yourself? I am building perl on z/OS and using it. > If the latter, Dave Mitchell has been looking for > someone to test > some parser changes he made on an EBCDIC platform so > they can be > integrated into the 5.8.x series. > Thanks, Rajarshi. Start your day with Yahoo! - make it your home page http://www.yahoo.com/r/hs
Re: bareword test on ebcdic.
On Thu, Jul 28, 2005 at 12:35:13AM -0700, rajarshi das wrote: > Nicholas Clark wrote: >> If you put those 3 bytes directly between the '{' and '}' characters in >> the EBCDIC version of that 1 liner, does it also print 3500? > I am unable to put those three bytes in the 1-liner you mentioned above, > since I am unable to print the chars corresponding to those bytes > (www.kostis.net/charsets/ebc1047.htm) on the command line. >> I think that the regression tests tended to do something like >> >> if (ord 'A' == 65) { >> # Do the ASCII/UTF-8 version >> } else { >> # Assume EBCDIC >> } I tried to fix the attribution above; apologies if I got it wrong. I think the way you want to test this is something like: $key = "\x{0442}\x{0435}\x{0441}\x{0442}"; if ( $hash{$key} eq eval "\$hash{$key}" ) It's unclear to me whether $key needs to be different for EBCDIC. Are you just using perl on z/OS, or are you building it yourself? If the latter, Dave Mitchell has been looking for someone to test some parser changes he made on an EBCDIC platform so they can be integrated into the 5.8.x series.
Re: bareword test on ebcdic.
On Wed, Jul 27, 2005 at 11:01:08PM +0100, Nicholas Clark wrote: > My question is, what are the bytes in UTF-EBCDIC that encode code point 3500? http://www.unicode.org/reports/tr16/ I *think* codepoint 3500, ie 0xdac, ie [0011][01101][01100] maps to the i8 bytes 1110[0011] 101[01101] 101[01100], ie 0xe3, 0xad, 0xac which after going through the i8 to UTF-EBCDIC byte conversion, comes out as 0xba, 0x54, 0x53 Obvious really. -- Thank God I'm an atheist.
Re: bareword test on ebcdic.
Nicholas Clark <[EMAIL PROTECTED]> wrote: On Tue, Jul 26, 2005 at 08:48:10AM -0700, rajarshi das wrote:> > For the code points being tested> > ("\x{0442}\x{0435}\x{0441}\x{0442}")> > does the perl source file contain the correct byte> > sequence in UTF-EBCDIC?> Yes it does, since I ran the test, > if (($hash{"\x{0442}\x{0435}\x{0441}\x{0442}"}) eq> ($hash{eval '"\x{0442}\x{0435}\x{0441}\x{0442}"'}))> print "ok\n";> and the test ran fine, if that is what you mean by the> source file containing the correct byte sequence. Or> am I mistaken ?You are mistaken, I'm afraid. bareword means no quotes.In ASCII & UTF-8 land, the 1 liner$ perl -le 'use utf8; $a{ඬ}++; print map {ord} keys %a'gives3500The 3 bytes in the source code between '{' and '}' are 224, 182 and 172which are the UTF-8 encoding for the code point 3500.My question is, what are the bytes in UTF-EBCDIC that encode code point 3500? The equivalent bytes on UTF-EBCDIC are 186, 84 and 83. If you put those 3 bytes directly between the '{' and '}' characters inthe EBCDIC version of that 1 liner, does it also print 3500?I am unable to put those three bytes in the 1-liner you mentioned above, since I am unable to print the chars corresponding to those bytes (www.kostis.net/charsets/ebc1047.htm) on the command line. > > If so, *that* would explain the failures, and be the> > thing that needs> > correcting. The test file would need if/else with a> > different test on EBCDIC.> what would you suggest be put in the if/ else ?I think that the regression tests tended to do something likeif (ord 'A' == 65) {# Do the ASCII/UTF-8 version} else {# Assume EBCDIC} Thanks, Rajarshi. Nicholas Clark Start your day with Yahoo! - make it your home page
Re: bareword test on ebcdic.
On Tue, Jul 26, 2005 at 08:48:10AM -0700, rajarshi das wrote: > > For the code points being tested > > ("\x{0442}\x{0435}\x{0441}\x{0442}") > > does the perl source file contain the correct byte > > sequence in UTF-EBCDIC? > Yes it does, since I ran the test, > if (($hash{"\x{0442}\x{0435}\x{0441}\x{0442}"}) eq > ($hash{eval '"\x{0442}\x{0435}\x{0441}\x{0442}"'})) > print "ok\n"; > and the test ran fine, if that is what you mean by the > source file containing the correct byte sequence. Or > am I mistaken ? You are mistaken, I'm afraid. bareword means no quotes. In ASCII & UTF-8 land, the 1 liner $ perl -le 'use utf8; $a{ඬ}++; print map {ord} keys %a' gives 3500 The 3 bytes in the source code between '{' and '}' are 224, 182 and 172 which are the UTF-8 encoding for the code point 3500. My question is, what are the bytes in UTF-EBCDIC that encode code point 3500? If you put those 3 bytes directly between the '{' and '}' characters in the EBCDIC version of that 1 liner, does it also print 3500? > > If so, *that* would explain the failures, and be the > > thing that needs > > correcting. The test file would need if/else with a > > different test on EBCDIC. > what would you suggest be put in the if/ else ? I think that the regression tests tended to do something like if (ord 'A' == 65) { # Do the ASCII/UTF-8 version } else { # Assume EBCDIC } Nicholas Clark
Re: bareword test on ebcdic.
--- Nicholas Clark <[EMAIL PROTECTED]> wrote: > On Tue, Jul 26, 2005 at 08:12:16AM -0700, rajarshi > das wrote: > > > I basically want to know if there are alternate > ways > > of representing barewords (as I mentioned in > question > > 2) above) ? > > No. By definition there can not be. > You're failing to grasp what is meant by "bareword". > There is only one representation. > > > Also, any pointers that you have regarding where > to > > look to fix this ? > > Not much better than "in toke.c or utf8.c" > > However, based on a comment I've spotted at the top > of utfebcdic.h *think* > that the internal encoding of perl on an EBCDIC > system is UTF-EBCDIC rather > than UTF-8. The byte sequence in the source file for > the bareword will need > to be valid UTF-EBCDIC. > > For the code points being tested > ("\x{0442}\x{0435}\x{0441}\x{0442}") > does the perl source file contain the correct byte > sequence in UTF-EBCDIC? Yes it does, since I ran the test, if (($hash{"\x{0442}\x{0435}\x{0441}\x{0442}"}) eq ($hash{eval '"\x{0442}\x{0435}\x{0441}\x{0442}"'})) print "ok\n"; and the test ran fine, if that is what you mean by the source file containing the correct byte sequence. Or am I mistaken ? > > Does the byte sequence in UTF-EBCDIC for those 4 > code points differ from the > byte sequence in UTF-8? > Yes the byte sequence for the 4 code points is different on UTF-EBCDIC from the sequence in UTF-8. > Does the source file happen to have the UTF-8 byte > sequence? It has the UTF-EBCDIC byte sequence on the ebcdic platform. > > If so, *that* would explain the failures, and be the > thing that needs > correcting. The test file would need if/else with a > different test on EBCDIC. what would you suggest be put in the if/ else ? > > Nicholas Clark > Thanks, Rajarshi. > > __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com