Re: bareword test on ebcdic.

2005-07-31 Thread rajarshi das


--- Yitzchak Scott-Thoennes <[EMAIL PROTECTED]> wrote:

> On Thu, Jul 28, 2005 at 12:35:13AM -0700, rajarshi
> das wrote:
> > Nicholas Clark wrote:
> >> If you put those 3 bytes directly between the '{'
> and '}' characters in
> >> the EBCDIC version of that 1 liner, does it also
> print 3500?
> 
> > I am unable to put those three bytes in the
> 1-liner you mentioned above, since I am unable to
> print the chars corresponding to those bytes
> (www.kostis.net/charsets/ebc1047.htm) on the command
> line. 
> 
> >> I think that the regression tests tended to do
> something like
> >> 
> >> if (ord 'A' == 65) {
> >> # Do the ASCII/UTF-8 version
> >> } else {
> >> # Assume EBCDIC
> >> }
> 
> I tried to fix the attribution above; apologies if I
> got it wrong.
> 
> I think the way you want to test this is something
> like:
> 
>   $key = "\x{0442}\x{0435}\x{0441}\x{0442}";
>   if ( $hash{$key} eq eval "\$hash{$key}" )
But, would doing something like,
$key = "\x{0442}\x{0435}\x{0441}\x{0442}";
be within the scope of a bareword test ?

Also, does eval "\$hash{$key}" as in the 'if'
condition remain within the scope of a bareword test ?


> 
> It's unclear to me whether $key needs to be
> different for EBCDIC.
\x{0442} is the unicode value for the character that
we are trying to test. So, as long as we are testing
the same character, $key needs to be the same on both
platforms.
> 
> Are you just using perl on z/OS, or are you building
> it yourself?
I am building perl on z/OS and using it. 

> If the latter, Dave Mitchell has been looking for
> someone to test
> some parser changes he made on an EBCDIC platform so
> they can be
> integrated into the 5.8.x series.
> 

Thanks,
Rajarshi.




Start your day with Yahoo! - make it your home page 
http://www.yahoo.com/r/hs 
 


Re: bareword test on ebcdic.

2005-07-28 Thread Yitzchak Scott-Thoennes
On Thu, Jul 28, 2005 at 12:35:13AM -0700, rajarshi das wrote:
> Nicholas Clark wrote:
>> If you put those 3 bytes directly between the '{' and '}' characters in
>> the EBCDIC version of that 1 liner, does it also print 3500?

> I am unable to put those three bytes in the 1-liner you mentioned above, 
> since I am unable to print the chars corresponding to those bytes 
> (www.kostis.net/charsets/ebc1047.htm) on the command line. 

>> I think that the regression tests tended to do something like
>> 
>> if (ord 'A' == 65) {
>> # Do the ASCII/UTF-8 version
>> } else {
>> # Assume EBCDIC
>> }

I tried to fix the attribution above; apologies if I got it wrong.

I think the way you want to test this is something like:

  $key = "\x{0442}\x{0435}\x{0441}\x{0442}";
  if ( $hash{$key} eq eval "\$hash{$key}" )

It's unclear to me whether $key needs to be different for EBCDIC.

Are you just using perl on z/OS, or are you building it yourself?
If the latter, Dave Mitchell has been looking for someone to test
some parser changes he made on an EBCDIC platform so they can be
integrated into the 5.8.x series.


Re: bareword test on ebcdic.

2005-07-28 Thread Dave Mitchell
On Wed, Jul 27, 2005 at 11:01:08PM +0100, Nicholas Clark wrote:
> My question is, what are the bytes in UTF-EBCDIC that encode code point 3500?

http://www.unicode.org/reports/tr16/

I *think* codepoint 3500, ie 0xdac, ie [0011][01101][01100]

maps to the i8 bytes

1110[0011] 101[01101] 101[01100], ie 0xe3, 0xad, 0xac

which after going through the i8 to UTF-EBCDIC byte conversion, comes out
as

0xba, 0x54, 0x53

Obvious really.


-- 
Thank God I'm an atheist.


Re: bareword test on ebcdic.

2005-07-28 Thread rajarshi das
Nicholas Clark <[EMAIL PROTECTED]> wrote:

On Tue, Jul 26, 2005 at 08:48:10AM -0700, rajarshi das wrote:> > For the code points being tested> > ("\x{0442}\x{0435}\x{0441}\x{0442}")> > does the perl source file contain the correct byte> > sequence in UTF-EBCDIC?> Yes it does, since I ran the test, > if (($hash{"\x{0442}\x{0435}\x{0441}\x{0442}"}) eq> ($hash{eval '"\x{0442}\x{0435}\x{0441}\x{0442}"'}))> print "ok\n";> and the test ran fine, if that is what you mean by the> source file containing the correct byte sequence. Or> am I mistaken ?You are mistaken, I'm afraid. bareword means no quotes.In ASCII & UTF-8 land, the 1 liner$ perl -le 'use utf8; $a{ඬ}++; print map {ord} keys %a'gives3500The 3 bytes in the source code between '{' and '}' are 224, 182 and 172which are the UTF-8 encoding for the code point 3500.My question is, what are the bytes in UTF-EBCDIC that
 encode code point 3500?
The equivalent bytes on UTF-EBCDIC are 186, 84 and 83. 
If you put those 3 bytes directly between the '{' and '}' characters inthe EBCDIC version of that 1 liner, does it also print 3500?I am unable to put those three bytes in the 1-liner you mentioned above, since I am unable to print the chars corresponding to those bytes (www.kostis.net/charsets/ebc1047.htm) on the command line. > > If so, *that* would explain the failures, and be the> > thing that needs> > correcting. The test file would need if/else with a> > different test on EBCDIC.> what would you suggest be put in the if/ else ?I think that the regression tests tended to do something likeif (ord 'A' == 65) {# Do the ASCII/UTF-8 version} else {# Assume EBCDIC}
Thanks,
Rajarshi.
Nicholas Clark
		 Start your day with Yahoo! - make it your home page 

Re: bareword test on ebcdic.

2005-07-27 Thread Nicholas Clark
On Tue, Jul 26, 2005 at 08:48:10AM -0700, rajarshi das wrote:

> > For the code points being tested
> > ("\x{0442}\x{0435}\x{0441}\x{0442}")
> > does the perl source file contain the correct byte
> > sequence in UTF-EBCDIC?
> Yes it does, since I ran the test, 
> if (($hash{"\x{0442}\x{0435}\x{0441}\x{0442}"}) eq
> ($hash{eval '"\x{0442}\x{0435}\x{0441}\x{0442}"'}))
> print "ok\n";
> and the test ran fine, if that is what you mean by the
> source file containing the correct byte sequence. Or
> am I mistaken ?

You are mistaken, I'm afraid. bareword means no quotes.

In ASCII & UTF-8 land, the 1 liner

$ perl -le 'use utf8; $a{ඬ}++; print map {ord} keys %a'

gives

3500


The 3 bytes in the source code between '{' and '}' are 224, 182 and 172
which are the UTF-8 encoding for the code point 3500.

My question is, what are the bytes in UTF-EBCDIC that encode code point 3500?
If you put those 3 bytes directly between the '{' and '}' characters in
the EBCDIC version of that 1 liner, does it also print 3500?

> > If so, *that* would explain the failures, and be the
> > thing that needs
> > correcting. The test file would need if/else with a
> > different test on EBCDIC.
> what would you suggest be put in the if/ else ?

I think that the regression tests tended to do something like

if (ord 'A' == 65) {
  # Do the ASCII/UTF-8 version
} else {
  # Assume EBCDIC
}

Nicholas Clark


Re: bareword test on ebcdic.

2005-07-26 Thread rajarshi das


--- Nicholas Clark <[EMAIL PROTECTED]> wrote:

> On Tue, Jul 26, 2005 at 08:12:16AM -0700, rajarshi
> das wrote:
> 
> > I basically want to know if there are alternate
> ways
> > of representing barewords (as I mentioned in
> question
> > 2) above) ? 
> 
> No. By definition there can not be.
> You're failing to grasp what is meant by "bareword".
> There is only one representation.
> 
> > Also, any pointers that you have regarding where
> to
> > look to fix this ? 
> 
> Not much better than "in toke.c or utf8.c"
> 
> However, based on a comment I've spotted at the top
> of utfebcdic.h *think*
> that the internal encoding of perl on an EBCDIC
> system is UTF-EBCDIC rather
> than UTF-8. The byte sequence in the source file for
> the bareword will need
> to be valid UTF-EBCDIC.
> 
> For the code points being tested
> ("\x{0442}\x{0435}\x{0441}\x{0442}")
> does the perl source file contain the correct byte
> sequence in UTF-EBCDIC?
Yes it does, since I ran the test, 
if (($hash{"\x{0442}\x{0435}\x{0441}\x{0442}"}) eq
($hash{eval '"\x{0442}\x{0435}\x{0441}\x{0442}"'}))
print "ok\n";
and the test ran fine, if that is what you mean by the
source file containing the correct byte sequence. Or
am I mistaken ?

> 
> Does the byte sequence in UTF-EBCDIC for those 4
> code points differ from the
> byte sequence in UTF-8?
> 
Yes the byte sequence for the 4 code points is
different on UTF-EBCDIC from the sequence in UTF-8.

> Does the source file happen to have the UTF-8 byte
> sequence?
It has the UTF-EBCDIC byte sequence on the ebcdic
platform.
> 
> If so, *that* would explain the failures, and be the
> thing that needs
> correcting. The test file would need if/else with a
> different test on EBCDIC.
what would you suggest be put in the if/ else ?
> 
> Nicholas Clark
> 
Thanks,
Rajarshi.
> 
> 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com