Re: use bytes

David Graff Wed, 19 May 2004 01:38:05 -0700


[EMAIL PROTECTED] said:
> I'm confused, can someone tell me why:
>
>     #!/usr/bin/perl
>     use bytes;
>     $x = chr( 400 );
>     print "Length is ", length( $x ), "\n";
>
> prints 1, while
>
>     #!/usr/bin/perl
>     $x = chr( 400 );
>     use bytes;
>     print "Length is ", length( $x ), "\n";
>
> prints 2?


The positioning of the "use bytes" pragma is important -- in the code that
follows "use bytes", the handling of values that could be wide characters 
is altered to defeat interpreting them as unicode.

There is a third case, without "use bytes" in there at all, which would 
also print 1.  But here is a version that might be more enlightening:

#!/usr/bin/perl

$x = chr(400);
printf( "set x = %x; length of %x is %d\n", 400, ord($x), length($x);

# prints "set x = 190; length of 190 is 1
# note that "190" here means Unicode point U0190 (Latin capital letter epsilon)

use bytes;
printf( "byte length of x is %d : %x %x\n", length($x), map{ord()} split( //, $x ));

# prints "byte length of x is 2 : c6 90
# where "c6 90" is the two-bye UTF-8 representation of U0190

# still using bytes at this point...

$x = chr(400);  # doesn't do what you want: can't have byte characters > 255

printf( "set x = %x; x is really %x with length %d\n", 400, ord($x), length($x));

# prints "set x = 190; x is really 90 with length 1"
# note that the bits above 0xFF have been ignored.


Hope that clears things up.

        Dave Graff

Re: use bytes

Reply via email to