[EMAIL PROTECTED] said: > I'm confused, can someone tell me why: > > #!/usr/bin/perl > use bytes; > $x = chr( 400 ); > print "Length is ", length( $x ), "\n"; > > prints 1, while > > #!/usr/bin/perl > $x = chr( 400 ); > use bytes; > print "Length is ", length( $x ), "\n"; > > prints 2?
The positioning of the "use bytes" pragma is important -- in the code that follows "use bytes", the handling of values that could be wide characters is altered to defeat interpreting them as unicode. There is a third case, without "use bytes" in there at all, which would also print 1. But here is a version that might be more enlightening: #!/usr/bin/perl $x = chr(400); printf( "set x = %x; length of %x is %d\n", 400, ord($x), length($x); # prints "set x = 190; length of 190 is 1 # note that "190" here means Unicode point U0190 (Latin capital letter epsilon) use bytes; printf( "byte length of x is %d : %x %x\n", length($x), map{ord()} split( //, $x )); # prints "byte length of x is 2 : c6 90 # where "c6 90" is the two-bye UTF-8 representation of U0190 # still using bytes at this point... $x = chr(400); # doesn't do what you want: can't have byte characters > 255 printf( "set x = %x; x is really %x with length %d\n", 400, ord($x), length($x)); # prints "set x = 190; x is really 90 with length 1" # note that the bits above 0xFF have been ignored. Hope that clears things up. Dave Graff