Re: What does ord mean?

2009-03-07 Thread Bill Stephenson

On Mar 5, 2009, at 2:23 PM, Sherm Pendley wrote:

On Thu, Mar 5, 2009 at 2:17 PM, Bill Stephenson bi...@perlhelp.com 
wrote:



Okay, but now I'm curious. What does ord mean? (or do)



It's an abbreviation of ordinal, and returns the position of the 
character
within its charset - i.e., its ordinal value, as opposed to its text 
value.


Thank you Sherm. I appreciate the clear descriptive answer.

And thank you too, David.

macbill% perldoc -f ord

   ord EXPR
   ord Returns the numeric (the native 8-bit encoding, like 
ASCII or
   EBCDIC, or Unicode) value of the first character of 
EXPR.  If

   EXPR is omitted, uses $_.

   For the reverse, see chr.  See perlunicode and 
encoding for

   more about Unicode.

I like Sherm's explanation better than the Perl Docs though ;)

Kindest Regards,

--
Bill Stephenson



Re: What does ord mean?

2009-03-05 Thread Bill Stephenson

Okay, but now I'm curious. What does ord mean? (or do)

Kindest Regards,

--
Bill



Re: What does ord mean?

2009-03-05 Thread Sherm Pendley
On Thu, Mar 5, 2009 at 2:17 PM, Bill Stephenson bi...@perlhelp.com wrote:

 Okay, but now I'm curious. What does ord mean? (or do)


It's an abbreviation of ordinal, and returns the position of the character
within its charset - i.e., its ordinal value, as opposed to its text value.

Perl's ord() function is encoding-aware, but *only* if Perl knows what
encoding the passed-in string is using, which it doesn't by default. If you
use utf8 at the top of your script, Perl knows that literal strings are
utf-8 encoded, and flags them appropriately. Likewise if you use the :utf8
I/O layer to open a file handle, like this:

open($fh, :utf8, $filename) or die Could not open $filename: $!;

Bytes that are input from $fh are then assumed to be utf-8 encoded, and Perl
sets an internal flag on the scalar to indicate this.

That's what happened to the OP. When scalars do *not* have their utf-8 flag
set, ord(), length(), and other builtin functions fall back to assuming that
they're encoded with one byte per character. If that assumption is incorrect
- as it is in the OP's case, where the character takes two bytes in utf-8
encoding - then the results from these functions will likewise be incorrect.

sherm--

-- 
Cocoa programming in Perl: http://camelbones.sourceforge.net


Re: What does ord mean?

2009-03-05 Thread David Cantrell
On Thu, Mar 05, 2009 at 01:17:38PM -0600, Bill Stephenson wrote:

 Okay, but now I'm curious. What does ord mean? 
 (or do)

perldoc -f ord

-- 
David Cantrell | Nth greatest programmer in the world

What profiteth a man, if he win a flame war, yet lose his cool?


Re: What does ord mean?

2009-03-04 Thread Chas. Owens
On Wed, Mar 4, 2009 at 19:58, Vic Norton v...@norton.name wrote:
 I am confused. I can't figure out what ord does. For example I've pasted the
 NO-BREAK-SPACE (00A0) between the quotation marks in the ord of the
 following line.
   print ord( ), \n;
 When I run this line in a Perl script I get
   194
 What does this 194 mean? As far as I know A0 = 10x16 = 160. And 194
 certainly can't be an octal numeral.
snip

I have a similar problem when copy and pasting in the terminal, but
when I write the characters to a file (or a pipe) it appears to do the
right thing:

perl -le 'print \x{A0}' | perl -nle 'print ord'

I assume the problem has to do with UNICODE vs UTF-8.

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.


Re: What does ord mean?

2009-03-04 Thread Paul G. Hackett

NO-BREAK SPACE is 00A0, which in UTF-8 is xC2 xA0.  Hex xC2 = Decimal 194.

best,

Paul

Quoting Vic Norton v...@norton.name:


I am confused. I can't figure out what ord does. For example I've
pasted the NO-BREAK-SPACE (00A0) between the quotation marks in the ord
of the following line.
   print ord( ), \n;
When I run this line in a Perl script I get
   194
What does this 194 mean? As far as I know A0 = 10x16 = 160. And 194
certainly can't be an octal numeral.

Regards,

Vic








Re: What does ord mean?

2009-03-04 Thread John Delacour

At 20:44 -0500 4/3/09, Paul G. Hackett wrote:


NO-BREAK SPACE is 00A0, which in UTF-8 is xC2 xA0.  Hex xC2 = Decimal 194.


so

#!/usr/bin/perl -w
use strict;
use utf8;
print ord( ), \n;

JD