On Fri, 10 Jan 2003 20:39:10 +0200
Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote:

> On Fri, Jan 10, 2003 at 07:28:00PM +0100, Merijn van den Kroonenberg wrote:
> > You might be looking for these:
> > 
> > 
> >     # ISO 8859-1 to UTF-8
> >     s/([\x80-\xFF])/chr(0xC0|ord($1)>>6).chr(0x80|ord($1)&0x3F)/eg;
> > 
> >     # UTF-8 to ISO 8859-1
> >     s/([\xC2\xC3])([\x80-\xBF])/chr(ord($1)<<6&0xC0|ord($2)&0x3F)/eg;
> > 
> > I think that will work (they are not mine, so don't blame me if not ;-)
> 
> They are mine :-) so I feel free to say that they don't &#NNN;
> conversion... but they certainly could be changed to work so.

(Answer)

$string = qq/ABC ÀÁÂÃÄÅÆ/;

$string =~ s/([\x80-\xff])/"&#".ord($1).";"/ge;

print "$string\n";
# gets "ABC &#192;&#193;&#194;&#195;&#196;&#197;&#198;"


(Another answer)
Gisle Aas's HTML::Entities may help.
It's aware of other types of character references too:
i.e. <&#234;>, <&#xea;>, and <&ecirc;>.

distributed from:
  http://search.cpan.org/author/GAAS/HTML-Parser-3.26/

use HTML::Entities;

$string = qq/ABC ÀÁÂÃÄÅÆ/;

print encode_entities($string, "\x80-\xff");
# gets "ABC &Agrave;&Aacute;&Acirc;&Atilde;&Auml;&Aring;&AElig;"


$encoded = qq/ABC &Agrave;&#214;&#xDD;Æ/;

print decode_entities($encoded), "\n";
# gets "ABC ÀÖÝÆ"

> > Greetings, Merijn
> > 
> > ----- Original Message -----
> > From: "Narins, Josh" <[EMAIL PROTECTED]>
> > To: <[EMAIL PROTECTED]>
> > Sent: Friday, January 10, 2003 6:54 PM
> > Subject: beginniner's 5.6.1 latin1<->utf8 question
> > 
> > 
> > >
> > > At one point I had a regex which perfectly converts the string A below
> > into
> > > a series of &#234; strings.
> > > This is nice for me, because I just sling them out on the web, and as
> > > entities, they always seem to work.
> > >
> > > I've lost the regex, can't seem to find it. I know it had chr or ord in
> > it.
> > >
> > > I've been reading the perl-unicode archives, and googling, but I just
> > don't
> > > see it.
> > >
> > > This is for perl5.6.1 with Sun's (reputedly?) sick iconv.
> > >
> > > If someone could tap me in the right direction...
> > >
> > > Thx in advance
> > >
> Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special
> biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen

SADAHIRO Tomoyuki

Reply via email to