On Fri, 10 Jan 2003 20:39:10 +0200 Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote:
> On Fri, Jan 10, 2003 at 07:28:00PM +0100, Merijn van den Kroonenberg wrote: > > You might be looking for these: > > > > > > # ISO 8859-1 to UTF-8 > > s/([\x80-\xFF])/chr(0xC0|ord($1)>>6).chr(0x80|ord($1)&0x3F)/eg; > > > > # UTF-8 to ISO 8859-1 > > s/([\xC2\xC3])([\x80-\xBF])/chr(ord($1)<<6&0xC0|ord($2)&0x3F)/eg; > > > > I think that will work (they are not mine, so don't blame me if not ;-) > > They are mine :-) so I feel free to say that they don't &#NNN; > conversion... but they certainly could be changed to work so. (Answer) $string = qq/ABC ÀÁÂÃÄÅÆ/; $string =~ s/([\x80-\xff])/"&#".ord($1).";"/ge; print "$string\n"; # gets "ABC ÀÁÂÃÄÅÆ" (Another answer) Gisle Aas's HTML::Entities may help. It's aware of other types of character references too: i.e. <ê>, <ê>, and <ê>. distributed from: http://search.cpan.org/author/GAAS/HTML-Parser-3.26/ use HTML::Entities; $string = qq/ABC ÀÁÂÃÄÅÆ/; print encode_entities($string, "\x80-\xff"); # gets "ABC ÀÁÂÃÄÅÆ" $encoded = qq/ABC ÀÖÝÆ/; print decode_entities($encoded), "\n"; # gets "ABC ÀÖÝÆ" > > Greetings, Merijn > > > > ----- Original Message ----- > > From: "Narins, Josh" <[EMAIL PROTECTED]> > > To: <[EMAIL PROTECTED]> > > Sent: Friday, January 10, 2003 6:54 PM > > Subject: beginniner's 5.6.1 latin1<->utf8 question > > > > > > > > > > At one point I had a regex which perfectly converts the string A below > > into > > > a series of ê strings. > > > This is nice for me, because I just sling them out on the web, and as > > > entities, they always seem to work. > > > > > > I've lost the regex, can't seem to find it. I know it had chr or ord in > > it. > > > > > > I've been reading the perl-unicode archives, and googling, but I just > > don't > > > see it. > > > > > > This is for perl5.6.1 with Sun's (reputedly?) sick iconv. > > > > > > If someone could tap me in the right direction... > > > > > > Thx in advance > > > > Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special > biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen SADAHIRO Tomoyuki