On Fri, Jan 10, 2003 at 07:28:00PM +0100, Merijn van den Kroonenberg wrote:
> You might be looking for these:
> 
> 
>     # ISO 8859-1 to UTF-8
>     s/([\x80-\xFF])/chr(0xC0|ord($1)>>6).chr(0x80|ord($1)&0x3F)/eg;
> 
>     # UTF-8 to ISO 8859-1
>     s/([\xC2\xC3])([\x80-\xBF])/chr(ord($1)<<6&0xC0|ord($2)&0x3F)/eg;
> 
> I think that will work (they are not mine, so don't blame me if not ;-)

They are mine :-) so I feel free to say that they don't &#NNN;
conversion... but they certainly could be changed to work so.

> Greetings, Merijn
> 
> ----- Original Message -----
> From: "Narins, Josh" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Friday, January 10, 2003 6:54 PM
> Subject: beginniner's 5.6.1 latin1<->utf8 question
> 
> 
> >
> > At one point I had a regex which perfectly converts the string A below
> into
> > a series of &#234; strings.
> > This is nice for me, because I just sling them out on the web, and as
> > entities, they always seem to work.
> >
> > I've lost the regex, can't seem to find it. I know it had chr or ord in
> it.
> >
> > I've been reading the perl-unicode archives, and googling, but I just
> don't
> > see it.
> >
> > This is for perl5.6.1 with Sun's (reputedly?) sick iconv.
> >
> > If someone could tap me in the right direction...
> >
> > Thx in advance
> >
> > --------------------------------------------------------------------------
> ----
> > This message is intended only for the personal and confidential use of the
> designated recipient(s) named above.  If you are not the intended recipient
> of this message you are hereby notified that any review, dissemination,
> distribution or copying of this message is strictly prohibited.  This
> communication is for information purposes only and should not be regarded as
> an offer to sell or as a solicitation of an offer to buy any financial
> product, an official confirmation of any transaction, or as an official
> statement of Lehman Brothers.  Email transmission cannot be guaranteed to be
> secure or error-free.  Therefore, we do not represent that this information
> is complete or accurate and it should not be relied upon as such.  All
> information is subject to change without notice.
> >
> >

-- 
Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen

Reply via email to