clearing the utf8 flag

2004-11-09 Thread Paul Bijnens
stead of: $s = encode("utf8", $s) if Encode::is_utf8($s); which seems superior to avoid double utf8 encodings, should the utf8-flag be lost. And it's faster. Or even simply: Encode::_utf8_off($s) The problem is that I'm usually wrong. Am I this time? Am I missing

Re: getBytes in perl ?!?

2004-12-02 Thread Paul Bijnens
cter (not byte) (i.e. one large number for the smiling face character). Or do you want one number for each utf8-encoded byte (I.e. three hexpairs for the smiling face character). perl -le '$s="Smile \x{263A}"; print unpack("H*",$s)' Perl Version : 5.6.1 Apparently, I

Re: filtering out non-Japanese

2004-12-15 Thread Paul Bijnens
-CSD -ne 'print if /^[\p{Hiragana}\p{Katakana}\p{Kanji}]+$/' f > f-clean.tok but replacing the [...] class with a group (?:...) does work: perl -CSD -ne 'print if /^(?:\p{Hiragana}|\p{Katakana}|\p{Kanji})+$/' f > f-clean.tok -- Paul Bijnens, Xplanation

losing utf8 flag on strings?

2005-01-13 Thread Paul Bijnens
alone. Why do I have to force the utf8 flag using decode("utf8",..) ? One of my guesses is that the problem lies in XS-processing of strings where the utf8 flag is not set correctly. True? Why does nobody else complain then? Is my setup wrong? (Tried this on different instal

Re: losing utf8 flag on strings?

2005-01-16 Thread Paul Bijnens
Nick Ing-Simmons wrote: Paul Bijnens <[EMAIL PROTECTED]> writes: Can anyone explain what I'm doing wrong? I was about to contact the author of HTML::Entities, when I noticed HTML::Parser 3.45 was released on 6 Jan 2005. Installed it -- and guess what? Now it works as expected! I gue

Re: List of unsupported unicode characters?

2007-01-10 Thread Paul Bijnens
is just the "no-break space". What exactly is your problem with that character? -- Paul Bijnens, xplanation Technology ServicesTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xpl

Re: Use of encoding/decoding and 3-param open

2007-11-15 Thread Paul Bijnens
; would raise an error in "UTF-8". For input, both get the correct characters, assuming the input bytestream was indeed correct. Or am I missing something? -- Paul Bijnens, xplanation Technology ServicesTel +32 16 397.511 Technologielaan 21 bus 2

Re: UTF-16BE -> UTF-8 encoding() error

2007-11-29 Thread Paul Bijnens
on of iconv, I ran perl on Windows, so, perhaps there is a problem only with the Windows port? Otherwise: 1) Please be aware of this error 2) Any suggestions (other than pre-translating via "iconv" ;-) -- Paul Bijnens, xplanation Technology ServicesTel +32 16 397.511 Techno