Re: pugs CGI.pm

BÁRTHÁZI András Wed, 13 Apr 2005 12:34:28 -0700

Hi,

ah! That makes perfect sense, thanks for clarifying matters! :)
Ok, then it seems we need to have a builtin, such that:
  new_builtin(0xE2) ~ new_builtin(0x82) ~ new_builtin(0xAC) eq
  "\xE2\x82\xAC"
I think - conceptually - it cannot be done, because you cannot store a byte in a character string, and ~ is for concatenating character strings, not byte strings. In fact, you can do it, because Pugs' (and as I know Parrot's) internal string representation is UTF-8
Parrot's not UTF-8 internally. It can do UTF-8 if it must, but we prefer not, since UTF-8 sucks in so very many ways.

Parrot's encoding-neutral. You can (or will, when I finish some library code) be able to mix unicode, Latin-3, Shift-JIS, EBCDIC, and EUC-KR string data in a program if you wanted. (Though I'd generally recommend against it)


So, then here's a solution:
http://barthazi.hu/decode.pugs

It wasn't heavily tested (euro sign, all the Hungarian letters and some other works), but I think it can work in all possible situations.

Bye,
  Andras

Re: pugs CGI.pm

Reply via email to