Hi,

ah! That makes perfect sense, thanks for clarifying matters! :)

Ok, then it seems we need to have a builtin, such that:
  new_builtin(0xE2) ~ new_builtin(0x82) ~ new_builtin(0xAC) eq
  "\xE2\x82\xAC"

I think - conceptually - it cannot be done, because you cannot store a byte in a character string, and ~ is for concatenating character strings, not byte strings. In fact, you can do it, because Pugs' (and as I know Parrot's) internal string representation is UTF-8

Parrot's not UTF-8 internally. It can do UTF-8 if it must, but we prefer not, since UTF-8 sucks in so very many ways.


Parrot's encoding-neutral. You can (or will, when I finish some library code) be able to mix unicode, Latin-3, Shift-JIS, EBCDIC, and EUC-KR string data in a program if you wanted. (Though I'd generally recommend against it)

So, then here's a solution: http://barthazi.hu/decode.pugs

It wasn't heavily tested (euro sign, all the Hungarian letters and some other works), but I think it can work in all possible situations.

Bye,
  Andras

Reply via email to