Hello MirclMax,

>A page I used to grab and parse through Rebol  using a
>page: read to-url <urlhere>
>command ..  seems to have switched from sending its content as ISO-8859-1
>(Latin1) and seems to be doing it as UTF-8 now.

>I really need the content now stored in "page" to be ISO-8859-1. So, can
>anyone tell me a way to force the "read" to pull it down as ISO-8859-1?
>Barring that, does anyone have any functions to convert "page" to it?   (I
>would actually need another function to go in the reverse direction I think)

This is something I've wanted to have a whack at for quite a while, so
I threw something together. It seems to work OK, but I haven't tested it
on illegal or broken UTF-8 to see what it does in such cases. Anything that
can't be expressed as ISO-8859-1 is converted to "?", but you can easily
modify it to substitute some other string, or ignore such characters.

You can use it like this:

  page: utf-iso read <some url>

Cheers,
Eric

utf-iso: func [
    {convert a string from UTF-8 encoding to ISO-8859-1}
    s [string!]
    /local res ascii skipn skipped stretch one iso
] compose [
    normal:  (make bitset! [#"^(0)" - #"^(7F)"])
    iso:     (make bitset! [#"^(C2)" - #"^(C3)"])
    skipn:   (make bitset! [#"^(80)" - #"^(FF)"])
    skipped: (make bitset! [#"^(80)" - #"^(BF)"])
    res: copy ""
    parse/all s [
        any [
            copy stretch some normal (append res stretch) |
            copy one iso copy stretch skipped
            (append res to-char
                (first one) - #"^(C0)" * #"^(40)" +
                ((first stretch)- #"^(80)")) |
            skipn any skipped (append res "?") |
            some skipped (append res "!")
        ]
    ]
    res
]

-- 
To unsubscribe from this list, please send an email to
[EMAIL PROTECTED] with "unsubscribe" in the 
subject, without the quotes.

Reply via email to