Some questions: when you replace "-" with a space what did you replace
it in, I mean where you using a text editor to look at output from your
program, cause then it can be that the text editor is saving as US-ASCII
instead of Unicode.
Has anyone confirmed that Rebol won't write Unicode? 
Can you post the xml?
You might at any rate consider writing ISO-8859-1 for the encoding as
windows-1252 is windows specific, and ISO is cross-platform.

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf
Of Andrew Martin
Sent: Friday, July 05, 2002 11:40 AM
To: [EMAIL PROTECTED]
Subject: [REBOL] Re: Rebol & XML encoding; use encoding="windows-1252"

Actually, I'm fairly sure now that I'm partially wrong!

I believe it's a bug in the MS operating system.

I've been reading Ed Batutis' web site here:
        http://www.batutis.com/i18n/papers/mlang/samples/
and I've been trying out his MLangDet on my Windows XP system (with all
the
latest upgrades from Microsoft) on a text file, and came across a
interesting problem with the MLangDet software. With a simple .txt file
that
contains just the following:

Telephone: +64-6-9748241

with one empty line before and after, the MLangDet program reports this
.txt
file as Unicode (UTF-7). If I simply replace both of the "-" with a
space,
like this:

Telephone: +64 6 9748241

Then MLangDet reports the .txt file as US-ASCII.

I've also noticed that in MS Internet Explorer, when the first line of
text
is placed in XML/XHTML, the browser also declares that the page is now
UTF-7
(instead of UTF-8) and shows the telephone number as:
        6-9748241
    instead of:
        +64-6-9748241

I think this behaviour is because both MS Internet Explorer and MLangDet
use
the same operating system function to detect the various encoding
scheme.

When I turn off MS Internet Explorer automatic detection, then the
correct
telephone number is shown.

This is a very curious problem!

Andrew Martin
ICQ: 26227169 http://valley.150m.com/
-><-

----- Original Message -----
From: "Andrew Martin" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, July 05, 2002 12:31 PM
Subject: [REBOL] Rebol & XML encoding; use encoding="windows-1252"


> After a long and exhausting day or two, I discovered that I've been
using
> the wrong XML character encoding. For Rebol running on Windows PCs
creating
> XML or XHTML files or driving a CGI program from Rebol scripts or
plain
text
> files (like windows .txt files), it's best to use this tag:
>
>         <?xml version="1.0" encoding="windows-1252"?>
>
> The problems one gets for not using the above tag, is that MS Internet
> Explorer (but not Opera or Netscape!) sometimes generates CGI query
strings
> that can look like Chinese characters or long strings of gibberish.
>
> I tried the unicode encoding of "UTF-8" and "UTF-16" but get the
problem
> that Rebol doesn't understand scripts written in unicode. Rebol seems
only
> to read 8 bit characters, not the 16 bits (I think?) of unicode.
>
> This site:
>         http://www.w3schools.com/xml/xml_encoding.asp
>     helped me the most.
>
> Andrew Martin
> ICQ: 26227169 http://valley.150m.com/
> -><-
>
>
> --
> To unsubscribe from this list, please send an email to
> [EMAIL PROTECTED] with "unsubscribe" in the
> subject, without the quotes.
>

-- 
To unsubscribe from this list, please send an email to
[EMAIL PROTECTED] with "unsubscribe" in the 
subject, without the quotes.


-- 
To unsubscribe from this list, please send an email to
[EMAIL PROTECTED] with "unsubscribe" in the 
subject, without the quotes.

Reply via email to