Some questions: when you replace "-" with a space what did you replace it in, I mean where you using a text editor to look at output from your program, cause then it can be that the text editor is saving as US-ASCII instead of Unicode. Has anyone confirmed that Rebol won't write Unicode? Can you post the xml? You might at any rate consider writing ISO-8859-1 for the encoding as windows-1252 is windows specific, and ISO is cross-platform.
-----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Andrew Martin Sent: Friday, July 05, 2002 11:40 AM To: [EMAIL PROTECTED] Subject: [REBOL] Re: Rebol & XML encoding; use encoding="windows-1252" Actually, I'm fairly sure now that I'm partially wrong! I believe it's a bug in the MS operating system. I've been reading Ed Batutis' web site here: http://www.batutis.com/i18n/papers/mlang/samples/ and I've been trying out his MLangDet on my Windows XP system (with all the latest upgrades from Microsoft) on a text file, and came across a interesting problem with the MLangDet software. With a simple .txt file that contains just the following: Telephone: +64-6-9748241 with one empty line before and after, the MLangDet program reports this .txt file as Unicode (UTF-7). If I simply replace both of the "-" with a space, like this: Telephone: +64 6 9748241 Then MLangDet reports the .txt file as US-ASCII. I've also noticed that in MS Internet Explorer, when the first line of text is placed in XML/XHTML, the browser also declares that the page is now UTF-7 (instead of UTF-8) and shows the telephone number as: 6-9748241 instead of: +64-6-9748241 I think this behaviour is because both MS Internet Explorer and MLangDet use the same operating system function to detect the various encoding scheme. When I turn off MS Internet Explorer automatic detection, then the correct telephone number is shown. This is a very curious problem! Andrew Martin ICQ: 26227169 http://valley.150m.com/ -><- ----- Original Message ----- From: "Andrew Martin" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, July 05, 2002 12:31 PM Subject: [REBOL] Rebol & XML encoding; use encoding="windows-1252" > After a long and exhausting day or two, I discovered that I've been using > the wrong XML character encoding. For Rebol running on Windows PCs creating > XML or XHTML files or driving a CGI program from Rebol scripts or plain text > files (like windows .txt files), it's best to use this tag: > > <?xml version="1.0" encoding="windows-1252"?> > > The problems one gets for not using the above tag, is that MS Internet > Explorer (but not Opera or Netscape!) sometimes generates CGI query strings > that can look like Chinese characters or long strings of gibberish. > > I tried the unicode encoding of "UTF-8" and "UTF-16" but get the problem > that Rebol doesn't understand scripts written in unicode. Rebol seems only > to read 8 bit characters, not the 16 bits (I think?) of unicode. > > This site: > http://www.w3schools.com/xml/xml_encoding.asp > helped me the most. > > Andrew Martin > ICQ: 26227169 http://valley.150m.com/ > -><- > > > -- > To unsubscribe from this list, please send an email to > [EMAIL PROTECTED] with "unsubscribe" in the > subject, without the quotes. > -- To unsubscribe from this list, please send an email to [EMAIL PROTECTED] with "unsubscribe" in the subject, without the quotes. -- To unsubscribe from this list, please send an email to [EMAIL PROTECTED] with "unsubscribe" in the subject, without the quotes.
