Hi all,
If anyone runs into this problem I had, use the following function
(Delphi native) to solve it:
Unit System;
UTF8String = String;
function Utf8ToAnsi(const S: UTF8String): string;
Darn, it was so simple!!! (BTW, if you happen to see a weird char in the
resulting String, check the Font you are using to display it...)
Cheers,
Marcelo Grossi
----- Original Message -----
From: "Marcelo Grossi" <[EMAIL PROTECTED]>
To: "ICS support mailing" <[email protected]>
Sent: Friday, July 21, 2006 11:22 AM
Subject: Re: [twsocket] HttpCli UTF-8 Coding Issue
More precisely (http://en.wikipedia.org/wiki/UTF-8):
UTF8 Range - n Bytes - Binary Representation (Info)
********************************************
000000-00007F - 1 Byte - 0xxxxxxx (ASCII equivalence range)
000080-0007FF - 2 Bytes - 110xxxxx 10xxxxxx (Latin letters with diacritics +
Greek, Cyrillic, Armenian, Hebrew, Arabic, Syriac and Thaana alphabets)
000800-00FFFF - 3 Bytes - 1110xxxx 10xxxxxx 10xxxxxx (Multilingual Plane -
which contains virtually all characters in common use)
010000-10FFFF - 4 Bytes - 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx (Other planes
of Unicode ... the rest)
Thanks a bunch, but I really can't find anything in that Jedi ... their
online help system even work?
Marcelo Grossi
----- Original Message -----
From: "Robert Chafer" <[EMAIL PROTECTED]>
To: "ICS support mailing" <[email protected]>
Sent: Friday, July 21, 2006 10:45 AM
Subject: Re: [twsocket] HttpCli UTF-8 Coding Issue
the first 7 bits of UTF-8 are ASCII, it uses the top 128 characters to
represent all the other Unicode characters. Take a look at the JEDI
library they have converters.
On Fri, 21 Jul 2006 10:25:17 -0300, you wrote:
> Thank you all for your answers,
>
> I found out the error. It was, as probably most of you realized so
> far,
> me! : ) I read the UTF-8 specs on Wiki and it says clearly to my face:
> "uses
> up to 4 bytes per character depending on the character ...". Dunno how I
> missed that ..
> So, what I have to do now is find a UTF-8 to ASCII converter (by
> aproximation of course) or build one (wich I was already doing). Anyways,
> thanks to all of you folks that took some time to answer me!
>
> Really apreciate it!
>
> Marcelo Grossi
>
> ----- Original Message -----
> From: "Francois PIETTE" <[EMAIL PROTECTED]>
> To: "ICS support mailing" <[email protected]>
> Sent: Friday, July 21, 2006 4:44 AM
> Subject: Re: [twsocket] HttpCli UTF-8 Coding Issue
>
>
> >> With HTTP component, you always get the data exactly as the server
> sent
> >> it. HTTP component does do any processing on the data itself. It is
> >> stored
> >> as is in the stream you provide for storage.
>
> > Then how come Mozilla Firefox doesn´t have this weird char problem?
>
> Firefox is much more than a HTTP component. It has an engine which
> interpret
> the document AND the header sent by the server.
>
> > I just used a TMemoryStream instead of using my old TStringStream,
> > debugged
> > the contents of the Buffer and it is as buggy as it was.
>
> How do you know it is buggy ? I'm sure the problem is that you don't
> interpret the data as it is encoded. There are many many ways to
> represent
> characters. Not only speaking about the code used (one byte, two bytes,
> multiple bytes, varying number of bytes) but also character sets (mapping
> between a given code and the character "image").
>
> > How come the server is sending me something and the browser
> something
> > else?
>
> The browser doesn't send anything. The browser interpret what the server
> sent.
> It may happend that the server doesn't send the same thing to your
> program
> than it sends to the browser. Why ? Because a HTTP request is composed of
> an
> URL but also a header with many kind of informations the client give to
> help
> the server send the correct content.
>
> Use a sniffer to compare the request the browser send (pay attention to
> the
> header lines) and what the server returns. Build the same request with
> the
> HTTP component and verify that the server send the exact same content (it
> will for sure if the request is the same in all details).
>
>
> > Because I trully don't believe that Mozilla Firefox is parsing
> > that kind of data. It even doesn't respect the same amount of bytes per
> > char
> > ...). I don't get it.. Me stupid!!! 8/
>
> I'm sure the browser parse the data and the header to show you the
> correct
> page.
>
> Contribute to the SSL Effort. Visit http://www.overbyte.be/eng/ssl.html
> --
> [EMAIL PROTECTED]
> http://www.overbyte.be
>
>
> --
> To unsubscribe or change your settings for TWSocket mailing list
> please goto http://www.elists.org/mailman/listinfo/twsocket
> Visit our website at http://www.overbyte.be
--
Rob Chafer
Silverfrost
--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://www.elists.org/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be
--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://www.elists.org/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be
--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://www.elists.org/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be