It depends on how you interpret the characters you are downloading.
Look at this page:

http://www.expansys.fr/

Now change the encoding from ISO8859-1 to UTF-8 (in IE its right click
the page and choose encoding, FF View->Character Encoding). You see
how (in IE) the accented characters turn into Chinese?  This is
because the way you process the characters depends on the encoding
used to send them.


On Thu, 20 Jul 2006 14:23:06 -0300, you wrote:

>  Hello,
>  
>      I´ve posted a message a few days ago about a html page being retrieved
>  with weird chars (through ICS's HttpCli). As very well suggested by JP in
>  his reply to my message, the page was endeed UTF-8 coded. But the question
>  remains (as I am currently building a weird char converter as they appear on
>  the captured page ... [yes, very dumb on my behalf]), how can I get the
>  retrieved characters as UTF-8? I mean, UTF-8 uses more then 1 Byte per char
>  and on the TStringStream I'm using to retrieve the data from the HttpCli I
>  get mixed type chars.
>      All the letters (a..z, A..Z, 0..9 and some other chars) are being
>  retrived as 1 ASCII Byte except for some weird chars that are coming in some
>  other format using more than 1 Byte (by more than 1 Byte I don't mean 2
>  Bytes, I mean 2 or 3 Bytes depending on the case). Bellow I send you some
>  example strings taken directly from my application:
>  
>      What I get:
>     a história do município de .. estrela do agronegócio â?oprêmio é
>  acima de tudo o reconhecimento do jornalismo, com foco no cidadão, que
>  estamos fazendo. Ã? o resultado de um trabalho feito dentro de uma empresa
>  pública de comunicaçãoâ?o
>  
>      What I was supposed to get:
>      a história do município de .. estrela do agronegócio "prêmio é acima de
>  tudo o reconhecimento do jornalismo, com foco no cidadão, que estamos
>  fazendo. É o resultado de um trabalho feito dentro de uma empresa pública de
>  comunicação"
>  
>      Note: The weird chars can come in 2 or 3 Bytes. The char " comes as 3
>  Bytes (â?o). On the other hand the char É comes in 2 Bytes (Ã?).
>      Note2.: The texts are in Brazilian Portuguese.
>  
>      The question is: Is the problem on the TStringStream that for some
>  reason is returning some ASCII chars and some others UTF-8 chars? Or the
>  problem is that I missed some property of THttpCli making the retrieved page
>  look so strange? Or the problem lies somewhere else far beyond my little
>  knowledge?
>  
>      Please help! :'(
>  
>  Best regards,
>  
>  Marcelo Grossi 
--

Rob Chafer
Silverfrost
-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://www.elists.org/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be

Reply via email to