Re: [twsocket] HttpCli UTF-8 Coding Issue

2006-07-21 Thread Marcelo Grossi
Hi all,

If anyone runs into this problem I had, use the following function 
(Delphi native) to solve it:

Unit System;
UTF8String = String;
function Utf8ToAnsi(const S: UTF8String): string;

Darn, it was so simple!!! (BTW, if you happen to see a weird char in the 
resulting String, check the Font you are using to display it...)

Cheers,

Marcelo Grossi

- Original Message - 
From: "Marcelo Grossi" <[EMAIL PROTECTED]>
To: "ICS support mailing" 
Sent: Friday, July 21, 2006 11:22 AM
Subject: Re: [twsocket] HttpCli UTF-8 Coding Issue


More precisely (http://en.wikipedia.org/wiki/UTF-8):

UTF8 Range- n Bytes - Binary Representation (Info)

00-7F - 1 Byte   - 0xxx (ASCII equivalence range)
80-0007FF - 2 Bytes - 110x 10xx (Latin letters with diacritics +
Greek, Cyrillic, Armenian, Hebrew, Arabic, Syriac and Thaana alphabets)
000800-00 - 3 Bytes - 1110 10xx 10xx (Multilingual Plane -
which contains virtually all characters in common use)
01-10 - 4 Bytes - 0xxx 10xx 10xx 10xx (Other planes
of Unicode ... the rest)

Thanks a bunch, but I really can't find anything in that Jedi ... their
online help system even work?

Marcelo Grossi

- Original Message - 
From: "Robert Chafer" <[EMAIL PROTECTED]>
To: "ICS support mailing" 
Sent: Friday, July 21, 2006 10:45 AM
Subject: Re: [twsocket] HttpCli UTF-8 Coding Issue



the first 7 bits of UTF-8 are ASCII, it uses the top 128 characters to
represent all the other Unicode characters.  Take a look at the JEDI
library they have converters.

On Fri, 21 Jul 2006 10:25:17 -0300, you wrote:

>  Thank you all for your answers,
>
> I found out the error. It was, as probably most of you realized so
> far,
>  me! : ) I read the UTF-8 specs on Wiki and it says clearly to my face:
> "uses
>  up to 4 bytes per character depending on the character ...". Dunno how I
>  missed that ..
>  So, what I have to do now is find a UTF-8 to ASCII converter (by
>  aproximation of course) or build one (wich I was already doing). Anyways,
>  thanks to all of you folks that took some time to answer me!
>
>  Really apreciate it!
>
>  Marcelo Grossi
>
>  - Original Message - 
>  From: "Francois PIETTE" <[EMAIL PROTECTED]>
>  To: "ICS support mailing" 
>  Sent: Friday, July 21, 2006 4:44 AM
>  Subject: Re: [twsocket] HttpCli UTF-8 Coding Issue
>
>
>  >> With HTTP component, you always get the data exactly as the server
> sent
>  >> it. HTTP component does do any processing on the data itself. It is
>  >> stored
>  >> as is in the stream you provide for storage.
>
>  >Then how come Mozilla Firefox doesn´t have this weird char problem?
>
>  Firefox is much more than a HTTP component. It has an engine which
> interpret
>  the document AND the header sent by the server.
>
>  > I just used a TMemoryStream instead of using my old TStringStream,
>  > debugged
>  > the contents of the Buffer and it is as buggy as it was.
>
>  How do you know it is buggy ? I'm sure the problem is that you don't
>  interpret the data as it is encoded. There are many many ways to
> represent
>  characters. Not only speaking about the code used (one byte, two bytes,
>  multiple bytes, varying number of bytes) but also character sets (mapping
>  between a given code and the character "image").
>
>  >How come the server is sending me something and the browser
> something
>  > else?
>
>  The browser doesn't send anything. The browser interpret what the server
>  sent.
>  It may happend that the server doesn't send the same thing to your
> program
>  than it sends to the browser. Why ? Because a HTTP request is composed of
> an
>  URL but also a header with many kind of informations the client give to
> help
>  the server send the correct content.
>
>  Use a sniffer to compare the request the browser send (pay attention to
> the
>  header lines) and what the server returns. Build the same request with
> the
>  HTTP component and verify that the server send the exact same content (it
>  will for sure if the request is the same in all details).
>
>
>  > Because I trully don't believe that Mozilla Firefox is parsing
>  > that kind of data. It even doesn't respect the same amount of bytes per
>  > char
>  > ...). I don't get it.. Me stupid!!! 8/
>
>  I'm sure the browser parse the data and the header to show you the
> correct
>  page.
>
>  Contribute to the SSL Effort. Visit http://www.overbyte.be/eng/ssl.html
>  --
>  [EMAIL PROTECTED]

Re: [twsocket] HttpCli UTF-8 Coding Issue

2006-07-21 Thread Marcelo Grossi
More precisely (http://en.wikipedia.org/wiki/UTF-8):

UTF8 Range- n Bytes - Binary Representation (Info)

00-7F - 1 Byte   - 0xxx (ASCII equivalence range)
80-0007FF - 2 Bytes - 110x 10xx (Latin letters with diacritics + 
Greek, Cyrillic, Armenian, Hebrew, Arabic, Syriac and Thaana alphabets)
000800-00 - 3 Bytes - 1110 10xx 10xx (Multilingual Plane - 
which contains virtually all characters in common use)
01-10 - 4 Bytes - 0xxx 10xx 10xx 10xx (Other planes 
of Unicode ... the rest)

Thanks a bunch, but I really can't find anything in that Jedi ... their 
online help system even work?

Marcelo Grossi

- Original Message - 
From: "Robert Chafer" <[EMAIL PROTECTED]>
To: "ICS support mailing" 
Sent: Friday, July 21, 2006 10:45 AM
Subject: Re: [twsocket] HttpCli UTF-8 Coding Issue



the first 7 bits of UTF-8 are ASCII, it uses the top 128 characters to
represent all the other Unicode characters.  Take a look at the JEDI
library they have converters.

On Fri, 21 Jul 2006 10:25:17 -0300, you wrote:

>  Thank you all for your answers,
>
> I found out the error. It was, as probably most of you realized so 
> far,
>  me! : ) I read the UTF-8 specs on Wiki and it says clearly to my face: 
> "uses
>  up to 4 bytes per character depending on the character ...". Dunno how I
>  missed that ..
>  So, what I have to do now is find a UTF-8 to ASCII converter (by
>  aproximation of course) or build one (wich I was already doing). Anyways,
>  thanks to all of you folks that took some time to answer me!
>
>  Really apreciate it!
>
>  Marcelo Grossi
>
>  - Original Message - 
>  From: "Francois PIETTE" <[EMAIL PROTECTED]>
>  To: "ICS support mailing" 
>  Sent: Friday, July 21, 2006 4:44 AM
>  Subject: Re: [twsocket] HttpCli UTF-8 Coding Issue
>
>
>  >> With HTTP component, you always get the data exactly as the server 
> sent
>  >> it. HTTP component does do any processing on the data itself. It is
>  >> stored
>  >> as is in the stream you provide for storage.
>
>  >Then how come Mozilla Firefox doesn´t have this weird char problem?
>
>  Firefox is much more than a HTTP component. It has an engine which 
> interpret
>  the document AND the header sent by the server.
>
>  > I just used a TMemoryStream instead of using my old TStringStream,
>  > debugged
>  > the contents of the Buffer and it is as buggy as it was.
>
>  How do you know it is buggy ? I'm sure the problem is that you don't
>  interpret the data as it is encoded. There are many many ways to 
> represent
>  characters. Not only speaking about the code used (one byte, two bytes,
>  multiple bytes, varying number of bytes) but also character sets (mapping
>  between a given code and the character "image").
>
>  >How come the server is sending me something and the browser 
> something
>  > else?
>
>  The browser doesn't send anything. The browser interpret what the server
>  sent.
>  It may happend that the server doesn't send the same thing to your 
> program
>  than it sends to the browser. Why ? Because a HTTP request is composed of 
> an
>  URL but also a header with many kind of informations the client give to 
> help
>  the server send the correct content.
>
>  Use a sniffer to compare the request the browser send (pay attention to 
> the
>  header lines) and what the server returns. Build the same request with 
> the
>  HTTP component and verify that the server send the exact same content (it
>  will for sure if the request is the same in all details).
>
>
>  > Because I trully don't believe that Mozilla Firefox is parsing
>  > that kind of data. It even doesn't respect the same amount of bytes per
>  > char
>  > ...). I don't get it.. Me stupid!!! 8/
>
>  I'm sure the browser parse the data and the header to show you the 
> correct
>  page.
>
>  Contribute to the SSL Effort. Visit http://www.overbyte.be/eng/ssl.html
>  --
>  [EMAIL PROTECTED]
>  http://www.overbyte.be
>
>
>  -- 
>  To unsubscribe or change your settings for TWSocket mailing list
>  please goto http://www.elists.org/mailman/listinfo/twsocket
>  Visit our website at http://www.overbyte.be
--

Rob Chafer
Silverfrost
-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://www.elists.org/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://www.elists.org/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] HttpCli UTF-8 Coding Issue

2006-07-21 Thread Arno Garrels
Robert Chafer wrote:
> the first 7 bits of UTF-8 are ASCII, it uses the top 128 characters to
> represent all the other Unicode characters.  Take a look at the JEDI
> library they have converters.

This easy to understand article may help as well:
http://www.joelonsoftware.com/articles/Unicode.html

---
Arno Garrels [TeamICS]
http://www.overbyte.be/eng/overbyte/teamics.html


> 
> On Fri, 21 Jul 2006 10:25:17 -0300, you wrote:
> 
>>  Thank you all for your answers,
>> 
>> I found out the error. It was, as probably most of you realized
>> so far,  me! : ) I read the UTF-8 specs on Wiki and it says clearly
>> to my face: "uses  up to 4 bytes per character depending on the
>> character ...". Dunno how I  missed that ..
>>  So, what I have to do now is find a UTF-8 to ASCII converter (by
>>  aproximation of course) or build one (wich I was already doing).
>> Anyways,  thanks to all of you folks that took some time to answer
>> me! 
>> 
>>  Really apreciate it!
>> 
>>  Marcelo Grossi
>> 
>>  - Original Message -
>>  From: "Francois PIETTE" <[EMAIL PROTECTED]>
>>  To: "ICS support mailing" 
>>  Sent: Friday, July 21, 2006 4:44 AM
>>  Subject: Re: [twsocket] HttpCli UTF-8 Coding Issue
>> 
>> 
>>  >> With HTTP component, you always get the data exactly as the
>> server sent  >> it. HTTP component does do any processing on the
>> data itself. It is  >> stored
>>  >> as is in the stream you provide for storage.
>> 
>>  >Then how come Mozilla Firefox doesn´t have this weird char
>> problem? 
>> 
>>  Firefox is much more than a HTTP component. It has an engine which
>> interpret  the document AND the header sent by the server.
>> 
>>  > I just used a TMemoryStream instead of using my old TStringStream,
>>  > debugged
>>  > the contents of the Buffer and it is as buggy as it was.
>> 
>>  How do you know it is buggy ? I'm sure the problem is that you don't
>>  interpret the data as it is encoded. There are many many ways to
>> represent  characters. Not only speaking about the code used (one
>> byte, two bytes,  multiple bytes, varying number of bytes) but also
>> character sets (mapping  between a given code and the character
>> "image"). 
>> 
>>  >How come the server is sending me something and the browser
>> something  > else?
>> 
>>  The browser doesn't send anything. The browser interpret what the
>> server  sent.
>>  It may happend that the server doesn't send the same thing to your
>> program  than it sends to the browser. Why ? Because a HTTP request
>> is composed of an  URL but also a header with many kind of
>> informations the client give to help  the server send the correct
>> content. 
>> 
>>  Use a sniffer to compare the request the browser send (pay
>> attention to the  header lines) and what the server returns. Build
>> the same request with the  HTTP component and verify that the server
>> send the exact same content (it  will for sure if the request is the
>> same in all details). 
>> 
>> 
>>  > Because I trully don't believe that Mozilla Firefox is parsing
>>  > that kind of data. It even doesn't respect the same amount of
>> bytes per  > char
>>  > ...). I don't get it.. Me stupid!!! 8/
>> 
>>  I'm sure the browser parse the data and the header to show you the
>> correct  page.
>> 
>>  Contribute to the SSL Effort. Visit
>> http://www.overbyte.be/eng/ssl.html  --
>>  [EMAIL PROTECTED]
>>  http://www.overbyte.be
>> 
>> 
>>  --
>>  To unsubscribe or change your settings for TWSocket mailing list
>>  please goto http://www.elists.org/mailman/listinfo/twsocket
>>  Visit our website at http://www.overbyte.be
> --
> 
> Rob Chafer
> Silverfrost
-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://www.elists.org/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] HttpCli UTF-8 Coding Issue

2006-07-21 Thread Robert Chafer

the first 7 bits of UTF-8 are ASCII, it uses the top 128 characters to
represent all the other Unicode characters.  Take a look at the JEDI
library they have converters.

On Fri, 21 Jul 2006 10:25:17 -0300, you wrote:

>  Thank you all for your answers,
>  
> I found out the error. It was, as probably most of you realized so far, 
>  me! : ) I read the UTF-8 specs on Wiki and it says clearly to my face: "uses 
>  up to 4 bytes per character depending on the character ...". Dunno how I 
>  missed that ..
>  So, what I have to do now is find a UTF-8 to ASCII converter (by 
>  aproximation of course) or build one (wich I was already doing). Anyways, 
>  thanks to all of you folks that took some time to answer me!
>  
>  Really apreciate it!
>  
>  Marcelo Grossi
>  
>  - Original Message - 
>  From: "Francois PIETTE" <[EMAIL PROTECTED]>
>  To: "ICS support mailing" 
>  Sent: Friday, July 21, 2006 4:44 AM
>  Subject: Re: [twsocket] HttpCli UTF-8 Coding Issue
>  
>  
>  >> With HTTP component, you always get the data exactly as the server sent
>  >> it. HTTP component does do any processing on the data itself. It is
>  >> stored
>  >> as is in the stream you provide for storage.
>  
>  >Then how come Mozilla Firefox doesn´t have this weird char problem?
>  
>  Firefox is much more than a HTTP component. It has an engine which interpret
>  the document AND the header sent by the server.
>  
>  > I just used a TMemoryStream instead of using my old TStringStream,
>  > debugged
>  > the contents of the Buffer and it is as buggy as it was.
>  
>  How do you know it is buggy ? I'm sure the problem is that you don't
>  interpret the data as it is encoded. There are many many ways to represent
>  characters. Not only speaking about the code used (one byte, two bytes,
>  multiple bytes, varying number of bytes) but also character sets (mapping
>  between a given code and the character "image").
>  
>  >How come the server is sending me something and the browser something
>  > else?
>  
>  The browser doesn't send anything. The browser interpret what the server
>  sent.
>  It may happend that the server doesn't send the same thing to your program
>  than it sends to the browser. Why ? Because a HTTP request is composed of an
>  URL but also a header with many kind of informations the client give to help
>  the server send the correct content.
>  
>  Use a sniffer to compare the request the browser send (pay attention to the
>  header lines) and what the server returns. Build the same request with the
>  HTTP component and verify that the server send the exact same content (it
>  will for sure if the request is the same in all details).
>  
>  
>  > Because I trully don't believe that Mozilla Firefox is parsing
>  > that kind of data. It even doesn't respect the same amount of bytes per
>  > char
>  > ...). I don't get it.. Me stupid!!! 8/
>  
>  I'm sure the browser parse the data and the header to show you the correct
>  page.
>  
>  Contribute to the SSL Effort. Visit http://www.overbyte.be/eng/ssl.html
>  --
>  [EMAIL PROTECTED]
>  http://www.overbyte.be
>  
>  
>  -- 
>  To unsubscribe or change your settings for TWSocket mailing list
>  please goto http://www.elists.org/mailman/listinfo/twsocket
>  Visit our website at http://www.overbyte.be
--

Rob Chafer
Silverfrost
-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://www.elists.org/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] HttpCli UTF-8 Coding Issue

2006-07-21 Thread Marcelo Grossi
Thank you all for your answers,

   I found out the error. It was, as probably most of you realized so far, 
me! : ) I read the UTF-8 specs on Wiki and it says clearly to my face: "uses 
up to 4 bytes per character depending on the character ...". Dunno how I 
missed that ..
So, what I have to do now is find a UTF-8 to ASCII converter (by 
aproximation of course) or build one (wich I was already doing). Anyways, 
thanks to all of you folks that took some time to answer me!

Really apreciate it!

Marcelo Grossi

- Original Message - 
From: "Francois PIETTE" <[EMAIL PROTECTED]>
To: "ICS support mailing" 
Sent: Friday, July 21, 2006 4:44 AM
Subject: Re: [twsocket] HttpCli UTF-8 Coding Issue


>> With HTTP component, you always get the data exactly as the server sent
>> it. HTTP component does do any processing on the data itself. It is
>> stored
>> as is in the stream you provide for storage.

>Then how come Mozilla Firefox doesn´t have this weird char problem?

Firefox is much more than a HTTP component. It has an engine which interpret
the document AND the header sent by the server.

> I just used a TMemoryStream instead of using my old TStringStream,
> debugged
> the contents of the Buffer and it is as buggy as it was.

How do you know it is buggy ? I'm sure the problem is that you don't
interpret the data as it is encoded. There are many many ways to represent
characters. Not only speaking about the code used (one byte, two bytes,
multiple bytes, varying number of bytes) but also character sets (mapping
between a given code and the character "image").

>How come the server is sending me something and the browser something
> else?

The browser doesn't send anything. The browser interpret what the server
sent.
It may happend that the server doesn't send the same thing to your program
than it sends to the browser. Why ? Because a HTTP request is composed of an
URL but also a header with many kind of informations the client give to help
the server send the correct content.

Use a sniffer to compare the request the browser send (pay attention to the
header lines) and what the server returns. Build the same request with the
HTTP component and verify that the server send the exact same content (it
will for sure if the request is the same in all details).


> Because I trully don't believe that Mozilla Firefox is parsing
> that kind of data. It even doesn't respect the same amount of bytes per
> char
> ...). I don't get it.. Me stupid!!! 8/

I'm sure the browser parse the data and the header to show you the correct
page.

Contribute to the SSL Effort. Visit http://www.overbyte.be/eng/ssl.html
--
[EMAIL PROTECTED]
http://www.overbyte.be


-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://www.elists.org/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://www.elists.org/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] HttpCli UTF-8 Coding Issue

2006-07-21 Thread Francois PIETTE
>> With HTTP component, you always get the data exactly as the server sent
>> it. HTTP component does do any processing on the data itself. It is 
>> stored
>> as is in the stream you provide for storage.

>Then how come Mozilla Firefox doesn´t have this weird char problem?

Firefox is much more than a HTTP component. It has an engine which interpret 
the document AND the header sent by the server.

> I just used a TMemoryStream instead of using my old TStringStream, 
> debugged
> the contents of the Buffer and it is as buggy as it was.

How do you know it is buggy ? I'm sure the problem is that you don't 
interpret the data as it is encoded. There are many many ways to represent 
characters. Not only speaking about the code used (one byte, two bytes, 
multiple bytes, varying number of bytes) but also character sets (mapping 
between a given code and the character "image").

>How come the server is sending me something and the browser something
> else?

The browser doesn't send anything. The browser interpret what the server 
sent.
It may happend that the server doesn't send the same thing to your program 
than it sends to the browser. Why ? Because a HTTP request is composed of an 
URL but also a header with many kind of informations the client give to help 
the server send the correct content.

Use a sniffer to compare the request the browser send (pay attention to the 
header lines) and what the server returns. Build the same request with the 
HTTP component and verify that the server send the exact same content (it 
will for sure if the request is the same in all details).


> Because I trully don't believe that Mozilla Firefox is parsing
> that kind of data. It even doesn't respect the same amount of bytes per 
> char
> ...). I don't get it.. Me stupid!!! 8/

I'm sure the browser parse the data and the header to show you the correct 
page.

Contribute to the SSL Effort. Visit http://www.overbyte.be/eng/ssl.html
--
[EMAIL PROTECTED]
http://www.overbyte.be


-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://www.elists.org/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] HttpCli UTF-8 Coding Issue

2006-07-20 Thread Robert Chafer
FireFox knows because it says so in the HTML itself:

http://www.expansys.fr/



http://www.expansys.cn/


In addition the http headers indicate (for the Chinese one above):

Connection: close
Date: Thu, 20 Jul 2006 23:52:55 GMT
Server: Microsoft-IIS/6.0
Content-Type: text/html; Charset=utf-8
Cache-Control: private

you can see the headers with THttpCli

On Thu, 20 Jul 2006 16:53:33 -0300, you wrote:

>  Hi,
>  
>  Then how come Mozilla Firefox doesn´t have this weird char problem? I 
>  just used a TMemoryStream instead of using my old TStringStream, debugged 
>  the contents of the Buffer and it is as buggy as it was.
>  How come the server is sending me something and the browser something 
>  else? :'( Because I trully don't believe that Mozilla Firefox is parsing 
>  that kind of data. It even doesn't respect the same amount of bytes per char 
>  ...). I don't get it.. Me stupid!!! 8/
>  
>  Thank you for your time,
>  
>  Marcelo Grossi
>  
>  - Original Message - 
>  From: "Francois PIETTE" <[EMAIL PROTECTED]>
>  To: "ICS support mailing" 
>  Sent: Thursday, July 20, 2006 4:24 PM
>  Subject: Re: [twsocket] HttpCli UTF-8 Coding Issue
>  
>  
>  >>How do I get the chance of interpreting the characters with HttpCli? I
>  >> don't set any property whatsoever regarding the enconding of the data I'm
>  >> receiving. The TStringStream the data comes is already the way I showed 
>  >> in
>  >> my last message ... How do I get the "raw" data or something?
>  >
>  > With HTTP component, you always get the data exactly as the server sent 
>  > it.
>  > HTTP component does do any processing on the data itself. It is stored as 
>  > is
>  > in the stream you provide for storage.
>  >
>  > Contribute to the SSL Effort. Visit http://www.overbyte.be/eng/ssl.html
>  > --
>  > [EMAIL PROTECTED]
>  > http://www.overbyte.be
>  >
>  >
>  > -- 
>  > To unsubscribe or change your settings for TWSocket mailing list
>  > please goto http://www.elists.org/mailman/listinfo/twsocket
>  > Visit our website at http://www.overbyte.be
>  > 
--

Rob Chafer
Silverfrost
-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://www.elists.org/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] HttpCli UTF-8 Coding Issue

2006-07-20 Thread Maurizio Lotauro
Scrive Marcelo Grossi <[EMAIL PROTECTED]>:

> Hi,
> 
> Then how come Mozilla Firefox doesn´t have this weird char problem?

What you have in the stream is the body that the server sent to you. Most 
probably in the header there are useful information to know how interpreted 
the body.
Do you receive html or what?


Bye, Maurizio.


This mail has been sent using Alpikom webmail system
http://www.alpikom.it

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://www.elists.org/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] HttpCli UTF-8 Coding Issue

2006-07-20 Thread Marcelo Grossi
Hi,

Then how come Mozilla Firefox doesn´t have this weird char problem? I 
just used a TMemoryStream instead of using my old TStringStream, debugged 
the contents of the Buffer and it is as buggy as it was.
How come the server is sending me something and the browser something 
else? :'( Because I trully don't believe that Mozilla Firefox is parsing 
that kind of data. It even doesn't respect the same amount of bytes per char 
...). I don't get it.. Me stupid!!! 8/

Thank you for your time,

Marcelo Grossi

- Original Message - 
From: "Francois PIETTE" <[EMAIL PROTECTED]>
To: "ICS support mailing" 
Sent: Thursday, July 20, 2006 4:24 PM
Subject: Re: [twsocket] HttpCli UTF-8 Coding Issue


>>How do I get the chance of interpreting the characters with HttpCli? I
>> don't set any property whatsoever regarding the enconding of the data I'm
>> receiving. The TStringStream the data comes is already the way I showed 
>> in
>> my last message ... How do I get the "raw" data or something?
>
> With HTTP component, you always get the data exactly as the server sent 
> it.
> HTTP component does do any processing on the data itself. It is stored as 
> is
> in the stream you provide for storage.
>
> Contribute to the SSL Effort. Visit http://www.overbyte.be/eng/ssl.html
> --
> [EMAIL PROTECTED]
> http://www.overbyte.be
>
>
> -- 
> To unsubscribe or change your settings for TWSocket mailing list
> please goto http://www.elists.org/mailman/listinfo/twsocket
> Visit our website at http://www.overbyte.be
> 

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://www.elists.org/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] HttpCli UTF-8 Coding Issue

2006-07-20 Thread Francois PIETTE
>How do I get the chance of interpreting the characters with HttpCli? I
> don't set any property whatsoever regarding the enconding of the data I'm
> receiving. The TStringStream the data comes is already the way I showed in
> my last message ... How do I get the "raw" data or something?

With HTTP component, you always get the data exactly as the server sent it. 
HTTP component does do any processing on the data itself. It is stored as is 
in the stream you provide for storage.

Contribute to the SSL Effort. Visit http://www.overbyte.be/eng/ssl.html
--
[EMAIL PROTECTED]
http://www.overbyte.be


-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://www.elists.org/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] HttpCli UTF-8 Coding Issue

2006-07-20 Thread Marcelo Grossi
Hi Robert,

How do I get the chance of interpreting the characters with HttpCli? I 
don't set any property whatsoever regarding the enconding of the data I'm 
receiving. The TStringStream the data comes is already the way I showed in 
my last message ... How do I get the "raw" data or something?

Cheers,

Marcelo Grossi
- Original Message - 
From: "Robert Chafer" <[EMAIL PROTECTED]>
To: "ICS support mailing" 
Sent: Thursday, July 20, 2006 2:30 PM
Subject: Re: [twsocket] HttpCli UTF-8 Coding Issue


It depends on how you interpret the characters you are downloading.
Look at this page:

http://www.expansys.fr/

Now change the encoding from ISO8859-1 to UTF-8 (in IE its right click
the page and choose encoding, FF View->Character Encoding). You see
how (in IE) the accented characters turn into Chinese?  This is
because the way you process the characters depends on the encoding
used to send them.


On Thu, 20 Jul 2006 14:23:06 -0300, you wrote:

>  Hello,
>
>  I´ve posted a message a few days ago about a html page being 
> retrieved
>  with weird chars (through ICS's HttpCli). As very well suggested by JP in
>  his reply to my message, the page was endeed UTF-8 coded. But the 
> question
>  remains (as I am currently building a weird char converter as they appear 
> on
>  the captured page ... [yes, very dumb on my behalf]), how can I get the
>  retrieved characters as UTF-8? I mean, UTF-8 uses more then 1 Byte per 
> char
>  and on the TStringStream I'm using to retrieve the data from the HttpCli 
> I
>  get mixed type chars.
>  All the letters (a..z, A..Z, 0..9 and some other chars) are being
>  retrived as 1 ASCII Byte except for some weird chars that are coming in 
> some
>  other format using more than 1 Byte (by more than 1 Byte I don't mean 2
>  Bytes, I mean 2 or 3 Bytes depending on the case). Bellow I send you some
>  example strings taken directly from my application:
>
>  What I get:
> a história do município de .. estrela do agronegócio â?oprêmio é
>  acima de tudo o reconhecimento do jornalismo, com foco no cidadão, que
>  estamos fazendo. Ã? o resultado de um trabalho feito dentro de uma 
> empresa
>  pública de comunicaçãoâ?o
>
>  What I was supposed to get:
>  a história do município de .. estrela do agronegócio "prêmio é acima 
> de
>  tudo o reconhecimento do jornalismo, com foco no cidadão, que estamos
>  fazendo. É o resultado de um trabalho feito dentro de uma empresa pública 
> de
>  comunicação"
>
>  Note: The weird chars can come in 2 or 3 Bytes. The char " comes as 3
>  Bytes (â?o). On the other hand the char É comes in 2 Bytes (Ã?).
>  Note2.: The texts are in Brazilian Portuguese.
>
>  The question is: Is the problem on the TStringStream that for some
>  reason is returning some ASCII chars and some others UTF-8 chars? Or the
>  problem is that I missed some property of THttpCli making the retrieved 
> page
>  look so strange? Or the problem lies somewhere else far beyond my little
>  knowledge?
>
>  Please help! :'(
>
>  Best regards,
>
>  Marcelo Grossi
--

Rob Chafer
Silverfrost
-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://www.elists.org/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://www.elists.org/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] HttpCli UTF-8 Coding Issue

2006-07-20 Thread Angus Robertson - Magenta Systems Ltd
> UTF-8 uses more then 1 Byte per char and on the TStringStream I'm
> using to retrieve the data from the HttpCli I get mixed type chars.

The component returns a binary stream, it may be a zip file or text in 
various character sets, with one or two bytes per character.  

Delphi supports the 2 bytes per character widestring data type, aka 
Unicode, you need to read some help on converting strings to 
widestring.  The Jedi library includes UTF-8 to Widestring conversion. 

Angus
-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://www.elists.org/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] HttpCli UTF-8 Coding Issue

2006-07-20 Thread Robert Chafer
It depends on how you interpret the characters you are downloading.
Look at this page:

http://www.expansys.fr/

Now change the encoding from ISO8859-1 to UTF-8 (in IE its right click
the page and choose encoding, FF View->Character Encoding). You see
how (in IE) the accented characters turn into Chinese?  This is
because the way you process the characters depends on the encoding
used to send them.


On Thu, 20 Jul 2006 14:23:06 -0300, you wrote:

>  Hello,
>  
>  I´ve posted a message a few days ago about a html page being retrieved
>  with weird chars (through ICS's HttpCli). As very well suggested by JP in
>  his reply to my message, the page was endeed UTF-8 coded. But the question
>  remains (as I am currently building a weird char converter as they appear on
>  the captured page ... [yes, very dumb on my behalf]), how can I get the
>  retrieved characters as UTF-8? I mean, UTF-8 uses more then 1 Byte per char
>  and on the TStringStream I'm using to retrieve the data from the HttpCli I
>  get mixed type chars.
>  All the letters (a..z, A..Z, 0..9 and some other chars) are being
>  retrived as 1 ASCII Byte except for some weird chars that are coming in some
>  other format using more than 1 Byte (by more than 1 Byte I don't mean 2
>  Bytes, I mean 2 or 3 Bytes depending on the case). Bellow I send you some
>  example strings taken directly from my application:
>  
>  What I get:
> a história do município de .. estrela do agronegócio â?oprêmio é
>  acima de tudo o reconhecimento do jornalismo, com foco no cidadão, que
>  estamos fazendo. Ã? o resultado de um trabalho feito dentro de uma empresa
>  pública de comunicaçãoâ?o
>  
>  What I was supposed to get:
>  a história do município de .. estrela do agronegócio "prêmio é acima de
>  tudo o reconhecimento do jornalismo, com foco no cidadão, que estamos
>  fazendo. É o resultado de um trabalho feito dentro de uma empresa pública de
>  comunicação"
>  
>  Note: The weird chars can come in 2 or 3 Bytes. The char " comes as 3
>  Bytes (â?o). On the other hand the char É comes in 2 Bytes (Ã?).
>  Note2.: The texts are in Brazilian Portuguese.
>  
>  The question is: Is the problem on the TStringStream that for some
>  reason is returning some ASCII chars and some others UTF-8 chars? Or the
>  problem is that I missed some property of THttpCli making the retrieved page
>  look so strange? Or the problem lies somewhere else far beyond my little
>  knowledge?
>  
>  Please help! :'(
>  
>  Best regards,
>  
>  Marcelo Grossi 
--

Rob Chafer
Silverfrost
-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://www.elists.org/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be