Peter Van Hove wrote:
>>> Interesting !!!
>>> So POST and GET data, when sent to a host, are not unicode ?
>> 
>> They are not sent as UTF-16 Unicode, correct.
>> 
>>> And so what I provide to the component when I do a GET, in unicode,
>>> is converted in the component to something else (UTF 8 ?)
>> 
>> Yes, in V7 a GET request is converted to UTF-8 first the URL-encoded,
>> so the component user must not care about the encoding (see functions
>> UrlEncode/UrlDecode in OverbyteIcsUrl.pas), in V5 and V6 the
>> AnsiString with current system code page was URL-encoded as is.
> 
> I converted to UTF before POSTing and was successful.
> Next I tried a special character "ù", because in UTF8 this codepoint
> is made up from 2 bytes.
> This failed, and the POSTed data, appears to contain 2 strange
> characters instead of the sent "ù"

Content-Type "application/x-www-form-urlencoded", which 
probably is what you should send, expects any non alpha-nummeric characters
being url-encoded.

The charset to use for posted data depends on:

The form attribute "accept-charset" _if specified.

If the form doesn't specify a charset common browsers the charset of 
the HTML document, for instance:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
specifies UTF-8.

If no document charset is available you could use charset of the
HTTP Content-Type header, _if available.
Otherwise use ASCII or ANSI with system code page.

While writing this, I think I introduced a bug in function 
UrlEncode which always converted to UTF-8 in V7. 
I just checked in a change, both UrlEncode and UrlDecode take a 
CodePage argument now. 


> 
>> With POST requests however the send stream is sent as is and the
>> component user is responsible to format and encode the stream content
>> properly. Posted data may, for example, contain multple parts all
>> with a different Charset and Content-Transfer-Encoding part-header.
> 
> So I suppose that if I want to send UTF8 I need to put a header
> before the data first ?

This is required with Content-Type "multipart/form-data" only,
I don't think you want to send multipart/form-data but 
Content-Type "application/x-www-form-urlencoded".
 
> Should I URL encode the POST data as well ?

Yes.

> Is there an ICS function for it ?

Yes now there is ;-), update your local work copy and use UrlEncode()
located in unit OberbyteIcsUrl.pas, string conversion is included.

AnsiString AStr;
AStr = UrlEncode(YourString, CP_UTF8); // UTF8
AStr = UrlEncode(YourString, CP_ACP);  // Ansi with current system code page

If you get an implicit string cast warning above don't worry, it's 
totally safe. Get rid of the warning by an explicit cast of the function
result to AnsiString. Then write the AnsiString to your stream. 

--
Arno Garrels

 
> 
> 
>> The Absolute Minimum Every Software Developer Absolutely, Positively
>> Must Know About Unicode and Character Sets (No Excuses!)
>> 
>> http://www.joelonsoftware.com/articles/Unicode.html
> 
> I also read these and can suggest them:
> 
> http://edn.embarcadero.com/article/38437
> http://edn.embarcadero.com/article/38498
> http://edn.embarcadero.com/article/38693
> http://edn.embarcadero.com/article/images/38980/Delphi_and_Unicode.pdf
-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be

Reply via email to