Peter Van Hove wrote: >>> Interesting !!! >>> So POST and GET data, when sent to a host, are not unicode ? >> >> They are not sent as UTF-16 Unicode, correct. >> >>> And so what I provide to the component when I do a GET, in unicode, >>> is converted in the component to something else (UTF 8 ?) >> >> Yes, in V7 a GET request is converted to UTF-8 first the URL-encoded, >> so the component user must not care about the encoding (see functions >> UrlEncode/UrlDecode in OverbyteIcsUrl.pas), in V5 and V6 the >> AnsiString with current system code page was URL-encoded as is. > > I converted to UTF before POSTing and was successful. > Next I tried a special character "ù", because in UTF8 this codepoint > is made up from 2 bytes. > This failed, and the POSTed data, appears to contain 2 strange > characters instead of the sent "ù"
Content-Type "application/x-www-form-urlencoded", which probably is what you should send, expects any non alpha-nummeric characters being url-encoded. The charset to use for posted data depends on: The form attribute "accept-charset" _if specified. If the form doesn't specify a charset common browsers the charset of the HTML document, for instance: <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> specifies UTF-8. If no document charset is available you could use charset of the HTTP Content-Type header, _if available. Otherwise use ASCII or ANSI with system code page. While writing this, I think I introduced a bug in function UrlEncode which always converted to UTF-8 in V7. I just checked in a change, both UrlEncode and UrlDecode take a CodePage argument now. > >> With POST requests however the send stream is sent as is and the >> component user is responsible to format and encode the stream content >> properly. Posted data may, for example, contain multple parts all >> with a different Charset and Content-Transfer-Encoding part-header. > > So I suppose that if I want to send UTF8 I need to put a header > before the data first ? This is required with Content-Type "multipart/form-data" only, I don't think you want to send multipart/form-data but Content-Type "application/x-www-form-urlencoded". > Should I URL encode the POST data as well ? Yes. > Is there an ICS function for it ? Yes now there is ;-), update your local work copy and use UrlEncode() located in unit OberbyteIcsUrl.pas, string conversion is included. AnsiString AStr; AStr = UrlEncode(YourString, CP_UTF8); // UTF8 AStr = UrlEncode(YourString, CP_ACP); // Ansi with current system code page If you get an implicit string cast warning above don't worry, it's totally safe. Get rid of the warning by an explicit cast of the function result to AnsiString. Then write the AnsiString to your stream. -- Arno Garrels > > >> The Absolute Minimum Every Software Developer Absolutely, Positively >> Must Know About Unicode and Character Sets (No Excuses!) >> >> http://www.joelonsoftware.com/articles/Unicode.html > > I also read these and can suggest them: > > http://edn.embarcadero.com/article/38437 > http://edn.embarcadero.com/article/38498 > http://edn.embarcadero.com/article/38693 > http://edn.embarcadero.com/article/images/38980/Delphi_and_Unicode.pdf -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be