ICS V8.50 in the overnight zip now includes various new functions to assist with determining the character set and codepage for HTML content received from HTTP servers, and to convert correctly to Delphi unicode strings.
The character set is determined according to the rules: 1 - HTTP Content-Type header always states the content type and more rarely the character set. 2 - HTML content bom, two, three or four bytes at the front. 3 - HTML content meta charset. 4 - HTML auto detect for UTF8, note browsers don't do this and assume ANSI if no charset specified. I created some unicode test pages that illustrate various characters represented as symbols and/or entities (like £ or ☍), using ANSI, UTF-8 and UTF-16 with and without boms and charset. Note that Firefox has limited UTF-16 support and seems to ignore CSS. The web site uses the ICS web server. https://www.telecom-tariffs.co.uk/testing/ The new functions are in OverbyteIcsCharsetUtils.pas and OverbyteIcsUtils.pas, IcsFindHtmlCharset, IcsFindHtmlCodepage, IcsContentCodepage, IcsMoveTBytesToString and IcsHtmlToStr, which take either a TBytes buffer or stream as input. Also IcsMoveStringToTBytes which takes a unicode string as input and creates a TBytes buffer. To convert the received HTML stream to a unicode string with the correct codepage, use IcsHtmlToStr in OverbyteIcsCharsetUtils, it's not yet used in OverbyteIcsHttpProt.pas to avoid linking in various charset tables applications may not need. The last argument determines whether entities like & £ and ☍ are converted to characters for display instead of HTML. UnicodeStr := IcsHtmlToStr(RcvdStream, HdrContentType, true); This is illustrated in the OverbyteIcsHttpsTst sample application, with the separate functions that may be alternatively used. OverbyteIcsProxy.pas has also been updated to check body meta for charset and convert html TBytes buffers to unicode instead of ANSI for the events that allow bodies to be examined and updated, illustrated in the OverbyteIcsProxySslServer sample. Angus -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be