Fastream Technologies wrote: > Hello Arno, > > If the function is ready, I would like to test it in our special unit > for HTML folder listings. Can you post it here or send privately?
Yes, I checked it in already. Install the TortoiseSVN client to get access to the ICS SVN repository. In Icsv6 URL encoding/decoding do not use UTF-8. The file names in directory listings are now _displayed_ correctly, but their links have not changed, they still are ANSI. I can easily change URL coding in v6 to UTF-8 as well, however I wonder whether that would break existing applications? -- Arno Garrels > > Best Regards, > SZ > On Fri, Oct 10, 2008 at 4:41 PM, Arno Garrels <[EMAIL PROTECTED]> > wrote: > >> Arno Garrels wrote: >>> Francois PIETTE wrote: >>>>> But 3 bytes looks like UTF-8 ? >>>> >>>> I don't know. You said it was UTF-16 if not encoded. >>> >>> I installed IIS 7 on my Vista box and I found that IIS 7 >>> uses UTF-7 in directory listings. >> >> Arrgh, typo above, IIS v7 uses UTF-8 of course! >> >>> The HTTP header contains >>> the "charset=UTF-8" content-type extension. >>> >>> >>> However I think the ICS server should continue to use HTML >>> enitities. >>> HTML entities represent both iso-8859-1 (Latin1) and Unicode >>> character numbers (in Unicode the first 256 chars are the same as >>> Latin1). So in order to create a _valid_ mapping a AnsiString MUST >>> be converted with current ANSI code page to a >>> UnicodeString/WideString first! This can be achieved easily in >>> TextToHtmlText() by a local WideString variable that is assigned >>> parameter Src : String. Characters above #255 must the be >>> represented as numerical HTML entities (&#nnnn;). That's all, fully >>> backwards compatible and >>> works in D2009 as well :) >>> >>> -- >>> Arno Garrels >>> >>> >>>> >>>> ----- Original Message ----- >>>> From: "Arno Garrels" <[EMAIL PROTECTED]> >>>> To: "ICS support mailing" <twsocket@elists.org> >>>> Sent: Thursday, October 09, 2008 7:03 PM >>>> Subject: Re: [twsocket] HTML encoding in HttpSrv func. >>>> TextToHtmlText() >>>> >>>> >>>>> Francois PIETTE wrote: >>>>>>> The twothird character is not 'encoded' either as "⅔" >>>>>>> (decimal) or as "⅔" (hex)? If so, IIS sends plain UTF-16! >>>>>> >>>>>> Yes, no encoding at all. Just the 3 bytes. So UTF-16. >>>>> >>>>> But 3 bytes looks like UTF-8 ? >>>>> >>>>> -- >>>>> Arno Garrels >>>>> >>>>>> >>>>>> -- >>>>>> [EMAIL PROTECTED] >>>>>> http://www.overbyte.be >>>>>> >>>>>> >>>>>> ----- Original Message ----- >>>>>> From: "Arno Garrels" <[EMAIL PROTECTED]> >>>>>> To: "ICS support mailing" <twsocket@elists.org> >>>>>> Sent: Thursday, October 09, 2008 5:26 PM >>>>>> Subject: Re: [twsocket] HTML encoding in HttpSrv func. >>>>>> TextToHtmlText() >>>>>> >>>>>> >>>>>>> Francois Piette wrote: >>>>>>>>> Yes, if someone has Apache or a newer IIS installed he could >>>>>>>>> help. Create a file name with characters not in current ANSI >>>>>>>>> code page by copy those characters from the Windows >>>>>>>>> application charmap.exe. Than start a packet sniffer and log >>>>>>>>> a directory listing. >>>>>>>> >>>>>>>> Using IIS6 on W2K3. >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>>> The twothird character (U+2154) is sent in the dirlist as 3 >>>>>>>> characters : 0xE2 0x85 0x94. In the href link, the 3 characters >>>>>>>> are expressed as %e2%85%94 >>>>>>> >>>>>>> That's UTF-8 URL-encoded. >>>>>>> >>>>>>>> while they are binary in the text itself. >>>>>>> >>>>>>> The twothird character is not 'encoded' either as "⅔" >>>>>>> (decimal) or as "⅔" (hex)? If so, IIS sends plain UTF-16! >>>>>>> >>>>>>>> There is nothing in the html header to tell which code page or >>>>>>>> charset is used. -- >>>>>>> >>>>>>> Browsers seem to be very good in detecting the correct character >>>>>>> set nowadays. >>>>>>> >>>>>>> -- >>>>>>> Arno Garrels -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be