Re: [twsocket] HTML encoding in HttpSrv func. TextToHtmlText()

Arno Garrels Thu, 09 Oct 2008 05:46:13 -0700

Francois Piette wrote:
>>> In your example, char #162 is replaced by "&cent;" in the html
>>> output. This represent the cnet character whatever the code page is.
>> 
>> Actually that is the bug, since #162 is the cent sign in CP 1252 but
>> not in CP 1251. This function is used to generate directory listings,
>> most file names including characters above #128 will be wrong
>> when the server does not run on Windows CP 1252.
> 
> Ah ! I understand now what you mean.
> The table used by TextToHtml should be based on the CP.


But how? As far as I know, and I searched a lot, there are no entity tables
available for other code pages than iso-8859-1.

> I suggest adding an optional second argument to TextToHtml so that
> the user can specify the code page to be used. This argument could be
> the array to be used for conversion. 

That would only work _IF_ different entity tables for different 
code pages existed which is IMO _not the case. That's why HTML entity
encoding should not be used for our purpose (mapped to character numbers).
Those entities are nice to have when you design webpages manually.
Their only purpose was to display particular characters _independent from
the browser's or the HTMl-page's ANSI code page.  

--
Arno Garrels


-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be

Re: [twsocket] HTML encoding in HttpSrv func. TextToHtmlText()

Reply via email to