DZ-Jay wrote:
> Actually, I think Arno is correct, but it's a bit more complex than
> that:
> The entities conversion depend strictly on the local character set.
> That is, each character set *may* map differently (as Arno just
> discovered for the "cent" character between CP-1252 and CP-1251);
> there is no "universal" conversion, that is, because the entities
> represent semantically equivalent characters in differing
> representations, not specific character codes.
> For this reason, the best solution is usually to use Unicode (UTF-8)
> in HTML output.  

Probably correct if speed doesn't matter and internationalization
was the goal.  

I guess that decimal notated characters below #255 are treated as ANSI
in the context of the content charset spezified in the HTML header, is that 
correct? If so, the fastest fix was just to add the correct content charset
to the HTML header and to use decimal notation, provided that 
internationalization doesn't matter much. 

Character numbers above #255 are rendered as Unicode code points (tested). 
I only wonder whether browsers treat characters below #255 also as Unicode
code points once they found one character above #255?

> If you specify UTF-8 as the content character set in
> the HTML header, then you only need to encode as entities the
> metacharacters:  ampersand, non-breaking-space, and left- and
> right-angled brackets.


Arno Garrels

> As for HttpSrv.TextToHtmlText() method, it should take the content
> character set into consideration.  However, if the mappings are too
> different, maintaining many different tables may not be practical.
> dZ.
> On Oct 9, 2008, at 05:09, Arno Garrels wrote:
>> Francois Piette wrote:
>>>> Or am I missing something?
>>> I think so. Using "html entities" make sure the correct character is
>>> represented whatever the character set or character code is used by
>>> the browser.
>> That's correct, but the server maps the wrong HTML entities if it
>> doesn't run
>> in a locale that uses CP 1252!
>> For example:
>> Currently  char #162 is hard coded to represent the cent sign:
>> HTML Entity: 'cent'   , { #162 cent sign
>>           }
>> In windows-1251 however #162 maps to the small kyrillic letter U
>> (short).
> --
> DZ-Jay [TeamICS]
To unsubscribe or change your settings for TWSocket mailing list
please goto
Visit our website at

Reply via email to