Re: Mysterious hex C2 AD
On 2 Mar, James Bursa ja...@netsurf-browser.org wrote: On Sunday 01 March 2009, Bernard Boase wrote: Just looked at the site www.world-science.net Netsurf renders much of its text with inter-syllable sequences  which, in the original HTML, are all hex C2 AD. What version of NetSurf? In Page - Info, what does Encoding say? NetSurf doesn't support soft hyphens but it should be displaying these as regular hyphens. What you're seeing seems to show that it isn't interpreting that page as UTF-8 correctly. Page - Info does say Encoding: UTF-8 (from meta), and, yes, r6772 displays them as regular hyphens (my original question arose from an earlier release). So I suppose it comes down to asking whether there are plans for NetSurf to handle soft hyphens like other browsers. Thanks for the info. -- Bernard
Mysterious hex C2 AD
Just looked at the site www.world-science.net Netsurf renders much of its text with inter-syllable sequences  which, in the original HTML, are all hex C2 AD. Is this legitimate HTML perhaps for automatic hyphenation or something? Should Netsurf edit it out? Firefox does. Whilst HTML entity #xC2AD; seems to be valid, http://www.fileformat.info/info/unicode/char/c2ad/index.htm tell us that U+C2AD is not a valid unicode character. Any idea what's going on? -- Bernard
Re: Mysterious hex C2 AD
In article bdf1473550.bo...@boase.demon.co.uk, Bernard Boase b.bo...@bcs.org wrote: Just looked at the site www.world-science.net Netsurf renders much of its text with inter-syllable sequences  which, in the original HTML, are all hex C2 AD. This is utf-8 for soft hyphen. Netsurf isn't handling this encoding it seems - which is intended to give a hint to a browser as to how a word could be split across a line boundary as in printing hyphenation. If there is no need to break across a line boundary then the hyphen should be silently ignored - as does Firefox. Is this legitimate HTML perhaps for automatic hyphenation or something? Should Netsurf edit it out? Firefox does. Whilst HTML entity #xC2AD; seems to be valid, http://www.fileformat.info/info/unicode/char/c2ad/index.htm tell us that U+C2AD is not a valid unicode character. I'm sorry to say that all of the different 'encodings' on that web document are generated on the fly as the document is being served - auto-magically - but blindly. If the code is not valid as a Unicode then that is it - allbets are off! The utf-8 is the correct encoding for the Unicode code point U+00AD - try looking at http://www.fileformat.info/info/unicode/char/00ad/index.htm Keith -- Inspired!
Re: Mysterious hex C2 AD
On Sunday 01 March 2009, Bernard Boase wrote: Just looked at the site www.world-science.net Netsurf renders much of its text with inter-syllable sequences  which, in the original HTML, are all hex C2 AD. What version of NetSurf? In Page - Info, what does Encoding say? NetSurf doesn't support soft hyphens but it should be displaying these as regular hyphens. What you're seeing seems to show that it isn't interpreting that page as UTF-8 correctly. James -- James Bursa, NetSurf developerhttp://www.netsurf-browser.org/