Re: Mysterious hex C2 AD

2009-03-12 Thread Bernard Boase
On 2 Mar, James Bursa ja...@netsurf-browser.org wrote:

 On Sunday 01 March 2009, Bernard Boase wrote:
 Just looked at the site www.world-science.net

 Netsurf renders much of its text with inter-syllable sequences ­
 which, in the original HTML, are all hex C2 AD.

 What version of NetSurf? In Page - Info, what does Encoding say?

 NetSurf doesn't support soft hyphens but it should be displaying these as
 regular hyphens. What you're seeing seems to show that it isn't interpreting
 that page as UTF-8 correctly.

Page - Info does say Encoding: UTF-8 (from meta), and, yes, r6772 
displays them as regular hyphens (my original question arose from an 
earlier release). So I suppose it comes down to asking whether there 
are plans for NetSurf to handle soft hyphens like other browsers.

Thanks for the info.

-- 
Bernard



Mysterious hex C2 AD

2009-03-01 Thread Bernard Boase
Just looked at the site www.world-science.net

Netsurf renders much of its text with inter-syllable sequences ­ 
which, in the original HTML, are all hex C2 AD.

Is this legitimate HTML perhaps for automatic hyphenation or 
something? Should Netsurf edit it out? Firefox does.

Whilst HTML entity #xC2AD; seems to be valid,
http://www.fileformat.info/info/unicode/char/c2ad/index.htm
tell us that U+C2AD is not a valid unicode character.

Any idea what's going on?

-- 
Bernard



Re: Mysterious hex C2 AD

2009-03-01 Thread Keith Hopper
In article bdf1473550.bo...@boase.demon.co.uk,
   Bernard Boase b.bo...@bcs.org wrote:
 Just looked at the site www.world-science.net

 Netsurf renders much of its text with inter-syllable sequences ­
 which, in the original HTML, are all hex C2 AD.

 This is utf-8 for soft hyphen. Netsurf isn't handling this encoding
it seems - which is intended to give a hint to a browser as to how a word
could be split across a line boundary as in printing hyphenation. If there
is no need to break across a line boundary then the hyphen should be
silently ignored - as does Firefox.

 Is this legitimate HTML perhaps for automatic hyphenation or
 something? Should Netsurf edit it out? Firefox does.

 Whilst HTML entity #xC2AD; seems to be valid,
 http://www.fileformat.info/info/unicode/char/c2ad/index.htm
 tell us that U+C2AD is not a valid unicode character.

 I'm sorry to say that all of the different 'encodings' on that web
document are generated on the fly as the document is being served -
auto-magically - but blindly. If the code is not valid as a Unicode then
that is it - allbets are off!  The utf-8 is the correct encoding for the
Unicode code point U+00AD - try looking at

http://www.fileformat.info/info/unicode/char/00ad/index.htm

 Keith

-- 
Inspired!



Re: Mysterious hex C2 AD

2009-03-01 Thread James Bursa
On Sunday 01 March 2009, Bernard Boase wrote:
 Just looked at the site www.world-science.net

 Netsurf renders much of its text with inter-syllable sequences ­
 which, in the original HTML, are all hex C2 AD.

What version of NetSurf? In Page - Info, what does Encoding say?

NetSurf doesn't support soft hyphens but it should be displaying these as 
regular hyphens. What you're seeing seems to show that it isn't interpreting 
that page as UTF-8 correctly.

James

-- 
James Bursa, NetSurf developerhttp://www.netsurf-browser.org/