Am 25.11.2011 um 08:44 schrieb Guenter Milde <mi...@users.sf.net>: > On 2011-11-24, Friedrich Romstedt wrote: >> Hi, > >> I'm experiencing some problem with Sphinx 1.1.2 (and also an earlier >> version from July), that some characters in my HTML <title> are >> occuring as strange Unicode character sequences in the HTML. Here's >> an example: > >> http://www.roentgen.physik.uni-goettingen.de/~fromstedt/ > >> Watch the title displayed in the browser (not the headline, but the >> title). The two-character sequence is hardcoded like this in the >> HTML, apparently (inspection with Firefox "Show Source"). > > Show source reveals: > > <title>Welcome to Friedrich Romstedt’s IRP Home! — I > RP - Friedrich Romstedt</title> > > Is this the same as in your *.html source (if you look at it in a text > editor) or is there some additional change introduced on its way over the web? > > Looks like an encoding problem.
Yes, that's what I've done too, with the same diagnosis. >> I have no idea how to boil this down since it's just the <title> which >> is affected. > > There is a similar problem with the "pop-up > anchors" of the section headings. If I move the mouse over a section > heading, I see something like: > > Welcome to Friedrich Romstedt’s IRP Home!¶ Precisely. Forgot to mention that. >> In that example, the two-char sequence results from an apostroph ' . > > AFAIK, Sphinx uses "smartypants" to convert > > Character ' (39, 0x27) APOSTROPHE > * neutral (vertical) glyph with mixed usage > > to > > Character '’' (8217, 0x2019) > 2019 RIGHT SINGLE QUOTATION MARK > = single comma quotation mark > * this is the preferred character to use for apostrophe > > At least this is what I see in the first section heading. > > Maybe you can disable this replacement. If I know how; although I would prefer a real fix / explanation. >> I think that my vim on my institute computer writes out just plain >> ASCII (i.e., not 16bit Unicode). ``file`` says:: > >> ASCII English text > > If it really writes just ASCII, how do you write Röntgen in your source > files? I don't. I write Roentgen. I switched my keyboard to US layout since DE layout is terrible for programming too. >> If anyone has an idea how to track this down, it's very much welcome. > > In "Romstedt’s", the three bytes of the UTF-8 representation of the > Character '’' (8217, 0x2019) RIGHT SINGLE QUOTATION MARK > are treated as three characters. Three bytes? Meaning ' is an digraph even though it is Unicoded? I remember there are several possibilities to encode some characters in Unicode. > How do you specify the title? Currently by the headline. I will try the .. title:: next. > What is the locale encoding? Mixed. The system has german i18n but I changed keyboard layout to US so I guess LOCALE is us_US.something. I remember MATLAB complaining about inconsistencies upon startup. I will investigate it. CU Friedrich -- You received this message because you are subscribed to the Google Groups "sphinx-dev" group. To post to this group, send email to sphinx-dev@googlegroups.com. To unsubscribe from this group, send email to sphinx-dev+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sphinx-dev?hl=en.