On 2011-11-25, Friedrich Romstedt wrote: > Am 25.11.2011 um 08:44 schrieb Guenter Milde <mi...@users.sf.net>: >> On 2011-11-24, Friedrich Romstedt wrote:
>>> I'm experiencing some problem with Sphinx 1.1.2 (and also an earlier >>> version from July), that some characters in my HTML <title> are >>> occuring as strange Unicode character sequences in the HTML. Here's >>> an example: >>> http://www.roentgen.physik.uni-goettingen.de/~fromstedt/ >>> Watch the title displayed in the browser (not the headline, but the >>> title). The two-character sequence is hardcoded like this in the >>> HTML, apparently (inspection with Firefox "Show Source"). >> Show source reveals: >> <title>Welcome to Friedrich Romstedt’s IRP Home! — I >> RP - Friedrich Romstedt</title> Interestingly, the section title in the document body reads <h1>Welcome to Friedrich Romstedt’s IRP Home!... with a numerical replacement for the RIGHT SINGLE QUOTATION MARK. ... >> There is a similar problem with the "pop-up >> anchors" of the section headings. If I move the mouse over a section >> heading, I see something like: >> Welcome to Friedrich Romstedt’s IRP Home!¶ and the HTML source shows this also in other section titles: <h2>Overview<a class="headerlink" href="#overview" title="Permalink to this headline">¶</a></h2> >>> In that example, the two-char sequence results from an apostroph ' . >> Sphinx uses "smartypants" to convert >> Character ' (39, 0x27) APOSTROPHE >> to >> Character '’' (8217, 0x2019) RIGHT SINGLE QUOTATION MARK >> Maybe you can disable this replacement. > If I know how; although I would prefer a real fix / explanation. I don't know either. Should be somewhere in the Sphinx docs or in the config file template. ... >>> If anyone has an idea how to track this down, it's very much welcome. >> In "Romstedt’s", the three bytes of the UTF-8 representation of the >> Character '’' (8217, 0x2019) RIGHT SINGLE QUOTATION MARK >> are treated as three characters. > Three bytes? Meaning ' is an digraph even though it is Unicoded? I > remember there are several possibilities to encode some characters in > Unicode. It's not a digraph, but the UTF8-encoding of Unicode represents every character outside the ASCII range as a sequence of up to four bytes. ``¶`` are the two bytes representing 00B6 PILCROW SIGN in UTF-8 encoding. >> How do you specify the title? > Currently by the headline. I will try the .. title:: next. By headline, you mean a section title in the rst source like:: Welcome to Friedrich Romstedt's IRP Home! ----------------------------------------- ? (I try to exclude encoding problems in the configuration file or style sheets.) >> What is the locale encoding? ... from your other post I see that it is UTF-8 (should be fine). What are the settings for input encoding and output encoding in the Sphinx config file? Something strange is going on here: * the html source defines itself as utf-8:: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> * Unicode chars in the text are replaced with "numbered entities" (like ’) * Unicode chars in "special places" (moved to the title, auto-added section marks) are "doube-encoded" in utf-8. I cannot tell the reason. Günter -- You received this message because you are subscribed to the Google Groups "sphinx-dev" group. To post to this group, send email to sphinx-dev@googlegroups.com. To unsubscribe from this group, send email to sphinx-dev+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sphinx-dev?hl=en.