[sphinx-dev] Re: Some characters rendering wrongly in HTML output</span></a></span> </h1> <p class="darkgray font13"> <span class="sender pipe"><a href="/search?l=sphinx-dev@googlegroups.com&q=from:%22Guenter+Milde%22" rel="nofollow"><span itemprop="author" itemscope itemtype="http://schema.org/Person"><span itemprop="name">Guenter Milde</span></span></a></span> <span class="date"><a href="/search?l=sphinx-dev@googlegroups.com&q=date:20111125" rel="nofollow">Fri, 25 Nov 2011 07:39:49 -0800</a></span> </p> </div> <div itemprop="articleBody" class="msgBody"> <!--X-Body-of-Message--> <pre>On 2011-11-25, Friedrich Romstedt wrote: > Am 25.11.2011 um 08:44 schrieb Guenter Milde <mi...@users.sf.net>: >> On 2011-11-24, Friedrich Romstedt wrote:</pre><pre> >>> I'm experiencing some problem with Sphinx 1.1.2 (and also an earlier >>> version from July), that some characters in my HTML <title> are >>> occuring as strange Unicode character sequences in the HTML. Here's >>> an example: >>> <a rel="nofollow" href="http://www.roentgen.physik.uni-goettingen.de/~fromstedt/">http://www.roentgen.physik.uni-goettingen.de/~fromstedt/</a> >>> Watch the title displayed in the browser (not the headline, but the >>> title). The two-character sequence is hardcoded like this in the >>> HTML, apparently (inspection with Firefox "Show Source"). >> Show source reveals: >> <title>Welcome to Friedrich Romstedt’s IRP Home! &mdash; I >> RP - Friedrich Romstedt</title> Interestingly, the section title in the document body reads <h1>Welcome to Friedrich Romstedt&#8217;s IRP Home!... with a numerical replacement for the RIGHT SINGLE QUOTATION MARK. ... >> There is a similar problem with the "pop-up >> anchors" of the section headings. If I move the mouse over a section >> heading, I see something like: >> Welcome to Friedrich Romstedt’s IRP Home!¶ and the HTML source shows this also in other section titles: <h2>Overview<a class="headerlink" href="#overview" title="Permalink to this headline">¶</a></h2> >>> In that example, the two-char sequence results from an apostroph ' . >> Sphinx uses "smartypants" to convert >> Character ' (39, 0x27) APOSTROPHE >> to >> Character '’' (8217, 0x2019) RIGHT SINGLE QUOTATION MARK >> Maybe you can disable this replacement. > If I know how; although I would prefer a real fix / explanation. I don't know either. Should be somewhere in the Sphinx docs or in the config file template. ... >>> If anyone has an idea how to track this down, it's very much welcome. >> In "Romstedt’s", the three bytes of the UTF-8 representation of the >> Character '’' (8217, 0x2019) RIGHT SINGLE QUOTATION MARK >> are treated as three characters. > Three bytes? Meaning ' is an digraph even though it is Unicoded? I > remember there are several possibilities to encode some characters in > Unicode. It's not a digraph, but the UTF8-encoding of Unicode represents every character outside the ASCII range as a sequence of up to four bytes. ``¶`` are the two bytes representing 00B6 PILCROW SIGN in UTF-8 encoding. >> How do you specify the title? > Currently by the headline. I will try the .. title:: next. By headline, you mean a section title in the rst source like:: Welcome to Friedrich Romstedt's IRP Home! ----------------------------------------- ? (I try to exclude encoding problems in the configuration file or style sheets.) >> What is the locale encoding? ... from your other post I see that it is UTF-8 (should be fine). What are the settings for input encoding and output encoding in the Sphinx config file? Something strange is going on here: * the html source defines itself as utf-8:: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> * Unicode chars in the text are replaced with "numbered entities" (like &#8217;) * Unicode chars in "special places" (moved to the title, auto-added section marks) are "doube-encoded" in utf-8. I cannot tell the reason. Günter -- You received this message because you are subscribed to the Google Groups "sphinx-dev" group. To post to this group, send email to sphinx-dev@googlegroups.com. To unsubscribe from this group, send email to sphinx-dev+unsubscr...@googlegroups.com. For more options, visit this group at <a rel="nofollow" href="http://groups.google.com/group/sphinx-dev?hl=en">http://groups.google.com/group/sphinx-dev?hl=en</a>. </pre> </div> <div class="msgButtons margintopdouble"> <ul class="overflow"> <li class="msgButtonItems"><a class="button buttonleft " accesskey="p" href="msg05278.html">Previous message</a></li> <li class="msgButtonItems textaligncenter"><a class="button" accesskey="c" href="index.html#05279">View by thread</a></li> <li class="msgButtonItems textaligncenter"><a class="button" accesskey="i" href="maillist.html#05279">View by date</a></li> <li class="msgButtonItems textalignright"><a class="button buttonright " accesskey="n" href="msg05280.html">Next message</a></li> </ul> </div> <a name="tslice"></a> <div class="tSliceList margintopdouble"> <ul class="icons monospace"> <li class="icons-email"><span class="subject"><a href="msg05269.html">[sphinx-dev] Some characters rendering wrongly in HTML ...</a></span> <span class="sender italic">Friedrich Romstedt</span></li> <li><ul> <li class="icons-email"><span class="subject"><a href="msg05273.html">[sphinx-dev] Re: Some characters rendering wrongly...</a></span> <span class="sender italic">Viktor Haag</span></li> <li><ul> <li class="icons-email"><span class="subject"><a href="msg05274.html">Re: [sphinx-dev] Re: Some characters rendering...</a></span> <span class="sender italic">Friedrich Romstedt</span></li> </ul></li> <li class="icons-email"><span class="subject"><a href="msg05275.html">[sphinx-dev] Re: Some characters rendering wrongly...</a></span> <span class="sender italic">Guenter Milde</span></li> <li><ul> <li class="icons-email"><span class="subject"><a href="msg05276.html">Re: [sphinx-dev] Re: Some characters rendering...</a></span> <span class="sender italic">Friedrich Romstedt</span></li> <li><ul> <li class="icons-email"><span class="subject"><a href="msg05277.html">Re: [sphinx-dev] Re: Some characters rende...</a></span> <span class="sender italic">Friedrich Romstedt</span></li> <li class="icons-email"><span class="subject"><a href="msg05278.html">Re: [sphinx-dev] Re: Some characters rende...</a></span> <span class="sender italic">Friedrich Romstedt</span></li> <li class="icons-email tSliceCur"><span class="subject">[sphinx-dev] Re: Some characters rendering...</span> <span class="sender italic">Guenter Milde</span></li> <li><ul> <li class="icons-email"><span class="subject"><a href="msg05280.html">[sphinx-dev] Re: Some characters rende...</a></span> <span class="sender italic">Guenter Milde</span></li> <li><ul> <li class="icons-email"><span class="subject"><a href="msg05281.html">Re: [sphinx-dev] Re: Some charact...</a></span> <span class="sender italic">Georg Brandl</span></li> <li class="icons-email"><span class="subject"><a href="msg05282.html">Re: [sphinx-dev] Re: Some charact...</a></span> <span class="sender italic">Friedrich Romstedt</span></li> </ul></li> </ul></li> <li class="icons-email"><span class="subject"><a href="msg05288.html">AW: [sphinx-dev] Re: Some characters rende...</a></span> <span class="sender italic">Lothar Braun</span></li> </ul> </ul> </ul> </ul> </div> <div class="overflow msgActions margintopdouble"> <div class="msgReply" > <h2> Reply via email to </h2> <form method="POST" action="/mailto.php"> <input type="hidden" name="subject" value="[sphinx-dev] Re: Some characters rendering wrongly in HTML <title> output"> <input type="hidden" name="msgid" value="jaocrd$jjv$1@dough.gmane.org"> <input type="hidden" name="relpath" value="sphinx-dev@googlegroups.com/msg05279.html"> <input type="submit" value=" Guenter Milde "> </form> </div> </div> </div> <div class="aside" role="complementary"> <div class="logo"> <a href="/"><img src="/logo.png" width=247 height=88 alt="The Mail Archive"></a> </div> <form class="overflow" action="/search" method="get"> <input type="hidden" name="l" value="sphinx-dev@googlegroups.com"> <label class="hidden" for="q">Search the site</label> <input class="submittext" type="text" id="q" name="q" placeholder="Search sphinx-dev"> <input class="submitbutton" name="submit" type="image" src="/submit.png" alt="Submit"> </form> <div class="nav margintop" id="nav" role="navigation"> <ul class="icons font16"> <li class="icons-home"><a href="/">The Mail Archive home</a></li> <li class="icons-list"><a href="/sphinx-dev@googlegroups.com/">sphinx-dev - all messages</a></li> <li class="icons-about"><a href="/sphinx-dev@googlegroups.com/info.html">sphinx-dev - about the list</a></li> <li class="icons-expand"><a href="/search?l=sphinx-dev@googlegroups.com&q=subject:%22%5C%5Bsphinx%5C-dev%5C%5D+Re%5C%3A+Some+characters+rendering+wrongly+in+HTML+%3Ctitle%3E+output%22&o=newest&f=1" title="e" id="e">Expand</a></li> <li class="icons-prev"><a href="msg05278.html" title="p">Previous message</a></li> <li class="icons-next"><a href="msg05280.html" title="n">Next message</a></li> </ul> </div> <div class="listlogo margintopdouble"> </div> <div class="margintopdouble"> </div> </div> </div> <div class="footer" role="contentinfo"> <ul> <li><a href="/">The Mail Archive home</a></li> <li><a href="/faq.html#newlist">Add your mailing list</a></li> <li><a href="/faq.html">FAQ</a></li> <li><a href="/faq.html#support">Support</a></li> <li><a href="/faq.html#privacy">Privacy</a></li> <li class="darkgray">jaocrd$jjv$1@dough.gmane.org</li> </ul> </div> </body> </html>