Günter, thanks for your response.

The conf.py did not have a source_encoding specified.  So I assume it would 
just default to 'utf-8-sig'.  Even explicitly specifying the encoding as 
'utf-8-sig' produced the same error.

The snippet in the rst document that is causing the error is (also 
specified in the original post):

*data = 'word,length\nTr\xe4umen,7\nGr\xfc\xdfe,5'*


The complete rst document can be found 
here<https://raw.github.com/pydata/pandas/master/doc/source/io.rst>. 
 The resulting html should look like 
this<http://pandas.pydata.org/pandas-docs/dev/io.html#dealing-with-unicode-data>.
 
 

One thing that I just realized is that other developers who have built the 
docs have built them exclusively on a Linux box.  However, I am working off 
a Ubuntu 12.04 virtual machine running on Windows 7.  So I'm not entirely 
convicted the the input file is broken and that it might be a platform 
dependent issue.  




On Thursday, April 18, 2013 2:06:43 AM UTC-5, Guenter Milde wrote:
>
> On 2013-04-17, Conway M wrote: 
>
>
> > I am trying to compile the docs of Pandas 
> > <https://github.com/pydata/pandas>but I am unable to get Sphinx to 
> > compile a document with some unicode.  Is there some flag I need to 
> > specify to let Sphinx correctly build documents with unicode in them? 
>
> The default input encoding is 'utf8', so if your rst document is 
> utf8-encoded, it should be OK. 
>
> If not, please post more details (used encoding, docutils settings). 
> A minimal example (the part of the input file that coused the error) may 
> help further. 
>
> > In this case, I don't want Sphinx to decode the text. 
>
> Docutils/Sphinx will always decode the input into an "unicode" instance 
> and encode the output. All inner processing is done on "unicode" (or 
> derived) objects. 
>
> ... 
>
> >> *  File "/usr/local/lib/python2.7/dist-packages/sphinx/environment.py", 
> >> line 609, in read_doc 
> >>     raise SphinxError(str(err)) 
> >> *SphinxError: 'utf8' codec can't decode byte 0xe4 in position 36: 
> invalid 
> >> continuation byte 
> >> *> 
> >> 
> /usr/local/lib/python2.7/dist-packages/sphinx/environment.py(609)read_doc() 
> >> -> raise SphinxError(str(err)) 
> >> (Pdb) 
>
> It looks like the input file is either broken or not in utf8 encoding 
> (which 
> then?). 
>
> It looks like the input decoding is not done by docutils.io, but by the 
> Sphinx "wrapper" - this means you must tell Sphinx about the correct 
> "source_encoding" 
> http://sphinx-doc.org/config.html#confval-source_encoding.   
> Setting the Docutils config setting "input-encoding" 
> http://docutils.sourceforge.net/docs/user/config.html#input-encoding will 
> not help. 
>
> Günter 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sphinx-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sphinx-users?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to