[Python-modules-team] Bug#720341: [python-sphinxcontrib.spelling] Spellchecker is not unicode avare in PythonBuiltinsFilter class

Slavko Tue, 20 Aug 2013 11:58:17 -0700

Package: python-sphinxcontrib.spelling
Version: 1.4-1
Severity: normal
Tags: patch


Hi,

the package has a problem with unicode strings/words in in the
PythonBuiltinsFilter's _skip method and when i try to check rst
documents written in Slovak language i get:

Exception occurred:
  File "/usr/lib/pymodules/python2.7/sphinx/application.py", line 204,
in build
    self.builder.build_update()
  File "/usr/lib/pymodules/python2.7/sphinx/builders/__init__.py", line
191, in build_update
    self.build(['__all__'], to_build)
  File "/usr/lib/pymodules/python2.7/sphinx/builders/__init__.py", line
252, in build
    self.write(docnames, list(updated_docnames), method)
  File "/usr/lib/pymodules/python2.7/sphinx/builders/__init__.py", line
292, in write
    self.write_doc(docname, doctree)
  File "/usr/lib/pymodules/python2.7/sphinxcontrib/spelling.py", line
295, in write_doc
    for word, suggestions in self.checker.check(node.astext()):
  File "/usr/lib/pymodules/python2.7/sphinxcontrib/spelling.py", line
203, in check
    for word, pos in self.tokenizer(text):
  File "/usr/lib/python2.7/dist-packages/enchant/tokenize/__init__.py",
line 389, in next
    (word,pos) = next(self._tokenizer)
  File "/usr/lib/python2.7/dist-packages/enchant/tokenize/__init__.py",
line 389, in next
    (word,pos) = next(self._tokenizer)
  File "/usr/lib/python2.7/dist-packages/enchant/tokenize/__init__.py",
line 390, in next
    while self._skip(word):
  File "/usr/lib/pymodules/python2.7/sphinxcontrib/spelling.py", line
150, in _skip
    return hasattr(__builtin__, word)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in
position 2: ordinal not in range(128)

After some inspection i found, that sphinx sends all words as unicode
strings (<type 'unicode'>), not matter if they have non ASCII chars or
not, but when there are non ASCII chars here is a problem,
because the hasattr function gets the *str* as argument. Solution seems
to be to add "encode...", to convert from unicode to the str type (at
line 149):

   return hasattr(__builtin__, word.encode("utf-8"))

I am not sure if it is workaround or solution, but seems to work for
English texts too. Patch attached.

regards

--- System information. ---
Architecture: amd64
Kernel:       Linux 3.10-2-amd64

Debian Release: jessie/sid

--- Package information. ---
Depends               (Version) | Installed
===============================-+-============
python                          | 2.7.5-2
python-support      (>= 0.90.0) | 1.0.15
python-docutils                 | 0.10-3
python-enchant                  | 1.6.5-2
python-sphinx                   | 1.1.3+dfsg-8


-- 
Slavko
http://slavino.sk

--- /usr/share/pyshared/sphinxcontrib/spelling.py	2012-08-04 17:14:39.000000000 +0200
+++ /tmp/spelling.py	2013-08-20 18:47:42.000000000 +0200
@@ -146,7 +146,7 @@
     """Ignore names of built-in Python symbols.
     """
     def _skip(self, word):
-        return hasattr(__builtin__, word)
+        return hasattr(__builtin__, word.encode("utf-8"))
 
 class ImportableModuleFilter(Filter):
     """Ignore names of modules that we could import.

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Python-modules-team mailing list
[email protected]
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/python-modules-team

[Python-modules-team] Bug#720341: [python-sphinxcontrib.spelling] Spellchecker is not unicode avare in PythonBuiltinsFilter class

Reply via email to