On Tue, Mar 22, 2016 at 3:50 AM, Bernhard Grotz <[email protected]>
wrote:
>
> today I got a problem when calling ``make linkcheck`` inside of a sphinx
> project. Apparently, there is a link inside the project which contains the
> German Umlaut 'Ü' ('\xdc'). When reaching this URL, the linkcheck-builder
> raises
> an Error and stops working:
>
isn't that correct behavior? As I understand it, URLs can only contain
ASCII characters, though it seems there is movement to extend that to
"IRIs":
https://en.wikipedia.org/wiki/Internationalized_resource_identifier
So the question is -- what it Shinx's policy here? IF only to support the
older, robust ASCII only system then an Error does make sense here.
split[2] = quote_plus(split[2].encode('utf-8'),
> '/').decode('ascii')
> AttributeError: 'str' object has no attribute 'decode'
>
then this looks like a py2-py3 error. The old py2 Unicode object has a
decode() method, even though that actually makes no sense, but backwards
compatibility and all that... that should apparetnly be a bytes object at
that point, not a string.
(and maybe there would be no Error then, as encode_uri maybe CAN encode
that not-ascii character)
.. code-block:: python
>
> # handle non-ASCII URIs
> try:
> req_url.encode('ascii')
> except UnicodeError:
> req_url = encode_uri(req_url)
>
> Obviously, "UnicodeError" should be replaced with "UnicodeEncodeError"
> there.
>
if py2 supports that -- otherwise, UnicodeEncodeError appears to be a
subclass of UnicodeError, so this still gets caught.
> But the problem still keeps the same. Only changing
> ``req_url.encode('ascii')``
> to ``req_url.encode('utf-8')`` helps as a workaround, but then of course
> the
> checks of all URLs containing German Umlaute fail.
>
> Is there a better way to fix this problem?
>
I'm wondering why encode_uri isn't just called every time, anyway, but in
any case, it looks like it needs some py3 testing and fixing... Taking a
quick look:
def encode_uri(uri):
split = list(urlsplit(uri))
split[1] = split[1].encode('idna').decode('ascii')
so split[1] should be a unicode object now -- all good
split[2] = quote_plus(split[2].encode('utf-8'), '/').decode('ascii')
Here is where it barfed:
split[2].encode('utf-8') is now a bytes object -- good.
but we've getting an error on the decode() call, so:
quote_plus() must be returning a unicode object.
It looks like quote_plus() is coming from the six module:
urllib.quote(string[, safe])
Replace special characters in string using the %xx escape. Letters, digits,
and the characters '_.-' are never quoted. By default, this function is
intended for quoting the path section of the URL. The optional safe
parameter specifies additional characters that should not be quoted — its
default value is '/'.
Example: quote('/~connolly/') yields '/%7econnolly/'.
urllib.quote_plus(string[, safe])
Like quote(), but also replaces spaces by plus signs, as required for
quoting HTML form values when building up a query string to go into a URL.
Plus signs in the original string are escaped unless they are included in
safe. It also does not have safe default to '/'.
IF it's doing its job with non-ascii charactors, then it shoudl return an
ascii-compatible string (i.e. unicod object in py3), so the .decode()
should not be required.
of course, it may be there because it returns a py2string, and we want a
py2 unicode object.
personally, I think it's odd that it wouldn't return a Unicode string under
py2, but if that's the case, then this may need the ugly:
try:
split[2].encode('ascii')
except AttributeError:
pass
NOTE: I haven't tried to run any of this code.....
-CHB
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
[email protected]
--
You received this message because you are subscribed to the Google Groups
"sphinx-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sphinx-users.
For more options, visit https://groups.google.com/d/optout.