Hi,
Jeroen Ruigrok van der Werven asmodai at in-nomine.org writes:
Would people object if such functionality got added to urllib?
I would ;-) There are IRIs, just that nobody wrote a useful module for that.
There are algorithms in the RFC that can convert URIs to IRIs and the other way
round.
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf
Of Jeroen Ruigrok van der Werven
Sent: Wednesday, May 07, 2008 05:20
To: Tom Pinckney
Cc: python-dev@python.org
Subject: Re: [Python-Dev] urllib unicode handling
-On [20080507 04:06], Tom Pinckney
Martin v. Löwis wrote:
The proper way to implement this would be IRIs (RFC 3987),
in particular section 3.1. This is not as simple as just
encoding it as UTF-8, as you might have to apply IDNA to
the host part.
Code doing so just hasn't been contributed yet.
But if someone wanted to do so,
Maybe I didn't understand the RFC quite right, but it seemed like how
to handle hostnames was left as a choice between IDNA encoding the
hostname or replacing the non-ascii characters with dashes? I guess in
practice IDNA is the right decision.
Another part I wasn't clear on is whether
Maybe I didn't understand the RFC quite right, but it seemed like how to
handle hostnames was left as a choice between IDNA encoding the hostname
or replacing the non-ascii characters with dashes? I guess in practice
IDNA is the right decision.
I haven't fully understood it, either, but I
If this is indeed the case, it sounds perfectly legal (according to the
RFC) and perfectly practical (as required by numerous popular websites)
to have urllib.quote and urllib.quote_plus do an automatic UTF-8
encoding of unicode strings before percent encoding them.
It's probably legal, but I
I was assuming urllib.quote/unquote would only be called on text
intended to be used in non-hostname portions of the URIs. I'm not sure
if this is the actual intent of urllib.quote and perhaps the
documentation should be updated to specify what precisely it does and
then peopel can decide
Hi,
While trying to use urllib in python 2.5.1 to HTTP GET content from
various web sites, I've run into a problem with urllib.quote
(and .quote_plus): they don't accept unicode strings.
I see that this is an issue that has been discussed before:
see this thread:
Thanks for any thoughts on this,
The proper way to implement this would be IRIs (RFC 3987),
in particular section 3.1. This is not as simple as just
encoding it as UTF-8, as you might have to apply IDNA to
the host part.
Code doing so just hasn't been contributed yet.
Regards,
Martin
-On [20080507 04:06], Tom Pinckney ([EMAIL PROTECTED]) wrote:
While in theory UTF-8 is not a standard, sites like Last.fm, Facebook and
Wikipedia seem to have embraced it (as have pretty much all other major web
sites). As with HTML, there is what the standard says and what the actual
browsers
10 matches
Mail list logo