Re: [Zope-dev] Non-ASCII characters in URLs

2008-04-09 Thread Tino Wildenhain

Dieter Maurer wrote:

Wichert Akkerman wrote at 2008-4-7 20:45 +0200:

...

Almost surely, Alexander wants to ask why Zope does not allow
non-ASCII characters in ids.

And, in fact, there are only two reasons:

  *  lazyness of the Zope developpers:

 without the restriction to ASCII characters
 careful quoting (and unquoting) is necessary
 in order to adhere to RFC 2396 (the modern uri syntax specification)

This is becoming increasingly painful


I will soon have a patch against Zope 2.11b1
which gets rid of this restriction.

If there is consense, I can add it to the Zope repository.


+1 from my side. Saves me the work to cleanup my own dirty
patch :-))

___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce

http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Non-ASCII characters in URLs

2008-04-07 Thread Martijn Pieters
On Mon, Apr 7, 2008 at 1:37 AM, Alexander Limi [EMAIL PROTECTED] wrote:
  Is there a good technical explanation for why Zope doesn't allow non-ASCII
 characters in URLs?

Because URLs don't allow non-ASCII characters?

  I'd like to be able to let URLs work like this example from Wikipedia:

  http://ja.wikipedia.org/wiki/メインページ

Your browser translates that into
http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8

  Is there a fundamental reason (ie. Python objects can only be ASCII) or is
 it simply bugs that need to be fixed?

RFC 1738 (http://www.ietf.org/rfc/rfc1738.txt) doesn't allow non-ascii
characters in URLs.

   No corresponding graphic US-ASCII:

   URLs are written only with the graphic printable characters of the
   US-ASCII coded character set. The octets 80-FF hexadecimal are not
   used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent
   control characters; these must be encoded.

Now, Zope could well support UTF-8 ids, and translate URLs
appropriately, but in the meantime you could use the same scheme?

-- 
Martijn Pieters
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Non-ASCII characters in URLs

2008-04-07 Thread Jonathan


- Original Message - 
From: Martijn Pieters [EMAIL PROTECTED]

To: Alexander Limi [EMAIL PROTECTED]
Cc: zope-dev@zope.org
Sent: Monday, April 07, 2008 4:39 AM
Subject: Re: [Zope-dev] Non-ASCII characters in URLs



On Mon, Apr 7, 2008 at 1:37 AM, Alexander Limi [EMAIL PROTECTED] wrote:
 Is there a good technical explanation for why Zope doesn't allow 
non-ASCII

characters in URLs?


Because URLs don't allow non-ASCII characters?


 I'd like to be able to let URLs work like this example from Wikipedia:

 http://ja.wikipedia.org/wiki/メインページ


Your browser translates that into
http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8

 Is there a fundamental reason (ie. Python objects can only be ASCII) or 
is

it simply bugs that need to be fixed?


RFC 1738 (http://www.ietf.org/rfc/rfc1738.txt) doesn't allow non-ascii
characters in URLs.

  No corresponding graphic US-ASCII:

  URLs are written only with the graphic printable characters of the
  US-ASCII coded character set. The octets 80-FF hexadecimal are not
  used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent
  control characters; these must be encoded.

Now, Zope could well support UTF-8 ids, and translate URLs
appropriately, but in the meantime you could use the same scheme?


IDNA (http://www.ietf.org/rfc/rfc3490.txt) and Punycode 
(http://www.faqs.org/rfcs/rfc3492.html) may be of some use.


Jonathan


___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce

http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Non-ASCII characters in URLs

2008-04-07 Thread Dieter Maurer
Martijn Pieters wrote at 2008-4-7 10:39 +0200:
On Mon, Apr 7, 2008 at 1:37 AM, Alexander Limi [EMAIL PROTECTED] wrote:
  Is there a good technical explanation for why Zope doesn't allow non-ASCII
 characters in URLs?

Because URLs don't allow non-ASCII characters?

Almost surely, Alexander wants to ask why Zope does not allow
non-ASCII characters in ids.

And, in fact, there are only two reasons:

  *  lazyness of the Zope developpers:

 without the restriction to ASCII characters
 careful quoting (and unquoting) is necessary
 in order to adhere to RFC 2396 (the modern uri syntax specification)

  *  there is no way to specify the encoding used for non ASCII characters.

 HTML 4 suggests to convert non ASCII characters first to
 UTF-8 and then url escape the result
 but most HTTP clients do not follow this suggestion.
 Instead, they use the charset found one the page
 that cause them to construct the uri.

 I have observed that MS WebDAV from some WebDAV commands
 transfers the url as given and for some other
 commands recodes them into utf-8.

 Thus, supporting non ASCII ids occationally may cause
 surprises.



-- 
Dieter
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Non-ASCII characters in URLs

2008-04-07 Thread Wichert Akkerman
Previously Dieter Maurer wrote:
 Martijn Pieters wrote at 2008-4-7 10:39 +0200:
 On Mon, Apr 7, 2008 at 1:37 AM, Alexander Limi [EMAIL PROTECTED] wrote:
   Is there a good technical explanation for why Zope doesn't allow non-ASCII
  characters in URLs?
 
 Because URLs don't allow non-ASCII characters?
 
 Almost surely, Alexander wants to ask why Zope does not allow
 non-ASCII characters in ids.
 
 And, in fact, there are only two reasons:
 
   *  lazyness of the Zope developpers:
 
  without the restriction to ASCII characters
  careful quoting (and unquoting) is necessary
  in order to adhere to RFC 2396 (the modern uri syntax specification)

This is becoming increasingly painful: it means we can't really use Active
Directory's ObjectGUID as userid, it breaks with LDAP DN's with
non-ASCII characters (all too common). I really wish Zope ID's were
either binary strings or unicode strings.

   *  there is no way to specify the encoding used for non ASCII characters.
 
  HTML 4 suggests to convert non ASCII characters first to
  UTF-8 and then url escape the result
  but most HTTP clients do not follow this suggestion.
  Instead, they use the charset found one the page
  that cause them to construct the uri.
 
  I have observed that MS WebDAV from some WebDAV commands
  transfers the url as given and for some other
  commands recodes them into utf-8.
 
  Thus, supporting non ASCII ids occationally may cause
  surprises.

You mean non ASCII URI's, not non ASCII ids here I suspect.  Somehow I'm
not surprised those are painful :(

Wichert.

-- 
Wichert Akkerman [EMAIL PROTECTED]It is simple to make things.
http://www.wiggy.net/   It is hard to make things simple.
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Non-ASCII characters in URLs

2008-04-07 Thread Dieter Maurer
Wichert Akkerman wrote at 2008-4-7 20:45 +0200:
 ...
 Almost surely, Alexander wants to ask why Zope does not allow
 non-ASCII characters in ids.
 
 And, in fact, there are only two reasons:
 
   *  lazyness of the Zope developpers:
 
  without the restriction to ASCII characters
  careful quoting (and unquoting) is necessary
  in order to adhere to RFC 2396 (the modern uri syntax specification)

This is becoming increasingly painful

I will soon have a patch against Zope 2.11b1
which gets rid of this restriction.

If there is consense, I can add it to the Zope repository.

 ...
   *  there is no way to specify the encoding used for non ASCII characters.
 
  HTML 4 suggests to convert non ASCII characters first to
  UTF-8 and then url escape the result
  but most HTTP clients do not follow this suggestion.
  Instead, they use the charset found one the page
  that cause them to construct the uri.
 
  I have observed that MS WebDAV from some WebDAV commands
  transfers the url as given and for some other
  commands recodes them into utf-8.
 
  Thus, supporting non ASCII ids occationally may cause
  surprises.

You mean non ASCII URI's, not non ASCII ids here I suspect.  Somehow I'm
not surprised those are painful :(

No, I mean non-ASCII ids.

They lead to uris with some escaped characters and MS WebDAV for some commands
unescapes the uris, interprets them in some default charset (windows-1252
in our case), recodes them in utf-8,
escapes them again and then uses them in the commands.
Examples are the COPY and MOVE commands. If an object has
a non ASCII charater in its id, say tüv, its url
may look like http:.../t%FCv. Used in a COPY or MOVE,
it is however represented as http:.../t%C2%BCb.



-- 
Dieter
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Non-ASCII characters in URLs

2008-04-06 Thread Paul Winkler
On Sun, Apr 06, 2008 at 04:37:22PM -0700, Alexander Limi wrote:
 Hi,

 Is there a good technical explanation for why Zope doesn't allow non-ASCII 
 characters in URLs?

I suspect it's only for hysterical raisins.  The code in question is
in OFS/ObjectManager.py, in the checkValidId() function.  Non-ASCII
characters trigger a match on the bad_id regular expression search.
As I recall, if you look at the revision history, that code is very
old.

There might even be an existing bug filed about this; I don't
remember.

-- 

Paul Winkler
http://www.slinkp.com
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Non-ASCII characters in URLs

2008-04-06 Thread Andreas Jung



--On 6. April 2008 16:37:22 -0700 Alexander Limi [EMAIL PROTECTED] wrote:


Hi,

Is there a good technical explanation for why Zope doesn't allow
non-ASCII characters in URLs?

I'd like to be able to let URLs work like this example from Wikipedia:

http://ja.wikipedia.org/wiki/メインページ

When I try adding an object with ID メインページ in Zope 2, I get
the following error message:

Error Type: BadRequest
Error Value: The id
amp;#12513;amp;#12452;amp;#12531;amp;#12506;amp;#12540;amp;#12472;
 contains characters illegal in URLs.

Is there a fundamental reason (ie. Python objects can only be ASCII) or
is it simply bugs that need to be fixed?



As Paul indicated: the issue dates back to the times when there was only
ASCII in the URL world. Especially object IDs have to be ascii - well...Zope
came from US :-)

Andreas

pgpJMq7CsKKOG.pgp
Description: PGP signature
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )