Hello vitalie,

On Wed, 3 May 2006 23:28:22 +0300 GMT (04/05/2006, 03:28 +0700 GMT),
vitalie vrabie wrote:

>> The important question here is: are there rules what character an URL
>> can contain and what not?

vv> yes, there is rfc3986.

vv> and it doesn't seem to be obsoleted or updated by any other rfc,
vv> judging by http://www.rfc-editor.org/rfc-index2.html

I found this in RFC3986:

2.4.  When to Encode or Decode

   [...]

   When a URI is dereferenced, the components and subcomponents
   significant to the scheme-specific dereferencing process (if any)
   must be parsed and separated before the percent-encoded octets
   within those components can be safely decoded, as otherwise the
   data may be mistaken for component delimiters. The only exception
   is for percent-encoded octets corresponding to characters in the
   unreserved set, which can be decoded at any time. For For example,
   the octet corresponding to the tilde ("~") character is often
   encoded as "%7E" by older URI processing implementations; the "%7E"
   can be replaced by "~" without changing its interpretation.

I understand this means that not only the tilde but also umlauts are
allowed. Or am I reading this wrong?

-- 

Cheers,
Thomas.

Trees that grow in smoggy cities are needed to make carbon paper.
http://thomas.fernandez.hat-gar-keine-homepage.de/

Message reply created with The Bat! 3.80.03
under Windows XP 5.1 Build 2600 Service Pack 2


________________________________________________________
 Current beta is 3.80.03 | 'Using TBBETA' information:
http://www.silverstones.com/thebat/TBUDLInfo.html
IMPORTANT: To register as a Beta tester, use this link first -
http://www.ritlabs.com/en/partners/testers/

Reply via email to