On Feb 14, 2010, at 12:04 AM, mdipierro wrote:

> We could create a switch in routes that allows to change the URL
> validation regex.
> I would take a patch. I would not change the default though.

We should revisit the logic (btw, is this something we could do on the dev list 
instead?)

            rewrite.filter_in()
            ...
            path = regex_space.sub('_', path)
            match = regex_url.match(path)
            if not match:
                raise HTTP(400,
            ...
            request.application = match.group('a') or 'init'
            request.controller = match.group('c') or 'default'
            request.function = match.group('f') or 'index'
            raw_extension = match.group('e')
            request.extension = raw_extension or 'html'
            request.args = \
                List((match.group('s') and match.group('s').split('/')) or [])
            ...
            parse_get_post_vars(request, environ)


where:

regex_space = re.compile('(\+|\s|%20)+')

# pattern to find valid paths in url /application/controller/...
#   this could be:
#     for static pages:
#        /<b:application>/static/<x:file>
#     for dynamic pages:
#        /<a:application>[/<c:controller>[/<f:function>[.<e:ext>][/<s:sub>]]]
#   application, controller, function and ext may only contain [a-zA-Z0-9_]
#   file and sub may also contain '-', '=', '.' and '/'
regex_url = re.compile(r'''
     (^                              # static pages
         /(?P<b> \w+)                # b=app
         /static                     # /b/static
         /(?P<x> (\w[\-\=\./]?)* )   # x=file
     $)
     |                               # dynamic pages
     (^(                             # (/a/c/f.e/s)
         /(?P<a> \w+ )               # /a=app
         (                           # (/c.f.e/s)
             /(?P<c> \w+ )           # /a/c=controller
             (                       # (/f.e/s)
                 /(?P<f> \w+ )       # /a/c/f=function
                 (                   # (.e)
                     \.(?P<e> \w+ )  # /a/c/f.e=extension
                 )?
                 (                   # (/s)
                     /(?P<s>         # /a/c/f.e/s=sub
                     ( [\...@][\=\./]? )+
                     )
                 )?
             )?
         )?
     )?
     /?$)    # trailing slash
     ''', re.X)

So *strings* of spaces (defined as an actual space, +, %20) are first converted 
to single underscores. And then we apply the URL test. (Note that vars are 
treated separately; they've already been moved to env.query_string).

We're interested in match group 's', which is a somewhat peculiar pattern. I 
see now that it could be written a little more clearly:

                     ( [...@-][=./]? )+

I'm really not sure what the intention is here, but this (and the spaces 
conversion) is were we would need to address alternatives to args parsing. 
Massimo? What's the intent?

> 
> On Feb 13, 4:48 pm, Jonathan Lundell <[email protected]> wrote:
>> On Feb 13, 2010, at 2:16 PM, DenesL wrote:
>> 
>>> 1) Is that the only reason?
>>> So there is no other objection to have args with accented chars in
>>> them for example?.
>>> Foreign languages would benefit from having args without such
>>> restrictions.
>> 
>>> By 'access filesystems' are you refering to 'static'?
>>> We could apply the restrictions only if the URL has c='static' or some
>>> other mechanism when it needs to access filesystems.
>> 
>>> 2) And what about the first question:
>>> Shouldn't URL create only URLs that are usable with web2py?
>> 
>> A lot of our systems are perfectly happy with spaces in filenames. It can be 
>> a PITA from the command line, but not from the GUI or programatically. Seems 
>> like an unnecessary restriction on web2py's part.
>> 
>> 
>> 
>>> On 13 feb, 15:37, mdipierro <[email protected]> wrote:
>>>> because args can be used to access filesystem and having spaces in
>>>> there causes trouble.
>> 
>>>> On Feb 13, 12:02 pm, DenesL <[email protected]> wrote:
>> 
>>>>> Shouldn't URL create only URLs that are usable with web2py?
>> 
>>>>> Example:
>>>>> u=URL(r=request, args='CapĂ­tulo 1')
>>>>> produces an URL ending with /Cap%A1tulo%201
>>>>> which generates an "Invalid request" when used.
>> 
>>>>> And why are these characters not allowed in URLs?
>> 
>>>>> Denes.


-- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/web2py?hl=en.

Reply via email to