On Feb 14, 2010, at 12:04 AM, mdipierro wrote:
> We could create a switch in routes that allows to change the URL
> validation regex.
> I would take a patch. I would not change the default though.
We should revisit the logic (btw, is this something we could do on the dev list
instead?)
rewrite.filter_in()
...
path = regex_space.sub('_', path)
match = regex_url.match(path)
if not match:
raise HTTP(400,
...
request.application = match.group('a') or 'init'
request.controller = match.group('c') or 'default'
request.function = match.group('f') or 'index'
raw_extension = match.group('e')
request.extension = raw_extension or 'html'
request.args = \
List((match.group('s') and match.group('s').split('/')) or [])
...
parse_get_post_vars(request, environ)
where:
regex_space = re.compile('(\+|\s|%20)+')
# pattern to find valid paths in url /application/controller/...
# this could be:
# for static pages:
# /<b:application>/static/<x:file>
# for dynamic pages:
# /<a:application>[/<c:controller>[/<f:function>[.<e:ext>][/<s:sub>]]]
# application, controller, function and ext may only contain [a-zA-Z0-9_]
# file and sub may also contain '-', '=', '.' and '/'
regex_url = re.compile(r'''
(^ # static pages
/(?P<b> \w+) # b=app
/static # /b/static
/(?P<x> (\w[\-\=\./]?)* ) # x=file
$)
| # dynamic pages
(^( # (/a/c/f.e/s)
/(?P<a> \w+ ) # /a=app
( # (/c.f.e/s)
/(?P<c> \w+ ) # /a/c=controller
( # (/f.e/s)
/(?P<f> \w+ ) # /a/c/f=function
( # (.e)
\.(?P<e> \w+ ) # /a/c/f.e=extension
)?
( # (/s)
/(?P<s> # /a/c/f.e/s=sub
( [\...@][\=\./]? )+
)
)?
)?
)?
)?
/?$) # trailing slash
''', re.X)
So *strings* of spaces (defined as an actual space, +, %20) are first converted
to single underscores. And then we apply the URL test. (Note that vars are
treated separately; they've already been moved to env.query_string).
We're interested in match group 's', which is a somewhat peculiar pattern. I
see now that it could be written a little more clearly:
( [...@-][=./]? )+
I'm really not sure what the intention is here, but this (and the spaces
conversion) is were we would need to address alternatives to args parsing.
Massimo? What's the intent?
>
> On Feb 13, 4:48 pm, Jonathan Lundell <[email protected]> wrote:
>> On Feb 13, 2010, at 2:16 PM, DenesL wrote:
>>
>>> 1) Is that the only reason?
>>> So there is no other objection to have args with accented chars in
>>> them for example?.
>>> Foreign languages would benefit from having args without such
>>> restrictions.
>>
>>> By 'access filesystems' are you refering to 'static'?
>>> We could apply the restrictions only if the URL has c='static' or some
>>> other mechanism when it needs to access filesystems.
>>
>>> 2) And what about the first question:
>>> Shouldn't URL create only URLs that are usable with web2py?
>>
>> A lot of our systems are perfectly happy with spaces in filenames. It can be
>> a PITA from the command line, but not from the GUI or programatically. Seems
>> like an unnecessary restriction on web2py's part.
>>
>>
>>
>>> On 13 feb, 15:37, mdipierro <[email protected]> wrote:
>>>> because args can be used to access filesystem and having spaces in
>>>> there causes trouble.
>>
>>>> On Feb 13, 12:02 pm, DenesL <[email protected]> wrote:
>>
>>>>> Shouldn't URL create only URLs that are usable with web2py?
>>
>>>>> Example:
>>>>> u=URL(r=request, args='CapĂtulo 1')
>>>>> produces an URL ending with /Cap%A1tulo%201
>>>>> which generates an "Invalid request" when used.
>>
>>>>> And why are these characters not allowed in URLs?
>>
>>>>> Denes.
--
You received this message because you are subscribed to the Google Groups
"web2py-users" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/web2py?hl=en.