Re: pre_translate_name hook vs location/directory walk (was: svn commit: r1879079)

2020-06-22 Thread Yann Ylavic
On Mon, Jun 22, 2020 at 2:57 PM Eric Covener  wrote:
>
> On Mon, Jun 22, 2020 at 8:28 AM Yann Ylavic  wrote:
> >
> > > Allow for URI-path pre_translate_name before (and/or instead of) decoding.
> > >
> > > Only if no hook takes "ownership" of the URI (returning OK), apply
> > > percent decoding for the rest of request handling. Otherwise r->uri 
> > > remains
> > > encoded meaning that further location/directory/file/if/.. sections 
> > > (walks)
> > > should that into account.
> >
> > This change allows a module to avoid URI decoding in
> > pre_translate_name, such that r->uri remains encoded for the whole
> > request handling.
> >
> > For instance (same series), mod_proxy will use that hook when
> > "ProxyMappingDecoded off" is configured, and return OK if a ProxyPass
> > matches. The decoding was already disabled (before this commit) for
> > the forward-proxy case (ProxyRequests on) since r->proxyreq was set at
> > an early stage, but now it "works" for the reverse-proxy case too.
> >
> > This means that when "ProxyMappingDecoded off" is configured (the
> > default is still "on", though), for instance a  > a fixup RewriteRule or a non-early RequestHeader rule will have to be
> > expressed with the encoded form since reserved characters will not be
> > decoded as before (note that unreserved characters will always be
> > decoded anyway).
> >
> > I don't know how much it can break existing configurations, and wonder
> > if we should have a core directive that still controls whether r->uri
> > should be decoded, regardless of ProxyMappingDecoded or any further
> > module pre_translate_name hook with its own directives.
> >
> > For mod_proxy this is not really an issue to have r->uri modified
> > after the (pre_)translate phase, because the handler will later work
> > on r->filename regardless (the "proxy:..." set by proxy_detect or
> > proxy_trans), so maybe we should simply always decode r->uri after the
> > pre_translate hooks have run. Though I find it easier in a proxy
> > context to match a URI-path (LocationMatch, RewriteRule..) in its
> > encoded form, and not worry about original vs decoded separators. That
> > may be just my preference, but a new core directive looks sane to me..
> >
> > Thoughts?
>
> I have not followed too closely so take with a grain of salt / for the
> sake of argument:

Thanks, always welcome!

>
> From an externals perspective (maybe being detached here is a benefit)
> it might be better to have something like "ProxyUseOriginalURL" that

Yeah, much better name. Will use *URI which is somewhat closer to the
RFC's URI-path nomenclature.

> has the following caveats:
> 1. it's partially decoded
> 2. it is required for whatever the servlet mapping option is (iiuc?)

Correct, I'm a bit reluctant to issue servlet mapping on decoded URIs..

> 3. it will affect the string used in comparisons/substitutions for
> , Rewrite, , (*CAUTION*)
>- Some details is needed here about the partial decoding, for
> example %2f or multiple leading slashes so "simple" context root
> configs can be written without too much paranoia
> 4. If there are risks of re-injecting URL's with [PT] or per-dir
> rewrites try to explain that it's a complex combination to worry
> about.

Possibly we are better off decoding r->uri in any case then, and not
change its current semantics.
As I said it won't affect the mod_proxy pre_trans case since its
translation already occurred and was saved in r->filename.
If we later want to let the user work with either the decoded or
encoded URI then we will add r->encoded_uri or alike, with some
opacity and helpers to keep it in sync with r->uri. But that would
affect module writers too, that's why I was thinking more of a "core"
directive than ProxyUseOriginalURI, or no directive at all with
trunk/2.next/3 material only.

>
> I would not worry too much about "existing configs" as this is all
> opt-in and it's probably a lost cause anyway.
>
> I would instead give people new built-in vars in the important places
> (rewrite, expr) to be able to match the partially decoded URI that the
> proxy will be working on so a savvy user can safely interoperate
> without monstrous regexes.

The hard part here (IMHO) is to keep encoded_uri and decoded_uri in
sync, depending on whichever is updated/accessed by a module or e.g. a
RewriteRule. It may be too much of an API/ABI change for the simple
proxy encoded URI case.

For now we can probably live with r->uri being always
normalized/decoded as before (and 

Re: pre_translate_name hook vs location/directory walk (was: svn commit: r1879079)

2020-06-22 Thread Eric Covener
On Mon, Jun 22, 2020 at 8:28 AM Yann Ylavic  wrote:
>
> > Allow for URI-path pre_translate_name before (and/or instead of) decoding.
> >
> > Only if no hook takes "ownership" of the URI (returning OK), apply
> > percent decoding for the rest of request handling. Otherwise r->uri remains
> > encoded meaning that further location/directory/file/if/.. sections (walks)
> > should that into account.
>
> This change allows a module to avoid URI decoding in
> pre_translate_name, such that r->uri remains encoded for the whole
> request handling.
>
> For instance (same series), mod_proxy will use that hook when
> "ProxyMappingDecoded off" is configured, and return OK if a ProxyPass
> matches. The decoding was already disabled (before this commit) for
> the forward-proxy case (ProxyRequests on) since r->proxyreq was set at
> an early stage, but now it "works" for the reverse-proxy case too.
>
> This means that when "ProxyMappingDecoded off" is configured (the
> default is still "on", though), for instance a  a fixup RewriteRule or a non-early RequestHeader rule will have to be
> expressed with the encoded form since reserved characters will not be
> decoded as before (note that unreserved characters will always be
> decoded anyway).
>
> I don't know how much it can break existing configurations, and wonder
> if we should have a core directive that still controls whether r->uri
> should be decoded, regardless of ProxyMappingDecoded or any further
> module pre_translate_name hook with its own directives.
>
> For mod_proxy this is not really an issue to have r->uri modified
> after the (pre_)translate phase, because the handler will later work
> on r->filename regardless (the "proxy:..." set by proxy_detect or
> proxy_trans), so maybe we should simply always decode r->uri after the
> pre_translate hooks have run. Though I find it easier in a proxy
> context to match a URI-path (LocationMatch, RewriteRule..) in its
> encoded form, and not worry about original vs decoded separators. That
> may be just my preference, but a new core directive looks sane to me..
>
> Thoughts?

I have not followed too closely so take with a grain of salt / for the
sake of argument:

>From an externals perspective (maybe being detached here is a benefit)
it might be better to have something like "ProxyUseOriginalURL" that
has the following caveats:
1. it's partially decoded
2. it is required for whatever the servlet mapping option is (iiuc?)
3. it will affect the string used in comparisons/substitutions for
, Rewrite, , (*CAUTION*)
   - Some details is needed here about the partial decoding, for
example %2f or multiple leading slashes so "simple" context root
configs can be written without too much paranoia
4. If there are risks of re-injecting URL's with [PT] or per-dir
rewrites try to explain that it's a complex combination to worry
about.

I would not worry too much about "existing configs" as this is all
opt-in and it's probably a lost cause anyway.

I would instead give people new built-in vars in the important places
(rewrite, expr) to be able to match the partially decoded URI that the
proxy will be working on so a savvy user can safely interoperate
without monstrous regexes.


Re: pre_translate_name hook vs location/directory walk (was: svn commit: r1879079)

2020-06-22 Thread Yann Ylavic
On Mon, Jun 22, 2020 at 2:28 PM Yann Ylavic  wrote:
>
> Also, since we are at it, I'm on the fence about running the
> pre_translate hooks before quick handlers (thus before
> ap_process_request_internal() too), good or bad idea?

Sorry, I meant running the *normalization* (including decoding
unreserved characters) before the quick handlers.


pre_translate_name hook vs location/directory walk (was: svn commit: r1879079)

2020-06-22 Thread Yann Ylavic
> Allow for URI-path pre_translate_name before (and/or instead of) decoding.
>
> Only if no hook takes "ownership" of the URI (returning OK), apply
> percent decoding for the rest of request handling. Otherwise r->uri remains
> encoded meaning that further location/directory/file/if/.. sections (walks)
> should that into account.

This change allows a module to avoid URI decoding in
pre_translate_name, such that r->uri remains encoded for the whole
request handling.

For instance (same series), mod_proxy will use that hook when
"ProxyMappingDecoded off" is configured, and return OK if a ProxyPass
matches. The decoding was already disabled (before this commit) for
the forward-proxy case (ProxyRequests on) since r->proxyreq was set at
an early stage, but now it "works" for the reverse-proxy case too.

This means that when "ProxyMappingDecoded off" is configured (the
default is still "on", though), for instance a uri
should be decoded, regardless of ProxyMappingDecoded or any further
module pre_translate_name hook with its own directives.

For mod_proxy this is not really an issue to have r->uri modified
after the (pre_)translate phase, because the handler will later work
on r->filename regardless (the "proxy:..." set by proxy_detect or
proxy_trans), so maybe we should simply always decode r->uri after the
pre_translate hooks have run. Though I find it easier in a proxy
context to match a URI-path (LocationMatch, RewriteRule..) in its
encoded form, and not worry about original vs decoded separators. That
may be just my preference, but a new core directive looks sane to me..

Thoughts?


Also, since we are at it, I'm on the fence about running the
pre_translate hooks before quick handlers (thus before
ap_process_request_internal() too), good or bad idea?


Regards;
Yann.