Re: pre_translate_name hook vs location/directory walk (was: svn commit: r1879079)
On Mon, Jun 22, 2020 at 2:57 PM Eric Covener wrote: > > On Mon, Jun 22, 2020 at 8:28 AM Yann Ylavic wrote: > > > > > Allow for URI-path pre_translate_name before (and/or instead of) decoding. > > > > > > Only if no hook takes "ownership" of the URI (returning OK), apply > > > percent decoding for the rest of request handling. Otherwise r->uri > > > remains > > > encoded meaning that further location/directory/file/if/.. sections > > > (walks) > > > should that into account. > > > > This change allows a module to avoid URI decoding in > > pre_translate_name, such that r->uri remains encoded for the whole > > request handling. > > > > For instance (same series), mod_proxy will use that hook when > > "ProxyMappingDecoded off" is configured, and return OK if a ProxyPass > > matches. The decoding was already disabled (before this commit) for > > the forward-proxy case (ProxyRequests on) since r->proxyreq was set at > > an early stage, but now it "works" for the reverse-proxy case too. > > > > This means that when "ProxyMappingDecoded off" is configured (the > > default is still "on", though), for instance a > a fixup RewriteRule or a non-early RequestHeader rule will have to be > > expressed with the encoded form since reserved characters will not be > > decoded as before (note that unreserved characters will always be > > decoded anyway). > > > > I don't know how much it can break existing configurations, and wonder > > if we should have a core directive that still controls whether r->uri > > should be decoded, regardless of ProxyMappingDecoded or any further > > module pre_translate_name hook with its own directives. > > > > For mod_proxy this is not really an issue to have r->uri modified > > after the (pre_)translate phase, because the handler will later work > > on r->filename regardless (the "proxy:..." set by proxy_detect or > > proxy_trans), so maybe we should simply always decode r->uri after the > > pre_translate hooks have run. Though I find it easier in a proxy > > context to match a URI-path (LocationMatch, RewriteRule..) in its > > encoded form, and not worry about original vs decoded separators. That > > may be just my preference, but a new core directive looks sane to me.. > > > > Thoughts? > > I have not followed too closely so take with a grain of salt / for the > sake of argument: Thanks, always welcome! > > From an externals perspective (maybe being detached here is a benefit) > it might be better to have something like "ProxyUseOriginalURL" that Yeah, much better name. Will use *URI which is somewhat closer to the RFC's URI-path nomenclature. > has the following caveats: > 1. it's partially decoded > 2. it is required for whatever the servlet mapping option is (iiuc?) Correct, I'm a bit reluctant to issue servlet mapping on decoded URIs.. > 3. it will affect the string used in comparisons/substitutions for > , Rewrite, , (*CAUTION*) >- Some details is needed here about the partial decoding, for > example %2f or multiple leading slashes so "simple" context root > configs can be written without too much paranoia > 4. If there are risks of re-injecting URL's with [PT] or per-dir > rewrites try to explain that it's a complex combination to worry > about. Possibly we are better off decoding r->uri in any case then, and not change its current semantics. As I said it won't affect the mod_proxy pre_trans case since its translation already occurred and was saved in r->filename. If we later want to let the user work with either the decoded or encoded URI then we will add r->encoded_uri or alike, with some opacity and helpers to keep it in sync with r->uri. But that would affect module writers too, that's why I was thinking more of a "core" directive than ProxyUseOriginalURI, or no directive at all with trunk/2.next/3 material only. > > I would not worry too much about "existing configs" as this is all > opt-in and it's probably a lost cause anyway. > > I would instead give people new built-in vars in the important places > (rewrite, expr) to be able to match the partially decoded URI that the > proxy will be working on so a savvy user can safely interoperate > without monstrous regexes. The hard part here (IMHO) is to keep encoded_uri and decoded_uri in sync, depending on whichever is updated/accessed by a module or e.g. a RewriteRule. It may be too much of an API/ABI change for the simple proxy encoded URI case. For now we can probably live with r->uri being always normalized/decoded as before (and
Re: pre_translate_name hook vs location/directory walk (was: svn commit: r1879079)
On Mon, Jun 22, 2020 at 8:28 AM Yann Ylavic wrote: > > > Allow for URI-path pre_translate_name before (and/or instead of) decoding. > > > > Only if no hook takes "ownership" of the URI (returning OK), apply > > percent decoding for the rest of request handling. Otherwise r->uri remains > > encoded meaning that further location/directory/file/if/.. sections (walks) > > should that into account. > > This change allows a module to avoid URI decoding in > pre_translate_name, such that r->uri remains encoded for the whole > request handling. > > For instance (same series), mod_proxy will use that hook when > "ProxyMappingDecoded off" is configured, and return OK if a ProxyPass > matches. The decoding was already disabled (before this commit) for > the forward-proxy case (ProxyRequests on) since r->proxyreq was set at > an early stage, but now it "works" for the reverse-proxy case too. > > This means that when "ProxyMappingDecoded off" is configured (the > default is still "on", though), for instance a a fixup RewriteRule or a non-early RequestHeader rule will have to be > expressed with the encoded form since reserved characters will not be > decoded as before (note that unreserved characters will always be > decoded anyway). > > I don't know how much it can break existing configurations, and wonder > if we should have a core directive that still controls whether r->uri > should be decoded, regardless of ProxyMappingDecoded or any further > module pre_translate_name hook with its own directives. > > For mod_proxy this is not really an issue to have r->uri modified > after the (pre_)translate phase, because the handler will later work > on r->filename regardless (the "proxy:..." set by proxy_detect or > proxy_trans), so maybe we should simply always decode r->uri after the > pre_translate hooks have run. Though I find it easier in a proxy > context to match a URI-path (LocationMatch, RewriteRule..) in its > encoded form, and not worry about original vs decoded separators. That > may be just my preference, but a new core directive looks sane to me.. > > Thoughts? I have not followed too closely so take with a grain of salt / for the sake of argument: >From an externals perspective (maybe being detached here is a benefit) it might be better to have something like "ProxyUseOriginalURL" that has the following caveats: 1. it's partially decoded 2. it is required for whatever the servlet mapping option is (iiuc?) 3. it will affect the string used in comparisons/substitutions for , Rewrite, , (*CAUTION*) - Some details is needed here about the partial decoding, for example %2f or multiple leading slashes so "simple" context root configs can be written without too much paranoia 4. If there are risks of re-injecting URL's with [PT] or per-dir rewrites try to explain that it's a complex combination to worry about. I would not worry too much about "existing configs" as this is all opt-in and it's probably a lost cause anyway. I would instead give people new built-in vars in the important places (rewrite, expr) to be able to match the partially decoded URI that the proxy will be working on so a savvy user can safely interoperate without monstrous regexes.
Re: pre_translate_name hook vs location/directory walk (was: svn commit: r1879079)
On Mon, Jun 22, 2020 at 2:28 PM Yann Ylavic wrote: > > Also, since we are at it, I'm on the fence about running the > pre_translate hooks before quick handlers (thus before > ap_process_request_internal() too), good or bad idea? Sorry, I meant running the *normalization* (including decoding unreserved characters) before the quick handlers.
pre_translate_name hook vs location/directory walk (was: svn commit: r1879079)
> Allow for URI-path pre_translate_name before (and/or instead of) decoding. > > Only if no hook takes "ownership" of the URI (returning OK), apply > percent decoding for the rest of request handling. Otherwise r->uri remains > encoded meaning that further location/directory/file/if/.. sections (walks) > should that into account. This change allows a module to avoid URI decoding in pre_translate_name, such that r->uri remains encoded for the whole request handling. For instance (same series), mod_proxy will use that hook when "ProxyMappingDecoded off" is configured, and return OK if a ProxyPass matches. The decoding was already disabled (before this commit) for the forward-proxy case (ProxyRequests on) since r->proxyreq was set at an early stage, but now it "works" for the reverse-proxy case too. This means that when "ProxyMappingDecoded off" is configured (the default is still "on", though), for instance a uri should be decoded, regardless of ProxyMappingDecoded or any further module pre_translate_name hook with its own directives. For mod_proxy this is not really an issue to have r->uri modified after the (pre_)translate phase, because the handler will later work on r->filename regardless (the "proxy:..." set by proxy_detect or proxy_trans), so maybe we should simply always decode r->uri after the pre_translate hooks have run. Though I find it easier in a proxy context to match a URI-path (LocationMatch, RewriteRule..) in its encoded form, and not worry about original vs decoded separators. That may be just my preference, but a new core directive looks sane to me.. Thoughts? Also, since we are at it, I'm on the fence about running the pre_translate hooks before quick handlers (thus before ap_process_request_internal() too), good or bad idea? Regards; Yann.