Re: Overriding mod_rewrite from another module

2011-01-03 Thread Joshua Marantz
OK I tried to find a more robust alternative but could not.  I was thinking
I could duplicate whatever mod_rewrite was doing to set the request filename
that appears to be complex and probably no less brittle.

I have another query on this.  In reality we do *not* want our rewritten
resources to be associated with a filename at all.  Apache should never look
for such things in the file system under ../htdocs -- they will not be
there.  We also do not need it to validate or authenticate on these static
resources.

In particular, we have found that there is some path through Apache that
imposes what looks like a file-system-based limitation on URL segments (e.g.
around 256 bytes).  This limitation is inconvenient and, as far as I can
tell, superfluous.  URL limits imposed by proxies and browsers are more like
2k bytes, which would allow us to encode more metadata in URLs (e.g.
sprites).  Is there some magic setting we could put into the request
structure to tell Apache not to interpret the request as being mapped from a
file, but just to pass it through to our handler?

Thanks!
-Josh

On Sat, Jan 1, 2011 at 6:24 AM, Ben Noordhuis i...@bnoordhuis.nl wrote:

 On Sat, Jan 1, 2011 at 00:16, Joshua Marantz jmara...@google.com wrote:
  Thanks for the quick response and the promising idea for a hack.  Looking
 at
  mod_rewrite.c this does indeed look a lot more surgical, if, perhaps,
  fragile, as mod_rewrite.c doesn't expose that string-constant in any
 formal
  interface (even as a #define in a .h).  Nevertheless the solution is
  easy-to-implement and easy-to-test, so...thanks!

 You're welcome, Joshua. :)

 You could try persuading a core committer to add this as a
 (semi-)official extension. Nick Kew reads this list, Paul Querna often
 idles in #node.js at freenode.net.

  I'm also still wondering if there's a good source of official
 documentation
  for the detailed semantics of interfaces like ap_hook_translate_name.
   Neither a Google Search, a  stackoverflow.com search, nor the Apache
  Modules
 http://www.amazon.com/Apache-Modules-Book-Application-Development/dp/0132409674/ref=sr_1_1?ie=UTF8qid=1293837117sr=8-1
 book
  offer much detail.
  code.google.com fares a little better but just points to 4 existing
 usages.

 This question comes up often. In my experience the online
 documentation is almost always outdated, incomplete or outright wrong.
 I don't bother looking things up, I go straight to the source.

 It's a kind of job security, I suppose. There are only a handful of
 people that truly and deeply understand Apache. We can ask any hourly
 rate we want!



Re: Overriding mod_rewrite from another module

2011-01-03 Thread Joshua Marantz
I have implemented Ben's hack in mod_pagespeed in
http://code.google.com/p/modpagespeed/source/detail?r=345 .  It works great.
 But I am concerned that a subtle change to mod_rewrite.c will break this
hack silently.  We would catch it in our regression tests, but the large
number of Apache users that have downloaded mod_pagespeed do not generally
run our regression tests.

I have another idea for a solution that I'd like to see opinions on.
Looking at Nick Kew's book, it seems like I could set request-filename to
whatever I wanted, return OK, but then also shunt off access_checker for my
rewritten resources.  The access checking on mod_pagespeed resources is
redundant, because the resource will either be served from cache (in which
case it had to be authenticated to get into the cache in the first place) or
will be decoded and the original resource(s) fetched from the same server
with full authentication.

I'd appreciate any comments on this approach.

-Josh

On Mon, Jan 3, 2011 at 11:40 AM, Joshua Marantz jmara...@google.com wrote:

 OK I tried to find a more robust alternative but could not.  I was thinking
 I could duplicate whatever mod_rewrite was doing to set the request filename
 that appears to be complex and probably no less brittle.

 I have another query on this.  In reality we do *not* want our rewritten
 resources to be associated with a filename at all.  Apache should never look
 for such things in the file system under ../htdocs -- they will not be
 there.  We also do not need it to validate or authenticate on these static
 resources.

 In particular, we have found that there is some path through Apache that
 imposes what looks like a file-system-based limitation on URL segments (e.g.
 around 256 bytes).  This limitation is inconvenient and, as far as I can
 tell, superfluous.  URL limits imposed by proxies and browsers are more like
 2k bytes, which would allow us to encode more metadata in URLs (e.g.
 sprites).  Is there some magic setting we could put into the request
 structure to tell Apache not to interpret the request as being mapped from a
 file, but just to pass it through to our handler?

 Thanks!
 -Josh

 On Sat, Jan 1, 2011 at 6:24 AM, Ben Noordhuis i...@bnoordhuis.nl wrote:

 On Sat, Jan 1, 2011 at 00:16, Joshua Marantz jmara...@google.com wrote:
  Thanks for the quick response and the promising idea for a hack.
  Looking at
  mod_rewrite.c this does indeed look a lot more surgical, if, perhaps,
  fragile, as mod_rewrite.c doesn't expose that string-constant in any
 formal
  interface (even as a #define in a .h).  Nevertheless the solution is
  easy-to-implement and easy-to-test, so...thanks!

 You're welcome, Joshua. :)

 You could try persuading a core committer to add this as a
 (semi-)official extension. Nick Kew reads this list, Paul Querna often
 idles in #node.js at freenode.net.

  I'm also still wondering if there's a good source of official
 documentation
  for the detailed semantics of interfaces like ap_hook_translate_name.
   Neither a Google Search, a  stackoverflow.com search, nor the Apache
  Modules
 http://www.amazon.com/Apache-Modules-Book-Application-Development/dp/0132409674/ref=sr_1_1?ie=UTF8qid=1293837117sr=8-1
 book
  offer much detail.
  code.google.com fares a little better but just points to 4 existing
 usages.

 This question comes up often. In my experience the online
 documentation is almost always outdated, incomplete or outright wrong.
 I don't bother looking things up, I go straight to the source.

 It's a kind of job security, I suppose. There are only a handful of
 people that truly and deeply understand Apache. We can ask any hourly
 rate we want!





Re: Overriding mod_rewrite from another module

2011-01-03 Thread Joshua Marantz
On Mon, Jan 3, 2011 at 4:50 PM, Ben Noordhuis i...@bnoordhuis.nl wrote:

   This means that returning OK from my handler does not prevent
  mod_authz_host's handler from being called.

 You're mistaken, Joshua. The access_checker hook by default is empty.
 mod_authz_host is a module and it can be disabled (if you're on a
 Debian/Ubuntu system, run `a2dismod authz_host` and reload Apache).


My perspective is that my team has implemented an Apache module that was
launched on Nov 3 2010.  Since its launch, we've encountered a variety of
compatibility reports with other modules, notably mod_rewrite.

My goal is not to remove authentication from the server; only from messing
with my module's rewritten resource.  The above statement is just observing
that, while it's possible to shunt off mod_rewrite by returning OK from an
upstream handler, the same is not true of mod_authz_host because it's
invoked with a different magic macro.

With respect to the URL length, I'm fairly sure it's nearly 8K (grep
 for HUGE_STRING_LEN in core_filters.c).


There may exist some buffer in Apache that's 8k.  But I have traced through
failing requests earlier that were more like 256 bytes.  This was reported
as mod_pagespeed Issue
9http://code.google.com/p/modpagespeed/issues/detail?id=9 and
resolved by limiting the number of css files that could be combined together
so that we did not exceed the pathname limitations.  I'm pretty sure it was
due to some built-in filter or core element in httpd trying to map the URL
to a filename (which is not necessary as far as mod_pagespeed is concerned)
and bumping into an OS path limitation (showing up as 403 Forbidden).

I confess I'm not entirely sure what you are trying to accomplish.
 You're serving up custom content and you're afraid mod_rewrite is
 going to munch the URL? Or is it more involved than that?


That's exactly right.  The simplest example is mod_pagespeed can infinitely
extend the cache lifetime of a js file, without compromising the site
owner's ability to propagate changes quickly, by putting an md5-hash of the
css content into the URL.

old: script src=scripts/hacks.js/script
new: script src=scripts/hacks.js*.pagespeed.ce.HASH.js*/script

If some mod_rewrite rule munges scripts/hacks.js.ce.pagespeed.HASH.js,
then mod_pagespeed will fail to serve it.

The issue is most simply stated in a Stack Overflow article:
http://stackoverflow.com/questions/4099659/mod-rewrite-mod-pagespeed-rewritecond

In this case, the user had hand-entered a mod_rewrite rule that broke
mod_pagespeed so it made sense for him to fix it there.  However, we have
heard reports of other cases where a user installs some content-generation
software that generate mod_rewrite rules that break mod_pagespeed.  Such
users may not even know what mod_rewrite is, so they can't easily work
around the broken rules.  This issue is reported as mod_pagespeed
Issue 63http://code.google.com/p/modpagespeed/issues/detail?id=63
.

Hope this clears things up.

I'm still interested in your opinion on my solution where I (inspired by
your hack) save the original URL in request-notes and then use *that* in my
resource handler in lieu of request-unparsed_uri.  This change is now
committed to svn trunk (but not released in a formal patch) as
http://code.google.com/p/modpagespeed/source/detail?r=348 .

-Josh


Re: Overriding mod_rewrite from another module

2011-01-03 Thread Eric Covener
 The access checking on mod_pagespeed resources is
 redundant, because the resource will either be served from cache (in which
 case it had to be authenticated to get into the cache in the first place) or
 will be decoded and the original resource(s) fetched from the same server
 with full authentication.

Re: suppressing mod_authz_host: This doesn't sound like it guards
against a user that meets the AAA conditions causing the resource to
be cached and served to users who would not have met the AAA
restrictions.  Maybe you are missing a map_to_storage callback to tell
the core that this thing will really, really not be served from the
filesystem.

Re: suppressing rewrite.  Your comments in the src imply that rewrite
is doing some of what you're also suppressing in
server/core.c:ap_core_translate_name().  Also, it's odd that your
scheme for suppressing mod_rewrite wasn't a no-op for rewrite in
htaccess context, since these use the RUN_ALL fixups hook to do its
magic, but maybe you're catching a break there?


Re: Overriding mod_rewrite from another module

2011-01-03 Thread Joshua Marantz
On Mon, Jan 3, 2011 at 6:15 PM, Eric Covener cove...@gmail.com wrote:

  The access checking on mod_pagespeed resources is
  redundant, because the resource will either be served from cache (in
 which
  case it had to be authenticated to get into the cache in the first
 place) or
  will be decoded and the original resource(s) fetched from the same
 server
  with full authentication.

 Re: suppressing mod_authz_host: This doesn't sound like it guards
 against a user that meets the AAA conditions causing the resource to
 be cached and served to users who would not have met the AAA
 restrictions.


This is a good point, but I think I'm covered.  mod_pagespeed will only
rewrite resources that are publicly cacheable.  What does AAA stand for?
 Authorization  Authentication in Apache or something?  In any case I've
abandoned, for the moment, the attempt to bypass mod_authz_host on a
per-request basis.


 Maybe you are missing a map_to_storage callback to tell
 the core that this thing will really, really not be served from the
 filesystem.


I was not aware of the concept of a map_to_storage callback at all.  I
will have to investigate.  This may be very helpful.  Thanks.


 Re: suppressing rewrite.  Your comments in the src imply that rewrite
 is doing some of what you're also suppressing in
 server/core.c:ap_core_translate_name().  Also, it's odd that your
 scheme for suppressing mod_rewrite wasn't a no-op for rewrite in
 htaccess context, since these use the RUN_ALL fixups hook to do its
 magic, but maybe you're catching a break there?


It's quite possible that the previous hack where we use the node
mod_rewrite_rewritten would break if mod_rewrite.c:hook_uri2file's
functional component could get called by mod_rewrite.c:hook_fixup, but I
haven't analyzed the module deeply enough to understand it at that level.

But I think the present hack, where we don't turn off mod_rewrite but just
ignore its output via our own request-note will be more robust.  At least I
hope it will.

In my testing 2 weeks ago I had trouble invoking mod_rewrite from .htaccess.
 I'll have to try again.

-Josh


Re: Overriding mod_rewrite from another module

2011-01-03 Thread Ben Noordhuis
On Mon, Jan 3, 2011 at 23:19, Joshua Marantz jmara...@google.com wrote:
 My goal is not to remove authentication from the server; only from messing
 with my module's rewritten resource.  The above statement is just observing
 that, while it's possible to shunt off mod_rewrite by returning OK from an
 upstream handler, the same is not true of mod_authz_host because it's
 invoked with a different magic macro.

My bad, I parsed your post as 'mod_authz_host is a core module and
cannot be removed' which is obviously false but not what you meant.

Yes, all auth_checker hooks are run. You can't prevent it but you can
catch the 403 on the rebound and complain loudly in the logs.
Actually, that's a lie. You can prevent it and that might also answer
this next bit...

 There may exist some buffer in Apache that's 8k.  But I have traced through
 failing requests earlier that were more like 256 bytes.  This was reported
 as mod_pagespeed Issue
 9http://code.google.com/p/modpagespeed/issues/detail?id=9 and
 resolved by limiting the number of css files that could be combined together
 so that we did not exceed the pathname limitations.  I'm pretty sure it was
 due to some built-in filter or core element in httpd trying to map the URL
 to a filename (which is not necessary as far as mod_pagespeed is concerned)
 and bumping into an OS path limitation (showing up as 403 Forbidden).

This might be the doing of core_map_to_storage(). Never run into it
myself (with URLs up to 4K, anyway) but there you go.

Okay, here is a dirty secret: if you hook map_to_storage and return
DONE, you bypass Apache's authentication stack - and nearly all other
hooks too. Probably an exceedingly bad idea.

You can however use it to prevent core_map_to_storage() from running.
Just return OK and you're set.

 I'm still interested in your opinion on my solution where I (inspired by
 your hack) save the original URL in request-notes and then use *that* in my
 resource handler in lieu of request-unparsed_uri.  This change is now
 committed to svn trunk (but not released in a formal patch) as
 http://code.google.com/p/modpagespeed/source/detail?r=348 .

Sounds fine, that's the kind of stuff request notes are for.


Overriding mod_rewrite from another module

2010-12-31 Thread Joshua Marantz
I need to find the best way to prevent mod_rewrite from renaming resources
that are generated by a different module, specifically mod_pagespeed.  This
needs to be done from within mod_pagespeed, rather than asking the site
admin to tweak his rule set.

By reading mod_rewrite.c, I found a mechanism that appears to work.  But it
has its own issues and I'm having trouble finding any relevant doc about the
mechanism:

ap_hook_translate_name(bypass_translators, APR_HOOK_FIRST -1);

bypass_translators returns OK for resources generated by the module,
preventing mod_rewrite from disturbing them.  It returns DECLINED for other
resources.

The trouble is that httpd seems to report error messages in the log for the
lack of a filename.  We can set the request-filename to something but that
causes the requests to fail completely on some servers.  We haven't isolated
the difference between servers that can handle the fake filename and ones
that can't yet.

Is there a better way to solve the original problem: preventing mod_rewrite
from corrupting mod_pagespeed's resources?

Or is there better doc on the semantics of the request.filename field in the
context of a resource that is not stored as a file?  Or on
ap_hook_translate_name?

sent from my android


Re: Overriding mod_rewrite from another module

2010-12-31 Thread Ben Noordhuis
On Fri, Dec 31, 2010 at 18:17, Joshua Marantz jmara...@google.com wrote:
 Is there a better way to solve the original problem: preventing mod_rewrite
 from corrupting mod_pagespeed's resources?

From memory and from a quick peek at mod_rewrite.c: in your
translate_name hook, set a mod_rewrite_rewritten note in r-notes
with value 0 and return DECLINED. That'll trick mod_rewrite into
thinking that it has already processed the request.


Re: Overriding mod_rewrite from another module

2010-12-31 Thread Joshua Marantz
Thanks for the quick response and the promising idea for a hack.  Looking at
mod_rewrite.c this does indeed look a lot more surgical, if, perhaps,
fragile, as mod_rewrite.c doesn't expose that string-constant in any formal
interface (even as a #define in a .h).  Nevertheless the solution is
easy-to-implement and easy-to-test, so...thanks!

I'm also still wondering if there's a good source of official documentation
for the detailed semantics of interfaces like ap_hook_translate_name.
 Neither a Google Search, a  stackoverflow.com search, nor the Apache
Moduleshttp://www.amazon.com/Apache-Modules-Book-Application-Development/dp/0132409674/ref=sr_1_1?ie=UTF8qid=1293837117sr=8-1book
offer much detail.
code.google.com fares a little better but just points to 4 existing usages.

-Josh

On Fri, Dec 31, 2010 at 1:50 PM, Ben Noordhuis i...@bnoordhuis.nl wrote:

 On Fri, Dec 31, 2010 at 18:17, Joshua Marantz jmara...@google.com wrote:
  Is there a better way to solve the original problem: preventing
 mod_rewrite
  from corrupting mod_pagespeed's resources?

 From memory and from a quick peek at mod_rewrite.c: in your
 translate_name hook, set a mod_rewrite_rewritten note in r-notes
 with value 0 and return DECLINED. That'll trick mod_rewrite into
 thinking that it has already processed the request.