New RewriteMap Help/Suggestions

2013-04-25 Thread Jim Riggs
I am in the process of preparing a patch to add a new RewriteMap type and could 
use some input from all of you on the best implementation. What I am creating 
is basically a clone of the txt map type, except that each line is a regexp 
followed by a replacement (with potential back-references).

You can just do those as RewriteRules...no need for a map, you say. True, 
except that I am looking at an automated, application-created list, and I don't 
really want the application to be writing out configuration files or .htaccess 
files. I would be much more comfortable with the app writing out a map file 
that has limited functionality and scope. Plus, it would be easy for the app to 
just create 'regexp replacement' lines at build time.

So, I have created a crude, working proof-of-concept of this. It basically 
copies all of the functionality of the txt maps, including the cache, but in 
the lookup_map_regexpfile() function, it compiles the regexp for each line, 
attempts a match, and returns the backref-substituted replacement. (This pair 
gets cached.) This works beautifully as is, but it is horribly inefficient to 
have to compile the REs every time we come in with a new key/URL. So, I was 
thinking of precompiling all of them and see three options:

1. Precompile and store all of the REs at config load time.
2. Compile and store all of the REs the first time we hit 
lookup_map_regexpfile() or when the map file is updated.
3. Compile and store each RE as we read through the map file in 
lookup_map_regexpfile() until a match is found and bail (full list will be 
built over time).

#1 is nice, because all of the work is done up front and will be fast from then 
on. The problem, though, is that I would like this map to reload/refresh if the 
map file gets changed like the other types do. #2 and #3 solve this. With #2 I 
worry about performance of compiling everything if the map file gets updated 
and we get a thundering herd. With #3 there is some coordination to manage with 
respect to which lines have been compiled and which ones haven't.

Does anyone have thoughts as to:

1. When/how should the map REs be compiled/precompiled? One of the options 
above or something else?
2. Where should the compiled REs be stored: in an existing pool or a new one?

Thanks for any input.

- Jim



Re: New RewriteMap Help/Suggestions

2013-04-25 Thread Yehuda Katz
On Thu, Apr 25, 2013 at 10:35 AM, Jim Riggs apache-li...@riggs.me wrote:

 So, I have created a crude, working proof-of-concept of this. It basically
 copies all of the functionality of the txt maps, including the cache, but
 in the lookup_map_regexpfile() function, it compiles the regexp for each
 line, attempts a match, and returns the backref-substituted replacement.
 (This pair gets cached.) This works beautifully as is, but it is horribly
 inefficient to have to compile the REs every time we come in with a new
 key/URL. So, I was thinking of precompiling all of them and see three
 options:

 1. Precompile and store all of the REs at config load time.

1a. Precompile and store all of the REs at config load time or when the map
file is updated.

 2. Compile and store all of the REs the first time we hit
 lookup_map_regexpfile() or when the map file is updated.
 3. Compile and store each RE as we read through the map file in
 lookup_map_regexpfile() until a match is found and bail (full list will be
 built over time).

 #1 is nice, because all of the work is done up front and will be fast from
 then on. The problem, though, is that I would like this map to
 reload/refresh if the map file gets changed like the other types do. #2 and
 #3 solve this. With #2 I worry about performance of compiling everything if
 the map file gets updated and we get a thundering herd. With #3 there is
 some coordination to manage with respect to which lines have been compiled
 and which ones haven't.

I think #3 is not a great idea for the same reason you mentioned.

I have actually seen the problem that you mention in #2 in a live
environment with a (poorly-designed) custom module. Each request tries to
clear the cached results and build them again, very quickly overloading the
server.

You could potentially use something like ap_hook_monitor to watch the file
for changes, paired with 1a (not sure how much load that might add). In my
regular apache module reference (Nick Kew's Apache Modules Book which I
keep on my office bookshelf) it is mentioned quickly (pages 67, 268, 337).

- Y