Nevermind. With another hour's work, I solved it. For reference, here's
my set of rewrite rules:
RewriteEngine on
RewriteLog /var/www/popline/logs/rewrite.log
#Turn off rewritelog with level 0. 2 is useful/normal.
RewriteLogLevel 0
RewriteRule ^/docs$ /docs/index.html
RewriteRule ^/docs/$ /docs/index.html
RewriteRule ^/docs/index.* - [L] #If this
matches, don't do any rewriting
RewriteRule ^/404.shtml - [L] #If this
matches, don't do any rewriting, so error pages come up correctly
RewriteRule ^/docs/sitemap.* - [L] #If this
matches, don't do any rewriting. For Google sitemap program
#If the file doesn't exist, rewrite and ...
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^/docs/[0-9]{4}/([0-9]{6})\.html /docs/$1 [R,L]
# submit
RewriteRule ^/docs/[0-9]{4}/[0-9]{6}\.html - [L] #If this
matches, don't do any rewriting
#Note that in RewriteRule below, must use %3F for '?' after
'icswppro.dll'. '?' has special meaning in Rewrite substitutions.
RewriteRule ^/docs/([0-9]{6})$
http://db.jhuccp.org/ics-wpd/exec/icswppro.dll?BU=http://db.jhuccp.org/i
cs-wpd/exec/icswppro.dll&QF0=DocNo&QI0=$1&TN=Popline&AC=QBE_QUERY&MR=30\
%DL=1&&RL=1&&RF=LongRecordDisplay&DF=LongRecordDisplay
[P]
RewriteRule ^/docs/[0-9]{4}.* - [L] #If this
matches, don't do any rewriting
#NOTE: If you want to do a Google sitemap verify, the next line must
be commented out,
# so that Apache doesn't return a 200 code (forwarding it on to
db.jhuccp.org) for a non-existant page.
RewriteRule ^/.*$ http://db.jhuccp.org/ics-wpd/popweb/basic.html
[R,L]
Thanks, again, for being here if needed.
-Kevin
-----Original Message-----
From: Zembower, Kevin [mailto:[EMAIL PROTECTED]
Sent: Thursday, October 05, 2006 3:20 PM
To: [email protected]
Subject: [EMAIL PROTECTED] Help with rewrite for errors?
I have a number of documents in HTML files like this:
www.popline.org/docs/0784/045796.html
www.popline.org/docs/0429/209471.html
www.popline.org/docs/0003/690206.html
In most of these records, the link is broken (as it is in these three
examples). This is a result of old files still in Google.
However, in these three cases, the original document can be found by
removing the 4 digit directory and the '.html' thusly:
www.popline.org/docs/045796
www.popline.org/docs/209471
www.popline.org/docs/690206
Because of the nature of our system, these resolve correctly.
Can anyone help me with a set of RewriteRules that will, whenever a 404
error is generated, transform the URL as indicated and resubmit it?
Here are the current Rewrite rules in my system:
RewriteEngine on
RewriteLog /var/www/popline/logs/rewrite.log
#Turn off rewritelog with level 0. 2 is useful/normal.
RewriteLogLevel 0
RewriteRule ^/docs$ /docs/index.html
RewriteRule ^/docs/$ /docs/index.html
RewriteRule ^/docs/index.* - [L] #If this
matches, don't do any rewriting
RewriteRule ^/error/.* - [L] #If this
matches, don't do any rewriting, so error pages come up correctly
RewriteRule ^/404.shtml - [L] #If this
matches, don't do any rewriting, so error pages come up correctly
RewriteRule ^/docs/sitemap.* - [L] #If this
matches, don't do any rewriting. For Google sitemap program
RewriteRule ^/docs/[0-9]{4}/[0-9]{6}\.html - [L] #If this
matches, don't do any rewriting
#Note that in RewriteRule below, must use %3F for '?' after
'icswppro.dll'. '?' has special meaning in Rewrite substitutions.
RewriteRule ^/docs/([0-9]{6})$
http://db.jhuccp.org/ics-wpd/exec/icswppro.dll?BU=http://db.jhuccp.org/i
cs-wpd/exec/icswppro.dll&QF0=DocNo&QI0=$1&TN=Popline&AC=QBE_QUERY&MR=30\
%DL=1&&RL=1&&RF=LongRecordDisplay&DF=LongRecordDisplay
[P]
RewriteRule ^/docs/[0-9]{4}.* - [L] #If this
matches, don't do any rewriting
RewriteRule ^/.*$ http://db.jhuccp.org/ics-wpd/popweb/basic.html
[R,L]
Here's an example from the current rewrite log of a 404 generation:
10.253.200.90 - - [05/Oct/2006:15:08:01 --0400]
[www.popline.org/sid#8275268][rid#82e1570/initial] (2) init rewrite
engine with requested uri /docs/0784/045796.html
10.253.200.90 - - [05/Oct/2006:15:08:01 --0400]
[www.popline.org/sid#8275268][rid#82e1570/initial] (1) pass through
/docs/0784/045796.html
10.253.200.90 - - [05/Oct/2006:15:08:01 --0400]
[www.popline.org/sid#8275268][rid#82e2d30/initial/redir#1] (2) init
rewrite engine with requested uri /404.shtml
10.253.200.90 - - [05/Oct/2006:15:08:01 --0400]
[www.popline.org/sid#8275268][rid#82e2d30/initial/redir#1] (1) pass
through /404.shtml
Here's an earlier excerpt from the rewrite log, before I filtered out
the 'HTTP_NOT_FOUND' information:
10.253.200.90 - - [04/Oct/2006:11:55:43 --0400]
[www.popline.org/sid#8270170][rid#82e3760/initial] (2) init rewrite
engine with requested uri /docs/0211/772369.html
10.253.200.90 - - [04/Oct/2006:11:55:43 --0400]
[www.popline.org/sid#8270170][rid#82e3760/initial] (1) pass through
/docs/0211/772369.html
10.253.200.90 - - [04/Oct/2006:11:55:43 --0400]
[www.popline.org/sid#8270170][rid#82e5070/initial/redir#1] (2) init
rewrite engine with requested uri /error/HTTP_NOT_FOUND.html.var
10.253.200.90 - - [04/Oct/2006:11:55:43 --0400]
[www.popline.org/sid#8270170][rid#82e5070/initial/redir#1] (2) rewrite
/error/HTTP_NOT_FOUND.html.var ->
http://db.jhuccp.org/ics-wpd/popweb/basic.html
10.253.200.90 - - [04/Oct/2006:11:55:43 --0400]
[www.popline.org/sid#8270170][rid#82e5070/initial/redir#1] (2)
explicitly forcing redirect with
http://db.jhuccp.org/ics-wpd/popweb/basic.html
10.253.200.90 - - [04/Oct/2006:11:55:43 --0400]
[www.popline.org/sid#8270170][rid#82e5070/initial/redir#1] (1) escaping
http://db.jhuccp.org/ics-wpd/popweb/basic.html for redirect
10.253.200.90 - - [04/Oct/2006:11:55:43 --0400]
[www.popline.org/sid#8270170][rid#82e5070/initial/redir#1] (1) redirect
to http://db.jhuccp.org/ics-wpd/popweb/basic.html [REDIRECT/302]
My question is not so much how to transform the submitted URL into the
one without the directory and '.html'. Instead, I don't understand how
to detect the 404 condition and then invoke the rewrite rule.
Thanks in advance for all your help and suggestions.
-Kevin
Kevin Zembower
Internet Services Group manager
Center for Communication Programs
Bloomberg School of Public Health
Johns Hopkins University
111 Market Place, Suite 310
Baltimore, Maryland 21202
410-659-6139
---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server
Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: [EMAIL PROTECTED]
" from the digest: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: [EMAIL PROTECTED]
" from the digest: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]