Andrzej Bialecki wrote:
> [EMAIL PROTECTED] wrote:
>> What is the best way to accomplish this?
>>
>> One thing I was thinking was to index the staging site, then open up
>> CrawlDb and LinkDb (any others?), loop through them and write out a
>> new version of those files, changing the keys (URLs) along the way,
>> for instance from http://STAGING.example.com/foo/bar.html to
>> http://WWW.example.com/foo/bar.html
>>
>> Has anyone done this?  Does this sound realistic/doable?
>> Is there a better/faster/easier way?
>>   e.g. changing URLs immediately at fetch/parse/index time?
>>   e.g. changing URLs on the fly at search time when displaying results?
> 
> There is another option - when fetching configure nutch to use a URL
> rewriting proxy, which will rewrite on the fly your requests of
> www.example.com to staging.example.com, get the response, and return the
> content - the only thing to do then would be to rewrite absolute
> outlinks contained in the content, from staging to www - but this can be
> done in URLNormalizers.
> 

You could also let your reverse proxy do the rewriting using something
like http://apache.webthing.com/mod_proxy_html/. I have been using
something like that for rewriting massive amount of html in realtime for
AA purposes to hammer web applications to different url space.

--
 Sami Siren

Reply via email to