I meant that you could just do a http://external_url.com/y/z/ crawl . But yes, if you have pages from someone elses server locally, you will need to rewrite the BASE component of the URL in the search results.

For that you could probably just hack search.jsp (but dont tell anyone I told you to) to rewrite the URLs. go to the ~tomcat/webapps/ROOT and edit search.jsp -- you'll need to know some java to do that, but look for Hits and url, should be easy enough to work out where to put the string replace.

Winton



k, so you merge your other crawls into the same search dir, thats
understood thanks.

My other question is concerning when you do a search in nutch.  Right now,
it returns links to "file:///x/y/z/......./foo.html"  and i was wondering if
there was a simple way to change that link to be "
http://mysite.com/y/z/...../foo.html"; when nutch returns the data.  Seems
like you cant change it since its using the same link it used to crawl the
data.

Not without modifying the code. I dont think it respects <BASE> for
example, if you crawl it as File:///
Frankly if you can, just serve it thru DOCROOT - it will be less painful in
the end!

- Serving URL - You can change it if you know how to set up Tomcat.

How do i serve it thru DOCROOT?  is that in tomcat?  And also, wont nutch
still return links when i do a search in the form of:
file:///x/y/z......foo.html ?    Thats the part in nutch im trying to
change.  Thanks.

-Ryan

On Sat, Jul 5, 2008 at 10:23 PM, Winton Davies <[EMAIL PROTECTED]>
wrote:

 oh sorry I misunderstood the question - I think you can only serve from 1
 directory (aka Crawl by default). Of course you can create multiple
 instances that serve from different crawls, but then you'd have to deal with
 joining them together.

 You can definitely MERGE multiple crawl directories.

 W


Reply via email to