Re: Indexing static html files

Ryan Smith Sun, 06 Jul 2008 09:33:44 -0700

Ok, so you merge your other crawls into the same search dir, thats
understood thanks.

My other question is concerning when you do a search in nutch.  Right now,
it returns links to "file:///x/y/z/......./foo.html"  and i was wondering if
there was a simple way to change that link to be "
http://mysite.com/y/z/...../foo.html"; when nutch returns the data.  Seems
like you cant change it since its using the same link it used to crawl the
data.

>Not without modifying the code. I dont think it respects <BASE> for
example, if you crawl it as File:///
>Frankly if you can, just serve it thru DOCROOT - it will be less painful in
the end!
>
>- Serving URL - You can change it if you know how to set up Tomcat.

How do i serve it thru DOCROOT?  is that in tomcat?  And also, wont nutch
still return links when i do a search in the form of:
file:///x/y/z......foo.html ?    Thats the part in nutch im trying to
change.  Thanks.

-Ryan

On Sat, Jul 5, 2008 at 10:23 PM, Winton Davies <[EMAIL PROTECTED]>
wrote:

> oh sorry I misunderstood the question - I think you can only serve from 1
> directory (aka Crawl by default). Of course you can create multiple
> instances that serve from different crawls, but then you'd have to deal with
> joining them together.
>
> You can definitely MERGE multiple crawl directories.
>
> W
>

Re: Indexing static html files

Reply via email to