Hi,
There is no automated way to switch from file:/... to http://... but I imagine
you could easily change the JSP that handler search results display and add a
little JSP scriptlet that does url = url.replace("file:/....",
"http://mysite.com/") type of thing.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
----- Original Message ----
> From: ivrokv <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Sunday, May 4, 2008 12:24:12 AM
> Subject: Crawling local filesystem to provide search access from web
>
>
> Hello,
>
> I am using nutch-0.9 for indexing html files which are present on the same
> server ( Server1) local as nutch. Thus, I am using the protocol-file for
> fetching and subsequently indexing. This is working out just great.
>
> My problem is this:
>
> I place the html files in the public folder of my apache server ( Server1 ,
> same server used for crawling the local files) so that it can be accessed
> at http://mysite.com/page1.html
>
> When I run a search query on nutch jsp search page, the search results have
> a url which is a local filesystem path such as
> file:/home/htmlfiles/page1.html
>
>
> Is it possible to provide nutch with the local filesystem path in the urls
> folder for crawling and indexing files( a local filesystem path -
> /home/htmlfiles/page1.html) , But during query time from the nutch jsp,
> present to the search user the web url ( http://mysite.com/page1.html)
>
> Would this involve some kind of URL normalization in nutch?
>
>
> Ideally I would prefer to crawl the files from the localFS, than to have
> them crawled from the website root folder.I have noticed that crawling is
> much faster (since the files are local to nutch) than when I crawl from
> mysite.com, even though in both cases the files are on the same physical
> server.
>
> One obvious solution is to have nutch fetch the html pages from the
> mysite.com root folder and as a result the url will show up correctly as
> mysite.com/page.html when a search is performed on nutch. I have tried this
> and it works well, but the fetching speed is very slow and I would prefer to
> crawl the files using a file protocol which appears to much faster.
>
> Thank you for any advise and help.
>
> Regards
>
> taknev
>
>
>
>
>
> --
> View this message in context:
> http://www.nabble.com/Crawling-local-filesystem-to-provide-search-access-from-web-tp17040516p17040516.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>