Re: [Nutch-general] Re: Crawling local files?

ogjunk-nutch Wed, 27 Jul 2005 15:56:02 -0700

If you choose to go with a Lucene-only approach, you could download the
free code that comes with the Lucene book ( http://lucenebook.com/ )
and look at the little framework for recursive traversal of file system
hierachies, parsing of various file types (Word, HTML, XML, PDF, etc.)
and indexing with Lucene.  It's small, simple, and extensible.


Otis


--- Giovanni Novelli <[EMAIL PROTECTED]> wrote:

> In my opinion you should be able to use directly Lucene; indeed nutch
> relies upon Lucene for indexing and retrieval. In your case you don't
> need to crawl as files are just local and static HTML; you need to
> index files and to be able to retrieve them through querying the
> index
> so Lucene should be what you need.
> 
> 
> -------------------------------------------------------
> SF.Net email is sponsored by: Discover Easy Linux Migration
> Strategies
> from IBM. Find simple to follow Roadmaps, straightforward articles,
> informative Webcasts and more! Get everything you need to get up to
> speed, fast. http://ads.osdn.com/?ad_idt77&alloc_id492&op=click
> _______________________________________________
> Nutch-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/nutch-general
>

Re: [Nutch-general] Re: Crawling local files?

Reply via email to