Maybe "regain" might be a solution for you? 

http://regain.sourceforge.net/?lang=en.

Regards 
Markus


rhodebump wrote:
> 
> I posted this on the lucene list a week ago and haven't heard anything, so 
> please don't give me the cross-post slap;)
> 
> I am successfully using lucene in our application to index 12 different
> types of objects located in a database, and their relationships to each
> other to provide some nice search functionality for our website.  We are
> building lots of lucene queries programmatically to filter based upon
> categories, regions, zip codes, scoring, long/lats...
> 
> My problem is that there is content that is not in the database which we
> have a lot of... (about 3000+ pages) that we need to also include in the
> search results.  It's a whole lot of jsp's.
> 
> As I see this, I can either
> a) Migrate this application to nutch
> b) Write/Implement a web crawler to crawl our site and inject the crawl 
> results into
> our lucene index.
> 
> I am leaning towards option B, since I think it
> would only take me a couple of days of implement/write a simple crawler
> and 
> I wouldn't
> have to change much else.
> 
> Can anyone think of any points/counterpoints for using Nutch vs. writing a
> crawler to extend our already used lucene framework?
> 
> Thanks. 
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Implement-crawler-with-custom-lucene-VS--use-nutch--tf3157478.html#a8804698
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to