Re: nutch suitable for blogs?

Ken Krugler Thu, 13 Jul 2006 12:49:57 -0700

Hi Chris,

 Hi all.  First off, I'm using Nutch 0.72.


 I've been playing with nutch for a couple weeks now, and have some
questions relating to indexing blog sites.


[snip]

 Third...  just in general... it seems I've had to goof with nutch's config
enough to make this work in this way, that it makes me want to ask if using
nutch for this purpose is indeed the correct path.  I know Technorati just
directly uses lucene for a similar purpose.  Should that be the path I take
(HTMLParser to fecth and extract text, lucene setup with incremental
indexes)?

We've done something similar, in using Nutch to crawl coderepositories. My advice would be to continue down your current path,as there's quite a lot in Nutch besides just the fetching supportthat proves useful when processing and serving up web-based content.

Eventually you might decide to just use Lucene and various pieces ofNutch as a better solution, but until then I think it's probablyfaster to use Nutch as your starting point, and also if/when thattime comes, you'll have a much better understanding of how best toslice and dice.


-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"

Re: nutch suitable for blogs?

Reply via email to