I'm not exactly new to Nutch, but haven't used it for a year or so.
I'm a bit out of touch with current "state of the art".

I see there is some HBase code in the form of some patches. I don't
know whether this is more than "proof of concept" stuff.

I also see that there is a 1.1 release candidate in the works.

however I can see no mention of HBase in the release candidate? Is it
there at all?

If I use Nutch I am going to have to develop several plugins of my own
and perhaps change the way that URLs are found for second and
subsequent crawls. I think that HBase would significantly help with
this.


References:
http://www.gossamer-threads.com/lists/lucene/general/99072 [VOTE]
Apache Nutch  1.1 Release Candidate #2
and
http://people.apache.org/~mattmann/apache-nutch-1.1/rc2/CHANGES-1.1.txt
and
https://issues.apache.org/jira/browse/NUTCH-650

Reply via email to