I'm not exactly new to Nutch, but haven't used it for a year or so. I'm a bit out of touch with current "state of the art".
I see there is some HBase code in the form of some patches. I don't know whether this is more than "proof of concept" stuff. I also see that there is a 1.1 release candidate in the works. however I can see no mention of HBase in the release candidate? Is it there at all? If I use Nutch I am going to have to develop several plugins of my own and perhaps change the way that URLs are found for second and subsequent crawls. I think that HBase would significantly help with this. References: http://www.gossamer-threads.com/lists/lucene/general/99072 [VOTE] Apache Nutch 1.1 Release Candidate #2 and http://people.apache.org/~mattmann/apache-nutch-1.1/rc2/CHANGES-1.1.txt and https://issues.apache.org/jira/browse/NUTCH-650

