Hi users & devs,

As you probably know, there are currently two active lines of development for Nutch:

* Nutch trunk, a.k.a. Nutch 2.0: this is based on a completely redesigned storage layer that uses Apache Gora, which in turn can use various storage implementations such as HBase, Cassandra, and MySQL. This branch is still largely experimental and unstable, but work is progressing, and at the current pace I think a release should be possible within the next ~6 months. Another important addition on this branch is a REST API that allows using Nutch as a black-box crawling service.

* Nutch branch-1.3: this started as a snapshot of Nutch trunk just before merging with nutchbase (i.e. switching to Gora as a storage layer). This branch is still largely similar to the previous versions of Nutch, and uses Hadoop MapFile/SequenceFile and "segments". As compared with release 1.2 it does NOT ship with any search infrastructure, because all search functionality has been delegated to Solr (via SolrIndexer). This is BTW also true about Nutch trunk.

Regarding branch-1.2 (which is a maintenance branch after release 1.2) there have been pretty no updates there, if any. Nutch committer resources are very limited (when it comes to active committers), so I don't expect any maintenance release from this branch to happen...

I think that considering the relatively remote release date for Nutch 2.-0 it would make sense to roll out a 1.3 release based on branch-1.3, after making sure that all critical patches from trunk have been merged in there.

What do you think?

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to