Hi users & devs,
As you probably know, there are currently two active lines of
development for Nutch:
* Nutch trunk, a.k.a. Nutch 2.0: this is based on a completely
redesigned storage layer that uses Apache Gora, which in turn can use
various storage implementations such as HBase, Cassandra, and MySQL.
This branch is still largely experimental and unstable, but work is
progressing, and at the current pace I think a release should be
possible within the next ~6 months. Another important addition on this
branch is a REST API that allows using Nutch as a black-box crawling
service.
* Nutch branch-1.3: this started as a snapshot of Nutch trunk just
before merging with nutchbase (i.e. switching to Gora as a storage
layer). This branch is still largely similar to the previous versions of
Nutch, and uses Hadoop MapFile/SequenceFile and "segments". As compared
with release 1.2 it does NOT ship with any search infrastructure,
because all search functionality has been delegated to Solr (via
SolrIndexer). This is BTW also true about Nutch trunk.
Regarding branch-1.2 (which is a maintenance branch after release 1.2)
there have been pretty no updates there, if any. Nutch committer
resources are very limited (when it comes to active committers), so I
don't expect any maintenance release from this branch to happen...
I think that considering the relatively remote release date for Nutch
2.-0 it would make sense to roll out a 1.3 release based on branch-1.3,
after making sure that all critical patches from trunk have been merged
in there.
What do you think?
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com