Thanks Lewis!
On 17 March 2014 02:07, Lewis John Mcgibbney <[email protected]>wrote: > Good Evening, > > The Apache Nutch PMC are pleased to announce the immediate release of > Apache Nutch v1.8. > > Apache Nutch is a highly extensible and scalable open source web crawler > software project. Stemming from Apache Lucene, the project has diversified > and now comprises two codebases, namely: Nutch 1.x: A well matured, > production ready crawler. 1.x enables fine grained configuration, relying > on Apache Hadoop data structures, which are great for batch processing. > Nutch 2.x: An emerging alternative taking direct inspiration from 1.x, but > which differs in one key area; storage is abstracted away from any specific > underlying data store by using Apache Gora for handling object to > persistent mappings. This means we can implement an extremely flexibile > model/stack for storing everything (fetch time, status, content, parsed > text, outlinks, inlinks, etc.) into a number of NoSQL storage solutions. > We advise all current users and developers of the 1.X series to upgrade to > this release. Although this release includes library upgrades to Crawler > Commons 0.3 and Apache Tika 1.4, it also provides over 30 bug fixes as well > as 18 improvements. Please see the list of > changes<http://www.apache.org/dist/nutch/1.8/CHANGES.txt>for a full > breakdown, or see the release > report <http://s.apache.org/oHY>. As usual in the 1.X series, this release > is made available both as source and binary. Additionally developers can > find Maven artifacts within Maven Central <http://search.maven.org/>. The > release is available here <http://www.apache.org/dyn/closer.cgi/nutch/>. > > Thank you > Lewis > (On behalf of the Nutch PMC) > > -- > *Lewis* > -- Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

