Agreed thanks Lewis! Cheers, Chris
On Jun 8, 2012, at 1:22 AM, Julien Nioche wrote: > Thanks Lewis! > > On 7 June 2012 17:52, lewis john mcgibbney <[email protected]> wrote: > >> (apologies for cross posting...) >> >> Good Afternoon Everyone, >> >> The 1.5 release of Nutch is now available. This release includes >> several improvements including upgrades of several major components >> including Tika 1.1 and Hadoop 1.0.0, improvements to LinkRank and >> WebGraph elements as well as a number of new plugins covering >> blacklisting, filtering and parsing to name a few. Please see the list >> of changes >> >> http://www.apache.org/dist/nutch/CHANGES-1.5.txt >> >> made in this version for a full breakdown of the 50 odd improvements >> the release boasts. A full PMC release statement can be found below >> >> http://nutch.apache.org/#07+June+2012+-+Apache+Nutch+1.5+Released >> >> Apache Nutch is an open source web-search software project. Stemming >> from Apache Lucene, it now builds on Apache Solr adding web-specifics, >> such as a crawler, a link-graph database and parsing support handled >> by Apache Tika for HTML and and array other document formats. Nutch >> can run on a single machine, but gains a lot of its strength from >> running in a Hadoop cluster. The system can be enhanced (eg other >> document formats can be parsed) using a highly flexible, easily >> extensible and thoroughly maintained plugin infrastructure. >> >> Nutch is available in source and binary form (zip and tar.gz) from the >> following >> download page: http://www.apache.org/dyn/closer.cgi/nutch/ >> >> In the initial 48 hours, the release may not be available on all mirrors. >> When downloading from a mirror site, please remember to verify the >> downloads >> using signatures found on the Apache site: >> >> http://www.apache.org/dist/nutch/KEYS >> >> For more information on Apache Nutch, visit the project home page: >> http://nutch.apache.org >> >> Thank you very much >> >> Lewis John McGibbney (on behalf of the Apache Nutch community) >> > > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > http://twitter.com/digitalpebble ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

