Hi I would suggest you to take recent nutch versions anyway. Not only has indexer/web part changed but a lot of bugs and very handy cookies have been implemented. One of such noticable improvement was replacement of many doc parsers with 3rd party tika parser.
Another good improvement since old days was fetcher improvement.It works much better and doesn't hang in some situations. as for which version to choose there are 2 versions: 1.5.x and 2.0 2.0 version contains all stuff from 1.5.x but it uses "database" instead of hdfs to keep data. Both versions send crawled data to solr which provides indexing and searching capabilities. Unfortunately there is no easy way to migrate from 1.3 to newest version and the easiest way will be to re-implement your custom plugins for these versions. Best Regards Alexander Aristov On 8 July 2012 20:10, Ye T Thet <[email protected]> wrote: > Hi Folks, > > I am seeking recommendation whether I should use Pre Nutch 1.3 (without > Solr) or New Nutch (2.x) with Solr integration for my research project. > > Little background information, > I developed prototype for web search engine during my post grad days using > Nutch as crawler, indexer and searcher. It was developed using < Nutch 1.3, > meaning not using Solr as searcher. > > I am continuing my research after a year of on hold. I noticed a huge > changes in Nutch such as using SOLR as indexer and searcher, 2.x has > changed crawling implementation and etc. > > The requirements for my project is similar typical web search engine with > lesser volume (less than 1 million pages for now). Additional requirements > are > > 1. Language Identification, (used language ID plug-in in Nutch using ngram > profile VS New Nutch used Tika for lang ID) > 2. Custom lucene analyzer for the analysis (done in Nutch for Pre 1.3 VS > done in SOLR) > > I would appreciate suggestions/comments on whether I should continue with > pre 1.3 or new Nutch with SOLR. > > Thanks, > > Y T Thet >

