Hi

I would suggest you to take recent nutch versions anyway. Not only has
indexer/web part changed but a lot of bugs and very handy cookies have been
implemented. One of such noticable improvement was replacement of many doc
parsers with 3rd party tika parser.

Another good improvement since old days was fetcher improvement.It works
much better and doesn't hang in some situations.

as for which version to choose there are 2 versions:

1.5.x and 2.0

2.0 version contains all stuff from 1.5.x but it uses "database" instead of
hdfs to keep data.

Both versions send crawled data to solr which provides indexing and
searching capabilities.

Unfortunately there is no easy way to migrate from 1.3 to newest version
and the easiest way will be to re-implement your custom plugins for these
versions.

Best Regards
Alexander Aristov


On 8 July 2012 20:10, Ye T Thet <[email protected]> wrote:

> Hi Folks,
>
> I am seeking recommendation whether I should use Pre Nutch 1.3 (without
> Solr) or New Nutch (2.x) with Solr integration for my research project.
>
> Little background information,
> I developed prototype for web search engine during my post grad days using
> Nutch as crawler, indexer and searcher. It was developed using < Nutch 1.3,
> meaning not using Solr as searcher.
>
> I am continuing my research after a year of on hold. I noticed a huge
> changes in Nutch such as using SOLR as indexer and searcher, 2.x has
> changed crawling implementation and etc.
>
> The requirements for my project is similar typical web search engine with
> lesser volume (less than 1 million pages for now). Additional requirements
> are
>
> 1. Language Identification, (used language ID plug-in in Nutch using ngram
> profile VS New Nutch used Tika for lang ID)
> 2. Custom lucene analyzer for the analysis (done in Nutch for Pre 1.3 VS
> done in SOLR)
>
> I would appreciate suggestions/comments on whether I should continue with
> pre 1.3 or new Nutch with SOLR.
>
> Thanks,
>
> Y T Thet
>

Reply via email to