Thanks for all the suggestions. I think am getting there :) Personally I think maintaining two versioning causes lot of confusion for newbies like me.
Perhaps as someone suggested earlier, just have a big 2.3 or even 3.x with all functionalities of 1.x and 2.x in one bundle and deprecate 1.x version altogether. That's how the rest of the open source libraries work don't that. Iqbal Shaikh ________________________________________ From: Ali Nazemian [[email protected]] Sent: 29 August 2014 15:15 To: [email protected] Subject: Re: Nutch Confusion Dear Iqbal, Hi, As far as I know, If you dont need Gora mapper for using Nutch over Hbase or MySQL or etc. , it is better to use version 1.x since some of Nutch functionality are not implemented on version 2.x and Nutch 1.x provides better performance for crawling web pages. ES is not difficult index-writer in Nutch 1.x so you should disable solr-index writer and enable ES by adding that to nutch-site.xml plugins part. Regards. On Fri, Aug 29, 2014 at 5:38 PM, Iqbal Shaikh <[email protected]> wrote: > Thanks Julien for the prompt response. > > Actually since the model for 1.9 version is all plugin based I shouldn't > be expecting an ivy.xml like in 2.x to have a elastic config. So ignore > that comment. > > Yes I mean HDFS (new to big data and Hadoop). Isn't HBase the default one > for 1.9 too ? > > Perhaps this article is a bit misleading > http://www.infoq.com/articles/nioche-apache-nutch2 based on your > clarification. Maybe there should be another follow on to that article. > > Thanks, > Iqbal Shaikh > ________________________________________ > From: Julien Nioche [[email protected]] > Sent: 29 August 2014 12:41 > To: [email protected] > Subject: Re: Nutch Confusion > > Hi Iqbal, > > Am doing a POC to help decide if we should be using Nutch 1.9 or 2.2.1 > > version. > > > > We would be indexing our crawled data in ElasticSearch 1.x version. > > > > I know the 2.2.1 version provides OTB support for Elastic 0.x version but > > to use 2.x I need to change the code (ElasticWriter.java) This means its > a > > customised Nutch installation, which I don't prefer. > > > > However even though 1.9 doesn't provide Elastic as default it does > support > > 1.x OTB which means no code change at all. And this is a big advantage. > > > > what do you mean by '1.9 doesn't provide ES by default'? > > > > > > I don't really need the flexibility provided by GORA as we're ok to use > > HBase. > > > do you mean HDFS? > > > > Also 2.x doesn't seem to have periodic commits compared to 1.9 > > > > Therefore I was wondering what others think as am not sure about the > > Roadmap going forward, are we going to cease 1.x at some point and > migrate > > the missing functionality to 2.x or we going to continue to have two > > parallel versions. > > > > more likely two parallel versions. 2.x is not making much progress. IMHO of > the two versions 1.x is not the one which is going to die first ;-) > > > > > > Any suggestion to help me make my decision please? > > > > See discussion on this list ( > http://www.mail-archive.com/[email protected]/msg12550.html). 1.x is > more robust, faster and more actively maintained. Since it sounds like you > don't have any need for any specific features from 2.x then I'd recommend > to use 1.x. > > HTH > > Julien > > > > > > > > Thanks, > > > > Iqbal Shaikh > > Transform is a trading division of Engine Partners UK LLP, a limited > > liability partnership registered in England & Wales with registered > number > > OC365812. > > Our registered office is at 60 Great Portland Street, London W1W 7RT, > > United Kingdom. > > A list of our members is open for inspection at our registered office. > > > > > -- > > Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > http://twitter.com/digitalpebble > > Transform is a trading division of Engine Partners UK LLP, a limited > liability partnership registered in England & Wales with registered number > OC365812. > Our registered office is at 60 Great Portland Street, London W1W 7RT, > United Kingdom. > A list of our members is open for inspection at our registered office. > -- A.Nazemian Transform is a trading division of Engine Partners UK LLP, a limited liability partnership registered in England & Wales with registered number OC365812. Our registered office is at 60 Great Portland Street, London W1W 7RT, United Kingdom. A list of our members is open for inspection at our registered office.

