Re: Optimizing Nutch 2.2.1

Talat Uyarer Wed, 19 Mar 2014 12:27:18 -0700

imho you dont wait performance on psedo mode. Actually you should learn how
do hadoop run. I read Hadoop Definitive Guide, i recommend you for start
point
19 Mar 2014 20:48 tarihinde "BlackIce" <[email protected]> yazdı:


> Thank you,
>
> what are some good starting points to start tuning?
>
> thnx
>
>
> On Tue, Mar 18, 2014 at 8:20 PM, Talat Uyarer <[email protected]> wrote:
>
> > Hi,
> >
> > When you use Hadoop in pseudo mode, it create 2 map and 2 reduce. If you
> > want to speed up some job you should decrease your map and reduce count.
> > But optimization is very general concept. You should tune Nutch, Hdfs,
> > Jobtracker and Hbase settings.
> >
> > Good luck ;)
> >
> >
> > 2014-03-18 14:00 GMT+02:00 BlackIce <[email protected]>:
> >
> > > Hi,
> > >
> > > I'm Using Nutch 2.2.1, Hbase 0.90.6 in pseudo distributed mode , Hadoop
> > > 1.2.1, Java 8 Oracle, Intel I5 Quadcore, 16GB Ram
> > >
> > > Currently the Fetch cycle is limited by my Internet connection.
> > >
> > > Parse cycle uses an average of 10% per CPU core
> > >
> > > Updatedb cycle uses average 3% per CPU core
> > >
> > > Currently I'm only running Hbase in Speudo distributed, not Nutch.
> > >
> > > As the DB grows everything slows down significantly but as you can see
> > CPU
> > > resources are not used very much, heck during Update DB my web browsing
> > > creates higher utilization spikes than the updatedb process. I feel
> that
> > my
> > > hardware is very underutilized and adding more phisycal machines would
> > be a
> > > waste.
> > >
> > > What are the bottlenecks? how can I optimize them? should I run a
> cluster
> > > on 3 Virtual machines?
> > >
> > > Thank you for any help you can give!
> > >
> > >
> > > Ralf R. Kotowski
> > >
> >
> >
> >
> > --
> > Talat UYARER
> > Websitesi: http://talat.uyarer.com
> > Twitter: http://twitter.com/talatuyarer
> > Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
> >
>

Re: Optimizing Nutch 2.2.1

Reply via email to