How parallel is parallel in your case ? Don't forget Hadoop in distributed mode will serialize your jobs anyhow.
For the rest why don't you create two Nutch directories and run things totally independently 2010/3/8, Pravin Karne <pravin_ka...@persistent.co.in>: > Hi guys any pointer on following. > Your help will highly appreciated . > > Thanks > -Pravin > > -----Original Message----- > From: Pravin Karne > Sent: Friday, March 05, 2010 12:57 PM > To: nutch-user@lucene.apache.org > Subject: Two Nutch parallel crawl with two conf folder. > > Hi, > > I want to do two Nutch parallel crawl with two conf folder. > > I am using crawl command to do this. I have two separate conf folders, all > files from conf are same except crawl-urlfilter.txt . In this file we have > different filters(domain filters). > > e.g . 1 st conf have - > +.^http://([a-z0-9]*\.)*abc.com/ > > 2nd conf have - > +.^http://([a-z0-9]*\.)*xyz.com/ > > > I am starting two crawl with above configuration and on separate > console.(one followed by other) > > I am using following crawl commands - > > bin/nutch --nutch_conf_dir=/home/conf1 crawl urls -dir test1 -depth 1 > > bin/nutch --nutch_conf_dir=/home/conf2 crawl urls -dir test2 -depth 1 > > [Note: We have modified nutch.sh for '--nutch_conf_dir'] > > urls file have following entries- > > http://www.abc.com > http://www.xyz.com > http://www.pqr.com > > > Expected Result: > > CrawlDB test1 should contains abc.com's data and CrawlDB test2 should > contains xyz.com's data. > > Actual Results: > > url filter of first run is overridden by url filter of second run. > > So Both CrawlDB have xyz.com's data. > > > Please provide pointer regarding this. > > Thanks in advance. > > -Pravin > > > DISCLAIMER > ========== > This e-mail may contain privileged and confidential information which is the > property of Persistent Systems Ltd. It is intended only for the use of the > individual or entity to which it is addressed. If you are not the intended > recipient, you are not authorized to read, retain, copy, print, distribute > or use this message. If you have received this communication in error, > please notify the sender and delete all copies of this message. Persistent > Systems Ltd. does not accept any liability for virus infected mails. > -- -MilleBii-