Hello;

open nutch with eclipse; 

Run -> debug Configuration -> 'right click on' java application and choose new 

-- set the main class;
org.apache.nutch.crawl.Crawl

-- and the arguments:
urls -dir crawloutput -threads 5 -depth 3  -topN 50

then set your breakpoints and run the debug for this configuration 

Good Luck :)





________________________________
From: Marseld Dedgjonaj <[email protected]>
To: [email protected]
Sent: Sat, October 2, 2010 4:51:28 PM
Subject: Run crawl from java code

Hi,

I have configured nutch 1.2 in Eclipse project. 

I need to run crawl from java code to follow it with debug.



This is the script in linux that I execute for crawl.



.         bin/nutch inject /home/administrator/nutch/albanian_crawl/crawldb
my_urls

.         bin/nutch generate
/home/administrator/nutch/albanian_crawl/crawldb
/home/administrator/nutch/albanian_crawl/segments

.         segment=`ls -d
/home/administrator/nutch/albanian_crawl/segments/2* | tail -1`

.         bin/nutch fetch $segment

.         bin/nutch updatedb
/home/administrator/nutch/albanian_crawl/crawldb $segment

.         bin/nutch mergesegs
/home/administrator/nutch/albanian_crawl/segments
/home/administrator/nutch/albanian_crawl/segments/*

.         bin/nutch invertlinks
/home/administrator/nutch/albanian_crawl/linkdb
/home/administrator/nutch/albanian_crawl/segments/*

.         bin/nutch index /home/administrator/nutch/albanian_crawl/indexes
/home/administrator/nutch/albanian_crawl/crawldb
/home/administrator/nutch/albanian_crawl/linkdb
/home/administrator/nutch/albanian_crawl/segments/*

.         bin/nutch dedup /home/administrator/nutch/albanian_crawl/indexes



Can anybody help to translate it in java.





Thanks in advance ,

Marseld.


      

Reply via email to