Hi! Here are my steps of crawling. I started all hadoop daemins, inserted url file into dfs. then started to crawl. Here is part of crawl log.
060306 124851 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 124851 parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_m_568oxw/job.xml 060306 124851 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 124852 task_r_281ien 0.16666667% reduce > copy > 060306 124852 map 67% reduce 17% 060306 124853 task_r_281ien 0.16666667% reduce > copy > 060306 124853 task_m_568oxw Child starting 060306 124853 task_m_568oxw parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 124853 task_m_568oxw parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 124853 Server connection on port 50050 from 212.58.116.70: starting 060306 124853 task_m_568oxw Client connection to 0.0.0.0:50050: starting 060306 124853 task_m_568oxw parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 124854 task_m_568oxw parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 124854 task_m_568oxw parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_m_568oxw/job.xml 060306 124854 task_m_568oxw parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 124854 task_m_568oxw parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 124854 task_m_568oxw parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 124854 task_m_568oxw parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 124854 Server connection on port 9000 from 127.0.0.1: starting 060306 124854 task_m_568oxw Client connection to 127.0.0.1:9000: starting 060306 124854 task_r_281ien 0.16666667% reduce > copy > 060306 124854 459 Served block blk_-3727406626879829125 to /212.58.116.70 060306 124854 460 Served block blk_-5496623489076405734 to /212.58.116.70 060306 124854 Server connection on port 50050 from 212.58.116.70: starting 060306 124854 task_m_568oxw Client connection to 0.0.0.0:50050: starting 060306 124854 task_m_568oxw 0.99999994% /user/root/tmpdb/segments/20060306124638/parse_data/part-00000/data:0+61 060306 124854 Task task_m_568oxw is done. 060306 124854 Server connection on port 9000 from 127.0.0.1: exiting 060306 124854 Server connection on port 50050 from 212.58.116.70: exiting 060306 124854 Server connection on port 50050 from 212.58.116.70: exiting 060306 124855 Taskid 'task_m_568oxw' has finished successfully. 060306 124855 Task 'task_m_568oxw' has completed. 060306 124855 task_r_281ien 0.16666667% reduce > copy > 060306 124855 task_r_281ien Got 1 map output locations. 060306 124855 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 124855 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 124855 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 124855 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 124855 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 124855 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 124855 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 124855 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 124855 parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_r_281ien/job.xml 060306 124855 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 124856 map 100% reduce 17% 060306 124857 task_r_281ien Child starting 060306 124857 task_r_281ien parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 124858 task_r_281ien parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 124858 Server connection on port 50050 from 212.58.116.70: starting 060306 125101 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125101 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125101 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125101 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125101 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125101 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125101 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125101 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125101 parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_r_1iryja/job.xml 060306 125101 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125102 task_m_4uq2k2 done; removing files. 060306 125102 task_r_1iryja Child starting 060306 125103 task_r_1iryja parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125103 task_r_1iryja parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125103 Server connection on port 50050 from 212.58.116.70: starting 060306 125103 task_r_1iryja Client connection to 0.0.0.0:50050: starting 060306 125106 task_r_1iryja parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125106 task_r_1iryja parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125106 task_r_1iryja parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_r_1iryja/job.xml 060306 125106 task_r_1iryja parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125106 task_r_1iryja parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125106 task_r_1iryja parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125106 task_r_1iryja parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125106 task_r_1iryja Client connection to 127.0.0.1:9000: starting 060306 125106 Server connection on port 9000 from 127.0.0.1: starting 060306 125106 task_r_1iryja parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125106 task_m_60jy1g done; removing files. 060306 125106 task_r_1iryja parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125106 568 Received block blk_937594491748799698 from /212.58.116.70 060306 125107 569 Received block blk_-292066997504083183 from /212.58.116.70 060306 125107 Server connection on port 50050 from 212.58.116.70: starting 060306 125107 task_r_1iryja Client connection to 0.0.0.0:50050: starting 060306 125107 task_r_1iryja 1.0% reduce > reduce 060306 125108 Task task_r_1iryja is done. 060306 125108 Server connection on port 9000 from 127.0.0.1: exiting 060306 125108 Server connection on port 50050 from 212.58.116.70: exiting 060306 125108 Server connection on port 50050 from 212.58.116.70: exiting 060306 125109 Taskid 'task_r_1iryja' has finished successfully. 060306 125109 Task 'task_r_1iryja' has completed. 060306 125109 task_m_1vj7kz done; removing files. 060306 125109 map 100% reduce 100% 060306 125109 Job complete: job_hldwxh 060306 125109 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125109 parsing file:/usr/home/duche/nutch-nightly/conf/nutch-default.xml 060306 125109 parsing file:/usr/home/duche/nutch-nightly/conf/crawl-tool.xml 060306 125109 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125109 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125109 parsing file:/usr/home/duche/nutch-nightly/conf/nutch-site.xml 060306 125109 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125109 Client connection to 127.0.0.1:9001: starting 060306 125109 Server connection on port 9001 from 127.0.0.1: starting 060306 125109 Client connection to 127.0.0.1:9000: starting 060306 125109 Server connection on port 9000 from 127.0.0.1: starting 060306 125112 570 Received block blk_-6705899863806264848 from /212.58.116.70 060306 125112 task_m_6cj8z8 done; removing files. 060306 125112 571 Received block blk_-5306879891270215512 from /212.58.116.70 060306 125113 572 Received block blk_-3324229811791817900 from /212.58.116.70 060306 125113 573 Received block blk_7536925015414314323 from /212.58.116.70 060306 125113 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125113 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125113 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125113 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125113 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125113 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125113 574 Served block blk_-3324229811791817900 to /212.58.116.70 060306 125113 575 Served block blk_7536925015414314323 to /212.58.116.70 060306 125113 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125113 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125113 parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/jobTracker/job_kspap7.xml 060306 125113 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125114 576 Served block blk_-6705899863806264848 to /212.58.116.70 060306 125115 577 Served block blk_-5306879891270215512 to /212.58.116.70 060306 125115 Running job: job_kspap7 060306 125115 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125115 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125115 parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/jobTracker/job_kspap7.xml 060306 125115 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125116 Adding task 'task_m_3hg2gq' to tip tip_1x3j1l, for tracker 'tracker_42329' 060306 125116 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125116 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125116 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125116 578 Served block blk_-3324229811791817900 to /212.58.116.70 060306 125116 579 Served block blk_7536925015414314323 to /212.58.116.70 060306 125116 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125116 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125116 parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_m_3hg2gq/job.xml 060306 125116 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125116 map 0% reduce 0% 060306 125117 580 Served block blk_-6705899863806264848 to /212.58.116.70 060306 125117 581 Served block blk_-5306879891270215512 to /212.58.116.70 060306 125117 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125117 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125117 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125117 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125117 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125117 parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_m_3hg2gq/job.xml 060306 125117 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125118 Adding task 'task_r_2teqig' to tip tip_nzm5vg, for tracker 'tracker_42329' 060306 125118 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125118 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125118 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125118 582 Served block blk_-3324229811791817900 to /212.58.116.70 060306 125118 583 Served block blk_7536925015414314323 to /212.58.116.70 060306 125118 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125118 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125118 parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_r_2teqig/job.xml 060306 125118 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125119 task_m_3hg2gq Child starting 060306 125119 task_m_3hg2gq parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125119 task_m_3hg2gq parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125119 Server connection on port 50050 from 212.58.116.70: starting 060306 125119 task_m_3hg2gq Client connection to 0.0.0.0:50050: starting 060306 125119 584 Served block blk_-6705899863806264848 to /212.58.116.70 060306 125119 task_m_3hg2gq parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125119 task_m_3hg2gq parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125119 task_m_3hg2gq parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_m_3hg2gq/job.xml 060306 125119 task_m_3hg2gq parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125119 task_m_3hg2gq parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125120 task_m_3hg2gq parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125120 task_m_3hg2gq parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125120 Server connection on port 9000 from 127.0.0.1: starting 060306 125120 task_m_3hg2gq Client connection to 127.0.0.1:9000: starting 060306 125120 585 Served block blk_937594491748799698 to /212.58.116.70 060306 125120 586 Served block blk_-292066997504083183 to /212.58.116.70 060306 125120 Server connection on port 50050 from 212.58.116.70: starting 060306 125120 task_m_3hg2gq Client connection to 0.0.0.0:50050: starting 060306 125120 task_m_3hg2gq 1.0% /user/root/dedup-hash-6473753/part-00000:0+126 060306 125120 587 Served block blk_-5306879891270215512 to /212.58.116.70 060306 125120 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125120 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125120 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125120 task_r_2teqig 0.0% reduce > copy > 060306 125120 Task task_m_3hg2gq is done. 060306 125120 Server connection on port 9000 from 127.0.0.1: exiting 060306 125120 Server connection on port 50050 from 212.58.116.70: exiting 060306 125120 Server connection on port 50050 from 212.58.116.70: exiting 060306 125123 task_r_2teqig 0.0% reduce > copy > 060306 125123 Taskid 'task_m_3hg2gq' has finished successfully. 060306 125123 Task 'task_m_3hg2gq' has completed. 060306 125123 task_r_2teqig Got 1 map output locations. 060306 125123 task_r_2teqig 0.0% reduce > copy > 060306 125123 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125123 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125123 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125123 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125123 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125123 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125123 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125123 parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125123 parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_r_2teqig/job.xml 060306 125123 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125123 map 100% reduce 0% 060306 125125 task_r_2teqig Child starting 060306 125126 task_r_2teqig parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125126 task_r_2teqig parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125126 Server connection on port 50050 from 212.58.116.70: starting 060306 125126 task_r_2teqig Client connection to 0.0.0.0:50050: starting 060306 125126 task_r_2teqig parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125126 task_r_2teqig parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125126 task_r_2teqig parsing /usr/home/duche/nutch-nightly/hadoop/mapred/local/taskTracker/task_r_2teqig/job.xml 060306 125126 task_r_2teqig parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125126 Server connection on port 9000 from 127.0.0.1: starting 060306 125126 task_r_2teqig Client connection to 127.0.0.1:9000: starting 060306 125126 task_r_2teqig parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125126 task_r_2teqig parsing file:/usr/home/duche/nutch-nightly/conf/mapred-default.xml 060306 125126 task_r_2teqig parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125126 Server connection on port 50050 from 212.58.116.70: starting 060306 125126 task_r_2teqig Client connection to 0.0.0.0:50050: starting 060306 125126 task_r_2teqig 1.0% reduce > reduce 060306 125126 Task task_r_2teqig is done. 060306 125127 Server connection on port 9000 from 127.0.0.1: exiting 060306 125127 Server connection on port 50050 from 212.58.116.70: exiting 060306 125127 Server connection on port 50050 from 212.58.116.70: exiting 060306 125129 Taskid 'task_r_2teqig' has finished successfully. 060306 125129 Task 'task_r_2teqig' has completed. 060306 125129 task_m_3hg2gq done; removing files. 060306 125129 map 100% reduce 100% 060306 125129 Job complete: job_kspap7 060306 125130 Dedup: done 060306 125130 Adding /user/root/tmpdb/indexes/part-00000 060306 125130 588 Served block blk_-6249956399366891811 to /212.58.116.70 060306 125131 589 Served block blk_-4384461151725426132 to /212.58.116.70 060306 125131 590 Received block blk_3162385090057235567 from /212.58.116.70 060306 125131 591 Received block blk_3855280644798095426 from /212.58.116.70 060306 125131 crawl finished: tmpdb 060306 125132 Server connection on port 9000 from 127.0.0.1: exiting 060306 125132 Server connection on port 9001 from 127.0.0.1: exiting 060306 125132 Server connection on port 9000 from 127.0.0.1: exiting 060306 125132 Server connection on port 9000 from 127.0.0.1: exiting But...where is the database of crawled sites???? ./hadoop dfs -ls returns following results : 060306 125433 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-default.xml 060306 125434 parsing file:/usr/home/duche/nutch-nightly/conf/hadoop-site.xml 060306 125434 No FS indicated, using default:localhost:9000 060306 125434 Server connection on port 9000 from 127.0.0.1: starting 060306 125434 Client connection to 127.0.0.1:9000: starting Found 3 items /user/root/dfs <dir> /user/root/seeds <dir> /user/root/tmpdb <dir> 060306 125434 Server connection on port 9000 from 127.0.0.1: exiting but there is no /user/root/tmpdb folder!!! Anyway, if it exists, what must I type into nutch-site.conf to point to it? Thanks, Regards, Dima.
