Dear Lewis:
I have met the same problem. I compile in the your same way. But it
still cause the problem. The configuration of seeds and filters do work for
a local crawl, but failed in deploy mode. Please help me , thank you a lot.
The procedure is as following:
[Jiuling@crawler-3 deploy]$ bin/nutch crawl urls -dir crawls -depth 20 (*i
have also execute by "bin/hadoop jar apache-nutch-1.6-SNAPSHOT.job
org.apache.nutch.crawl.Crawl urls -dir crawls -depth 20"* )
Warning: $HADOOP_HOME is deprecated.
12/09/16 18:40:16 WARN crawl.Crawl: solrUrl is not set, indexing will be
skipped...
12/09/16 18:40:16 INFO crawl.Crawl: crawl started in: crawls
12/09/16 18:40:16 INFO crawl.Crawl: rootUrlDir = urls
12/09/16 18:40:16 INFO crawl.Crawl: threads = 10
12/09/16 18:40:16 INFO crawl.Crawl: depth = 20
12/09/16 18:40:16 INFO crawl.Crawl: solrUrl=null
12/09/16 18:40:16 INFO crawl.Injector: Injector: starting at 2012-09-16
18:40:16
12/09/16 18:40:16 INFO crawl.Injector: Injector: crawlDb: crawls/crawldb
12/09/16 18:40:16 INFO crawl.Injector: Injector: urlDir: urls
12/09/16 18:40:16 INFO crawl.Injector: Injector: Converting injected urls to
crawl db entries.
12/09/16 18:40:23 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
12/09/16 18:40:23 WARN snappy.LoadSnappy: Snappy native library not loaded
12/09/16 18:40:23 INFO mapred.FileInputFormat: Total input paths to process
: 1
12/09/16 18:40:23 INFO mapred.JobClient: Running job: job_201209161612_0047
12/09/16 18:40:24 INFO mapred.JobClient: map 0% reduce 0%
12/09/16 18:40:39 INFO mapred.JobClient: map 100% reduce 0%
12/09/16 18:40:51 INFO mapred.JobClient: map 100% reduce 50%
12/09/16 18:40:54 INFO mapred.JobClient: map 100% reduce 100%
12/09/16 18:40:59 INFO mapred.JobClient: Job complete: job_201209161612_0047
12/09/16 18:40:59 INFO mapred.JobClient: Counters: 30
12/09/16 18:40:59 INFO mapred.JobClient: Job Counters
12/09/16 18:40:59 INFO mapred.JobClient: Launched reduce tasks=2
12/09/16 18:40:59 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=16534
12/09/16 18:40:59 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
12/09/16 18:40:59 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
12/09/16 18:40:59 INFO mapred.JobClient: Launched map tasks=2
12/09/16 18:40:59 INFO mapred.JobClient: Data-local map tasks=2
12/09/16 18:40:59 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=20086
12/09/16 18:40:59 INFO mapred.JobClient: File Input Format Counters
12/09/16 18:40:59 INFO mapred.JobClient: Bytes Read=321
12/09/16 18:40:59 INFO mapred.JobClient: File Output Format Counters
12/09/16 18:40:59 INFO mapred.JobClient: Bytes Written=716
12/09/16 18:40:59 INFO mapred.JobClient: FileSystemCounters
12/09/16 18:40:59 INFO mapred.JobClient: FILE_BYTES_READ=502
12/09/16 18:40:59 INFO mapred.JobClient: HDFS_BYTES_READ=517
12/09/16 18:40:59 INFO mapred.JobClient: FILE_BYTES_WRITTEN=132358
12/09/16 18:40:59 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=716
12/09/16 18:40:59 INFO mapred.JobClient: Map-Reduce Framework
12/09/16 18:40:59 INFO mapred.JobClient: Map output materialized
bytes=514
12/09/16 18:40:59 INFO mapred.JobClient: Map input records=11
12/09/16 18:40:59 INFO mapred.JobClient: Reduce shuffle bytes=231
12/09/16 18:40:59 INFO mapred.JobClient: Spilled Records=18
12/09/16 18:40:59 INFO mapred.JobClient: Map output bytes=472
12/09/16 18:40:59 INFO mapred.JobClient: Total committed heap usage
(bytes)=358285312
12/09/16 18:40:59 INFO mapred.JobClient: CPU time spent (ms)=3070
12/09/16 18:40:59 INFO mapred.JobClient: Map input bytes=213
12/09/16 18:40:59 INFO mapred.JobClient: SPLIT_RAW_BYTES=196
12/09/16 18:40:59 INFO mapred.JobClient: Combine input records=0
12/09/16 18:40:59 INFO mapred.JobClient: Reduce input records=9
12/09/16 18:40:59 INFO mapred.JobClient: Reduce input groups=9
12/09/16 18:40:59 INFO mapred.JobClient: Combine output records=0
12/09/16 18:40:59 INFO mapred.JobClient: Physical memory (bytes)
snapshot=580689920
12/09/16 18:40:59 INFO mapred.JobClient: Reduce output records=9
12/09/16 18:40:59 INFO mapred.JobClient: Virtual memory (bytes)
snapshot=8829870080
12/09/16 18:40:59 INFO mapred.JobClient: Map output records=9
12/09/16 18:40:59 INFO crawl.Injector: Injector: Merging injected urls into
crawl db.
12/09/16 18:41:05 INFO mapred.FileInputFormat: Total input paths to process
: 4
12/09/16 18:41:06 INFO mapred.JobClient: Running job: job_201209161612_0048
12/09/16 18:41:07 INFO mapred.JobClient: map 0% reduce 0%
12/09/16 18:41:22 INFO mapred.JobClient: map 50% reduce 0%
12/09/16 18:41:28 INFO mapred.JobClient: map 100% reduce 0%
12/09/16 18:41:31 INFO mapred.JobClient: map 100% reduce 8%
12/09/16 18:41:37 INFO mapred.JobClient: map 100% reduce 58%
12/09/16 18:41:40 INFO mapred.JobClient: map 100% reduce 100%
12/09/16 18:41:45 INFO mapred.JobClient: Job complete: job_201209161612_0048
12/09/16 18:41:45 INFO mapred.JobClient: Counters: 30
12/09/16 18:41:45 INFO mapred.JobClient: Job Counters
12/09/16 18:41:45 INFO mapred.JobClient: Launched reduce tasks=2
12/09/16 18:41:45 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=26468
12/09/16 18:41:45 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
12/09/16 18:41:45 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
12/09/16 18:41:45 INFO mapred.JobClient: Launched map tasks=4
12/09/16 18:41:45 INFO mapred.JobClient: Data-local map tasks=4
12/09/16 18:41:45 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=26867
12/09/16 18:41:45 INFO mapred.JobClient: File Input Format Counters
12/09/16 18:41:45 INFO mapred.JobClient: Bytes Read=51222
12/09/16 18:41:45 INFO mapred.JobClient: File Output Format Counters
12/09/16 18:41:45 INFO mapred.JobClient: Bytes Written=51056
12/09/16 18:41:45 INFO mapred.JobClient: FileSystemCounters
12/09/16 18:41:45 INFO mapred.JobClient: FILE_BYTES_READ=46201
12/09/16 18:41:45 INFO mapred.JobClient: HDFS_BYTES_READ=51754
12/09/16 18:41:45 INFO mapred.JobClient: FILE_BYTES_WRITTEN=290892
12/09/16 18:41:45 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=51056
12/09/16 18:41:45 INFO mapred.JobClient: Map-Reduce Framework
12/09/16 18:41:45 INFO mapred.JobClient: Map output materialized
bytes=46237
12/09/16 18:41:45 INFO mapred.JobClient: Map input records=703
12/09/16 18:41:45 INFO mapred.JobClient: Reduce shuffle bytes=46010
12/09/16 18:41:45 INFO mapred.JobClient: Spilled Records=1406
12/09/16 18:41:45 INFO mapred.JobClient: Map output bytes=44774
12/09/16 18:41:45 INFO mapred.JobClient: Total committed heap usage
(bytes)=599851008
12/09/16 18:41:45 INFO mapred.JobClient: CPU time spent (ms)=2690
12/09/16 18:41:45 INFO mapred.JobClient: Map input bytes=50878
12/09/16 18:41:45 INFO mapred.JobClient: SPLIT_RAW_BYTES=532
12/09/16 18:41:45 INFO mapred.JobClient: Combine input records=0
12/09/16 18:41:45 INFO mapred.JobClient: Reduce input records=703
12/09/16 18:41:45 INFO mapred.JobClient: Reduce input groups=694
12/09/16 18:41:45 INFO mapred.JobClient: Combine output records=0
12/09/16 18:41:45 INFO mapred.JobClient: Physical memory (bytes)
snapshot=923774976
12/09/16 18:41:45 INFO mapred.JobClient: Reduce output records=694
12/09/16 18:41:45 INFO mapred.JobClient: Virtual memory (bytes)
snapshot=12767576064
12/09/16 18:41:45 INFO mapred.JobClient: Map output records=703
12/09/16 18:41:45 INFO crawl.Injector: Injector: finished at 2012-09-16
18:41:45, elapsed: 00:01:28
12/09/16 18:41:45 INFO crawl.Generator: Generator: starting at 2012-09-16
18:41:45
12/09/16 18:41:45 INFO crawl.Generator: Generator: Selecting best-scoring
urls due for fetch.
12/09/16 18:41:45 INFO crawl.Generator: Generator: filtering: true
12/09/16 18:41:45 INFO crawl.Generator: Generator: normalizing: true
12/09/16 18:41:51 INFO mapred.FileInputFormat: Total input paths to process
: 2
12/09/16 18:41:51 INFO mapred.JobClient: Running job: job_201209161612_0049
12/09/16 18:41:52 INFO mapred.JobClient: map 0% reduce 0%
12/09/16 18:42:07 INFO mapred.JobClient: map 100% reduce 0%
12/09/16 18:42:16 INFO mapred.JobClient: map 100% reduce 8%
12/09/16 18:42:19 INFO mapred.JobClient: map 100% reduce 66%
12/09/16 18:42:22 INFO mapred.JobClient: map 100% reduce 100%
12/09/16 18:42:27 INFO mapred.JobClient: Job complete: job_201209161612_0049
12/09/16 18:42:27 INFO mapred.JobClient: Counters: 29
12/09/16 18:42:27 INFO mapred.JobClient: Job Counters
12/09/16 18:42:27 INFO mapred.JobClient: Launched reduce tasks=2
12/09/16 18:42:27 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=17772
12/09/16 18:42:27 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
12/09/16 18:42:27 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
12/09/16 18:42:27 INFO mapred.JobClient: Launched map tasks=2
12/09/16 18:42:27 INFO mapred.JobClient: Data-local map tasks=2
12/09/16 18:42:27 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=20043
12/09/16 18:42:27 INFO mapred.JobClient: File Input Format Counters
12/09/16 18:42:27 INFO mapred.JobClient: Bytes Read=50506
12/09/16 18:42:27 INFO mapred.JobClient: File Output Format Counters
12/09/16 18:42:27 INFO mapred.JobClient: Bytes Written=0
12/09/16 18:42:27 INFO mapred.JobClient: FileSystemCounters
12/09/16 18:42:27 INFO mapred.JobClient: FILE_BYTES_READ=12
12/09/16 18:42:27 INFO mapred.JobClient: HDFS_BYTES_READ=50752
12/09/16 18:42:27 INFO mapred.JobClient: FILE_BYTES_WRITTEN=135098
12/09/16 18:42:27 INFO mapred.JobClient: Map-Reduce Framework
12/09/16 18:42:27 INFO mapred.JobClient: Map output materialized
bytes=24
12/09/16 18:42:27 INFO mapred.JobClient: Map input records=694
12/09/16 18:42:27 INFO mapred.JobClient: Reduce shuffle bytes=18
12/09/16 18:42:27 INFO mapred.JobClient: Spilled Records=0
12/09/16 18:42:27 INFO mapred.JobClient: Map output bytes=0
12/09/16 18:42:27 INFO mapred.JobClient: Total committed heap usage
(bytes)=369360896
12/09/16 18:42:27 INFO mapred.JobClient: CPU time spent (ms)=3330
12/09/16 18:42:27 INFO mapred.JobClient: Map input bytes=50334
12/09/16 18:42:27 INFO mapred.JobClient: SPLIT_RAW_BYTES=246
12/09/16 18:42:27 INFO mapred.JobClient: Combine input records=0
12/09/16 18:42:27 INFO mapred.JobClient: Reduce input records=0
12/09/16 18:42:27 INFO mapred.JobClient: Reduce input groups=0
12/09/16 18:42:27 INFO mapred.JobClient: Combine output records=0
12/09/16 18:42:27 INFO mapred.JobClient: Physical memory (bytes)
snapshot=582873088
12/09/16 18:42:27 INFO mapred.JobClient: Reduce output records=0
12/09/16 18:42:27 INFO mapred.JobClient: Virtual memory (bytes)
snapshot=8829927424
12/09/16 18:42:27 INFO mapred.JobClient: Map output records=0
12/09/16 18:42:27 WARN crawl.Generator: Generator: 0 records selected for
fetching, exiting ...
12/09/16 18:42:28 INFO crawl.Crawl: Stopping at depth=0 - no more URLs to
fetch.
12/09/16 18:42:28 WARN crawl.Crawl: No URLs to fetch - check your seed list
and URL filters.
12/09/16 18:42:28 INFO crawl.Crawl: crawl finished: crawls
--
View this message in context:
http://lucene.472066.n3.nabble.com/problem-running-Nutch-1-5-1-in-distributed-mode-simple-crawl-tp4008073p4008102.html
Sent from the Nutch - User mailing list archive at Nabble.com.