Dear Lewis:
    I have met the same problem. I compile in the your same way. But it
still cause the problem.  The configuration of seeds and filters do work for
a local crawl, but failed  in deploy mode. Please help me , thank you a lot.

    The procedure is as following:
[Jiuling@crawler-3 deploy]$ bin/nutch crawl urls -dir crawls -depth 20 (*i
have also execute by "bin/hadoop jar apache-nutch-1.6-SNAPSHOT.job
org.apache.nutch.crawl.Crawl urls -dir crawls -depth 20"* )
Warning: $HADOOP_HOME is deprecated.

12/09/16 18:40:16 WARN crawl.Crawl: solrUrl is not set, indexing will be
skipped...
12/09/16 18:40:16 INFO crawl.Crawl: crawl started in: crawls
12/09/16 18:40:16 INFO crawl.Crawl: rootUrlDir = urls
12/09/16 18:40:16 INFO crawl.Crawl: threads = 10
12/09/16 18:40:16 INFO crawl.Crawl: depth = 20
12/09/16 18:40:16 INFO crawl.Crawl: solrUrl=null
12/09/16 18:40:16 INFO crawl.Injector: Injector: starting at 2012-09-16
18:40:16
12/09/16 18:40:16 INFO crawl.Injector: Injector: crawlDb: crawls/crawldb
12/09/16 18:40:16 INFO crawl.Injector: Injector: urlDir: urls
12/09/16 18:40:16 INFO crawl.Injector: Injector: Converting injected urls to
crawl db entries.
12/09/16 18:40:23 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
12/09/16 18:40:23 WARN snappy.LoadSnappy: Snappy native library not loaded
12/09/16 18:40:23 INFO mapred.FileInputFormat: Total input paths to process
: 1
12/09/16 18:40:23 INFO mapred.JobClient: Running job: job_201209161612_0047
12/09/16 18:40:24 INFO mapred.JobClient:  map 0% reduce 0%
12/09/16 18:40:39 INFO mapred.JobClient:  map 100% reduce 0%
12/09/16 18:40:51 INFO mapred.JobClient:  map 100% reduce 50%
12/09/16 18:40:54 INFO mapred.JobClient:  map 100% reduce 100%
12/09/16 18:40:59 INFO mapred.JobClient: Job complete: job_201209161612_0047
12/09/16 18:40:59 INFO mapred.JobClient: Counters: 30
12/09/16 18:40:59 INFO mapred.JobClient:   Job Counters 
12/09/16 18:40:59 INFO mapred.JobClient:     Launched reduce tasks=2
12/09/16 18:40:59 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=16534
12/09/16 18:40:59 INFO mapred.JobClient:     Total time spent by all reduces
waiting after reserving slots (ms)=0
12/09/16 18:40:59 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
12/09/16 18:40:59 INFO mapred.JobClient:     Launched map tasks=2
12/09/16 18:40:59 INFO mapred.JobClient:     Data-local map tasks=2
12/09/16 18:40:59 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=20086
12/09/16 18:40:59 INFO mapred.JobClient:   File Input Format Counters 
12/09/16 18:40:59 INFO mapred.JobClient:     Bytes Read=321
12/09/16 18:40:59 INFO mapred.JobClient:   File Output Format Counters 
12/09/16 18:40:59 INFO mapred.JobClient:     Bytes Written=716
12/09/16 18:40:59 INFO mapred.JobClient:   FileSystemCounters
12/09/16 18:40:59 INFO mapred.JobClient:     FILE_BYTES_READ=502
12/09/16 18:40:59 INFO mapred.JobClient:     HDFS_BYTES_READ=517
12/09/16 18:40:59 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=132358
12/09/16 18:40:59 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=716
12/09/16 18:40:59 INFO mapred.JobClient:   Map-Reduce Framework
12/09/16 18:40:59 INFO mapred.JobClient:     Map output materialized
bytes=514
12/09/16 18:40:59 INFO mapred.JobClient:     Map input records=11
12/09/16 18:40:59 INFO mapred.JobClient:     Reduce shuffle bytes=231
12/09/16 18:40:59 INFO mapred.JobClient:     Spilled Records=18
12/09/16 18:40:59 INFO mapred.JobClient:     Map output bytes=472
12/09/16 18:40:59 INFO mapred.JobClient:     Total committed heap usage
(bytes)=358285312
12/09/16 18:40:59 INFO mapred.JobClient:     CPU time spent (ms)=3070
12/09/16 18:40:59 INFO mapred.JobClient:     Map input bytes=213
12/09/16 18:40:59 INFO mapred.JobClient:     SPLIT_RAW_BYTES=196
12/09/16 18:40:59 INFO mapred.JobClient:     Combine input records=0
12/09/16 18:40:59 INFO mapred.JobClient:     Reduce input records=9
12/09/16 18:40:59 INFO mapred.JobClient:     Reduce input groups=9
12/09/16 18:40:59 INFO mapred.JobClient:     Combine output records=0
12/09/16 18:40:59 INFO mapred.JobClient:     Physical memory (bytes)
snapshot=580689920
12/09/16 18:40:59 INFO mapred.JobClient:     Reduce output records=9
12/09/16 18:40:59 INFO mapred.JobClient:     Virtual memory (bytes)
snapshot=8829870080
12/09/16 18:40:59 INFO mapred.JobClient:     Map output records=9
12/09/16 18:40:59 INFO crawl.Injector: Injector: Merging injected urls into
crawl db.
12/09/16 18:41:05 INFO mapred.FileInputFormat: Total input paths to process
: 4
12/09/16 18:41:06 INFO mapred.JobClient: Running job: job_201209161612_0048
12/09/16 18:41:07 INFO mapred.JobClient:  map 0% reduce 0%
12/09/16 18:41:22 INFO mapred.JobClient:  map 50% reduce 0%
12/09/16 18:41:28 INFO mapred.JobClient:  map 100% reduce 0%
12/09/16 18:41:31 INFO mapred.JobClient:  map 100% reduce 8%
12/09/16 18:41:37 INFO mapred.JobClient:  map 100% reduce 58%
12/09/16 18:41:40 INFO mapred.JobClient:  map 100% reduce 100%
12/09/16 18:41:45 INFO mapred.JobClient: Job complete: job_201209161612_0048
12/09/16 18:41:45 INFO mapred.JobClient: Counters: 30
12/09/16 18:41:45 INFO mapred.JobClient:   Job Counters 
12/09/16 18:41:45 INFO mapred.JobClient:     Launched reduce tasks=2
12/09/16 18:41:45 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=26468
12/09/16 18:41:45 INFO mapred.JobClient:     Total time spent by all reduces
waiting after reserving slots (ms)=0
12/09/16 18:41:45 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
12/09/16 18:41:45 INFO mapred.JobClient:     Launched map tasks=4
12/09/16 18:41:45 INFO mapred.JobClient:     Data-local map tasks=4
12/09/16 18:41:45 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=26867
12/09/16 18:41:45 INFO mapred.JobClient:   File Input Format Counters 
12/09/16 18:41:45 INFO mapred.JobClient:     Bytes Read=51222
12/09/16 18:41:45 INFO mapred.JobClient:   File Output Format Counters 
12/09/16 18:41:45 INFO mapred.JobClient:     Bytes Written=51056
12/09/16 18:41:45 INFO mapred.JobClient:   FileSystemCounters
12/09/16 18:41:45 INFO mapred.JobClient:     FILE_BYTES_READ=46201
12/09/16 18:41:45 INFO mapred.JobClient:     HDFS_BYTES_READ=51754
12/09/16 18:41:45 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=290892
12/09/16 18:41:45 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=51056
12/09/16 18:41:45 INFO mapred.JobClient:   Map-Reduce Framework
12/09/16 18:41:45 INFO mapred.JobClient:     Map output materialized
bytes=46237
12/09/16 18:41:45 INFO mapred.JobClient:     Map input records=703
12/09/16 18:41:45 INFO mapred.JobClient:     Reduce shuffle bytes=46010
12/09/16 18:41:45 INFO mapred.JobClient:     Spilled Records=1406
12/09/16 18:41:45 INFO mapred.JobClient:     Map output bytes=44774
12/09/16 18:41:45 INFO mapred.JobClient:     Total committed heap usage
(bytes)=599851008
12/09/16 18:41:45 INFO mapred.JobClient:     CPU time spent (ms)=2690
12/09/16 18:41:45 INFO mapred.JobClient:     Map input bytes=50878
12/09/16 18:41:45 INFO mapred.JobClient:     SPLIT_RAW_BYTES=532
12/09/16 18:41:45 INFO mapred.JobClient:     Combine input records=0
12/09/16 18:41:45 INFO mapred.JobClient:     Reduce input records=703
12/09/16 18:41:45 INFO mapred.JobClient:     Reduce input groups=694
12/09/16 18:41:45 INFO mapred.JobClient:     Combine output records=0
12/09/16 18:41:45 INFO mapred.JobClient:     Physical memory (bytes)
snapshot=923774976
12/09/16 18:41:45 INFO mapred.JobClient:     Reduce output records=694
12/09/16 18:41:45 INFO mapred.JobClient:     Virtual memory (bytes)
snapshot=12767576064
12/09/16 18:41:45 INFO mapred.JobClient:     Map output records=703
12/09/16 18:41:45 INFO crawl.Injector: Injector: finished at 2012-09-16
18:41:45, elapsed: 00:01:28
12/09/16 18:41:45 INFO crawl.Generator: Generator: starting at 2012-09-16
18:41:45
12/09/16 18:41:45 INFO crawl.Generator: Generator: Selecting best-scoring
urls due for fetch.
12/09/16 18:41:45 INFO crawl.Generator: Generator: filtering: true
12/09/16 18:41:45 INFO crawl.Generator: Generator: normalizing: true
12/09/16 18:41:51 INFO mapred.FileInputFormat: Total input paths to process
: 2
12/09/16 18:41:51 INFO mapred.JobClient: Running job: job_201209161612_0049
12/09/16 18:41:52 INFO mapred.JobClient:  map 0% reduce 0%
12/09/16 18:42:07 INFO mapred.JobClient:  map 100% reduce 0%
12/09/16 18:42:16 INFO mapred.JobClient:  map 100% reduce 8%
12/09/16 18:42:19 INFO mapred.JobClient:  map 100% reduce 66%
12/09/16 18:42:22 INFO mapred.JobClient:  map 100% reduce 100%
12/09/16 18:42:27 INFO mapred.JobClient: Job complete: job_201209161612_0049
12/09/16 18:42:27 INFO mapred.JobClient: Counters: 29
12/09/16 18:42:27 INFO mapred.JobClient:   Job Counters 
12/09/16 18:42:27 INFO mapred.JobClient:     Launched reduce tasks=2
12/09/16 18:42:27 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=17772
12/09/16 18:42:27 INFO mapred.JobClient:     Total time spent by all reduces
waiting after reserving slots (ms)=0
12/09/16 18:42:27 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
12/09/16 18:42:27 INFO mapred.JobClient:     Launched map tasks=2
12/09/16 18:42:27 INFO mapred.JobClient:     Data-local map tasks=2
12/09/16 18:42:27 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=20043
12/09/16 18:42:27 INFO mapred.JobClient:   File Input Format Counters 
12/09/16 18:42:27 INFO mapred.JobClient:     Bytes Read=50506
12/09/16 18:42:27 INFO mapred.JobClient:   File Output Format Counters 
12/09/16 18:42:27 INFO mapred.JobClient:     Bytes Written=0
12/09/16 18:42:27 INFO mapred.JobClient:   FileSystemCounters
12/09/16 18:42:27 INFO mapred.JobClient:     FILE_BYTES_READ=12
12/09/16 18:42:27 INFO mapred.JobClient:     HDFS_BYTES_READ=50752
12/09/16 18:42:27 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=135098
12/09/16 18:42:27 INFO mapred.JobClient:   Map-Reduce Framework
12/09/16 18:42:27 INFO mapred.JobClient:     Map output materialized
bytes=24
12/09/16 18:42:27 INFO mapred.JobClient:     Map input records=694
12/09/16 18:42:27 INFO mapred.JobClient:     Reduce shuffle bytes=18
12/09/16 18:42:27 INFO mapred.JobClient:     Spilled Records=0
12/09/16 18:42:27 INFO mapred.JobClient:     Map output bytes=0
12/09/16 18:42:27 INFO mapred.JobClient:     Total committed heap usage
(bytes)=369360896
12/09/16 18:42:27 INFO mapred.JobClient:     CPU time spent (ms)=3330
12/09/16 18:42:27 INFO mapred.JobClient:     Map input bytes=50334
12/09/16 18:42:27 INFO mapred.JobClient:     SPLIT_RAW_BYTES=246
12/09/16 18:42:27 INFO mapred.JobClient:     Combine input records=0
12/09/16 18:42:27 INFO mapred.JobClient:     Reduce input records=0
12/09/16 18:42:27 INFO mapred.JobClient:     Reduce input groups=0
12/09/16 18:42:27 INFO mapred.JobClient:     Combine output records=0
12/09/16 18:42:27 INFO mapred.JobClient:     Physical memory (bytes)
snapshot=582873088
12/09/16 18:42:27 INFO mapred.JobClient:     Reduce output records=0
12/09/16 18:42:27 INFO mapred.JobClient:     Virtual memory (bytes)
snapshot=8829927424
12/09/16 18:42:27 INFO mapred.JobClient:     Map output records=0
12/09/16 18:42:27 WARN crawl.Generator: Generator: 0 records selected for
fetching, exiting ...
12/09/16 18:42:28 INFO crawl.Crawl: Stopping at depth=0 - no more URLs to
fetch.
12/09/16 18:42:28 WARN crawl.Crawl: No URLs to fetch - check your seed list
and URL filters.
12/09/16 18:42:28 INFO crawl.Crawl: crawl finished: crawls




--
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-running-Nutch-1-5-1-in-distributed-mode-simple-crawl-tp4008073p4008102.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to