Hi,

Issue created: https://issues.apache.org/jira/browse/NUTCH-2383.

Thanks,
Yossi.


-----Original Message-----
From: Sebastian Nagel [mailto:wastl.na...@googlemail.com] 
Sent: 02 May 2017 16:08
To: user@nutch.apache.org
Subject: Re: Wrong FS exception in Fetcher

Hi Yossi,

> that 1.13 requires Hadoop 2.7.2 specifically.

That's not a hard requirement. Usually you have to use the Hadoop version of 
your running Hadoop
cluster. Mostly this causes no problems, but if there are problems it's a good 
strategy to try
this first.

Thanks, for the detailed log. All steps are called the same way. The method
checkOutputSpecs(FileSystem, JobConf) is first called in the Fetcher.
It probably needs debugging to find out why here a local file system for the
output path is assumed.

Please, open an issue on
  https://issues.apache.org/jira/browse/NUTCH

Thanks,
Sebastian

On 05/02/2017 01:21 PM, Yossi Tamari wrote:
> Thanks Sebastian,
> 
> The output with set -x is below. I'm new to Nutch and was not aware that 1.13 
> requires Hadoop 2.7.2 specifically. While I see it now in pom.xml, it may be 
> a good idea to document it in the download page and provide a download link 
> (since the Hadoop releases page contains 2.7.3 but not 2.7.2). I will try to 
> install 2.7.2 and retest tomorrow.
> 
> root@crawler001:/data/apache-nutch-1.13/runtime/deploy/bin# ./crawl urls 
> crawl 2
> Injecting seed URLs
> /data/apache-nutch-1.13/runtime/deploy/bin/nutch inject crawl/crawldb urls
> + cygwin=false
> + case "`uname`" in
> ++ uname
> + THIS=/data/apache-nutch-1.13/runtime/deploy/bin/nutch
> + '[' -h /data/apache-nutch-1.13/runtime/deploy/bin/nutch ']'
> + '[' 3 = 0 ']'
> + COMMAND=inject
> + shift
> ++ dirname /data/apache-nutch-1.13/runtime/deploy/bin/nutch
> + THIS_DIR=/data/apache-nutch-1.13/runtime/deploy/bin
> ++ cd /data/apache-nutch-1.13/runtime/deploy/bin/..
> ++ pwd
> + NUTCH_HOME=/data/apache-nutch-1.13/runtime/deploy
> + '[' '' '!=' '' ']'
> + '[' /usr/lib/jvm/java-8-oracle/jre/ = '' ']'
> + local=true
> + '[' -f /data/apache-nutch-1.13/runtime/deploy/apache-nutch-1.13.job ']'
> + local=false
> + for f in '"$NUTCH_HOME"/*nutch*.job'
> + NUTCH_JOB=/data/apache-nutch-1.13/runtime/deploy/apache-nutch-1.13.job
> + false
> + JAVA=/usr/lib/jvm/java-8-oracle/jre//bin/java
> + JAVA_HEAP_MAX=-Xmx1000m
> + '[' '' '!=' '' ']'
> + CLASSPATH=/data/apache-nutch-1.13/runtime/deploy/conf
> + 
> CLASSPATH=/data/apache-nutch-1.13/runtime/deploy/conf:/usr/lib/jvm/java-8-oracle/jre//lib/tools.jar
> + IFS=
> + false
> + false
> + JAVA_LIBRARY_PATH=
> + '[' -d /data/apache-nutch-1.13/runtime/deploy/lib/native ']'
> + '[' false = true -a X '!=' X ']'
> + unset IFS
> + '[' '' = '' ']'
> + NUTCH_LOG_DIR=/data/apache-nutch-1.13/runtime/deploy/logs
> + '[' '' = '' ']'
> + NUTCH_LOGFILE=hadoop.log
> + false
> + NUTCH_OPTS=($NUTCH_OPTS -Dhadoop.log.dir="$NUTCH_LOG_DIR")
> + NUTCH_OPTS=("${NUTCH_OPTS[@]}" -Dhadoop.log.file="$NUTCH_LOGFILE")
> + '[' x '!=' x ']'
> + '[' inject = crawl ']'
> + '[' inject = inject ']'
> + CLASS=org.apache.nutch.crawl.Injector
> + EXEC_CALL=(hadoop jar "$NUTCH_JOB")
> + false
> ++ which hadoop
> ++ wc -l
> + '[' 1 -eq 0 ']'
> + exec hadoop jar 
> /data/apache-nutch-1.13/runtime/deploy/apache-nutch-1.13.job 
> org.apache.nutch.crawl.Injector crawl/crawldb urls
> 17/05/02 06:00:24 INFO crawl.Injector: Injector: starting at 2017-05-02 
> 06:00:24
> 17/05/02 06:00:24 INFO crawl.Injector: Injector: crawlDb: crawl/crawldb
> 17/05/02 06:00:24 INFO crawl.Injector: Injector: urlDir: urls
> 17/05/02 06:00:24 INFO crawl.Injector: Injector: Converting injected urls to 
> crawl db entries.
> 17/05/02 06:00:25 INFO Configuration.deprecation: session.id is deprecated. 
> Instead, use dfs.metrics.session-id
> 17/05/02 06:00:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with 
> processName=JobTracker, sessionId=
> 17/05/02 06:00:26 INFO input.FileInputFormat: Total input files to process : 1
> 17/05/02 06:00:26 INFO input.FileInputFormat: Total input files to process : 1
> 17/05/02 06:00:26 INFO mapreduce.JobSubmitter: number of splits:2
> 17/05/02 06:00:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
> job_local307378419_0001
> 17/05/02 06:00:26 INFO mapreduce.Job: The url to track the job: 
> http://localhost:8080/
> 17/05/02 06:00:26 INFO mapreduce.Job: Running job: job_local307378419_0001
> 17/05/02 06:00:26 INFO mapred.LocalJobRunner: OutputCommitter set in config 
> null
> 17/05/02 06:00:26 INFO output.FileOutputCommitter: File Output Committer 
> Algorithm version is 1
> 17/05/02 06:00:26 INFO output.FileOutputCommitter: FileOutputCommitter skip 
> cleanup _temporary folders under output directory:false, ignore cleanup 
> failures: false
> 17/05/02 06:00:26 INFO mapred.LocalJobRunner: OutputCommitter is 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
> 17/05/02 06:00:26 INFO mapred.LocalJobRunner: Waiting for map tasks
> 17/05/02 06:00:26 INFO mapred.LocalJobRunner: Starting task: 
> attempt_local307378419_0001_m_000000_0
> 17/05/02 06:00:26 INFO output.FileOutputCommitter: File Output Committer 
> Algorithm version is 1
> 17/05/02 06:00:26 INFO output.FileOutputCommitter: FileOutputCommitter skip 
> cleanup _temporary folders under output directory:false, ignore cleanup 
> failures: false
> 17/05/02 06:00:26 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
> 17/05/02 06:00:26 INFO mapred.MapTask: Processing split: 
> hdfs://localhost:9000/user/root/crawl/crawldb/current/part-r-00000/data:0+148
> 17/05/02 06:00:26 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
> 17/05/02 06:00:26 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
> 17/05/02 06:00:26 INFO mapred.MapTask: soft limit at 83886080
> 17/05/02 06:00:26 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
> 17/05/02 06:00:26 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
> 17/05/02 06:00:26 INFO mapred.MapTask: Map output collector class = 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer
> 17/05/02 06:00:26 INFO plugin.PluginRepository: Plugins: looking in: 
> /tmp/hadoop-unjar333276722181778867/classes/plugins
> 17/05/02 06:00:26 INFO plugin.PluginRepository: Plugin Auto-activation mode: 
> [true]
> 17/05/02 06:00:26 INFO plugin.PluginRepository: Registered Plugins:
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         Regex URL Filter 
> (urlfilter-regex)
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         Html Parse Plug-in 
> (parse-html)
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         HTTP Framework 
> (lib-http)
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         the nutch core 
> extension points (nutch-extensionpoints)
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         Basic Indexing Filter 
> (index-basic)
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         Anchor Indexing 
> Filter (index-anchor)
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         Tika Parser Plug-in 
> (parse-tika)
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         Basic URL Normalizer 
> (urlnormalizer-basic)
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         Regex URL Filter 
> Framework (lib-regex-filter)
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         Regex URL Normalizer 
> (urlnormalizer-regex)
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         CyberNeko HTML Parser 
> (lib-nekohtml)
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         OPIC Scoring Plug-in 
> (scoring-opic)
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         Pass-through URL 
> Normalizer (urlnormalizer-pass)
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         Http Protocol Plug-in 
> (protocol-http)
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         ElasticIndexWriter 
> (indexer-elastic)
> 17/05/02 06:00:26 INFO plugin.PluginRepository: Registered Extension-Points:
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         Nutch Content Parser 
> (org.apache.nutch.parse.Parser)
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         Nutch URL Filter 
> (org.apache.nutch.net.URLFilter)
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         HTML Parse Filter 
> (org.apache.nutch.parse.HtmlParseFilter)
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         Nutch Scoring 
> (org.apache.nutch.scoring.ScoringFilter)
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         Nutch URL Normalizer 
> (org.apache.nutch.net.URLNormalizer)
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         Nutch Publisher 
> (org.apache.nutch.publisher.NutchPublisher)
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         Nutch Protocol 
> (org.apache.nutch.protocol.Protocol)
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         Nutch URL Ignore 
> Exemption Filter (org.apache.nutch.net.URLExemptionFilter)
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         Nutch Index Writer 
> (org.apache.nutch.indexer.IndexWriter)
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         Nutch Segment Merge 
> Filter (org.apache.nutch.segment.SegmentMergeFilter)
> 17/05/02 06:00:26 INFO plugin.PluginRepository:         Nutch Indexing Filter 
> (org.apache.nutch.indexer.IndexingFilter)
> 17/05/02 06:00:26 INFO conf.Configuration: found resource regex-normalize.xml 
> at file:/tmp/hadoop-unjar333276722181778867/regex-normalize.xml
> 17/05/02 06:00:26 INFO conf.Configuration: found resource regex-urlfilter.txt 
> at file:/tmp/hadoop-unjar333276722181778867/regex-urlfilter.txt
> 17/05/02 06:00:26 INFO regex.RegexURLNormalizer: can't find rules for scope 
> 'inject', using default
> 17/05/02 06:00:26 INFO mapred.LocalJobRunner:
> 17/05/02 06:00:26 INFO mapred.MapTask: Starting flush of map output
> 17/05/02 06:00:26 INFO mapred.MapTask: Spilling map output
> 17/05/02 06:00:26 INFO mapred.MapTask: bufstart = 0; bufend = 54; bufvoid = 
> 104857600
> 17/05/02 06:00:26 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 
> 26214396(104857584); length = 1/6553600
> 17/05/02 06:00:26 INFO mapred.MapTask: Finished spill 0
> 17/05/02 06:00:26 INFO mapred.Task: 
> Task:attempt_local307378419_0001_m_000000_0 is done. And is in the process of 
> committing
> 17/05/02 06:00:26 INFO mapred.LocalJobRunner: map
> 17/05/02 06:00:26 INFO mapred.Task: Task 
> 'attempt_local307378419_0001_m_000000_0' done.
> 17/05/02 06:00:26 INFO mapred.LocalJobRunner: Finishing task: 
> attempt_local307378419_0001_m_000000_0
> 17/05/02 06:00:26 INFO mapred.LocalJobRunner: Starting task: 
> attempt_local307378419_0001_m_000001_0
> 17/05/02 06:00:26 INFO output.FileOutputCommitter: File Output Committer 
> Algorithm version is 1
> 17/05/02 06:00:26 INFO output.FileOutputCommitter: FileOutputCommitter skip 
> cleanup _temporary folders under output directory:false, ignore cleanup 
> failures: false
> 17/05/02 06:00:26 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
> 17/05/02 06:00:26 INFO mapred.MapTask: Processing split: 
> hdfs://localhost:9000/user/root/urls/seed.txt:0+24
> 17/05/02 06:00:26 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
> 17/05/02 06:00:26 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
> 17/05/02 06:00:26 INFO mapred.MapTask: soft limit at 83886080
> 17/05/02 06:00:26 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
> 17/05/02 06:00:26 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
> 17/05/02 06:00:26 INFO mapred.MapTask: Map output collector class = 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer
> 17/05/02 06:00:26 INFO conf.Configuration: found resource regex-normalize.xml 
> at file:/tmp/hadoop-unjar333276722181778867/regex-normalize.xml
> 17/05/02 06:00:26 INFO regex.RegexURLNormalizer: can't find rules for scope 
> 'inject', using default
> 17/05/02 06:00:26 INFO mapred.LocalJobRunner:
> 17/05/02 06:00:26 INFO mapred.MapTask: Starting flush of map output
> 17/05/02 06:00:26 INFO mapred.MapTask: Spilling map output
> 17/05/02 06:00:26 INFO mapred.MapTask: bufstart = 0; bufend = 54; bufvoid = 
> 104857600
> 17/05/02 06:00:26 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 
> 26214396(104857584); length = 1/6553600
> 17/05/02 06:00:26 INFO mapred.MapTask: Finished spill 0
> 17/05/02 06:00:26 INFO mapred.Task: 
> Task:attempt_local307378419_0001_m_000001_0 is done. And is in the process of 
> committing
> 17/05/02 06:00:26 INFO mapred.LocalJobRunner: 
> hdfs://localhost:9000/user/root/urls/seed.txt:0+24
> 17/05/02 06:00:26 INFO mapred.Task: Task 
> 'attempt_local307378419_0001_m_000001_0' done.
> 17/05/02 06:00:26 INFO mapred.LocalJobRunner: Finishing task: 
> attempt_local307378419_0001_m_000001_0
> 17/05/02 06:00:26 INFO mapred.LocalJobRunner: map task executor complete.
> 17/05/02 06:00:26 INFO mapred.LocalJobRunner: Waiting for reduce tasks
> 17/05/02 06:00:26 INFO mapred.LocalJobRunner: Starting task: 
> attempt_local307378419_0001_r_000000_0
> 17/05/02 06:00:26 INFO output.FileOutputCommitter: File Output Committer 
> Algorithm version is 1
> 17/05/02 06:00:26 INFO output.FileOutputCommitter: FileOutputCommitter skip 
> cleanup _temporary folders under output directory:false, ignore cleanup 
> failures: false
> 17/05/02 06:00:26 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
> 17/05/02 06:00:26 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle@504b0ec4
> 17/05/02 06:00:26 INFO reduce.MergeManagerImpl: MergerManager: 
> memoryLimit=334338464, maxSingleShuffleLimit=83584616, 
> mergeThreshold=220663392, ioSortFactor=10, memToMemMergeOutputsThreshold=10
> 17/05/02 06:00:26 INFO reduce.EventFetcher: 
> attempt_local307378419_0001_r_000000_0 Thread started: EventFetcher for 
> fetching Map Completion Events
> 17/05/02 06:00:26 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle 
> output of map attempt_local307378419_0001_m_000001_0 decomp: 58 len: 62 to 
> MEMORY
> 17/05/02 06:00:26 INFO reduce.InMemoryMapOutput: Read 58 bytes from 
> map-output for attempt_local307378419_0001_m_000001_0
> 17/05/02 06:00:26 INFO reduce.MergeManagerImpl: closeInMemoryFile -> 
> map-output of size: 58, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, 
> usedMemory ->58
> 17/05/02 06:00:26 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle 
> output of map attempt_local307378419_0001_m_000000_0 decomp: 58 len: 62 to 
> MEMORY
> 17/05/02 06:00:26 INFO reduce.InMemoryMapOutput: Read 58 bytes from 
> map-output for attempt_local307378419_0001_m_000000_0
> 17/05/02 06:00:26 INFO reduce.MergeManagerImpl: closeInMemoryFile -> 
> map-output of size: 58, inMemoryMapOutputs.size() -> 2, commitMemory -> 58, 
> usedMemory ->116
> 17/05/02 06:00:26 INFO reduce.EventFetcher: EventFetcher is interrupted.. 
> Returning
> 17/05/02 06:00:26 INFO mapred.LocalJobRunner: 2 / 2 copied.
> 17/05/02 06:00:26 INFO reduce.MergeManagerImpl: finalMerge called with 2 
> in-memory map-outputs and 0 on-disk map-outputs
> 17/05/02 06:00:26 INFO mapred.Merger: Merging 2 sorted segments
> 17/05/02 06:00:26 INFO mapred.Merger: Down to the last merge-pass, with 2 
> segments left of total size: 62 bytes
> 17/05/02 06:00:26 INFO reduce.MergeManagerImpl: Merged 2 segments, 116 bytes 
> to disk to satisfy reduce memory limit
> 17/05/02 06:00:26 INFO reduce.MergeManagerImpl: Merging 1 files, 118 bytes 
> from disk
> 17/05/02 06:00:26 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes 
> from memory into reduce
> 17/05/02 06:00:26 INFO mapred.Merger: Merging 1 sorted segments
> 17/05/02 06:00:26 INFO mapred.Merger: Down to the last merge-pass, with 1 
> segments left of total size: 87 bytes
> 17/05/02 06:00:26 INFO mapred.LocalJobRunner: 2 / 2 copied.
> 17/05/02 06:00:27 INFO zlib.ZlibFactory: Successfully loaded & initialized 
> native-zlib library
> 17/05/02 06:00:27 INFO compress.CodecPool: Got brand-new compressor [.deflate]
> 17/05/02 06:00:27 INFO Configuration.deprecation: mapred.skip.on is 
> deprecated. Instead, use mapreduce.job.skiprecords
> 17/05/02 06:00:27 INFO crawl.Injector: Injector: overwrite: false
> 17/05/02 06:00:27 INFO crawl.Injector: Injector: update: false
> 17/05/02 06:00:27 INFO mapreduce.Job: Job job_local307378419_0001 running in 
> uber mode : false
> 17/05/02 06:00:27 INFO mapreduce.Job:  map 100% reduce 0%
> 17/05/02 06:00:27 INFO mapred.Task: 
> Task:attempt_local307378419_0001_r_000000_0 is done. And is in the process of 
> committing
> 17/05/02 06:00:27 INFO mapred.LocalJobRunner: 2 / 2 copied.
> 17/05/02 06:00:27 INFO mapred.Task: Task 
> attempt_local307378419_0001_r_000000_0 is allowed to commit now
> 17/05/02 06:00:27 INFO output.FileOutputCommitter: Saved output of task 
> 'attempt_local307378419_0001_r_000000_0' to 
> hdfs://localhost:9000/user/root/crawl/crawldb/crawldb-921346783/_temporary/0/task_local307378419_0001_r_000000
> 17/05/02 06:00:27 INFO mapred.LocalJobRunner: reduce > reduce
> 17/05/02 06:00:27 INFO mapred.Task: Task 
> 'attempt_local307378419_0001_r_000000_0' done.
> 17/05/02 06:00:27 INFO mapred.LocalJobRunner: Finishing task: 
> attempt_local307378419_0001_r_000000_0
> 17/05/02 06:00:27 INFO mapred.LocalJobRunner: reduce task executor complete.
> 17/05/02 06:00:28 INFO mapreduce.Job:  map 100% reduce 100%
> 17/05/02 06:00:28 INFO mapreduce.Job: Job job_local307378419_0001 completed 
> successfully
> 17/05/02 06:00:28 INFO mapreduce.Job: Counters: 37
>         File System Counters
>                 FILE: Number of bytes read=652298479
>                 FILE: Number of bytes written=658557993
>                 FILE: Number of read operations=0
>                 FILE: Number of large read operations=0
>                 FILE: Number of write operations=0
>                 HDFS: Number of bytes read=492
>                 HDFS: Number of bytes written=365
>                 HDFS: Number of read operations=46
>                 HDFS: Number of large read operations=0
>                 HDFS: Number of write operations=13
>         Map-Reduce Framework
>                 Map input records=2
>                 Map output records=2
>                 Map output bytes=108
>                 Map output materialized bytes=124
>                 Input split bytes=570
>                 Combine input records=0
>                 Combine output records=0
>                 Reduce input groups=1
>                 Reduce shuffle bytes=124
>                 Reduce input records=2
>                 Reduce output records=1
>                 Spilled Records=4
>                 Shuffled Maps =2
>                 Failed Shuffles=0
>                 Merged Map outputs=2
>                 GC time elapsed (ms)=15
>                 Total committed heap usage (bytes)=1044381696
>         Shuffle Errors
>                 BAD_ID=0
>                 CONNECTION=0
>                 IO_ERROR=0
>                 WRONG_LENGTH=0
>                 WRONG_MAP=0
>                 WRONG_REDUCE=0
>         injector
>                 urls_injected=1
>                 urls_merged=1
>         File Input Format Counters
>                 Bytes Read=0
>         File Output Format Counters
>                 Bytes Written=365
> 17/05/02 06:00:28 INFO crawl.Injector: Injector: Total urls rejected by 
> filters: 0
> 17/05/02 06:00:28 INFO crawl.Injector: Injector: Total urls injected after 
> normalization and filtering: 1
> 17/05/02 06:00:28 INFO crawl.Injector: Injector: Total urls injected but 
> already in CrawlDb: 1
> 17/05/02 06:00:28 INFO crawl.Injector: Injector: Total new urls injected: 0
> 17/05/02 06:00:28 INFO crawl.Injector: Injector: finished at 2017-05-02 
> 06:00:28, elapsed: 00:00:04
> Tue May 2 06:00:28 CDT 2017 : Iteration 1 of 2
> Generating a new segment
> /data/apache-nutch-1.13/runtime/deploy/bin/nutch generate -D 
> mapreduce.job.reduces=2 -D mapred.child.java.opts=-Xmx1000m -D 
> mapreduce.reduce.speculative=false -D mapreduce.map.speculative=false -D 
> mapreduce.map.output.compress=true crawl/crawldb crawl/segments -topN 50000 
> -numFetchers 1 -noFilter
> + cygwin=false
> + case "`uname`" in
> ++ uname
> + THIS=/data/apache-nutch-1.13/runtime/deploy/bin/nutch
> + '[' -h /data/apache-nutch-1.13/runtime/deploy/bin/nutch ']'
> + '[' 18 = 0 ']'
> + COMMAND=generate
> + shift
> ++ dirname /data/apache-nutch-1.13/runtime/deploy/bin/nutch
> + THIS_DIR=/data/apache-nutch-1.13/runtime/deploy/bin
> ++ cd /data/apache-nutch-1.13/runtime/deploy/bin/..
> ++ pwd
> + NUTCH_HOME=/data/apache-nutch-1.13/runtime/deploy
> + '[' '' '!=' '' ']'
> + '[' /usr/lib/jvm/java-8-oracle/jre/ = '' ']'
> + local=true
> + '[' -f /data/apache-nutch-1.13/runtime/deploy/apache-nutch-1.13.job ']'
> + local=false
> + for f in '"$NUTCH_HOME"/*nutch*.job'
> + NUTCH_JOB=/data/apache-nutch-1.13/runtime/deploy/apache-nutch-1.13.job
> + false
> + JAVA=/usr/lib/jvm/java-8-oracle/jre//bin/java
> + JAVA_HEAP_MAX=-Xmx1000m
> + '[' '' '!=' '' ']'
> + CLASSPATH=/data/apache-nutch-1.13/runtime/deploy/conf
> + 
> CLASSPATH=/data/apache-nutch-1.13/runtime/deploy/conf:/usr/lib/jvm/java-8-oracle/jre//lib/tools.jar
> + IFS=
> + false
> + false
> + JAVA_LIBRARY_PATH=
> + '[' -d /data/apache-nutch-1.13/runtime/deploy/lib/native ']'
> + '[' false = true -a X '!=' X ']'
> + unset IFS
> + '[' '' = '' ']'
> + NUTCH_LOG_DIR=/data/apache-nutch-1.13/runtime/deploy/logs
> + '[' '' = '' ']'
> + NUTCH_LOGFILE=hadoop.log
> + false
> + NUTCH_OPTS=($NUTCH_OPTS -Dhadoop.log.dir="$NUTCH_LOG_DIR")
> + NUTCH_OPTS=("${NUTCH_OPTS[@]}" -Dhadoop.log.file="$NUTCH_LOGFILE")
> + '[' x '!=' x ']'
> + '[' generate = crawl ']'
> + '[' generate = inject ']'
> + '[' generate = generate ']'
> + CLASS=org.apache.nutch.crawl.Generator
> + EXEC_CALL=(hadoop jar "$NUTCH_JOB")
> + false
> ++ which hadoop
> ++ wc -l
> + '[' 1 -eq 0 ']'
> + exec hadoop jar 
> /data/apache-nutch-1.13/runtime/deploy/apache-nutch-1.13.job 
> org.apache.nutch.crawl.Generator -D mapreduce.job.reduces=2 -D 
> mapred.child.java.opts=-Xmx1000m -D mapreduce.reduce.speculative=false -D 
> mapreduce.map.speculative=false -D mapreduce.map.output.compress=true 
> crawl/crawldb crawl/segments -topN 50000 -numFetchers 1 -noFilter
> 17/05/02 06:00:32 INFO crawl.Generator: Generator: starting at 2017-05-02 
> 06:00:32
> 17/05/02 06:00:32 INFO crawl.Generator: Generator: Selecting best-scoring 
> urls due for fetch.
> 17/05/02 06:00:32 INFO crawl.Generator: Generator: filtering: false
> 17/05/02 06:00:32 INFO crawl.Generator: Generator: normalizing: true
> 17/05/02 06:00:32 INFO crawl.Generator: Generator: topN: 50000
> 17/05/02 06:00:32 INFO Configuration.deprecation: session.id is deprecated. 
> Instead, use dfs.metrics.session-id
> 17/05/02 06:00:32 INFO jvm.JvmMetrics: Initializing JVM Metrics with 
> processName=JobTracker, sessionId=
> 17/05/02 06:00:32 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with 
> processName=JobTracker, sessionId= - already initialized
> 17/05/02 06:00:33 INFO mapred.FileInputFormat: Total input files to process : 
> 1
> 17/05/02 06:00:33 INFO mapreduce.JobSubmitter: number of splits:1
> 17/05/02 06:00:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
> job_local1706016672_0001
> 17/05/02 06:00:33 INFO mapreduce.Job: The url to track the job: 
> http://localhost:8080/
> 17/05/02 06:00:33 INFO mapred.LocalJobRunner: OutputCommitter set in config 
> null
> 17/05/02 06:00:33 INFO mapreduce.Job: Running job: job_local1706016672_0001
> 17/05/02 06:00:33 INFO mapred.LocalJobRunner: OutputCommitter is 
> org.apache.hadoop.mapred.FileOutputCommitter
> 17/05/02 06:00:33 INFO output.FileOutputCommitter: File Output Committer 
> Algorithm version is 1
> 17/05/02 06:00:33 INFO output.FileOutputCommitter: FileOutputCommitter skip 
> cleanup _temporary folders under output directory:false, ignore cleanup 
> failures: false
> 17/05/02 06:00:34 INFO mapred.LocalJobRunner: Waiting for map tasks
> 17/05/02 06:00:34 INFO mapred.LocalJobRunner: Starting task: 
> attempt_local1706016672_0001_m_000000_0
> 17/05/02 06:00:34 INFO output.FileOutputCommitter: File Output Committer 
> Algorithm version is 1
> 17/05/02 06:00:34 INFO output.FileOutputCommitter: FileOutputCommitter skip 
> cleanup _temporary folders under output directory:false, ignore cleanup 
> failures: false
> 17/05/02 06:00:34 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
> 17/05/02 06:00:34 INFO mapred.MapTask: Processing split: 
> hdfs://localhost:9000/user/root/crawl/crawldb/current/part-r-00000/data:0+148
> 17/05/02 06:00:34 INFO mapred.MapTask: numReduceTasks: 2
> 17/05/02 06:00:34 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
> 17/05/02 06:00:34 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
> 17/05/02 06:00:34 INFO mapred.MapTask: soft limit at 83886080
> 17/05/02 06:00:34 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
> 17/05/02 06:00:34 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
> 17/05/02 06:00:34 INFO mapred.MapTask: Map output collector class = 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer
> 17/05/02 06:00:34 INFO plugin.PluginRepository: Plugins: looking in: 
> /tmp/hadoop-unjar7886623985863993949/classes/plugins
> 17/05/02 06:00:34 INFO plugin.PluginRepository: Plugin Auto-activation mode: 
> [true]
> 17/05/02 06:00:34 INFO plugin.PluginRepository: Registered Plugins:
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         Regex URL Filter 
> (urlfilter-regex)
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         Html Parse Plug-in 
> (parse-html)
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         HTTP Framework 
> (lib-http)
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         the nutch core 
> extension points (nutch-extensionpoints)
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         Basic Indexing Filter 
> (index-basic)
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         Anchor Indexing 
> Filter (index-anchor)
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         Tika Parser Plug-in 
> (parse-tika)
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         Basic URL Normalizer 
> (urlnormalizer-basic)
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         Regex URL Filter 
> Framework (lib-regex-filter)
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         Regex URL Normalizer 
> (urlnormalizer-regex)
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         CyberNeko HTML Parser 
> (lib-nekohtml)
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         OPIC Scoring Plug-in 
> (scoring-opic)
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         Pass-through URL 
> Normalizer (urlnormalizer-pass)
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         Http Protocol Plug-in 
> (protocol-http)
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         ElasticIndexWriter 
> (indexer-elastic)
> 17/05/02 06:00:34 INFO plugin.PluginRepository: Registered Extension-Points:
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         Nutch Content Parser 
> (org.apache.nutch.parse.Parser)
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         Nutch URL Filter 
> (org.apache.nutch.net.URLFilter)
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         HTML Parse Filter 
> (org.apache.nutch.parse.HtmlParseFilter)
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         Nutch Scoring 
> (org.apache.nutch.scoring.ScoringFilter)
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         Nutch URL Normalizer 
> (org.apache.nutch.net.URLNormalizer)
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         Nutch Publisher 
> (org.apache.nutch.publisher.NutchPublisher)
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         Nutch Protocol 
> (org.apache.nutch.protocol.Protocol)
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         Nutch URL Ignore 
> Exemption Filter (org.apache.nutch.net.URLExemptionFilter)
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         Nutch Index Writer 
> (org.apache.nutch.indexer.IndexWriter)
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         Nutch Segment Merge 
> Filter (org.apache.nutch.segment.SegmentMergeFilter)
> 17/05/02 06:00:34 INFO plugin.PluginRepository:         Nutch Indexing Filter 
> (org.apache.nutch.indexer.IndexingFilter)
> 17/05/02 06:00:34 INFO conf.Configuration: found resource regex-urlfilter.txt 
> at file:/tmp/hadoop-unjar7886623985863993949/regex-urlfilter.txt
> 17/05/02 06:00:34 INFO conf.Configuration: found resource regex-normalize.xml 
> at file:/tmp/hadoop-unjar7886623985863993949/regex-normalize.xml
> 17/05/02 06:00:34 INFO crawl.FetchScheduleFactory: Using FetchSchedule impl: 
> org.apache.nutch.crawl.DefaultFetchSchedule
> 17/05/02 06:00:34 INFO crawl.AbstractFetchSchedule: defaultInterval=2592000
> 17/05/02 06:00:34 INFO crawl.AbstractFetchSchedule: maxInterval=7776000
> 17/05/02 06:00:34 INFO regex.RegexURLNormalizer: can't find rules for scope 
> 'partition', using default
> 17/05/02 06:00:34 INFO mapred.LocalJobRunner:
> 17/05/02 06:00:34 INFO mapred.MapTask: Starting flush of map output
> 17/05/02 06:00:34 INFO mapred.MapTask: Spilling map output
> 17/05/02 06:00:34 INFO mapred.MapTask: bufstart = 0; bufend = 83; bufvoid = 
> 104857600
> 17/05/02 06:00:34 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 
> 26214396(104857584); length = 1/6553600
> 17/05/02 06:00:34 INFO zlib.ZlibFactory: Successfully loaded & initialized 
> native-zlib library
> 17/05/02 06:00:34 INFO compress.CodecPool: Got brand-new compressor [.deflate]
> 17/05/02 06:00:34 INFO mapred.MapTask: Finished spill 0
> 17/05/02 06:00:34 INFO mapred.Task: 
> Task:attempt_local1706016672_0001_m_000000_0 is done. And is in the process 
> of committing
> 17/05/02 06:00:34 INFO mapred.LocalJobRunner: 
> hdfs://localhost:9000/user/root/crawl/crawldb/current/part-r-00000/data:0+148
> 17/05/02 06:00:34 INFO mapred.Task: Task 
> 'attempt_local1706016672_0001_m_000000_0' done.
> 17/05/02 06:00:34 INFO mapred.LocalJobRunner: Finishing task: 
> attempt_local1706016672_0001_m_000000_0
> 17/05/02 06:00:34 INFO mapred.LocalJobRunner: map task executor complete.
> 17/05/02 06:00:34 INFO mapred.LocalJobRunner: Waiting for reduce tasks
> 17/05/02 06:00:34 INFO mapred.LocalJobRunner: Starting task: 
> attempt_local1706016672_0001_r_000000_0
> 17/05/02 06:00:34 INFO output.FileOutputCommitter: File Output Committer 
> Algorithm version is 1
> 17/05/02 06:00:34 INFO output.FileOutputCommitter: FileOutputCommitter skip 
> cleanup _temporary folders under output directory:false, ignore cleanup 
> failures: false
> 17/05/02 06:00:34 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
> 17/05/02 06:00:34 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle@2fd7e5ad
> 17/05/02 06:00:34 INFO reduce.MergeManagerImpl: MergerManager: 
> memoryLimit=334338464, maxSingleShuffleLimit=83584616, 
> mergeThreshold=220663392, ioSortFactor=10, memToMemMergeOutputsThreshold=10
> 17/05/02 06:00:34 INFO reduce.EventFetcher: 
> attempt_local1706016672_0001_r_000000_0 Thread started: EventFetcher for 
> fetching Map Completion Events
> 17/05/02 06:00:34 INFO compress.CodecPool: Got brand-new decompressor 
> [.deflate]
> 17/05/02 06:00:34 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle 
> output of map attempt_local1706016672_0001_m_000000_0 decomp: 87 len: 83 to 
> MEMORY
> 17/05/02 06:00:34 INFO reduce.InMemoryMapOutput: Read 87 bytes from 
> map-output for attempt_local1706016672_0001_m_000000_0
> 17/05/02 06:00:34 INFO reduce.MergeManagerImpl: closeInMemoryFile -> 
> map-output of size: 87, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, 
> usedMemory ->87
> 17/05/02 06:00:34 INFO reduce.EventFetcher: EventFetcher is interrupted.. 
> Returning
> 17/05/02 06:00:34 INFO mapred.LocalJobRunner: 1 / 1 copied.
> 17/05/02 06:00:34 INFO reduce.MergeManagerImpl: finalMerge called with 1 
> in-memory map-outputs and 0 on-disk map-outputs
> 17/05/02 06:00:34 INFO mapred.Merger: Merging 1 sorted segments
> 17/05/02 06:00:34 INFO mapred.Merger: Down to the last merge-pass, with 1 
> segments left of total size: 81 bytes
> 17/05/02 06:00:34 INFO reduce.MergeManagerImpl: Merged 1 segments, 87 bytes 
> to disk to satisfy reduce memory limit
> 17/05/02 06:00:34 INFO reduce.MergeManagerImpl: Merging 1 files, 91 bytes 
> from disk
> 17/05/02 06:00:34 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes 
> from memory into reduce
> 17/05/02 06:00:34 INFO mapred.Merger: Merging 1 sorted segments
> 17/05/02 06:00:34 INFO mapred.Merger: Down to the last merge-pass, with 1 
> segments left of total size: 81 bytes
> 17/05/02 06:00:34 INFO mapred.LocalJobRunner: 1 / 1 copied.
> 17/05/02 06:00:34 INFO conf.Configuration: found resource regex-normalize.xml 
> at file:/tmp/hadoop-unjar7886623985863993949/regex-normalize.xml
> 17/05/02 06:00:34 INFO crawl.FetchScheduleFactory: Using FetchSchedule impl: 
> org.apache.nutch.crawl.DefaultFetchSchedule
> 17/05/02 06:00:34 INFO crawl.AbstractFetchSchedule: defaultInterval=2592000
> 17/05/02 06:00:34 INFO crawl.AbstractFetchSchedule: maxInterval=7776000
> 17/05/02 06:00:34 INFO regex.RegexURLNormalizer: can't find rules for scope 
> 'generate_host_count', using default
> 17/05/02 06:00:34 INFO mapred.Task: 
> Task:attempt_local1706016672_0001_r_000000_0 is done. And is in the process 
> of committing
> 17/05/02 06:00:34 INFO mapred.LocalJobRunner: 1 / 1 copied.
> 17/05/02 06:00:34 INFO mapred.Task: Task 
> attempt_local1706016672_0001_r_000000_0 is allowed to commit now
> 17/05/02 06:00:34 INFO output.FileOutputCommitter: Saved output of task 
> 'attempt_local1706016672_0001_r_000000_0' to 
> hdfs://localhost:9000/user/root/generate-temp-ca817ccd-332b-4fa3-afe3-dab7d80ea711/_temporary/0/task_local1706016672_0001_r_000000
> 17/05/02 06:00:34 INFO mapred.LocalJobRunner: reduce > reduce
> 17/05/02 06:00:34 INFO mapred.Task: Task 
> 'attempt_local1706016672_0001_r_000000_0' done.
> 17/05/02 06:00:34 INFO mapred.LocalJobRunner: Finishing task: 
> attempt_local1706016672_0001_r_000000_0
> 17/05/02 06:00:34 INFO mapred.LocalJobRunner: Starting task: 
> attempt_local1706016672_0001_r_000001_0
> 17/05/02 06:00:34 INFO output.FileOutputCommitter: File Output Committer 
> Algorithm version is 1
> 17/05/02 06:00:34 INFO output.FileOutputCommitter: FileOutputCommitter skip 
> cleanup _temporary folders under output directory:false, ignore cleanup 
> failures: false
> 17/05/02 06:00:34 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
> 17/05/02 06:00:34 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle@29cfa49
> 17/05/02 06:00:34 INFO reduce.MergeManagerImpl: MergerManager: 
> memoryLimit=334338464, maxSingleShuffleLimit=83584616, 
> mergeThreshold=220663392, ioSortFactor=10, memToMemMergeOutputsThreshold=10
> 17/05/02 06:00:34 INFO reduce.EventFetcher: 
> attempt_local1706016672_0001_r_000001_0 Thread started: EventFetcher for 
> fetching Map Completion Events
> 17/05/02 06:00:34 INFO reduce.LocalFetcher: localfetcher#2 about to shuffle 
> output of map attempt_local1706016672_0001_m_000000_0 decomp: 2 len: 14 to 
> MEMORY
> 17/05/02 06:00:34 INFO reduce.InMemoryMapOutput: Read 2 bytes from map-output 
> for attempt_local1706016672_0001_m_000000_0
> 17/05/02 06:00:34 INFO reduce.MergeManagerImpl: closeInMemoryFile -> 
> map-output of size: 2, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, 
> usedMemory ->2
> 17/05/02 06:00:34 INFO reduce.EventFetcher: EventFetcher is interrupted.. 
> Returning
> 17/05/02 06:00:34 INFO mapred.LocalJobRunner: 1 / 1 copied.
> 17/05/02 06:00:34 INFO reduce.MergeManagerImpl: finalMerge called with 1 
> in-memory map-outputs and 0 on-disk map-outputs
> 17/05/02 06:00:34 INFO mapred.Merger: Merging 1 sorted segments
> 17/05/02 06:00:34 INFO mapred.Merger: Down to the last merge-pass, with 0 
> segments left of total size: 0 bytes
> 17/05/02 06:00:34 INFO reduce.MergeManagerImpl: Merged 1 segments, 2 bytes to 
> disk to satisfy reduce memory limit
> 17/05/02 06:00:34 INFO reduce.MergeManagerImpl: Merging 1 files, 22 bytes 
> from disk
> 17/05/02 06:00:34 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes 
> from memory into reduce
> 17/05/02 06:00:34 INFO mapred.Merger: Merging 1 sorted segments
> 17/05/02 06:00:34 INFO mapred.Merger: Down to the last merge-pass, with 0 
> segments left of total size: 0 bytes
> 17/05/02 06:00:34 INFO mapred.LocalJobRunner: 1 / 1 copied.
> 17/05/02 06:00:34 INFO conf.Configuration: found resource regex-normalize.xml 
> at file:/tmp/hadoop-unjar7886623985863993949/regex-normalize.xml
> 17/05/02 06:00:34 INFO crawl.FetchScheduleFactory: Using FetchSchedule impl: 
> org.apache.nutch.crawl.DefaultFetchSchedule
> 17/05/02 06:00:34 INFO crawl.AbstractFetchSchedule: defaultInterval=2592000
> 17/05/02 06:00:34 INFO crawl.AbstractFetchSchedule: maxInterval=7776000
> 17/05/02 06:00:34 INFO mapred.Task: 
> Task:attempt_local1706016672_0001_r_000001_0 is done. And is in the process 
> of committing
> 17/05/02 06:00:34 INFO mapred.LocalJobRunner: reduce > reduce
> 17/05/02 06:00:34 INFO mapred.Task: Task 
> 'attempt_local1706016672_0001_r_000001_0' done.
> 17/05/02 06:00:34 INFO mapred.LocalJobRunner: Finishing task: 
> attempt_local1706016672_0001_r_000001_0
> 17/05/02 06:00:34 INFO mapred.LocalJobRunner: reduce task executor complete.
> 17/05/02 06:00:34 INFO mapreduce.Job: Job job_local1706016672_0001 running in 
> uber mode : false
> 17/05/02 06:00:34 INFO mapreduce.Job:  map 100% reduce 100%
> 17/05/02 06:00:34 INFO mapreduce.Job: Job job_local1706016672_0001 completed 
> successfully
> 17/05/02 06:00:35 INFO mapreduce.Job: Counters: 35
>         File System Counters
>                 FILE: Number of bytes read=652296139
>                 FILE: Number of bytes written=658571046
>                 FILE: Number of read operations=0
>                 FILE: Number of large read operations=0
>                 FILE: Number of write operations=0
>                 HDFS: Number of bytes read=444
>                 HDFS: Number of bytes written=398
>                 HDFS: Number of read operations=37
>                 HDFS: Number of large read operations=0
>                 HDFS: Number of write operations=13
>         Map-Reduce Framework
>                 Map input records=1
>                 Map output records=1
>                 Map output bytes=83
>                 Map output materialized bytes=97
>                 Input split bytes=123
>                 Combine input records=0
>                 Combine output records=0
>                 Reduce input groups=1
>                 Reduce shuffle bytes=97
>                 Reduce input records=1
>                 Reduce output records=1
>                 Spilled Records=2
>                 Shuffled Maps =2
>                 Failed Shuffles=0
>                 Merged Map outputs=2
>                 GC time elapsed (ms)=8
>                 Total committed heap usage (bytes)=1036517376
>         Shuffle Errors
>                 BAD_ID=0
>                 CONNECTION=0
>                 IO_ERROR=0
>                 WRONG_LENGTH=0
>                 WRONG_MAP=0
>                 WRONG_REDUCE=0
>         File Input Format Counters
>                 Bytes Read=148
>         File Output Format Counters
>                 Bytes Written=199
> 17/05/02 06:00:35 INFO crawl.Generator: Generator: Partitioning selected urls 
> for politeness.
> 17/05/02 06:00:36 INFO crawl.Generator: Generator: segment: 
> crawl/segments/20170502060036
> 17/05/02 06:00:36 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with 
> processName=JobTracker, sessionId= - already initialized
> 17/05/02 06:00:36 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with 
> processName=JobTracker, sessionId= - already initialized
> 17/05/02 06:00:36 INFO mapred.FileInputFormat: Total input files to process : 
> 1
> 17/05/02 06:00:36 INFO mapreduce.JobSubmitter: number of splits:1
> 17/05/02 06:00:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
> job_local1332900929_0002
> 17/05/02 06:00:36 INFO mapreduce.Job: The url to track the job: 
> http://localhost:8080/
> 17/05/02 06:00:36 INFO mapreduce.Job: Running job: job_local1332900929_0002
> 17/05/02 06:00:36 INFO mapred.LocalJobRunner: OutputCommitter set in config 
> null
> 17/05/02 06:00:36 INFO mapred.LocalJobRunner: OutputCommitter is 
> org.apache.hadoop.mapred.FileOutputCommitter
> 17/05/02 06:00:36 INFO output.FileOutputCommitter: File Output Committer 
> Algorithm version is 1
> 17/05/02 06:00:36 INFO output.FileOutputCommitter: FileOutputCommitter skip 
> cleanup _temporary folders under output directory:false, ignore cleanup 
> failures: false
> 17/05/02 06:00:36 INFO mapred.LocalJobRunner: Waiting for map tasks
> 17/05/02 06:00:36 INFO mapred.LocalJobRunner: Starting task: 
> attempt_local1332900929_0002_m_000000_0
> 17/05/02 06:00:36 INFO output.FileOutputCommitter: File Output Committer 
> Algorithm version is 1
> 17/05/02 06:00:36 INFO output.FileOutputCommitter: FileOutputCommitter skip 
> cleanup _temporary folders under output directory:false, ignore cleanup 
> failures: false
> 17/05/02 06:00:36 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
> 17/05/02 06:00:36 INFO mapred.MapTask: Processing split: 
> hdfs://localhost:9000/user/root/generate-temp-ca817ccd-332b-4fa3-afe3-dab7d80ea711/fetchlist-1/part-00000:0+199
> 17/05/02 06:00:36 INFO mapred.MapTask: numReduceTasks: 1
> 17/05/02 06:00:36 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
> 17/05/02 06:00:36 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
> 17/05/02 06:00:36 INFO mapred.MapTask: soft limit at 83886080
> 17/05/02 06:00:36 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
> 17/05/02 06:00:36 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
> 17/05/02 06:00:36 INFO mapred.MapTask: Map output collector class = 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer
> 17/05/02 06:00:36 INFO mapred.LocalJobRunner:
> 17/05/02 06:00:36 INFO mapred.MapTask: Starting flush of map output
> 17/05/02 06:00:36 INFO mapred.MapTask: Spilling map output
> 17/05/02 06:00:36 INFO mapred.MapTask: bufstart = 0; bufend = 104; bufvoid = 
> 104857600
> 17/05/02 06:00:36 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 
> 26214396(104857584); length = 1/6553600
> 17/05/02 06:00:36 INFO mapred.MapTask: Finished spill 0
> 17/05/02 06:00:36 INFO mapred.Task: 
> Task:attempt_local1332900929_0002_m_000000_0 is done. And is in the process 
> of committing
> 17/05/02 06:00:36 INFO mapred.LocalJobRunner: 
> hdfs://localhost:9000/user/root/generate-temp-ca817ccd-332b-4fa3-afe3-dab7d80ea711/fetchlist-1/part-00000:0+199
> 17/05/02 06:00:36 INFO mapred.Task: Task 
> 'attempt_local1332900929_0002_m_000000_0' done.
> 17/05/02 06:00:36 INFO mapred.LocalJobRunner: Finishing task: 
> attempt_local1332900929_0002_m_000000_0
> 17/05/02 06:00:36 INFO mapred.LocalJobRunner: map task executor complete.
> 17/05/02 06:00:36 INFO mapred.LocalJobRunner: Waiting for reduce tasks
> 17/05/02 06:00:36 INFO mapred.LocalJobRunner: Starting task: 
> attempt_local1332900929_0002_r_000000_0
> 17/05/02 06:00:36 INFO output.FileOutputCommitter: File Output Committer 
> Algorithm version is 1
> 17/05/02 06:00:36 INFO output.FileOutputCommitter: FileOutputCommitter skip 
> cleanup _temporary folders under output directory:false, ignore cleanup 
> failures: false
> 17/05/02 06:00:36 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
> 17/05/02 06:00:36 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle@57dcd1f6
> 17/05/02 06:00:36 INFO reduce.MergeManagerImpl: MergerManager: 
> memoryLimit=334338464, maxSingleShuffleLimit=83584616, 
> mergeThreshold=220663392, ioSortFactor=10, memToMemMergeOutputsThreshold=10
> 17/05/02 06:00:36 INFO reduce.EventFetcher: 
> attempt_local1332900929_0002_r_000000_0 Thread started: EventFetcher for 
> fetching Map Completion Events
> 17/05/02 06:00:36 INFO reduce.LocalFetcher: localfetcher#3 about to shuffle 
> output of map attempt_local1332900929_0002_m_000000_0 decomp: 108 len: 82 to 
> MEMORY
> 17/05/02 06:00:36 INFO reduce.InMemoryMapOutput: Read 108 bytes from 
> map-output for attempt_local1332900929_0002_m_000000_0
> 17/05/02 06:00:36 INFO reduce.MergeManagerImpl: closeInMemoryFile -> 
> map-output of size: 108, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, 
> usedMemory ->108
> 17/05/02 06:00:36 INFO reduce.EventFetcher: EventFetcher is interrupted.. 
> Returning
> 17/05/02 06:00:36 INFO mapred.LocalJobRunner: 1 / 1 copied.
> 17/05/02 06:00:36 INFO reduce.MergeManagerImpl: finalMerge called with 1 
> in-memory map-outputs and 0 on-disk map-outputs
> 17/05/02 06:00:36 INFO mapred.Merger: Merging 1 sorted segments
> 17/05/02 06:00:36 INFO mapred.Merger: Down to the last merge-pass, with 1 
> segments left of total size: 81 bytes
> 17/05/02 06:00:36 INFO reduce.MergeManagerImpl: Merged 1 segments, 108 bytes 
> to disk to satisfy reduce memory limit
> 17/05/02 06:00:36 INFO reduce.MergeManagerImpl: Merging 1 files, 90 bytes 
> from disk
> 17/05/02 06:00:36 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes 
> from memory into reduce
> 17/05/02 06:00:36 INFO mapred.Merger: Merging 1 sorted segments
> 17/05/02 06:00:36 INFO mapred.Merger: Down to the last merge-pass, with 1 
> segments left of total size: 81 bytes
> 17/05/02 06:00:36 INFO mapred.LocalJobRunner: 1 / 1 copied.
> 17/05/02 06:00:36 INFO mapred.Task: 
> Task:attempt_local1332900929_0002_r_000000_0 is done. And is in the process 
> of committing
> 17/05/02 06:00:36 INFO mapred.LocalJobRunner: 1 / 1 copied.
> 17/05/02 06:00:36 INFO mapred.Task: Task 
> attempt_local1332900929_0002_r_000000_0 is allowed to commit now
> 17/05/02 06:00:36 INFO output.FileOutputCommitter: Saved output of task 
> 'attempt_local1332900929_0002_r_000000_0' to 
> hdfs://localhost:9000/user/root/crawl/segments/20170502060036/crawl_generate/_temporary/0/task_local1332900929_0002_r_000000
> 17/05/02 06:00:36 INFO mapred.LocalJobRunner: reduce > reduce
> 17/05/02 06:00:36 INFO mapred.Task: Task 
> 'attempt_local1332900929_0002_r_000000_0' done.
> 17/05/02 06:00:36 INFO mapred.LocalJobRunner: Finishing task: 
> attempt_local1332900929_0002_r_000000_0
> 17/05/02 06:00:36 INFO mapred.LocalJobRunner: reduce task executor complete.
> 17/05/02 06:00:37 INFO mapreduce.Job: Job job_local1332900929_0002 running in 
> uber mode : false
> 17/05/02 06:00:37 INFO mapreduce.Job:  map 100% reduce 100%
> 17/05/02 06:00:37 INFO mapreduce.Job: Job job_local1332900929_0002 completed 
> successfully
> 17/05/02 06:00:37 INFO mapreduce.Job: Counters: 35
>         File System Counters
>                 FILE: Number of bytes read=869728356
>                 FILE: Number of bytes written=878093356
>                 FILE: Number of read operations=0
>                 FILE: Number of large read operations=0
>                 FILE: Number of write operations=0
>                 HDFS: Number of bytes read=694
>                 HDFS: Number of bytes written=567
>                 HDFS: Number of read operations=53
>                 HDFS: Number of large read operations=0
>                 HDFS: Number of write operations=18
>         Map-Reduce Framework
>                 Map input records=1
>                 Map output records=1
>                 Map output bytes=104
>                 Map output materialized bytes=82
>                 Input split bytes=157
>                 Combine input records=0
>                 Combine output records=0
>                 Reduce input groups=1
>                 Reduce shuffle bytes=82
>                 Reduce input records=1
>                 Reduce output records=1
>                 Spilled Records=2
>                 Shuffled Maps =1
>                 Failed Shuffles=0
>                 Merged Map outputs=1
>                 GC time elapsed (ms)=0
>                 Total committed heap usage (bytes)=901775360
>         Shuffle Errors
>                 BAD_ID=0
>                 CONNECTION=0
>                 IO_ERROR=0
>                 WRONG_LENGTH=0
>                 WRONG_MAP=0
>                 WRONG_REDUCE=0
>         File Input Format Counters
>                 Bytes Read=199
>         File Output Format Counters
>                 Bytes Written=169
> 17/05/02 06:00:37 INFO crawl.Generator: Generator: finished at 2017-05-02 
> 06:00:37, elapsed: 00:00:05
> Operating on segment : 20170502060036
> Fetching : 20170502060036
> /data/apache-nutch-1.13/runtime/deploy/bin/nutch fetch -D 
> mapreduce.job.reduces=2 -D mapred.child.java.opts=-Xmx1000m -D 
> mapreduce.reduce.speculative=false -D mapreduce.map.speculative=false -D 
> mapreduce.map.output.compress=true -D fetcher.timelimit.mins=180 
> crawl/segments/20170502060036 -noParsing -threads 50
> + cygwin=false
> + case "`uname`" in
> ++ uname
> + THIS=/data/apache-nutch-1.13/runtime/deploy/bin/nutch
> + '[' -h /data/apache-nutch-1.13/runtime/deploy/bin/nutch ']'
> + '[' 17 = 0 ']'
> + COMMAND=fetch
> + shift
> ++ dirname /data/apache-nutch-1.13/runtime/deploy/bin/nutch
> + THIS_DIR=/data/apache-nutch-1.13/runtime/deploy/bin
> ++ cd /data/apache-nutch-1.13/runtime/deploy/bin/..
> ++ pwd
> + NUTCH_HOME=/data/apache-nutch-1.13/runtime/deploy
> + '[' '' '!=' '' ']'
> + '[' /usr/lib/jvm/java-8-oracle/jre/ = '' ']'
> + local=true
> + '[' -f /data/apache-nutch-1.13/runtime/deploy/apache-nutch-1.13.job ']'
> + local=false
> + for f in '"$NUTCH_HOME"/*nutch*.job'
> + NUTCH_JOB=/data/apache-nutch-1.13/runtime/deploy/apache-nutch-1.13.job
> + false
> + JAVA=/usr/lib/jvm/java-8-oracle/jre//bin/java
> + JAVA_HEAP_MAX=-Xmx1000m
> + '[' '' '!=' '' ']'
> + CLASSPATH=/data/apache-nutch-1.13/runtime/deploy/conf
> + 
> CLASSPATH=/data/apache-nutch-1.13/runtime/deploy/conf:/usr/lib/jvm/java-8-oracle/jre//lib/tools.jar
> + IFS=
> + false
> + false
> + JAVA_LIBRARY_PATH=
> + '[' -d /data/apache-nutch-1.13/runtime/deploy/lib/native ']'
> + '[' false = true -a X '!=' X ']'
> + unset IFS
> + '[' '' = '' ']'
> + NUTCH_LOG_DIR=/data/apache-nutch-1.13/runtime/deploy/logs
> + '[' '' = '' ']'
> + NUTCH_LOGFILE=hadoop.log
> + false
> + NUTCH_OPTS=($NUTCH_OPTS -Dhadoop.log.dir="$NUTCH_LOG_DIR")
> + NUTCH_OPTS=("${NUTCH_OPTS[@]}" -Dhadoop.log.file="$NUTCH_LOGFILE")
> + '[' x '!=' x ']'
> + '[' fetch = crawl ']'
> + '[' fetch = inject ']'
> + '[' fetch = generate ']'
> + '[' fetch = freegen ']'
> + '[' fetch = fetch ']'
> + CLASS=org.apache.nutch.fetcher.Fetcher
> + EXEC_CALL=(hadoop jar "$NUTCH_JOB")
> + false
> ++ which hadoop
> ++ wc -l
> + '[' 1 -eq 0 ']'
> + exec hadoop jar 
> /data/apache-nutch-1.13/runtime/deploy/apache-nutch-1.13.job 
> org.apache.nutch.fetcher.Fetcher -D mapreduce.job.reduces=2 -D 
> mapred.child.java.opts=-Xmx1000m -D mapreduce.reduce.speculative=false -D 
> mapreduce.map.speculative=false -D mapreduce.map.output.compress=true -D 
> fetcher.timelimit.mins=180 crawl/segments/20170502060036 -noParsing -threads 
> 50
> 17/05/02 06:00:43 INFO fetcher.Fetcher: Fetcher: starting at 2017-05-02 
> 06:00:43
> 17/05/02 06:00:43 INFO fetcher.Fetcher: Fetcher: segment: 
> crawl/segments/20170502060036
> 17/05/02 06:00:43 INFO fetcher.Fetcher: Fetcher Timelimit set for : 
> 1493733643194
> 17/05/02 06:00:44 INFO Configuration.deprecation: session.id is deprecated. 
> Instead, use dfs.metrics.session-id
> 17/05/02 06:00:44 INFO jvm.JvmMetrics: Initializing JVM Metrics with 
> processName=JobTracker, sessionId=
> 17/05/02 06:00:44 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with 
> processName=JobTracker, sessionId= - already initialized
> 17/05/02 06:00:44 ERROR fetcher.Fetcher: Fetcher: 
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://localhost:9000/user/root/crawl/segments/20170502060036/crawl_fetch, 
> expected: file:///
>         at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:665)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:86)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:630)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:861)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:625)
>         at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:435)
>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1436)
>         at 
> org.apache.nutch.fetcher.FetcherOutputFormat.checkOutputSpecs(FetcherOutputFormat.java:55)
>         at 
> org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:270)
>         at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:141)
>         at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
>         at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
>         at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)
>         at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
>         at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
>         at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
>         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:870)
>         at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:486)
>         at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:521)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>         at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:495)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
> 
> Error running:
>   /data/apache-nutch-1.13/runtime/deploy/bin/nutch fetch -D 
> mapreduce.job.reduces=2 -D mapred.child.java.opts=-Xmx1000m -D 
> mapreduce.reduce.speculative=false -D mapreduce.map.speculative=false -D 
> mapreduce.map.output.compress=true -D fetcher.timelimit.mins=180 
> crawl/segments/20170502060036 -noParsing -threads 50
> Failed with exit value 255.
> 
> -----Original Message-----
> From: Sebastian Nagel [mailto:wastl.na...@googlemail.com] 
> Sent: 02 May 2017 13:54
> To: user@nutch.apache.org
> Subject: Re: Wrong FS exception in Fetcher
> 
> Hi Yossi,
> 
> strange error, indeed. Is it also reproducible in pseudo-distributed mode 
> using Hadoop 2.7.2,
> the version Nutch depends on?n
> 
> Could you also add the line
>   set -x
> to bin/nutch and run bin/crawl again to see how all steps are executed.
> 
> Thanks,
> Sebastian
> 
> On 04/30/2017 04:04 PM, Yossi Tamari wrote:
>> Hi,
>>
>>  
>>
>> I'm trying to run Nutch 1.13 on Hadoop 2.8.0 in pseudo-distributed
>> distributed mode.
>>
>> Running the command:
>>
>> Deploy/bin/crawl urls crawl 2
>>
>> The Injector and Generator run successfully, but in the Fetcher I get the
>> following error:
>>
>> 17/04/30 08:43:48 ERROR fetcher.Fetcher: Fetcher:
>> java.lang.IllegalArgumentException: Wrong FS:
>> hdfs://localhost:9000/user/root/crawl/segments/20170430084337/crawl_fetch,
>> expected: file:///
>>
>>         at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:665)
>>
>>         at
>> org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:8
>> 6)
>>
>>         at
>> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFile
>> System.java:630)
>>
>>         at
>> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFi
>> leSystem.java:861)
>>
>>         at
>> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.jav
>> a:625)
>>
>>         at
>> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:43
>> 5)
>>
>>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1436)
>>
>>         at
>> org.apache.nutch.fetcher.FetcherOutputFormat.checkOutputSpecs(FetcherOutputF
>> ormat.java:55)
>>
>>         at
>> org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:270)
>>
>>         at
>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java
>> :141)
>>
>>         at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
>>
>>         at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)
>>
>>         at java.security.AccessController.doPrivileged(Native Method)
>>
>>         at javax.security.auth.Subject.doAs(Subject.java:422)
>>
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
>> va:1807)
>>
>>         at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)
>>
>>         at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
>>
>>         at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
>>
>>         at java.security.AccessController.doPrivileged(Native Method)
>>
>>         at javax.security.auth.Subject.doAs(Subject.java:422)
>>
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
>> va:1807)
>>
>>         at
>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
>>
>>         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
>>
>>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:870)
>>
>>         at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:486)
>>
>>         at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:521)
>>
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>>
>>         at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:495)
>>
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62
>> )
>>
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
>> .java:43)
>>
>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>
>>         at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
>>
>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
>>
>>  
>>
>> Error running:
>>
>>   /data/apache-nutch-1.13/runtime/deploy/bin/nutch fetch -D
>> mapreduce.job.reduces=2 -D mapred.child.java.opts=-Xmx1000m -D
>> mapreduce.reduce.speculative=false -D mapreduce.map.speculative=false -D
>> mapreduce.map.output.compress=true -D fetcher.timelimit.mins=180
>> crawl/segments/20170430084337 -noParsing -threads 50
>>
>> Failed with exit value 255.
>>
>>  
>>
>>  
>>
>> Any ideas how to fix this?
>>
>>  
>>
>> Thanks,
>>
>>                Yossi.
>>
>>
> 
> 


Reply via email to