Re: run nutch from tomcat with ProcessBuilder

DB Design Sat, 26 Aug 2017 03:35:22 -0700

Hi,
Markus thanks for your attention.
finally problem solved!
when we run crawl script with ProcessBuilder, current directory will be /
(root) and nutch will try to create temp files (like generate temp files)
in root, so without sudoer user we got permission problem.
solution is to set working directory for ProcessBuilder to some where that
user has permission.
we set working directory to nutch bin folder and now every thing is ok!



On Tue, Aug 22, 2017 at 11:47 PM, Markus Jelsma <markus.jel...@openindex.io>
wrote:

> Well, the exception doesn't mention it, but i would guess it has something
> to do with permissions.
>
>
>
> -----Original message-----
> > From:DB Design <sutd...@gmail.com>
> > Sent: Tuesday 22nd August 2017 19:33
> > To: user@nutch.apache.org
> > Subject: run nutch from tomcat with ProcessBuilder
> >
> > Hi,
> > i want to run nutch crawler command from tomcat web application with java
> > ProcessBuilder, when i run crawler command from terminal every thing is
> ok,
> > but when run with ProccessBuilder job fails with below error.
> > nutch version: 1.12
> > java version: 8
> > OS: ubuntu 16.04
> > tomcat version: 8
> > solr version: 6.2
> > thanks for your help.
> > java.lang.Exception: java.io.IOException: Mkdirs failed to create
> > file:/generate-temp-8dca91a6-3610-4802-a534-c0cdf85cde73/_
> temporary/0/_temporary/attempt_local1345668275_0001_r_000001_0/fetchlist-1
> >     at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(
> LocalJobRunner.java:462)
> >     at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
> > Caused by: java.io.IOException: Mkdirs failed to create
> > file:/generate-temp-8dca91a6-3610-4802-a534-c0cdf85cde73/_
> temporary/0/_temporary/attempt_local1345668275_0001_r_000001_0/fetchlist-1
> >     at
> > org.apache.hadoop.fs.ChecksumFileSystem.create(
> ChecksumFileSystem.java:438)
> >     at
> > org.apache.hadoop.fs.ChecksumFileSystem.create(
> ChecksumFileSystem.java:424)
> >     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
> >     at
> > org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1071)
> >     at org.apache.hadoop.io.SequenceFile.createWriter(
> SequenceFile.java:270)
> >     at org.apache.hadoop.io.SequenceFile.createWriter(
> SequenceFile.java:527)
> >     at
> > org.apache.hadoop.mapred.SequenceFileOutputFormat.getRecordWriter(
> SequenceFileOutputFormat.java:63)
> >     at
> > org.apache.hadoop.mapred.lib.MultipleSequenceFileOutputForm
> at.getBaseRecordWriter(MultipleSequenceFileOutputFormat.java:51)
> >     at
> > org.apache.hadoop.mapred.lib.MultipleOutputFormat$1.write(
> MultipleOutputFormat.java:104)
> >     at
> > org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(
> ReduceTask.java:493)
> >     at org.apache.hadoop.mapred.ReduceTask$3.collect(
> ReduceTask.java:422)
> >     at org.apache.nutch.crawl.Generator$Selector.reduce(
> Generator.java:342)
> >     at org.apache.nutch.crawl.Generator$Selector.reduce(
> Generator.java:110)
> >     at
> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
> >     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
> >     at
> > org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(
> LocalJobRunner.java:319)
> >     at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >     at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
> >     at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
> >     at java.lang.Thread.run(Thread.java:748)
> > 2017-08-22 20:58:28,674 INFO  mapreduce.Job - Job
> job_local1345668275_0001
> > running in uber mode : false
> > 2017-08-22 20:58:28,674 INFO  mapreduce.Job -  map 100% reduce 50%
> > 2017-08-22 20:58:28,675 INFO  mapreduce.Job - Job
> job_local1345668275_0001
> > failed with state FAILED due to: NA
> > 2017-08-22 20:58:28,684 INFO  mapreduce.Job - Counters: 33
> >     File System Counters
> >         FILE: Number of bytes read=1601520
> >         FILE: Number of bytes written=2209492
> >         FILE: Number of read operations=0
> >         FILE: Number of large read operations=0
> >         FILE: Number of write operations=0
> >     Map-Reduce Framework
> >         Map input records=1
> >         Map output records=1
> >         Map output bytes=77
> >         Map output materialized bytes=88
> >         Input split bytes=142
> >         Combine input records=0
> >         Combine output records=0
> >         Reduce input groups=0
> >         Reduce shuffle bytes=88
> >         Reduce input records=0
> >         Reduce output records=0
> >         Spilled Records=1
> >         Shuffled Maps =2
> >         Failed Shuffles=0
> >         Merged Map outputs=2
> >         GC time elapsed (ms)=6
> >         CPU time spent (ms)=0
> >         Physical memory (bytes) snapshot=0
> >         Virtual memory (bytes) snapshot=0
> >         Total committed heap usage (bytes)=531628032
> >     Shuffle Errors
> >         BAD_ID=0
> >         CONNECTION=0
> >         IO_ERROR=0
> >         WRONG_LENGTH=0
> >         WRONG_MAP=0
> >         WRONG_REDUCE=0
> >     File Input Format Counters
> >         Bytes Read=154
> >     File Output Format Counters
> >         Bytes Written=0
> > 2017-08-22 20:58:28,684 ERROR crawl.Generator - Generator:
> > java.io.IOException: Job failed!
> >     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
> >     at org.apache.nutch.crawl.Generator.generate(Generator.java:589)
> >     at org.apache.nutch.crawl.Generator.run(Generator.java:764)
> >     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >     at org.apache.nutch.crawl.Generator.main(Generator.java:717)
> >
>

Re: run nutch from tomcat with ProcessBuilder

Reply via email to