HI Markus,
I got nutch up and running on centos =) so yay!
However, I would still like to try and see if I can get it up and running on
windows.
I enabled debug and saw some small problems and fixed them:
Now I am getting this error, does this one ring a bell by any chance? (to
anyone?).
$ bin/nutch inject testcrawl/crawldb urls
Injector: starting at 2016-07-14 10:01:08
Injector: crawlDb: testcrawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" java.lang.NoSuchMethodError:
com.google.common.base.Stopwatch.elapsedMillis()J
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:278)
at
org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:59)
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:375)
at
org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:115)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:493)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:510)
at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at org.apache.nutch.crawl.Injector.inject(Injector.java:376)
at org.apache.nutch.crawl.Injector.run(Injector.java:467)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.Injector.main(Injector.java:441)
-----Original Message-----
From: Markus Jelsma [mailto:[email protected]]
Sent: Wednesday, July 13, 2016 10:44 AM
To: [email protected]
Subject: RE: Running into an Issue
Jamal, i really have no idea but it smells like a Windows related problem again.
See:
http://stackoverflow.com/questions/27201505/hadoop-exception-in-thread-main-java-lang-nullpointerexception
http://stackoverflow.com/questions/30379441/mapreduce-development-inside-eclipse-on-windows
Markus
-----Original message-----
> From:Jamal, Sarfaraz <[email protected]>
> Sent: Wednesday 13th July 2016 16:09
> To: [email protected]
> Subject: RE: Running into an Issue
>
> Hi Markus,
>
> So I am now running this on a machine that I have administrative
> access to And I am getting a different error message:
>
> Do you (or anyone) have any ideas?
>
> $ ./nutch inject ../TestCrawl/crawldb ../url [I tried from the bin
> folder and the root nutch folder[
>
> Injector: starting at 2016-07-13 10:02:54
> Injector: crawlDb: ../TestCrawl/crawldb
> Injector: urlDir: ../url
> Injector: Converting injected urls to crawl db entries.
> Injector: java.lang.NullPointerException
> at java.lang.ProcessBuilder.start(Unknown Source)
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:445)
> at org.apache.hadoop.util.Shell.run(Shell.java:418)
> at
>org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:
>
>650)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
> at
>org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSys
>
>tem.java:633)
> at
>org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.
>
>java:467)
> at
>org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.jav
>
>a:456)
> at
>org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.jav
>
>a:424)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:849)
> at
>org.apache.hadoop.fs.FileSystem.createNewFile(FileSystem.java:1149)
> at
>org.apache.nutch.util.LockUtil.createLockFile(LockUtil.java:58)
> at org.apache.nutch.crawl.Injector.inject(Injector.java:357)
> at org.apache.nutch.crawl.Injector.run(Injector.java:467)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.nutch.crawl.Injector.main(Injector.java:441)
>
>
>
> -----Original Message-----
> From: Markus Jelsma [mailto:[email protected]]
> Sent: Tuesday, July 12, 2016 4:52 AM
> To: [email protected]
> Subject: RE: Running into an Issue
>
> Hi , there are some Windows API calls in there that i will never understand.
> Are there some kinds of symlinks you are working with or whatever they are
> called in Windows? There must be something with Nutch/Hadoop getting access
> to your disk. Check permissions, disk space and whatever you can think of.
>
> M.
>
> -----Original message-----
> > From:Jamal, Sarfaraz <[email protected]>
> > Sent: Monday 11th July 2016 22:46
> > To: Nutch help <[email protected]>
> > Subject: Running into an Issue
> >
> > So I feel I have made some progress on Nutch
> >
> > However I am now getting another error which I am having difficulty
> > navigating through:
> >
> > bin/nutch inject TestCrawl/crawldb url
> >
> > produces this below
> >
> > Do you have to run Cygwin under Admin for it to work?
> >
> > Injector: Converting injected urls to crawl db entries.
> > Exception in thread "main" java.lang.UnsatisfiedLinkError:
> >org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/Str
> >ing;I)Z
> > at
> >org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
> > at
> >org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:5
> >70)
> > at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:977)
> > at
> >org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskCheck
> >er.java:173)
> > at
> >org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:16
> >0)
> > at
> >org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:94)
> > at
> >org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChange
> >d(LocalDirAllocator.java:285)
> > at
> >org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPa
> >thForWrite(LocalDirAllocator.java:344)
> > at
> >org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirA
> >llocator.java:150)
> > at
> >org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirA
> >llocator.java:131)
> > at
> >org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirA
> >llocator.java:115)
> > at
> >org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDist
> >ributedCacheManager.java:131)
> > at
> >org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.jav
> >a:163)
> > at
> >org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java
> >:731)
> > at
> >org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitt
> >er.java:432)
> > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
> > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
> > at java.security.AccessController.doPrivileged(Native
> >Method)
> > at javax.security.auth.Subject.doAs(Unknown Source)
> > at
> >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
> >tion.java:1548)
> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
> > at
> >org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
> > at org.apache.nutch.crawl.Injector.inject(Injector.java:376)
> > at org.apache.nutch.crawl.Injector.run(Injector.java:467)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > at org.apache.nutch.crawl.Injector.main(Injector.java:441)
> >
>