Hi Lewis,
Here is an update
(I spoke to one of our java guys) -
$ set classpath = C:\\apache-nutch-1.11\\lib
$ $classpath
/cygdrive/c/apache-nutch-1.11/lib
$ ../bin/crawl -i urls/ TestCrawl 2
Injecting seed URLs
/cygdrive/c/apache-nutch-1.11/bin/nutch inject TestCrawl crawldb urls/
Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.commons.cli.OptionBuilder.withArgPattern(Ljava/lang/String;I)Lorg/apache/commons/cli/OptionBuilder;
at
org.apache.hadoop.util.GenericOptionsParser.buildGeneralOptions(GenericOptionsParser.java:207)
at
org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:370)
at
org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
at
org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:138)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:59)
at org.apache.nutch.crawl.Injector.main(Injector.java:369)
Error running:
/cygdrive/c/apache-nutch-1.11/bin/nutch inject TestCrawl/crawldb urls/
Failed with exit value 1.
1. I set it using Cygwin notation, and regular windows path
2. I set it in DOS as well by appending to what was already ther
And in each instance I received the same error
Any thoughts? Or do you notice anything I might have missed?
Thanks,
sas
-----Original Message-----
From: Lewis John Mcgibbney [mailto:[email protected]]
Sent: Wednesday, June 15, 2016 11:46 PM
To: [email protected]
Subject: [E] Re: Newbie Question, hadoop error?
Hi Sas,
See response inline :)
On Wed, Jun 15, 2016 at 5:36 AM, <[email protected]> wrote:
> From: "Jamal, Sarfaraz" <[email protected]>
> To: "'[email protected]'" <[email protected]>
> Cc:
> Date: Mon, 13 Jun 2016 17:36:44 -0400
> Subject: Newbie Question, hadoop error?
> Hi Guys,
>
> I am attempting to run nutch using cygwin,
Is this Nutch 1.11 binary distribution you mean?
> and I am having the following problem:
> Ps. I added Hadoop-core to the lib folder already -
>
> I appreciate any insight or comment you guys may have -
>
> $ bin/crawl -i urls/ TestCrawl/ 2
> Injecting seed URLs
> /cygdrive/c/apache-nutch-1.11/bin/nutch inject TestCrawl//crawldb
> urls/ Exception in thread "main" java.lang.NoSuchMethodError:
> org.apache.commons.cli.OptionBuilder.withArgPattern(Ljava/lang/String;I)Lorg/apache/commons/cli/OptionBuilder;
> at
> org.apache.hadoop.util.GenericOptionsParser.buildGeneralOptions(GenericOptionsParser.java:207)
> at
> org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:370)
> at
> org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
> at
> org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:138)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:59)
> at org.apache.nutch.crawl.Injector.main(Injector.java:369)
> Error running:
> /cygdrive/c/apache-nutch-1.11/bin/nutch inject TestCrawl//crawldb
> urls/ Failed with exit value 1.
There are a few issues above.
1) You should change the data structures parent directory from 'TestCrawl/'
to 'TestCrawl' e.g. remove the trailing forward slash. This will prevent you
from generating the CrawlDB in 'TestCrawl//crawldb' and will generated it in
'TestCrawl/crawldb' instead.
2) The presence of NoSuchMethodError would indicate that the $NUTCH_HOME/lib
directory is not on the JVM classpath. Please make sure that it is.
Lewis