Hi Jerritt,
> $ bin/crawl -D
> C:/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/urls/seeds.txt
> \
Test Crawl http://localhost:8983/solr/ 2
Afaics, there are two issues with the command:
1. The option -D expects a key=value pair to set a property, e.g.
-D solr.server.url=http://localhost:8983/solr/
2. In case, you are using a crawl or seed directory containing white spaces in
the path,
the argument needs to be passed with quotes. However, better don't use space
in
file names. :)
Running bin/crawl without arguments shows a command-line help:
$ apache-nutch-1.11/bin/crawl
Usage: crawl [-i|--index] [-D "key=value"] [-w|--wait] <Seed Dir> <Crawl Dir>
<Num Rounds>
-i|--index Indexes crawl results into a configured indexer
-D A Java property to pass to Nutch calls
-w|--wait NUMBER[SUFFIX] Time to wait before generating a new
segment when no URLs
are scheduled for fetching. Suffix can be: s for second,
m for minute, h for hour and d for day. If no suffix is
specified second is used by default.
Seed Dir Directory in which to look for a seeds file
Crawl Dir Directory where the crawl/link/segments dirs are saved
Num Rounds The number of rounds to run this crawl for
I guess this should do the job for you:
$ bin/crawl -i -D solr.server.url=http://localhost:8983/solr/
.../urls/seeds.txt TestCrawl 2
Cheers,
Sebastian
On 12/26/2015 07:16 PM, Jerritt Pace wrote:
> I have tried a lot ofdifferent things, but I can't get nutch to tun a crawl
> command.
>
> I am using cygwin on windows 7.
> I have the java classpath set, and I am getting feedback when I run bin/nutch.
> But the crawl execute gives me an error:
> Error running:
>
> /cygdrive/c/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/bin/nutch
> inject http://localhost:8983/solr//crawldb TestCrawl
> Failed with exit value 127.
> My command is
>
>
> $ bin/crawl -D
> C:/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/urls/seeds.txt
> Test Crawl http://localhost:8983/solr/ 2
> The full output is:
>
> Injecting seed URLs
> /cygdrive/c/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/bin/nutch
> inject http://localhost:8983/solr//crawldb TestCrawl
> Injector: starting at 2015-12-26 13:11:12
> Injector: crawlDb: http://localhost:8983/solr/crawldb
> Injector: urlDir: TestCrawl
> Injector: Converting injected urls to crawl db entries.
> Injector: java.lang.IllegalArgumentException: Wrong FS:
> http://localhost:8983/solr/crawldb, expected: file:///
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:643)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:79)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:506)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
> at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398)
> at org.apache.nutch.crawl.Injector.inject(Injector.java:298)
> at org.apache.nutch.crawl.Injector.run(Injector.java:379)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.nutch.crawl.Injector.main(Injector.java:369)
>
> Error running:
>
> /cygdrive/c/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/bin/nutch
> inject http://localhost:8983/solr//crawldb TestCrawl
> Failed with exit value 127.
>
> Any help with this would be much appreciated!!
>
>
>