Hi Gorkem,
I haven't verified it by trying - but it may be that given your configuration
the Solr instance isn't reachable via
http://localhost:8983/solr/nutch
Inside the Docker network, host names are the same as container names, that is
http://solr:8983/solr/nutch
might work. Cf. the docker-compose networking documentation:
https://docs.docker.com/compose/networking/
In your docker-compose.yaml there is:
services:
solr:
container_name: solr
image: 'solr:8.5.2'
ports:
- '8983:8983'
...
nutch:
container_name: nutch
...
command: '/root/nutch/bin/crawl -i -D
solr.server.url=http://localhost:8983/solr/nutch -s urls crawl 1'
Please try to fix the URL not in the Sorl URL.
Important: you need to configure the Solr URL in the file
conf/index-writers.xml unless you're using
Nutch 1.14 or below. See
https://cwiki.apache.org/confluence/display/NUTCH/NutchTutorial#NutchTutorial-SetupSolrforsearch
In any case it's important to be able to read the logs (stdout/stderr and the
hadoop.log)! I know this
isn't trivial when using docker-compose but it will save you a lot of time when
searching for errors.
If you need help here, please let us know. Best start a separate thread in the
Nutch user mailing list.
Best,
Sebastian
On 6/7/21 3:18 PM, lewis john mcgibbney wrote:
I’ll have a look today. You can always use the mailing list as well. Feel
free to post your questions there and we will help you out :)
On Sun, Jun 6, 2021 at 12:43 gokmen.yontem <[email protected]>
wrote:
Hi Lewis,
Sorry to bother you. I've been trying to configure Apache Nutch for
almost 10 days now and I'm about to give up. I saw that you are
contributing to this project and I thought maybe you can help me.
This is how desperate I am :)
Here's my repo if you have time:
https://github.com/gorkemyontem/nutch/blob/main/docker-compose.yml
I'm trying to use docker images so there isn't much on the repo/
This is my current error:
nutch | Indexer: java.lang.RuntimeException: Indexing job did not
succeed, job status:FAILED, reason: NA
nutch | at
org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:150)
nutch | at
org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:291)
nutch | at
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
nutch | at
org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:300)
People say that schema.xml could be wrong, but I'm using the most up to
date one from here
https://github.com/apache/nutch/blob/master/src/plugin/indexer-solr/schema.xml
Many many thanks!
Best wishes,
Gorkem