Re: Apache Nutch help request for a school project :)
:) On Thu, Jun 10, 2021 at 7:18 AM gokmen.yontem wrote: > Lewis, Sebastian > I can’t thank you enough! Your help is much appreciated. > > Next time I'll follow your advice and use the mailing list, which I > wasn't aware of that. > > Best wishes, > Gorkem > > > On 2021-06-07 20:08, lewis john mcgibbney wrote: > > Yep Sebastian is absolutely correct. I sent you a pull request. > > > > https://github.com/gorkemyontem/nutch/pull/1 > > HTH > > lewismc > > > > On Mon, Jun 7, 2021 at 6:18 AM lewis john mcgibbney > > wrote: > > > >> I’ll have a look today. You can always use the mailing list as > >> well. Feel free to post your questions there and we will help you > >> out :) > >> > >> On Sun, Jun 6, 2021 at 12:43 gokmen.yontem > >> wrote: > >> > >>> Hi Lewis, > >>> Sorry to bother you. I've been trying to configure Apache Nutch > >>> for > >>> almost 10 days now and I'm about to give up. I saw that you are > >>> contributing to this project and I thought maybe you can help me. > >>> This is how desperate I am :) > >>> > >>> Here's my repo if you have time: > >>> https://github.com/gorkemyontem/nutch/blob/main/docker-compose.yml > >>> I'm trying to use docker images so there isn't much on the repo/ > >>> > >>> This is my current error: > >>> > >>> nutch| Indexer: java.lang.RuntimeException: Indexing job did > >>> not > >>> succeed, job status:FAILED, reason: NA > >>> nutch| at > >>> org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:150) > >>> nutch| at > >>> org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:291) > >>> nutch| at > >>> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > >>> nutch| at > >>> org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:300) > >>> > >>> People say that schema.xml could be wrong, but I'm using the most > >>> up to > >>> date one from here > >>> > >> > > > https://github.com/apache/nutch/blob/master/src/plugin/indexer-solr/schema.xml > >>> > >>> Many many thanks! > >>> Best wishes, > >>> Gorkem > >> -- > >> > >> http://home.apache.org/~lewismc/ > >> http://people.apache.org/keys/committer/lewismc > > > > -- > > > > http://home.apache.org/~lewismc/ > > http://people.apache.org/keys/committer/lewismc > -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc
Re: Apache Nutch help request for a school project :)
Yep Sebastian is absolutely correct. I sent you a pull request. https://github.com/gorkemyontem/nutch/pull/1 HTH lewismc On Mon, Jun 7, 2021 at 6:18 AM lewis john mcgibbney wrote: > I’ll have a look today. You can always use the mailing list as well. Feel > free to post your questions there and we will help you out :) > > On Sun, Jun 6, 2021 at 12:43 gokmen.yontem > wrote: > >> Hi Lewis, >> Sorry to bother you. I've been trying to configure Apache Nutch for >> almost 10 days now and I'm about to give up. I saw that you are >> contributing to this project and I thought maybe you can help me. >> This is how desperate I am :) >> >> Here's my repo if you have time: >> https://github.com/gorkemyontem/nutch/blob/main/docker-compose.yml >> I'm trying to use docker images so there isn't much on the repo/ >> >> This is my current error: >> >> nutch| Indexer: java.lang.RuntimeException: Indexing job did not >> succeed, job status:FAILED, reason: NA >> nutch| at >> org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:150) >> nutch| at >> org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:291) >> nutch| at >> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) >> nutch| at >> org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:300) >> >> >> People say that schema.xml could be wrong, but I'm using the most up to >> date one from here >> >> https://github.com/apache/nutch/blob/master/src/plugin/indexer-solr/schema.xml >> >> >> Many many thanks! >> Best wishes, >> Gorkem >> > -- > http://home.apache.org/~lewismc/ > http://people.apache.org/keys/committer/lewismc > -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc
Re: Apache Nutch help request for a school project :)
Hi Gorkem, I haven't verified it by trying - but it may be that given your configuration the Solr instance isn't reachable via http://localhost:8983/solr/nutch Inside the Docker network, host names are the same as container names, that is http://solr:8983/solr/nutch might work. Cf. the docker-compose networking documentation: https://docs.docker.com/compose/networking/ In your docker-compose.yaml there is: services: solr: container_name: solr image: 'solr:8.5.2' ports: - '8983:8983' ... nutch: container_name: nutch ... command: '/root/nutch/bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch -s urls crawl 1' Please try to fix the URL not in the Sorl URL. Important: you need to configure the Solr URL in the file conf/index-writers.xml unless you're using Nutch 1.14 or below. See https://cwiki.apache.org/confluence/display/NUTCH/NutchTutorial#NutchTutorial-SetupSolrforsearch In any case it's important to be able to read the logs (stdout/stderr and the hadoop.log)! I know this isn't trivial when using docker-compose but it will save you a lot of time when searching for errors. If you need help here, please let us know. Best start a separate thread in the Nutch user mailing list. Best, Sebastian On 6/7/21 3:18 PM, lewis john mcgibbney wrote: I’ll have a look today. You can always use the mailing list as well. Feel free to post your questions there and we will help you out :) On Sun, Jun 6, 2021 at 12:43 gokmen.yontem wrote: Hi Lewis, Sorry to bother you. I've been trying to configure Apache Nutch for almost 10 days now and I'm about to give up. I saw that you are contributing to this project and I thought maybe you can help me. This is how desperate I am :) Here's my repo if you have time: https://github.com/gorkemyontem/nutch/blob/main/docker-compose.yml I'm trying to use docker images so there isn't much on the repo/ This is my current error: nutch| Indexer: java.lang.RuntimeException: Indexing job did not succeed, job status:FAILED, reason: NA nutch| at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:150) nutch| at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:291) nutch| at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) nutch| at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:300) People say that schema.xml could be wrong, but I'm using the most up to date one from here https://github.com/apache/nutch/blob/master/src/plugin/indexer-solr/schema.xml Many many thanks! Best wishes, Gorkem
Re: Apache Nutch help request for a school project :)
I’ll have a look today. You can always use the mailing list as well. Feel free to post your questions there and we will help you out :) On Sun, Jun 6, 2021 at 12:43 gokmen.yontem wrote: > Hi Lewis, > Sorry to bother you. I've been trying to configure Apache Nutch for > almost 10 days now and I'm about to give up. I saw that you are > contributing to this project and I thought maybe you can help me. > This is how desperate I am :) > > Here's my repo if you have time: > https://github.com/gorkemyontem/nutch/blob/main/docker-compose.yml > I'm trying to use docker images so there isn't much on the repo/ > > This is my current error: > > nutch| Indexer: java.lang.RuntimeException: Indexing job did not > succeed, job status:FAILED, reason: NA > nutch| at > org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:150) > nutch| at > org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:291) > nutch| at > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > nutch| at > org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:300) > > > People say that schema.xml could be wrong, but I'm using the most up to > date one from here > > https://github.com/apache/nutch/blob/master/src/plugin/indexer-solr/schema.xml > > > Many many thanks! > Best wishes, > Gorkem > -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc