Re: Nutch 1.13 with Solr Cloud 6.6

Witney, Ernest Fri, 09 Jun 2017 08:08:00 -0700

What version of SOLR and Nutch were you able to get to work?






> On Jun 9, 2017, at 10:24 AM, David Parker <[email protected]> wrote:
> 
> Just to follow up on this, I never did get this to work.  I ended up
> reverting to a standalone Solr instance without authentication, and it
> works.  It would certainly be nice to have this working with SolrCloud and
> ZK, though.
> 
> Thanks!
> 
> On Wed, Jun 7, 2017 at 5:45 PM, David Parker <[email protected]> wrote:
> 
>> I saw that while I was Googling this issue.  That conversation made it
>> sound like this would be fixed in Nutch 1.12, and I'm using 1.13.
>> Shouldn't that fix be in this version?
>> 
>> On Jun 7, 2017 4:32 PM, "Furkan KAMACI" <[email protected]> wrote:
>> 
>>> *PS:* Similar conversation:
>>> http://lucene.472066.n3.nabble.com/Nutch-with-Solrcloud-5-td4248700.html
>>> 
>>> On Wed, Jun 7, 2017 at 9:52 PM, David Parker <[email protected]> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I am attempting to integrate Nutch 1.13 with Solr 6.6 running in cloud
>>>> mode.  I previously had this working fine with Nutch 1.13 and Solr 6.5
>>>> running in stand-alone mode, but now I get an error.  It seems to be an
>>>> issue with the collection not being default.
>>>> 
>>>> Command:
>>>> 
>>>> bin/nutch index -Dsolr.zookeeper.hosts=localhost:9983
>>>> -Dsolr.auth.password=xxxxxxxx -Dsolr.auth.username=xxxxxxxx
>>>> -Dsolr.auth=true -Dsolr.server.url=http://local
>>> host:8983/solr/uc_website
>>>> crawl/crawldb -linkdb crawl/linkdb crawl/segments/20170607135140
>>>> 
>>>> Result in hadoop.log:
>>>> 
>>>> java.lang.Exception: java.io.IOException
>>>>        at
>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(
>>>> LocalJobRunner.java:462)
>>>>        at
>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunn
>>> er.java:529)
>>>> Caused by: java.io.IOException
>>>>        at
>>>> org.apache.nutch.indexwriter.solr.SolrIndexWriter.makeIOException(
>>>> SolrIndexWriter.java:234)
>>>>        at
>>>> org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(
>>>> SolrIndexWriter.java:213)
>>>>        at
>>>> org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(
>>>> SolrIndexWriter.java:174)
>>>>        at org.apache.nutch.indexer.IndexWriters.write(
>>>> IndexWriters.java:87)
>>>>        at
>>>> org.apache.nutch.indexer.IndexerOutputFormat$1.write(
>>>> IndexerOutputFormat.java:50)
>>>>        at
>>>> org.apache.nutch.indexer.IndexerOutputFormat$1.write(
>>>> IndexerOutputFormat.java:41)
>>>>        at
>>>> org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(
>>>> ReduceTask.java:493)
>>>>        at
>>>> org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:422)
>>>>        at
>>>> org.apache.nutch.indexer.IndexerMapReduce.reduce(
>>>> IndexerMapReduce.java:368)
>>>>        at
>>>> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapR
>>> educe.java:57)
>>>>        at
>>>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
>>>>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>>>>        at
>>>> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(
>>>> LocalJobRunner.java:319)
>>>>        at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>>        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>        at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>> ThreadPoolExecutor.java:1142)
>>>>        at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>> ThreadPoolExecutor.java:617)
>>>>        at java.lang.Thread.run(Thread.java:745)
>>>> Caused by: org.apache.solr.client.solrj.SolrServerException: No
>>> collection
>>>> param specified on request and no default collection has been set.
>>>>        at
>>>> org.apache.solr.client.solrj.impl.CloudSolrClient.
>>>> directUpdate(CloudSolrClient.java:556)
>>>>        at
>>>> org.apache.solr.client.solrj.impl.CloudSolrClient.
>>>> sendRequest(CloudSolrClient.java:981)
>>>>        at
>>>> org.apache.solr.client.solrj.impl.CloudSolrClient.
>>>> requestWithRetryOnStaleState(CloudSolrClient.java:870)
>>>>        at
>>>> org.apache.solr.client.solrj.impl.CloudSolrClient.request(
>>>> CloudSolrClient.java:806)
>>>>        at
>>>> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
>>>>        at
>>>> org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(
>>>> SolrIndexWriter.java:210)
>>>>        ... 16 more
>>>> 2017-06-07 14:42:32,305 ERROR indexer.IndexingJob - Indexer:
>>>> java.io.IOException: Job failed!
>>>>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:
>>> 865)
>>>>        at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.
>>>> java:147)
>>>>        at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:
>>> 230)
>>>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>>>        at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:
>>> 239)
>>>> 
>>>> I think the root of the problem is the line "No collection param
>>> specified
>>>> on request and no default collection has been set."
>>>> 
>>>> Any help is greatly appreciated.  Thanks!
>>>> 
>>>> --
>>>> Dave Parker
>>>> Database & Systems Administrator
>>>> Utica College
>>>> Integrated Information Technology Services
>>>> (315) 792-3229
>>>> Registered Linux User #408177
>>>> 
>>> 
>> 
> 
> 
> -- 
> Dave Parker
> Database & Systems Administrator
> Utica College
> Integrated Information Technology Services
> (315) 792-3229
> Registered Linux User #408177

Re: Nutch 1.13 with Solr Cloud 6.6

Reply via email to