Re: Upgrade to Nutch 1.12

lewis john mcgibbney Fri, 19 Aug 2016 01:54:13 -0700

Evening Madhvi,
I will set this up and debug a clean. I'll report over on
https://issues.apache.org/jira/browse/NUTCH-2269


Thank you for reporting.
Lewis

On Thu, Aug 18, 2016 at 7:08 AM, <[email protected]> wrote:

>
> From: "Arora, Madhvi" <[email protected]>
> To: "[email protected]" <[email protected]>
> Cc:
> Date: Wed, 17 Aug 2016 13:30:09 +0000
> Subject: Upgrade to Nutch 1.12
> Hi,
>
>
> I wanted to find out how to correct the issue below and will appreciate
> any help.
>
>
>  I am trying to upgrade to Nutch 1.12. I am using solr 5.3.1. The reason I
> am upgrading are:
> 1: https crawling
> 2: Boilerplate canola extraction through tika
>
> The only problem so far I am having is an IOException. Please see below. I
> searched and there is an existing jira issue
> NUTCH-2269 <https://issues.apache.org/jira/browse/NUTCH-2269>
>
> [NUTCH-2269] Clean not working after crawl - ASF JIRA<
> https://issues.apache.org/jira/browse/NUTCH-2269>
> issues.apache.org
> It seems like the database on Lucene can only be called crawldb. However a
> couple of bundled version we can find online use linkdb for Lucene as
> default
>
>
>
>
> I get the same error if I try to clean via the old command:
> bin/nutch solrclean crawl-adc/crawldb http://localhost:8983/solr/nutch
>
> But cleaning through linkdb worked as said in the jira issue i.e.
> bin/nutch solrclean crawl-adc/linkdb http://localhost:8983/solr/nutch
>
>
> Just want to know if there is a fix or an alternate way of cleaning and if
> cleaning via linkdb might be okay or what are the repercussions of cleaning
> via linkdb.
>
>
> Exception from logs:
> java.lang.Exception: java.lang.IllegalStateException: Connection pool
> shut down
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(
> LocalJobRunner.java:462)
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
> LocalJobRunner.java:529

Re: Upgrade to Nutch 1.12

Reply via email to