Re: Integrating nutch crawl into solr

Rum Raisin Fri, 28 Oct 2011 09:44:42 -0700

Thanks I resolved it. Was due to wrongly specified crawldb directory. The 
tutorial had it like this... Is this a typo in the tutorial?
bin/nutch solrindex http://127.0.0.1:8983/solr/ crawldb crawldb/linkdb 
crawldb/segments/*
I changed "crawldb" to "crawldb/crawldb" so that the crawldb, linkdb, segments 
directories are on the same level like they are by default.


________________________________
From: lewis john mcgibbney <[email protected]>
To: [email protected]; Rum Raisin <[email protected]>
Sent: Friday, October 28, 2011 2:20 AM
Subject: Re: Integrating nutch crawl into solr


Please check your Hadoop.log and solr logs for related clues
 
The current directory should not be created manually, this should be a result 
from Nutch related task executions.


On Fri, Oct 28, 2011 at 3:28 AM, Rum Raisin <[email protected]> wrote:

Hi,
>I'm running nutch 1.3 and solr 3.4. Both newly installed.  
>I ran a crawl which seems successful as I can see some data retrieved...
>
>
>bin/nutch crawl urls -dir crawl -depth 3 -topN 20
>
>
>Then I copied the default config/schema.xml file from nutch to solr's 
>example/solr/conf directory. Restarted solr.
>
>Then to put the crawl data into solr I ran below command...
>
>bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl crawl/linkdb 
>crawl/segments/*
>
>
>It gave me an error about a missing "current" directory. So I manually created 
>that. And ran again.
>The 2nd time I ran it, there were no errors. But it ran quickly. So I go into 
>my solr admin panel and the statistics show maxDocs=0 and numDocs=0.
>Also did a *:* query but got 0 results.
>So it looks like nothing got imported into solr.  I was following the tutorial 
>here: http://wiki.apache.org/nutch/NutchTutorial#A6._Integrate_Solr_with_Nutch
>Help what am I doing wrong? Why can't I get the nutch data into solr? Thanks.


-- 
Lewis

Re: Integrating nutch crawl into solr

Reply via email to