Hi

I tried to run nutch-1.3 together with solr 3.x according to http://wiki.apache.org/nutch/NutchTutorial.

That worked as described but if I try to search the index using the Solr admin
interface i always get an empty result.

http://localhost:8983/solr/admin/schema.jsp

Using the Schema Browser I see entries in different fields (e.g. the url field) but the content field is emtpy. I was looking for similar problem on the mailing list but I didn't found a solution for this problem.

Here is what  I did:

1.) ./bin/nutch crawl urls -dir crawl -depth 3 -topN 5
2.) Dumping the segment (./bin/nutch readseg -dump crawl/segments/20110916124747 test). The script
     did also dump the content of the web pages. All seems to be ok here.
3.) Copy the nutch schema.xml to the solr conf directory
4.) bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb crawl/linkdb crawl/segments/* 5.) And then trying to search using http://localhost:8983/solr/admin/. but didn't found any HTML-content.
     However if there was a pdf-File to crawl, this pdf-Content is found.

BTW. Using Nutch 1.1 and solr 1.4.1 all worked as expected. I could use these version but I am upgrading from an older Nutch Version and it would be nice if I could use the newer version where nutch and solr
are better integrated.

Any Ideas what might be wrong?

Jann



--

Jann Forrer
Informatikdienste
Universität Zürich
Winterthurerstr. 190
CH-8057 Zürich

oooO   mail:[email protected]
(  )   phone: +41 44 63 56772
 \ (   fax:   +41 44 63 54505
  \_)http://www.id.uzh.ch

Reply via email to