Hi
I tried to run nutch-1.3 together with solr 3.x according to
http://wiki.apache.org/nutch/NutchTutorial.
That worked as described but if I try to search the index using the Solr
admin
interface i always get an empty result.
http://localhost:8983/solr/admin/schema.jsp
Using the Schema Browser I see entries in different fields (e.g. the url
field) but the content field is emtpy. I
was looking for similar problem on the mailing list but I didn't found a
solution for this problem.
Here is what I did:
1.) ./bin/nutch crawl urls -dir crawl -depth 3 -topN 5
2.) Dumping the segment (./bin/nutch readseg -dump
crawl/segments/20110916124747 test). The script
did also dump the content of the web pages. All seems to be ok here.
3.) Copy the nutch schema.xml to the solr conf directory
4.) bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb
crawl/linkdb crawl/segments/*
5.) And then trying to search using http://localhost:8983/solr/admin/.
but didn't found any HTML-content.
However if there was a pdf-File to crawl, this pdf-Content is found.
BTW. Using Nutch 1.1 and solr 1.4.1 all worked as expected. I could use
these version but I am upgrading
from an older Nutch Version and it would be nice if I could use the
newer version where nutch and solr
are better integrated.
Any Ideas what might be wrong?
Jann
--
Jann Forrer
Informatikdienste
Universität Zürich
Winterthurerstr. 190
CH-8057 Zürich
oooO mail:[email protected]
( ) phone: +41 44 63 56772
\ ( fax: +41 44 63 54505
\_)http://www.id.uzh.ch