solr and nutch confusion...

codegigabyte Mon, 14 Nov 2011 18:57:42 -0800

I just started learning about nutch and solr and I am starting to getconfuse over some issue.


I using cygwin on windows xp


Basically I crawl with this command:

sh nutch crawl urls -dir ../data/jf -topN 1000

So basically this means that each segments will contain 1000 urls right?

So i went to the jf folder and see there are 2 folder under segmentswith timestamp as name.


So theorically I should have 2000 documents right? Or wrong?

so I index it to solr with solrindex

Using the catch-all query *:* return "numFound" to be 77.

Some of the urls i supposed was crawled was not found in the results.?

Anyone can point me in the right direction?

solr and nutch confusion...

Reply via email to