I just started learning about nutch and solr and I am starting to get confuse over some issue.

I using cygwin on windows xp

Basically I crawl with this command:

sh nutch crawl urls -dir ../data/jf -topN 1000

So basically this means that each segments will contain 1000 urls right?

So i went to the jf folder and see there are 2 folder under segments with timestamp as name.

So theorically I should have 2000 documents right? Or wrong?

so I index it to solr with solrindex

Using the catch-all query *:* return "numFound" to be 77.

Some of the urls i supposed was crawled was not found in the results.?

Anyone can point me in the right direction?

Reply via email to