Hey Juan,

I had the same problem.  If you check the nutch logs (in the logs folder in
nutch) you will most likely see solrindex throwing errors on some of your
documents.

For example, some of the date formats on some of my docs wasn't being
properly parsed, so I had to create a patch (here is my bug entry:
https://issues.apache.org/jira/browse/NUTCH-871)

You could be having a different error that demonstrates a bug in a different
part of the pipeline, but the logs are the place to start.

-Max

On Mon, Nov 1, 2010 at 7:23 PM, Juan Felix <[email protected]> wrote:

>
> Hi.
>
> I'm trying to index all the documents using solrindex command, but for some
> reason sometimes it doesn't index all the documents.
>
> For example, I saw the crawl db stats and it has 75,031 fetched pages but
> after index them to solr, the number of documents in solr are 74,827
>
> Any Idea? What about the other 204 pages that are not on solr?
>
> Thanks
> Juan Felix
>

Reply via email to