Thanks but I don't have error messages neither in nutch or solr logs.

> From: [email protected]
> Date: Mon, 1 Nov 2010 22:43:34 -0500
> Subject: Re: nutch solrindex doesn't index all the documents
> To: [email protected]
> 
> Hey Juan,
> 
> I had the same problem.  If you check the nutch logs (in the logs folder in
> nutch) you will most likely see solrindex throwing errors on some of your
> documents.
> 
> For example, some of the date formats on some of my docs wasn't being
> properly parsed, so I had to create a patch (here is my bug entry:
> https://issues.apache.org/jira/browse/NUTCH-871)
> 
> You could be having a different error that demonstrates a bug in a different
> part of the pipeline, but the logs are the place to start.
> 
> -Max
> 
> On Mon, Nov 1, 2010 at 7:23 PM, Juan Felix <[email protected]> wrote:
> 
> >
> > Hi.
> >
> > I'm trying to index all the documents using solrindex command, but for some
> > reason sometimes it doesn't index all the documents.
> >
> > For example, I saw the crawl db stats and it has 75,031 fetched pages but
> > after index them to solr, the number of documents in solr are 74,827
> >
> > Any Idea? What about the other 204 pages that are not on solr?
> >
> > Thanks
> > Juan Felix
> >
                                          

Reply via email to