Re: solr 5.2.1, data import issue, shown processed rows doesn't match acturally indexed doc quantity.

John Bickerstaff Mon, 04 Apr 2016 20:51:07 -0700

The first question is whether you have duplicate ID's in your data set.

I had the same kind of thing a few months back, freaked out, and spent a
few hours trying to figure it out by coding extra logging etc... to keep
track of every single count at every stage of the process..  All the
numbers matched, right until I sent everything to SOLR...  after which I
ended up with fewer Solr documents than I had "rows" of results.

Then my client told me they knew they had duplicates in the data set based
on the way they "harvest" the data...

I can't explain the difference between the first and second results in SOLR
and of course my situation may not match yours...

But - I suggest clearing solr and starting from scratch.  Get a count at
each stage and (if you're in control of the code as I was) build something
that checks for duplicates (a hashmap in Java is a handy tool for this by
virtue of refusing to accept duplicates).

If you don't have control of the code, you might write some SQL or
something agains the original data store that would uncover the presence of
duplicates.  If you're dealing with a "canned" set of data, you might want
to parse it in code and check for duplicates...

If you can reproduce the difference between the first and second run,
you've got more going on than duplicate ID's - but that might still be part
of your problem.

On Mon, Apr 4, 2016 at 9:26 PM, cqlangyi <cqlan...@163.com> wrote:

> hi there,
>
>
> i have an solr 5.2.1,  when i do data import, after the job is done, it's
> shown 165,191 rows processed successfully.
>
>
> but when i query with *:*, the "numFound" shown only 163,349 docs in index.
>
>
> when i tred to do it again, , it's shown 165,191 rows processed
> successfully. but the *:* query result now is 162,390.
>
>
> no errors in any log,
>
>
> any idea?
>
>
> thank you very much!
>
>
> cq
>
>
>
>
>
>
>
>
> At 2016-04-05 09:19:48, "Chris Hostetter" <hossman_luc...@fucit.org>
> wrote:
> >
> >: I am not sure how to use "Sort By Function" for Case.
> >:
> >: |10#40|14#19|33#17|27#6|15#6|19#5|7#2|6#1|29#1|5#1|30#1|28#1|12#0|20#0|
> >:
> >: Can you tell how to fetch 40 when input is 10.
> >
> >Something like...
> >
>
> >if(termfreq(f,10),40,if(termfreq(f,14),19,if(termfreq(f,33),17,....)))))))))))
> >
> >But i suspect there may be a much better way to achieve your ultimate goal
> >if you tell us what it is.  what do these fields represent? what makes
> >these numeric valuessignificant? do you know which values are significant
> >when indexing, or do they vary for every query?
> >
> >https://people.apache.org/~hossman/#xyproblem
> >XY Problem
> >
> >Your question appears to be an "XY Problem" ... that is: you are dealing
> >with "X", you are assuming "Y" will help you, and you are asking about "Y"
> >without giving more details about the "X" so that we can understand the
> >full issue.  Perhaps the best solution doesn't involve "Y" at all?
> >See Also: http://www.perlmonks.org/index.pl?node_id=542341
> >
> >
> >
> >
> >-Hoss
> >http://www.lucidworks.com/
>

Re: solr 5.2.1, data import issue, shown processed rows doesn't match acturally indexed doc quantity.

Reply via email to