RE: maxDoc ten times greater than numDoc

2017-04-13 Thread Markus Jelsma
Thanks, but i am not going to be brave this time :)

I have tried reclaimDeletesWeight on an ordinary index some time ago and it was 
very aggresive with slightly higher values than default. I think setting this 
weight in this situation would be analogous to a forceMerge every time, which 
makes sense.

Thanks,
Markus
 
-Original message-
> From:Erick Erickson <erickerick...@gmail.com>
> Sent: Thursday 13th April 2017 17:07
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: maxDoc ten times greater than numDoc
> 
> If you want to be brave
> 
> Through a clever bit of reflection, the parameters that
> TieredMergePolicy uses to decide what segments to reclaim are settable
> in solrconfig.xml (undocumented, so use at your own risk). You could
> try bumping
> 
> reclaimDeletesWeight
> 
> in your TieredMergePolicy configuration if you wanted to experiment.
> 
> There's no good reason not to set your segments per tier, it won't hurt.
> 
>  But as you say you have a solution so this is just for curiosity's sake.
> 
> Best,
> Erick
> 
> On Thu, Apr 13, 2017 at 4:42 AM, Alexandre Rafalovitch
> <arafa...@gmail.com> wrote:
> > Maybe not every entry got deleted and it was holding up the segment.
> > E.g. a child or parent record abandoned. If, for example, the parent
> > record has a date field and the child does not, then deleting with a
> > date-based query may trigger this. I think there was a bug about
> > abandoned child or something.
> >
> > This is pure speculation of course.
> >
> > Regards,
> >Alex.
> > 
> > http://www.solr-start.com/ - Resources for Solr users, new and experienced
> >
> >
> > On 13 April 2017 at 12:54, Markus Jelsma <markus.jel...@openindex.io> wrote:
> >> I have forced a merge yesterday and went back to one segment.
> >>
> >> One indexer program reindexes (most or all) every 20 minutes orso. There 
> >> is nothing custom at that particular point. There is no autoCommit, the 
> >> indexer program is responsible for a hard commit, it is the single source 
> >> of reindexed data.
> >>
> >> After one cycle we had two segments, 50 % deleted, as expected. This was 
> >> stable for many hours and many cycles. For some reason, i now have 2/3 
> >> deletes and three segments, now this situation is stable. So the merges do 
> >> happen, but sometimes they don't. When they don't, the size increases (now 
> >> three segments, 55 MB). But it appears that number of segments never 
> >> decreases, and that is what bothers me.
> >>
> >> I was about to set segmentsPerTier to two but then i realized i can also 
> >> delete everything prior to indexing as opposed to deleting only items 
> >> older than the set i am already about to reindex. This strategy works fine 
> >> with other reindexing programs, they don't suffer this problem.
> >>
> >> So, it is not solved, but not a problem anymore. Thanks all anyway :)
> >> Markus
> >>
> >> -Original message-
> >>> From:Erick Erickson <erickerick...@gmail.com>
> >>> Sent: Wednesday 12th April 2017 17:51
> >>> To: solr-user <solr-user@lucene.apache.org>
> >>> Subject: Re: maxDoc ten times greater than numDoc
> >>>
> >>> Yes, this is very strange. My bet: you have something
> >>> custom, a setting, indexing code, whatever that
> >>> is getting in the way.
> >>>
> >>> Second possibility (really stretching here): your
> >>> merge settings are set to 10 segments having to exist
> >>> before merging and somehow not all the docs in the
> >>> segments are replaced. So until you get to the 10th
> >>> re-index (and assuming a single segment is
> >>> produced per re-index) the older segments aren't
> >>> merged. If that were the case I'd expect to see the
> >>> number of deleted docs drop back periodically
> >>> then build up again. A real shot in the dark. One way
> >>> to test this would be to specify "segmentsPerTier" of, say,
> >>> 2 rather than the default 10, see:
> >>> https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig
> >>> If this were the case I'd expect with a setting of 2 that
> >>> your index might have 50% deleted docs, that would at
> >>> least tell us whether we're on the right track.
> >>>
> >>> Take a look at your index on disk. If you're seeing gaps
> >>> in th

Re: maxDoc ten times greater than numDoc

2017-04-13 Thread Erick Erickson
If you want to be brave

Through a clever bit of reflection, the parameters that
TieredMergePolicy uses to decide what segments to reclaim are settable
in solrconfig.xml (undocumented, so use at your own risk). You could
try bumping

reclaimDeletesWeight

in your TieredMergePolicy configuration if you wanted to experiment.

There's no good reason not to set your segments per tier, it won't hurt.

 But as you say you have a solution so this is just for curiosity's sake.

Best,
Erick

On Thu, Apr 13, 2017 at 4:42 AM, Alexandre Rafalovitch
<arafa...@gmail.com> wrote:
> Maybe not every entry got deleted and it was holding up the segment.
> E.g. a child or parent record abandoned. If, for example, the parent
> record has a date field and the child does not, then deleting with a
> date-based query may trigger this. I think there was a bug about
> abandoned child or something.
>
> This is pure speculation of course.
>
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 13 April 2017 at 12:54, Markus Jelsma <markus.jel...@openindex.io> wrote:
>> I have forced a merge yesterday and went back to one segment.
>>
>> One indexer program reindexes (most or all) every 20 minutes orso. There is 
>> nothing custom at that particular point. There is no autoCommit, the indexer 
>> program is responsible for a hard commit, it is the single source of 
>> reindexed data.
>>
>> After one cycle we had two segments, 50 % deleted, as expected. This was 
>> stable for many hours and many cycles. For some reason, i now have 2/3 
>> deletes and three segments, now this situation is stable. So the merges do 
>> happen, but sometimes they don't. When they don't, the size increases (now 
>> three segments, 55 MB). But it appears that number of segments never 
>> decreases, and that is what bothers me.
>>
>> I was about to set segmentsPerTier to two but then i realized i can also 
>> delete everything prior to indexing as opposed to deleting only items older 
>> than the set i am already about to reindex. This strategy works fine with 
>> other reindexing programs, they don't suffer this problem.
>>
>> So, it is not solved, but not a problem anymore. Thanks all anyway :)
>> Markus
>>
>> -Original message-
>>> From:Erick Erickson <erickerick...@gmail.com>
>>> Sent: Wednesday 12th April 2017 17:51
>>> To: solr-user <solr-user@lucene.apache.org>
>>> Subject: Re: maxDoc ten times greater than numDoc
>>>
>>> Yes, this is very strange. My bet: you have something
>>> custom, a setting, indexing code, whatever that
>>> is getting in the way.
>>>
>>> Second possibility (really stretching here): your
>>> merge settings are set to 10 segments having to exist
>>> before merging and somehow not all the docs in the
>>> segments are replaced. So until you get to the 10th
>>> re-index (and assuming a single segment is
>>> produced per re-index) the older segments aren't
>>> merged. If that were the case I'd expect to see the
>>> number of deleted docs drop back periodically
>>> then build up again. A real shot in the dark. One way
>>> to test this would be to specify "segmentsPerTier" of, say,
>>> 2 rather than the default 10, see:
>>> https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig
>>> If this were the case I'd expect with a setting of 2 that
>>> your index might have 50% deleted docs, that would at
>>> least tell us whether we're on the right track.
>>>
>>> Take a look at your index on disk. If you're seeing gaps
>>> in the numbering, you are getting merging, it may be
>>> that they're not happening very often.
>>>
>>> And I take it you have no custom code here and you are
>>> doing commits? (hard commits are all that matters
>>> for merging, it doesn't matter whether openSearcher
>>> is set to true or false).
>>>
>>> I just tried the "techproducts" example as follows:
>>> 1> indexed all the sample files with the bin/solr -e techproducts example
>>> 2> started re-indexing the sample docs one at a time with post.jar
>>>
>>> It took a while, but eventually the original segments got merged away so
>>> I doubt it's any weirdness with a small index.
>>>
>>> Speaking of small index, why are you sharding with only
>>> 8K docs? Sharding will probably slow things down for such
>>> a small index. This isn't germane to your question though.

Re: maxDoc ten times greater than numDoc

2017-04-13 Thread Alexandre Rafalovitch
Maybe not every entry got deleted and it was holding up the segment.
E.g. a child or parent record abandoned. If, for example, the parent
record has a date field and the child does not, then deleting with a
date-based query may trigger this. I think there was a bug about
abandoned child or something.

This is pure speculation of course.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 13 April 2017 at 12:54, Markus Jelsma <markus.jel...@openindex.io> wrote:
> I have forced a merge yesterday and went back to one segment.
>
> One indexer program reindexes (most or all) every 20 minutes orso. There is 
> nothing custom at that particular point. There is no autoCommit, the indexer 
> program is responsible for a hard commit, it is the single source of 
> reindexed data.
>
> After one cycle we had two segments, 50 % deleted, as expected. This was 
> stable for many hours and many cycles. For some reason, i now have 2/3 
> deletes and three segments, now this situation is stable. So the merges do 
> happen, but sometimes they don't. When they don't, the size increases (now 
> three segments, 55 MB). But it appears that number of segments never 
> decreases, and that is what bothers me.
>
> I was about to set segmentsPerTier to two but then i realized i can also 
> delete everything prior to indexing as opposed to deleting only items older 
> than the set i am already about to reindex. This strategy works fine with 
> other reindexing programs, they don't suffer this problem.
>
> So, it is not solved, but not a problem anymore. Thanks all anyway :)
> Markus
>
> -Original message-
>> From:Erick Erickson <erickerick...@gmail.com>
>> Sent: Wednesday 12th April 2017 17:51
>> To: solr-user <solr-user@lucene.apache.org>
>> Subject: Re: maxDoc ten times greater than numDoc
>>
>> Yes, this is very strange. My bet: you have something
>> custom, a setting, indexing code, whatever that
>> is getting in the way.
>>
>> Second possibility (really stretching here): your
>> merge settings are set to 10 segments having to exist
>> before merging and somehow not all the docs in the
>> segments are replaced. So until you get to the 10th
>> re-index (and assuming a single segment is
>> produced per re-index) the older segments aren't
>> merged. If that were the case I'd expect to see the
>> number of deleted docs drop back periodically
>> then build up again. A real shot in the dark. One way
>> to test this would be to specify "segmentsPerTier" of, say,
>> 2 rather than the default 10, see:
>> https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig
>> If this were the case I'd expect with a setting of 2 that
>> your index might have 50% deleted docs, that would at
>> least tell us whether we're on the right track.
>>
>> Take a look at your index on disk. If you're seeing gaps
>> in the numbering, you are getting merging, it may be
>> that they're not happening very often.
>>
>> And I take it you have no custom code here and you are
>> doing commits? (hard commits are all that matters
>> for merging, it doesn't matter whether openSearcher
>> is set to true or false).
>>
>> I just tried the "techproducts" example as follows:
>> 1> indexed all the sample files with the bin/solr -e techproducts example
>> 2> started re-indexing the sample docs one at a time with post.jar
>>
>> It took a while, but eventually the original segments got merged away so
>> I doubt it's any weirdness with a small index.
>>
>> Speaking of small index, why are you sharding with only
>> 8K docs? Sharding will probably slow things down for such
>> a small index. This isn't germane to your question though.
>>
>> Best,
>> Erick
>>
>>
>> On Wed, Apr 12, 2017 at 5:56 AM, Shawn Heisey <apa...@elyograg.org> wrote:
>> > On 4/12/2017 5:11 AM, Markus Jelsma wrote:
>> >> One of our 2 shard collections is rather small and gets all its entries 
>> >> reindexed every 20 minutes orso. Now i just noticed maxDoc is ten times 
>> >> greater than numDoc, the merger is never scheduled but settings are 
>> >> default. We just overwrite the existing entries, all of them.
>> >>
>> >> Here are the stats:
>> >>
>> >> Last Modified:12 minutes ago
>> >> Num Docs: 8336
>> >> Max Doc:82362
>> >> Heap Memory Usage: -1
>> >> Deleted Docs: 74026
>> >> Version: 3125
>> >> Segment Count: 10
>>

RE: maxDoc ten times greater than numDoc

2017-04-13 Thread Markus Jelsma
I have forced a merge yesterday and went back to one segment.

One indexer program reindexes (most or all) every 20 minutes orso. There is 
nothing custom at that particular point. There is no autoCommit, the indexer 
program is responsible for a hard commit, it is the single source of reindexed 
data.

After one cycle we had two segments, 50 % deleted, as expected. This was stable 
for many hours and many cycles. For some reason, i now have 2/3 deletes and 
three segments, now this situation is stable. So the merges do happen, but 
sometimes they don't. When they don't, the size increases (now three segments, 
55 MB). But it appears that number of segments never decreases, and that is 
what bothers me.

I was about to set segmentsPerTier to two but then i realized i can also delete 
everything prior to indexing as opposed to deleting only items older than the 
set i am already about to reindex. This strategy works fine with other 
reindexing programs, they don't suffer this problem.

So, it is not solved, but not a problem anymore. Thanks all anyway :)
Markus
 
-Original message-
> From:Erick Erickson <erickerick...@gmail.com>
> Sent: Wednesday 12th April 2017 17:51
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: maxDoc ten times greater than numDoc
> 
> Yes, this is very strange. My bet: you have something
> custom, a setting, indexing code, whatever that
> is getting in the way.
> 
> Second possibility (really stretching here): your
> merge settings are set to 10 segments having to exist
> before merging and somehow not all the docs in the
> segments are replaced. So until you get to the 10th
> re-index (and assuming a single segment is
> produced per re-index) the older segments aren't
> merged. If that were the case I'd expect to see the
> number of deleted docs drop back periodically
> then build up again. A real shot in the dark. One way
> to test this would be to specify "segmentsPerTier" of, say,
> 2 rather than the default 10, see:
> https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig
> If this were the case I'd expect with a setting of 2 that
> your index might have 50% deleted docs, that would at
> least tell us whether we're on the right track.
> 
> Take a look at your index on disk. If you're seeing gaps
> in the numbering, you are getting merging, it may be
> that they're not happening very often.
> 
> And I take it you have no custom code here and you are
> doing commits? (hard commits are all that matters
> for merging, it doesn't matter whether openSearcher
> is set to true or false).
> 
> I just tried the "techproducts" example as follows:
> 1> indexed all the sample files with the bin/solr -e techproducts example
> 2> started re-indexing the sample docs one at a time with post.jar
> 
> It took a while, but eventually the original segments got merged away so
> I doubt it's any weirdness with a small index.
> 
> Speaking of small index, why are you sharding with only
> 8K docs? Sharding will probably slow things down for such
> a small index. This isn't germane to your question though.
> 
> Best,
> Erick
> 
> 
> On Wed, Apr 12, 2017 at 5:56 AM, Shawn Heisey <apa...@elyograg.org> wrote:
> > On 4/12/2017 5:11 AM, Markus Jelsma wrote:
> >> One of our 2 shard collections is rather small and gets all its entries 
> >> reindexed every 20 minutes orso. Now i just noticed maxDoc is ten times 
> >> greater than numDoc, the merger is never scheduled but settings are 
> >> default. We just overwrite the existing entries, all of them.
> >>
> >> Here are the stats:
> >>
> >> Last Modified:12 minutes ago
> >> Num Docs: 8336
> >> Max Doc:82362
> >> Heap Memory Usage: -1
> >> Deleted Docs: 74026
> >> Version: 3125
> >> Segment Count: 10
> >
> > This discrepancy would typically mean that when you reindex, you're
> > indexing MOST of the documents, but not ALL of them, so at least one
> > document is still not deleted in each older segment.  When segments have
> > all their documents deleted, they are automatically removed by Lucene,
> > but if there's even one document NOT deleted, the segment will remain
> > until it is merged.
> >
> > There's no information here about how large this core is, but unless the
> > documents are REALLY enormous, I'm betting that an optimize would happen
> > quickly.  With a document count this low and an indexing pattern that
> > results in such a large maxdoc, this might be a good time to go against
> > general advice and perform an optimize at least once a day.
> >
> > An alternate idea

Re: maxDoc ten times greater than numDoc

2017-04-12 Thread Erick Erickson
Yes, this is very strange. My bet: you have something
custom, a setting, indexing code, whatever that
is getting in the way.

Second possibility (really stretching here): your
merge settings are set to 10 segments having to exist
before merging and somehow not all the docs in the
segments are replaced. So until you get to the 10th
re-index (and assuming a single segment is
produced per re-index) the older segments aren't
merged. If that were the case I'd expect to see the
number of deleted docs drop back periodically
then build up again. A real shot in the dark. One way
to test this would be to specify "segmentsPerTier" of, say,
2 rather than the default 10, see:
https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig
If this were the case I'd expect with a setting of 2 that
your index might have 50% deleted docs, that would at
least tell us whether we're on the right track.

Take a look at your index on disk. If you're seeing gaps
in the numbering, you are getting merging, it may be
that they're not happening very often.

And I take it you have no custom code here and you are
doing commits? (hard commits are all that matters
for merging, it doesn't matter whether openSearcher
is set to true or false).

I just tried the "techproducts" example as follows:
1> indexed all the sample files with the bin/solr -e techproducts example
2> started re-indexing the sample docs one at a time with post.jar

It took a while, but eventually the original segments got merged away so
I doubt it's any weirdness with a small index.

Speaking of small index, why are you sharding with only
8K docs? Sharding will probably slow things down for such
a small index. This isn't germane to your question though.

Best,
Erick


On Wed, Apr 12, 2017 at 5:56 AM, Shawn Heisey  wrote:
> On 4/12/2017 5:11 AM, Markus Jelsma wrote:
>> One of our 2 shard collections is rather small and gets all its entries 
>> reindexed every 20 minutes orso. Now i just noticed maxDoc is ten times 
>> greater than numDoc, the merger is never scheduled but settings are default. 
>> We just overwrite the existing entries, all of them.
>>
>> Here are the stats:
>>
>> Last Modified:12 minutes ago
>> Num Docs: 8336
>> Max Doc:82362
>> Heap Memory Usage: -1
>> Deleted Docs: 74026
>> Version: 3125
>> Segment Count: 10
>
> This discrepancy would typically mean that when you reindex, you're
> indexing MOST of the documents, but not ALL of them, so at least one
> document is still not deleted in each older segment.  When segments have
> all their documents deleted, they are automatically removed by Lucene,
> but if there's even one document NOT deleted, the segment will remain
> until it is merged.
>
> There's no information here about how large this core is, but unless the
> documents are REALLY enormous, I'm betting that an optimize would happen
> quickly.  With a document count this low and an indexing pattern that
> results in such a large maxdoc, this might be a good time to go against
> general advice and perform an optimize at least once a day.
>
> An alternate idea that would not require optimizes:  If the intent is to
> completely rebuild the index, you might want to consider issuing a
> "delete all docs by query" before beginning the indexing process.  This
> would ensure that none of the previous documents remain.  As long as you
> don't do a commit that opens a new searcher before the indexing is
> complete, clients won't ever know that everything was deleted.
>
>> This is the config:
>>
>>   6.5.0
>>   ${solr.data.dir:}
>>   > class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
>>   
>>   
>>
>>   
>> ${solr.lock.type:native}
>>  false
>>   
>>
>>   
>>
>>   
>> 
>>   ${solr.ulog.dir:}
>> 
>>   
>
> Side issue: This config is missing autoCommit.  You really should have
> autoCommit with openSearcher set to false and a maxTime in the
> neighborhood of 6.  It goes inside the updateHandler section.  This
> won't change the maxDoc issue, but because of the other problems it can
> prevent, it is strongly recommended.  It can be omitted if you are
> confident that your indexing code is correctly managing hard commits.
>
> Thanks,
> Shawn
>


Re: maxDoc ten times greater than numDoc

2017-04-12 Thread Shawn Heisey
On 4/12/2017 5:11 AM, Markus Jelsma wrote:
> One of our 2 shard collections is rather small and gets all its entries 
> reindexed every 20 minutes orso. Now i just noticed maxDoc is ten times 
> greater than numDoc, the merger is never scheduled but settings are default. 
> We just overwrite the existing entries, all of them.
>
> Here are the stats:
>
> Last Modified:12 minutes ago 
> Num Docs: 8336
> Max Doc:82362
> Heap Memory Usage: -1
> Deleted Docs: 74026
> Version: 3125
> Segment Count: 10

This discrepancy would typically mean that when you reindex, you're
indexing MOST of the documents, but not ALL of them, so at least one
document is still not deleted in each older segment.  When segments have
all their documents deleted, they are automatically removed by Lucene,
but if there's even one document NOT deleted, the segment will remain
until it is merged.

There's no information here about how large this core is, but unless the
documents are REALLY enormous, I'm betting that an optimize would happen
quickly.  With a document count this low and an indexing pattern that
results in such a large maxdoc, this might be a good time to go against
general advice and perform an optimize at least once a day.

An alternate idea that would not require optimizes:  If the intent is to
completely rebuild the index, you might want to consider issuing a
"delete all docs by query" before beginning the indexing process.  This
would ensure that none of the previous documents remain.  As long as you
don't do a commit that opens a new searcher before the indexing is
complete, clients won't ever know that everything was deleted.

> This is the config:
>
>   6.5.0
>   ${solr.data.dir:}
>class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
>   
>   
>
>   
> ${solr.lock.type:native}
>  false
>   
>
>   
>
>   
> 
>   ${solr.ulog.dir:}
> 
>   

Side issue: This config is missing autoCommit.  You really should have
autoCommit with openSearcher set to false and a maxTime in the
neighborhood of 6.  It goes inside the updateHandler section.  This
won't change the maxDoc issue, but because of the other problems it can
prevent, it is strongly recommended.  It can be omitted if you are
confident that your indexing code is correctly managing hard commits.

Thanks,
Shawn



RE: maxDoc ten times greater than numDoc

2017-04-12 Thread alessandro.benedetti
This may be incorrect, but I think that even if a merge happened and the disk
space is actually released, the  deleted docs count will still be there.
What about your index size ? is the index 10 times bigger than expected ?

Cheers



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/maxDoc-ten-times-greater-than-numDoc-tp4329484p4329494.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: maxDoc ten times greater than numDoc

2017-04-12 Thread Markus Jelsma
Hello - i know it includes all those deleted/overwritten documents. But having 
89,9 % deleted documens is quite unreasonable, so i would expect the 
mergeScheduler to kick in at least once in a while. It doesn't with default 
settings so i am curious what is wrong.

Our large regular search cluster regularly merges segments but that one 
receives updates and deletes more sparsel. Maybe the scheduler is fooled by the 
way i reindex. Any ideas?

Regards,
Markus

 
 
-Original message-
> From:alessandro.benedetti <a.benede...@sease.io>
> Sent: Wednesday 12th April 2017 13:45
> To: solr-user@lucene.apache.org
> Subject: Re: maxDoc ten times greater than numDoc
> 
> Hi Markus,
> maxDocs includes deletions :
> 
> Deleted Docs: 74026 + Num Docs: 8336 = Max Doc:82362 
> 
> Cheers
> 
> 
> 
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/maxDoc-ten-times-greater-than-numDoc-tp4329484p4329487.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: maxDoc ten times greater than numDoc

2017-04-12 Thread alessandro.benedetti
Hi Markus,
maxDocs includes deletions :

Deleted Docs: 74026 + Num Docs: 8336 = Max Doc:82362 

Cheers



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/maxDoc-ten-times-greater-than-numDoc-tp4329484p4329487.html
Sent from the Solr - User mailing list archive at Nabble.com.


maxDoc ten times greater than numDoc

2017-04-12 Thread Markus Jelsma
Hi,

One of our 2 shard collections is rather small and gets all its entries 
reindexed every 20 minutes orso. Now i just noticed maxDoc is ten times greater 
than numDoc, the merger is never scheduled but settings are default. We just 
overwrite the existing entries, all of them.

Here are the stats:

Last Modified:    12 minutes ago 
Num Docs: 8336
Max Doc:    82362
Heap Memory Usage: -1
Deleted Docs: 74026
Version: 3125
Segment Count: 10

This is the config:

  6.5.0
  ${solr.data.dir:}
  
  
  

  
    ${solr.lock.type:native}
 false
  

  

  
    
  ${solr.ulog.dir:}
    
  

Any ideas? Thanks!
Markus