Re: Solr 7.6 optimize index size increase

2020-06-17 Thread Erick Erickson
What Walter said. Although with Solr 7.6, unless you specify maxSegments 
explicitly,
you won’t create segments over the default 5G maximum.

And if you have in the past specified maxSegments so you have segments over 5G, 
optimize (again without specifying maxSegments) will do a “singleton merge” on 
them,
i.e. it’ll rewrite each large segment into a single new segment with all the 
deleted data
removed thus gradually shrinking it. This happens automatically if you delete
documents (update is a delete + add so counts), but you may have a significant
percentage of deleted docs in your index..

Best,
Erick

> On Jun 17, 2020, at 12:39 PM, Walter Underwood  wrote:
> 
> From that short description, you should not be running optimize at all.
> 
> Just stop doing it. It doesn’t make that big a difference.
> 
> It may take your indexes a few weeks to get back to a normal state after the 
> forced merges.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Jun 17, 2020, at 4:12 AM, Raveendra Yerraguntla 
>>  wrote:
>> 
>> Thank you David, Walt , Eric.
>> 1. First time bloated index generated , there is no disk space issue. one 
>> copy of index is 1/6 of disk capacity. we ran into disk capacity after more 
>> than 2  copies of bloated copies.2. Solr is upgraded from 5.*. in 5.* more 
>> than 5 segments is causing performance issue. Performance in 7.* is not 
>> measured for increasing segments. I will plan a PT to get optimum number. 
>> Application has incremental indexing multiple times in a work week.
>> I will keep you updated on the resolution.
>> Thanks again 
>>   On Tuesday, June 16, 2020, 07:34:26 PM EDT, Erick Erickson 
>>  wrote:  
>> 
>> It Depends (tm).
>> 
>> As of Solr 7.5, optimize is different. See: 
>> https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/
>> 
>> So, assuming you have _not_ specified maxSegments=1, any very large
>> segment (near 5G) that has _zero_ deleted documents won’t be merged.
>> 
>> So there are two scenarios:
>> 
>> 1> What Walter mentioned. The optimize process runs out of disk space
>>and leaves lots of crud around
>> 
>> 2> your “older segments” are just max-sized segments with zero deletions.
>> 
>> 
>> All that said… do you have demonstrable performance improvements after
>> optimizing? The entire name “optimize” is misleading, of course who
>> wouldn’t want an optimized index? In earlier versions of Solr (i.e. 4x),
>> it made quite a difference. In more recent Solr releases, it’s not as clear
>> cut. So before worrying about making optimize work, I’d recommend that
>> you do some performance tests on optimized and un-optimized indexes. 
>> If there are significant improvements, that’s one thing. Otherwise, it’s
>> a waste.
>> 
>> Best,
>> Erick
>> 
>>> On Jun 16, 2020, at 5:36 PM, Walter Underwood  wrote:
>>> 
>>> For a full forced merge (mistakenly named “optimize”), the worst case disk 
>>> space
>>> is 3X the size of the index. It is common to need 2X the size of the index.
>>> 
>>> When I worked on Ultraseek Server 20+ years ago, it had the same merge 
>>> behavior.
>>> I implemented a disk space check that would refuse to merge if there wasn’t 
>>> enough
>>> free space. It would log an error and send an email to the admin.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
 On Jun 16, 2020, at 1:58 PM, David Hastings  
 wrote:
 
 I cant give you a 100% true answer but ive experienced this, and what
 "seemed" to happen to me was that the optimize would start, and that will
 drive the size up by 3 fold, and if you out of disk space in the process
 the optimize will quit since, it cant optimize, and leave the live index
 pieces in tact, so now you have the "current" index as well as the
 "optimized" fragments
 
 i cant say for certain thats what you ran into, but we found that if you
 get an expanding disk it will keep growing and prevent this from happening,
 then the index will contract and the disk will shrink back to only what it
 needs.  saved me a lot of headaches not needing to ever worry about disk
 space
 
 On Tue, Jun 16, 2020 at 4:43 PM Raveendra Yerraguntla
  wrote:
 
> 
> when optimize command is issued, the expectation after the completion of
> optimization process is that the index size either decreases or at most
> remain same. In solr 7.6 cluster with 50 plus shards, when optimize 
> command
> is issued, some of the shard's transient or older segment files are not
> deleted. This is happening randomly across all shards. When unnoticed 
> these
> transient files makes disk full. Currently it is handled through monitors,
> but question is what is causing the transient/older files remains there.
> Are there any specific race conditions which laves the older files not
> being deleted?
> 

Re: Solr 7.6 optimize index size increase

2020-06-17 Thread Walter Underwood
From that short description, you should not be running optimize at all.

Just stop doing it. It doesn’t make that big a difference.

It may take your indexes a few weeks to get back to a normal state after the 
forced merges.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jun 17, 2020, at 4:12 AM, Raveendra Yerraguntla 
>  wrote:
> 
> Thank you David, Walt , Eric.
> 1. First time bloated index generated , there is no disk space issue. one 
> copy of index is 1/6 of disk capacity. we ran into disk capacity after more 
> than 2  copies of bloated copies.2. Solr is upgraded from 5.*. in 5.* more 
> than 5 segments is causing performance issue. Performance in 7.* is not 
> measured for increasing segments. I will plan a PT to get optimum number. 
> Application has incremental indexing multiple times in a work week.
> I will keep you updated on the resolution.
> Thanks again 
>On Tuesday, June 16, 2020, 07:34:26 PM EDT, Erick Erickson 
>  wrote:  
> 
> It Depends (tm).
> 
> As of Solr 7.5, optimize is different. See: 
> https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/
> 
> So, assuming you have _not_ specified maxSegments=1, any very large
> segment (near 5G) that has _zero_ deleted documents won’t be merged.
> 
> So there are two scenarios:
> 
> 1> What Walter mentioned. The optimize process runs out of disk space
> and leaves lots of crud around
> 
> 2> your “older segments” are just max-sized segments with zero deletions.
> 
> 
> All that said… do you have demonstrable performance improvements after
> optimizing? The entire name “optimize” is misleading, of course who
> wouldn’t want an optimized index? In earlier versions of Solr (i.e. 4x),
> it made quite a difference. In more recent Solr releases, it’s not as clear
> cut. So before worrying about making optimize work, I’d recommend that
> you do some performance tests on optimized and un-optimized indexes. 
> If there are significant improvements, that’s one thing. Otherwise, it’s
> a waste.
> 
> Best,
> Erick
> 
>> On Jun 16, 2020, at 5:36 PM, Walter Underwood  wrote:
>> 
>> For a full forced merge (mistakenly named “optimize”), the worst case disk 
>> space
>> is 3X the size of the index. It is common to need 2X the size of the index.
>> 
>> When I worked on Ultraseek Server 20+ years ago, it had the same merge 
>> behavior.
>> I implemented a disk space check that would refuse to merge if there wasn’t 
>> enough
>> free space. It would log an error and send an email to the admin.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Jun 16, 2020, at 1:58 PM, David Hastings  
>>> wrote:
>>> 
>>> I cant give you a 100% true answer but ive experienced this, and what
>>> "seemed" to happen to me was that the optimize would start, and that will
>>> drive the size up by 3 fold, and if you out of disk space in the process
>>> the optimize will quit since, it cant optimize, and leave the live index
>>> pieces in tact, so now you have the "current" index as well as the
>>> "optimized" fragments
>>> 
>>> i cant say for certain thats what you ran into, but we found that if you
>>> get an expanding disk it will keep growing and prevent this from happening,
>>> then the index will contract and the disk will shrink back to only what it
>>> needs.  saved me a lot of headaches not needing to ever worry about disk
>>> space
>>> 
>>> On Tue, Jun 16, 2020 at 4:43 PM Raveendra Yerraguntla
>>>  wrote:
>>> 
 
 when optimize command is issued, the expectation after the completion of
 optimization process is that the index size either decreases or at most
 remain same. In solr 7.6 cluster with 50 plus shards, when optimize command
 is issued, some of the shard's transient or older segment files are not
 deleted. This is happening randomly across all shards. When unnoticed these
 transient files makes disk full. Currently it is handled through monitors,
 but question is what is causing the transient/older files remains there.
 Are there any specific race conditions which laves the older files not
 being deleted?
 Any pointers around this will be helpful.
 TIA
>> 



Re: Solr 7.6 optimize index size increase

2020-06-17 Thread Raveendra Yerraguntla
Thank you David, Walt , Eric.
1. First time bloated index generated , there is no disk space issue. one copy 
of index is 1/6 of disk capacity. we ran into disk capacity after more than 2  
copies of bloated copies.2. Solr is upgraded from 5.*. in 5.* more than 5 
segments is causing performance issue. Performance in 7.* is not measured for 
increasing segments. I will plan a PT to get optimum number. Application has 
incremental indexing multiple times in a work week.
I will keep you updated on the resolution.
Thanks again 
On Tuesday, June 16, 2020, 07:34:26 PM EDT, Erick Erickson 
 wrote:  
 
 It Depends (tm).

As of Solr 7.5, optimize is different. See: 
https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/

So, assuming you have _not_ specified maxSegments=1, any very large
segment (near 5G) that has _zero_ deleted documents won’t be merged.

So there are two scenarios:

1> What Walter mentioned. The optimize process runs out of disk space
    and leaves lots of crud around

2> your “older segments” are just max-sized segments with zero deletions.


All that said… do you have demonstrable performance improvements after
optimizing? The entire name “optimize” is misleading, of course who
wouldn’t want an optimized index? In earlier versions of Solr (i.e. 4x),
it made quite a difference. In more recent Solr releases, it’s not as clear
cut. So before worrying about making optimize work, I’d recommend that
you do some performance tests on optimized and un-optimized indexes. 
If there are significant improvements, that’s one thing. Otherwise, it’s
a waste.

Best,
Erick

> On Jun 16, 2020, at 5:36 PM, Walter Underwood  wrote:
> 
> For a full forced merge (mistakenly named “optimize”), the worst case disk 
> space
> is 3X the size of the index. It is common to need 2X the size of the index.
> 
> When I worked on Ultraseek Server 20+ years ago, it had the same merge 
> behavior.
> I implemented a disk space check that would refuse to merge if there wasn’t 
> enough
> free space. It would log an error and send an email to the admin.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Jun 16, 2020, at 1:58 PM, David Hastings  
>> wrote:
>> 
>> I cant give you a 100% true answer but ive experienced this, and what
>> "seemed" to happen to me was that the optimize would start, and that will
>> drive the size up by 3 fold, and if you out of disk space in the process
>> the optimize will quit since, it cant optimize, and leave the live index
>> pieces in tact, so now you have the "current" index as well as the
>> "optimized" fragments
>> 
>> i cant say for certain thats what you ran into, but we found that if you
>> get an expanding disk it will keep growing and prevent this from happening,
>> then the index will contract and the disk will shrink back to only what it
>> needs.  saved me a lot of headaches not needing to ever worry about disk
>> space
>> 
>> On Tue, Jun 16, 2020 at 4:43 PM Raveendra Yerraguntla
>>  wrote:
>> 
>>> 
>>> when optimize command is issued, the expectation after the completion of
>>> optimization process is that the index size either decreases or at most
>>> remain same. In solr 7.6 cluster with 50 plus shards, when optimize command
>>> is issued, some of the shard's transient or older segment files are not
>>> deleted. This is happening randomly across all shards. When unnoticed these
>>> transient files makes disk full. Currently it is handled through monitors,
>>> but question is what is causing the transient/older files remains there.
>>> Are there any specific race conditions which laves the older files not
>>> being deleted?
>>> Any pointers around this will be helpful.
>>> TIA
> 
  

Re: Solr 7.6 optimize index size increase

2020-06-16 Thread Erick Erickson
It Depends (tm).

As of Solr 7.5, optimize is different. See: 
https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/

So, assuming you have _not_ specified maxSegments=1, any very large
segment (near 5G) that has _zero_ deleted documents won’t be merged.

So there are two scenarios:

1> What Walter mentioned. The optimize process runs out of disk space
 and leaves lots of crud around

2> your “older segments” are just max-sized segments with zero deletions.


All that said… do you have demonstrable performance improvements after
optimizing? The entire name “optimize” is misleading, of course who
wouldn’t want an optimized index? In earlier versions of Solr (i.e. 4x),
it made quite a difference. In more recent Solr releases, it’s not as clear
cut. So before worrying about making optimize work, I’d recommend that
you do some performance tests on optimized and un-optimized indexes. 
If there are significant improvements, that’s one thing. Otherwise, it’s
a waste.

Best,
Erick

> On Jun 16, 2020, at 5:36 PM, Walter Underwood  wrote:
> 
> For a full forced merge (mistakenly named “optimize”), the worst case disk 
> space
> is 3X the size of the index. It is common to need 2X the size of the index.
> 
> When I worked on Ultraseek Server 20+ years ago, it had the same merge 
> behavior.
> I implemented a disk space check that would refuse to merge if there wasn’t 
> enough
> free space. It would log an error and send an email to the admin.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Jun 16, 2020, at 1:58 PM, David Hastings  
>> wrote:
>> 
>> I cant give you a 100% true answer but ive experienced this, and what
>> "seemed" to happen to me was that the optimize would start, and that will
>> drive the size up by 3 fold, and if you out of disk space in the process
>> the optimize will quit since, it cant optimize, and leave the live index
>> pieces in tact, so now you have the "current" index as well as the
>> "optimized" fragments
>> 
>> i cant say for certain thats what you ran into, but we found that if you
>> get an expanding disk it will keep growing and prevent this from happening,
>> then the index will contract and the disk will shrink back to only what it
>> needs.  saved me a lot of headaches not needing to ever worry about disk
>> space
>> 
>> On Tue, Jun 16, 2020 at 4:43 PM Raveendra Yerraguntla
>>  wrote:
>> 
>>> 
>>> when optimize command is issued, the expectation after the completion of
>>> optimization process is that the index size either decreases or at most
>>> remain same. In solr 7.6 cluster with 50 plus shards, when optimize command
>>> is issued, some of the shard's transient or older segment files are not
>>> deleted. This is happening randomly across all shards. When unnoticed these
>>> transient files makes disk full. Currently it is handled through monitors,
>>> but question is what is causing the transient/older files remains there.
>>> Are there any specific race conditions which laves the older files not
>>> being deleted?
>>> Any pointers around this will be helpful.
>>> TIA
> 



Re: Solr 7.6 optimize index size increase

2020-06-16 Thread Walter Underwood
For a full forced merge (mistakenly named “optimize”), the worst case disk space
is 3X the size of the index. It is common to need 2X the size of the index.

When I worked on Ultraseek Server 20+ years ago, it had the same merge behavior.
I implemented a disk space check that would refuse to merge if there wasn’t 
enough
free space. It would log an error and send an email to the admin.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jun 16, 2020, at 1:58 PM, David Hastings  
> wrote:
> 
> I cant give you a 100% true answer but ive experienced this, and what
> "seemed" to happen to me was that the optimize would start, and that will
> drive the size up by 3 fold, and if you out of disk space in the process
> the optimize will quit since, it cant optimize, and leave the live index
> pieces in tact, so now you have the "current" index as well as the
> "optimized" fragments
> 
> i cant say for certain thats what you ran into, but we found that if you
> get an expanding disk it will keep growing and prevent this from happening,
> then the index will contract and the disk will shrink back to only what it
> needs.  saved me a lot of headaches not needing to ever worry about disk
> space
> 
> On Tue, Jun 16, 2020 at 4:43 PM Raveendra Yerraguntla
>  wrote:
> 
>> 
>> when optimize command is issued, the expectation after the completion of
>> optimization process is that the index size either decreases or at most
>> remain same. In solr 7.6 cluster with 50 plus shards, when optimize command
>> is issued, some of the shard's transient or older segment files are not
>> deleted. This is happening randomly across all shards. When unnoticed these
>> transient files makes disk full. Currently it is handled through monitors,
>> but question is what is causing the transient/older files remains there.
>> Are there any specific race conditions which laves the older files not
>> being deleted?
>> Any pointers around this will be helpful.
>> TIA



Re: Solr 7.6 optimize index size increase

2020-06-16 Thread David Hastings
I cant give you a 100% true answer but ive experienced this, and what
"seemed" to happen to me was that the optimize would start, and that will
drive the size up by 3 fold, and if you out of disk space in the process
the optimize will quit since, it cant optimize, and leave the live index
pieces in tact, so now you have the "current" index as well as the
"optimized" fragments

i cant say for certain thats what you ran into, but we found that if you
get an expanding disk it will keep growing and prevent this from happening,
then the index will contract and the disk will shrink back to only what it
needs.  saved me a lot of headaches not needing to ever worry about disk
space

On Tue, Jun 16, 2020 at 4:43 PM Raveendra Yerraguntla
 wrote:

>
> when optimize command is issued, the expectation after the completion of
> optimization process is that the index size either decreases or at most
> remain same. In solr 7.6 cluster with 50 plus shards, when optimize command
> is issued, some of the shard's transient or older segment files are not
> deleted. This is happening randomly across all shards. When unnoticed these
> transient files makes disk full. Currently it is handled through monitors,
> but question is what is causing the transient/older files remains there.
> Are there any specific race conditions which laves the older files not
> being deleted?
> Any pointers around this will be helpful.
>  TIA


Solr 7.6 optimize index size increase

2020-06-16 Thread Raveendra Yerraguntla

when optimize command is issued, the expectation after the completion of 
optimization process is that the index size either decreases or at most remain 
same. In solr 7.6 cluster with 50 plus shards, when optimize command is issued, 
some of the shard's transient or older segment files are not deleted. This is 
happening randomly across all shards. When unnoticed these transient files 
makes disk full. Currently it is handled through monitors, but question is what 
is causing the transient/older files remains there. Are there any specific race 
conditions which laves the older files not being deleted?
Any pointers around this will be helpful.
 TIA