Re: Solr Cloud reclaiming disk space from deleted documents

2015-05-04 Thread Rishi Easwaran
Sadly with the size of our complex, spiting and adding more HW is not a viable 
long term solution. 
 I guess the options we have are to run optimize regularly and/or become 
aggressive in our merges proactively even before solr cloud gets into this 
situation.
 
 Thanks,
 Rishi.
 

 

 

-Original Message-
From: Gili Nachum gilinac...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Mon, Apr 27, 2015 4:18 pm
Subject: Re: Solr Cloud reclaiming disk space from deleted documents


To prevent it from re occurring you could monitor index size and once above
a
certain size threshold add another machine and split the shard between
existing
and new machine.
On Apr 20, 2015 9:10 PM, Rishi Easwaran
rishi.easwa...@aol.com wrote:

 So is there anything that can be done from
a tuning perspective, to
 recover a shard that is 75%-90% full, other that get
rid of the index and
 rebuild the data?
  Also to prevent this issue from
re-occurring, looks like we need make our
 system aggressive with segment
merges using lower merge factor


 Thanks,
 Rishi.




-Original Message-
 From: Shawn Heisey apa...@elyograg.org
 To:
solr-user solr-user@lucene.apache.org
 Sent: Mon, Apr 20, 2015 11:25 am

Subject: Re: Solr Cloud reclaiming disk space from deleted documents


 On
4/20/2015 8:44 AM, Rishi Easwaran wrote:
  Yeah I noticed that. Looks like

optimize won't work since on some disks we are already pretty full.
  Any

thoughts on increasing/decreasing mergeFactor10/mergeFactor  or

ConcurrentMergeScheduler to make solr do merges faster.

 You don't have to
do
 an optimize to need 2x disk space.  Even normal
 merging, if it happens
just
 right, can require the same disk space as a
 full optimize.  Normal
Solr
 operation requires that you have enough
 space for your index to reach
at least
 double size on occasion.

 Higher merge factors are better for
indexing speed,
 because merging
 happens less frequently.  Lower merge
factors are better for
 query
 speed, at least after the merging finishes,
because merging happens
 more
 frequently and there are fewer total segments
at any given moment.

 During a merge, there is so much I/O that query speed
is often
 negatively
 affected.

 Thanks,
 Shawn





 


Re: Solr Cloud reclaiming disk space from deleted documents

2015-05-04 Thread Shawn Heisey
On 5/4/2015 4:55 AM, Rishi Easwaran wrote:
 Sadly with the size of our complex, spiting and adding more HW is not a 
 viable long term solution. 
  I guess the options we have are to run optimize regularly and/or become 
 aggressive in our merges proactively even before solr cloud gets into this 
 situation.

If you are regularly deleting most of your index, or reindexing large
parts of it, which effectively does the same thing, then regular
optimizes may be required to keep the index size down, although you must
remember that you need enough room for the core to grow in order to
actually complete the optimize.  If the core is 75-90 percent deleted
docs, then you will not need 2x the core size to optimize it, because
the new index will be much smaller.

Currently, SolrCloud will always optimize the entire collection when you
ask for an optimize on any core, but it will NOT optimize all the
replicas (cores) at the same time.  It will go through the cores that
make up the collection and optimize each one one in sequence.  If your
index is sharded and replicated enough, hopefully that will make it
possible for the optimize to complete even though the amount of disk
space available may be low.

We have at least one issue in Jira where users have asked for optimize
to honor distrib=false, which would allow the user to be in complete
control of all optimizing, but so far that hasn't been implemented.  The
volunteers that maintain Solr can only accomplish so much in the limited
time they have available.

Thanks,
Shawn



Re: Solr Cloud reclaiming disk space from deleted documents

2015-05-04 Thread Rishi Easwaran
Thanks Shawn.. yeah regular optimize might be the route we take, if this 
becomes a recurring issue.
 I remember in our old multicore deployment CPU used to spike and the core 
almost became non responsive. 

My guess with solr cloud architecture, any slack by leader while optimizing is 
picked up by the replica.
I was searching around for optimize behaviour of solr cloud and could not find 
much information.

Does anyone have experience running optimize for solr cloud in a loaded 
production env?

Thanks,
Rishi.
 
 

 

 

-Original Message-
From: Shawn Heisey apa...@elyograg.org
To: solr-user solr-user@lucene.apache.org
Sent: Mon, May 4, 2015 9:11 am
Subject: Re: Solr Cloud reclaiming disk space from deleted documents


On 5/4/2015 4:55 AM, Rishi Easwaran wrote:
 Sadly with the size of our
complex, spiting and adding more HW is not a viable long term solution. 
  I
guess the options we have are to run optimize regularly and/or become aggressive
in our merges proactively even before solr cloud gets into this situation.

If
you are regularly deleting most of your index, or reindexing large
parts of it,
which effectively does the same thing, then regular
optimizes may be required
to keep the index size down, although you must
remember that you need enough
room for the core to grow in order to
actually complete the optimize.  If the
core is 75-90 percent deleted
docs, then you will not need 2x the core size to
optimize it, because
the new index will be much smaller.

Currently,
SolrCloud will always optimize the entire collection when you
ask for an
optimize on any core, but it will NOT optimize all the
replicas (cores) at the
same time.  It will go through the cores that
make up the collection and
optimize each one one in sequence.  If your
index is sharded and replicated
enough, hopefully that will make it
possible for the optimize to complete even
though the amount of disk
space available may be low.

We have at least one
issue in Jira where users have asked for optimize
to honor distrib=false, which
would allow the user to be in complete
control of all optimizing, but so far
that hasn't been implemented.  The
volunteers that maintain Solr can only
accomplish so much in the limited
time they have
available.

Thanks,
Shawn


 


Re: Solr Cloud reclaiming disk space from deleted documents

2015-04-27 Thread Gili Nachum
To prevent it from re occurring you could monitor index size and once above
a certain size threshold add another machine and split the shard between
existing and new machine.
On Apr 20, 2015 9:10 PM, Rishi Easwaran rishi.easwa...@aol.com wrote:

 So is there anything that can be done from a tuning perspective, to
 recover a shard that is 75%-90% full, other that get rid of the index and
 rebuild the data?
  Also to prevent this issue from re-occurring, looks like we need make our
 system aggressive with segment merges using lower merge factor


 Thanks,
 Rishi.



 -Original Message-
 From: Shawn Heisey apa...@elyograg.org
 To: solr-user solr-user@lucene.apache.org
 Sent: Mon, Apr 20, 2015 11:25 am
 Subject: Re: Solr Cloud reclaiming disk space from deleted documents


 On 4/20/2015 8:44 AM, Rishi Easwaran wrote:
  Yeah I noticed that. Looks like
 optimize won't work since on some disks we are already pretty full.
  Any
 thoughts on increasing/decreasing mergeFactor10/mergeFactor  or
 ConcurrentMergeScheduler to make solr do merges faster.

 You don't have to do
 an optimize to need 2x disk space.  Even normal
 merging, if it happens just
 right, can require the same disk space as a
 full optimize.  Normal Solr
 operation requires that you have enough
 space for your index to reach at least
 double size on occasion.

 Higher merge factors are better for indexing speed,
 because merging
 happens less frequently.  Lower merge factors are better for
 query
 speed, at least after the merging finishes, because merging happens
 more
 frequently and there are fewer total segments at any given moment.

 During a merge, there is so much I/O that query speed is often
 negatively
 affected.

 Thanks,
 Shawn






Re: Solr Cloud reclaiming disk space from deleted documents

2015-04-20 Thread Rishi Easwaran
Yeah I noticed that. Looks like optimize won't work since on some disks we are 
already pretty full.
Any thoughts on increasing/decreasing mergeFactor10/mergeFactor  or 
ConcurrentMergeScheduler to make solr do merges faster.   


 

 

 

-Original Message-
From: Gili Nachum gilinac...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Sun, Apr 19, 2015 12:34 pm
Subject: Re: Solr Cloud reclaiming disk space from deleted documents


I assume you don't have much free space available in your disk. Notice
that
during optimization (merge into a single segment) your shard replica
space
usage may peak to 2x-3x of it's normal size until optimization
completes.
Is it a problem? Not if optimization occurs over shards serially and
your
index is broken to many small shards.
On Apr 18, 2015 1:54 AM, Rishi
Easwaran rishi.easwa...@aol.com wrote:

 Thanks Shawn for the quick
reply.
 Our indexes are running on SSD, so 3 should be ok.
 Any
recommendation on bumping it up?

 I guess will have to run optimize for
entire solr cloud and see if we can
 reclaim space.

 Thanks,

Rishi.








 -Original Message-
 From: Shawn
Heisey apa...@elyograg.org
 To: solr-user solr-user@lucene.apache.org

Sent: Fri, Apr 17, 2015 6:22 pm
 Subject: Re: Solr Cloud reclaiming disk space
from deleted documents


 On 4/17/2015 2:15 PM, Rishi Easwaran wrote:
 
Running into an issue and wanted
 to see if anyone had some suggestions.
 
We are seeing this with both solr 4.6
 and 4.10.3 code.
  We are running an
extremely update heavy application, with
 millions of writes and deletes
happening to our indexes constantly.  An
 issue we
 are seeing is that solr
cloud reclaiming the disk space that can be used
 for new
 inserts, by
cleanup up deletes.
 
  We used to run optimize periodically with
 our
old multicore set up, not sure if that works for solr cloud.
 
  Num

Docs:28762340
  Max Doc:48079586
  Deleted Docs:19317246
 
 
Version
 1429299216227
  Gen 16525463
  Size 109.92 GB
 
  In our
solrconfig.xml we
 use the following configs.
 
  indexConfig
 
!-- Values here
 affect all index writers and act as a default unless
overridden. --
 
 useCompoundFilefalse/useCompoundFile
 

maxBufferedDocs1000/maxBufferedDocs
 

maxMergeDocs2147483647/maxMergeDocs
 

maxFieldLength1/maxFieldLength
 
 

mergeFactor10/mergeFactor
  mergePolicy

class=org.apache.lucene.index.TieredMergePolicy/
 
mergeScheduler
 class=org.apache.lucene.index.ConcurrentMergeScheduler

 int
 name=maxThreadCount3/int
  int

name=maxMergeCount15/int
  /mergeScheduler
 

ramBufferSizeMB64/ramBufferSizeMB
 
  /indexConfig

 This

part of my response won't help the issue you wrote about, but it
 can
affect
 performance, so I'm going to mention it.  If your indexes are

stored on regular
 spinning disks, reduce mergeScheduler/maxThreadCount
 to
1.  If they are stored
 on SSD, then a value of 3 is OK.  Spinning
 disks
cannot do seeks (read/write
 head moves) fast enough to handle
 multiple
merging threads properly.  All the
 seek activity required will
 really slow
down merging, which is a very bad thing
 when your indexing
 load is high. 
SSD disks do not have to seek, so multiple
 threads are OK
 there.

 An
optimize is the only way to reclaim all of the disk
 space held by
 deleted
documents.  Over time, as segments are merged
 automatically,
 deleted doc
space will be automatically recovered, but it won't
 be
 perfect, especially
as segments are merged multiple times into very
 large
 segments.

 If
you send an optimize command to a core/collection in SolrCloud,
 the
 entire
collection will be optimized ... the cloud will do one
 shard
 replica
(core) at a time until the entire collection has been
 optimized.
 There is
no way (currently) to ask it to only optimize a
 single core, or to do

multiple cores simultaneously, even if they are on
 different

servers.

 Thanks,
 Shawn





 


Re: Solr Cloud reclaiming disk space from deleted documents

2015-04-20 Thread Rishi Easwaran
So is there anything that can be done from a tuning perspective, to recover a 
shard that is 75%-90% full, other that get rid of the index and rebuild the 
data?
 Also to prevent this issue from re-occurring, looks like we need make our 
system aggressive with segment merges using lower merge factor  

 
Thanks,
Rishi.

 

-Original Message-
From: Shawn Heisey apa...@elyograg.org
To: solr-user solr-user@lucene.apache.org
Sent: Mon, Apr 20, 2015 11:25 am
Subject: Re: Solr Cloud reclaiming disk space from deleted documents


On 4/20/2015 8:44 AM, Rishi Easwaran wrote:
 Yeah I noticed that. Looks like
optimize won't work since on some disks we are already pretty full.
 Any
thoughts on increasing/decreasing mergeFactor10/mergeFactor  or
ConcurrentMergeScheduler to make solr do merges faster.

You don't have to do
an optimize to need 2x disk space.  Even normal
merging, if it happens just
right, can require the same disk space as a
full optimize.  Normal Solr
operation requires that you have enough
space for your index to reach at least
double size on occasion.

Higher merge factors are better for indexing speed,
because merging
happens less frequently.  Lower merge factors are better for
query
speed, at least after the merging finishes, because merging happens
more
frequently and there are fewer total segments at any given moment.

During a merge, there is so much I/O that query speed is often
negatively
affected.

Thanks,
Shawn


 


Re: Solr Cloud reclaiming disk space from deleted documents

2015-04-20 Thread Shawn Heisey
On 4/20/2015 8:44 AM, Rishi Easwaran wrote:
 Yeah I noticed that. Looks like optimize won't work since on some disks we 
 are already pretty full.
 Any thoughts on increasing/decreasing mergeFactor10/mergeFactor  or 
 ConcurrentMergeScheduler to make solr do merges faster.

You don't have to do an optimize to need 2x disk space.  Even normal
merging, if it happens just right, can require the same disk space as a
full optimize.  Normal Solr operation requires that you have enough
space for your index to reach at least double size on occasion.

Higher merge factors are better for indexing speed, because merging
happens less frequently.  Lower merge factors are better for query
speed, at least after the merging finishes, because merging happens more
frequently and there are fewer total segments at any given moment. 
During a merge, there is so much I/O that query speed is often
negatively affected.

Thanks,
Shawn



Re: Solr Cloud reclaiming disk space from deleted documents

2015-04-19 Thread Gili Nachum
I assume you don't have much free space available in your disk. Notice that
during optimization (merge into a single segment) your shard replica space
usage may peak to 2x-3x of it's normal size until optimization completes.
Is it a problem? Not if optimization occurs over shards serially and your
index is broken to many small shards.
On Apr 18, 2015 1:54 AM, Rishi Easwaran rishi.easwa...@aol.com wrote:

 Thanks Shawn for the quick reply.
 Our indexes are running on SSD, so 3 should be ok.
 Any recommendation on bumping it up?

 I guess will have to run optimize for entire solr cloud and see if we can
 reclaim space.

 Thanks,
 Rishi.








 -Original Message-
 From: Shawn Heisey apa...@elyograg.org
 To: solr-user solr-user@lucene.apache.org
 Sent: Fri, Apr 17, 2015 6:22 pm
 Subject: Re: Solr Cloud reclaiming disk space from deleted documents


 On 4/17/2015 2:15 PM, Rishi Easwaran wrote:
  Running into an issue and wanted
 to see if anyone had some suggestions.
  We are seeing this with both solr 4.6
 and 4.10.3 code.
  We are running an extremely update heavy application, with
 millions of writes and deletes happening to our indexes constantly.  An
 issue we
 are seeing is that solr cloud reclaiming the disk space that can be used
 for new
 inserts, by cleanup up deletes.
 
  We used to run optimize periodically with
 our old multicore set up, not sure if that works for solr cloud.
 
  Num
 Docs:28762340
  Max Doc:48079586
  Deleted Docs:19317246
 
  Version
 1429299216227
  Gen 16525463
  Size 109.92 GB
 
  In our solrconfig.xml we
 use the following configs.
 
  indexConfig
  !-- Values here
 affect all index writers and act as a default unless overridden. --
 
 useCompoundFilefalse/useCompoundFile
 
 maxBufferedDocs1000/maxBufferedDocs
 
 maxMergeDocs2147483647/maxMergeDocs
 
 maxFieldLength1/maxFieldLength
 
 
 mergeFactor10/mergeFactor
  mergePolicy
 class=org.apache.lucene.index.TieredMergePolicy/
  mergeScheduler
 class=org.apache.lucene.index.ConcurrentMergeScheduler
  int
 name=maxThreadCount3/int
  int
 name=maxMergeCount15/int
  /mergeScheduler
 
 ramBufferSizeMB64/ramBufferSizeMB
 
  /indexConfig

 This
 part of my response won't help the issue you wrote about, but it
 can affect
 performance, so I'm going to mention it.  If your indexes are
 stored on regular
 spinning disks, reduce mergeScheduler/maxThreadCount
 to 1.  If they are stored
 on SSD, then a value of 3 is OK.  Spinning
 disks cannot do seeks (read/write
 head moves) fast enough to handle
 multiple merging threads properly.  All the
 seek activity required will
 really slow down merging, which is a very bad thing
 when your indexing
 load is high.  SSD disks do not have to seek, so multiple
 threads are OK
 there.

 An optimize is the only way to reclaim all of the disk
 space held by
 deleted documents.  Over time, as segments are merged
 automatically,
 deleted doc space will be automatically recovered, but it won't
 be
 perfect, especially as segments are merged multiple times into very
 large
 segments.

 If you send an optimize command to a core/collection in SolrCloud,
 the
 entire collection will be optimized ... the cloud will do one
 shard
 replica (core) at a time until the entire collection has been
 optimized.
 There is no way (currently) to ask it to only optimize a
 single core, or to do
 multiple cores simultaneously, even if they are on
 different
 servers.

 Thanks,
 Shawn






Solr Cloud reclaiming disk space from deleted documents

2015-04-17 Thread Rishi Easwaran
Hi All,

Running into an issue and wanted to see if anyone had some suggestions.
We are seeing this with both solr 4.6 and 4.10.3 code.
We are running an extremely update heavy application, with millions of writes 
and deletes happening to our indexes constantly.  An issue we are seeing is 
that solr cloud reclaiming the disk space that can be used for new inserts, by 
cleanup up deletes. 

We used to run optimize periodically with our old multicore set up, not sure if 
that works for solr cloud.

Num Docs:28762340
Max Doc:48079586
Deleted Docs:19317246

Version 1429299216227
Gen 16525463
Size 109.92 GB

In our solrconfig.xml we use the following configs.

indexConfig
!-- Values here affect all index writers and act as a default unless 
overridden. --
useCompoundFilefalse/useCompoundFile
maxBufferedDocs1000/maxBufferedDocs
maxMergeDocs2147483647/maxMergeDocs
maxFieldLength1/maxFieldLength

mergeFactor10/mergeFactor
mergePolicy class=org.apache.lucene.index.TieredMergePolicy/
mergeScheduler 
class=org.apache.lucene.index.ConcurrentMergeScheduler
int name=maxThreadCount3/int
int name=maxMergeCount15/int
/mergeScheduler
ramBufferSizeMB64/ramBufferSizeMB

/indexConfig


Any suggestions on which which tunable to adjust, mergeFactor, mergeScheduler 
thread counts etc would be great.

Thanks,
Rishi.
 


Re: Solr Cloud reclaiming disk space from deleted documents

2015-04-17 Thread Shawn Heisey
On 4/17/2015 2:15 PM, Rishi Easwaran wrote:
 Running into an issue and wanted to see if anyone had some suggestions.
 We are seeing this with both solr 4.6 and 4.10.3 code.
 We are running an extremely update heavy application, with millions of writes 
 and deletes happening to our indexes constantly.  An issue we are seeing is 
 that solr cloud reclaiming the disk space that can be used for new inserts, 
 by cleanup up deletes. 

 We used to run optimize periodically with our old multicore set up, not sure 
 if that works for solr cloud.

 Num Docs:28762340
 Max Doc:48079586
 Deleted Docs:19317246

 Version 1429299216227
 Gen 16525463
 Size 109.92 GB

 In our solrconfig.xml we use the following configs.

 indexConfig
 !-- Values here affect all index writers and act as a default unless 
 overridden. --
 useCompoundFilefalse/useCompoundFile
 maxBufferedDocs1000/maxBufferedDocs
 maxMergeDocs2147483647/maxMergeDocs
 maxFieldLength1/maxFieldLength

 mergeFactor10/mergeFactor
 mergePolicy class=org.apache.lucene.index.TieredMergePolicy/
 mergeScheduler 
 class=org.apache.lucene.index.ConcurrentMergeScheduler
 int name=maxThreadCount3/int
 int name=maxMergeCount15/int
 /mergeScheduler
 ramBufferSizeMB64/ramBufferSizeMB
 
 /indexConfig

This part of my response won't help the issue you wrote about, but it
can affect performance, so I'm going to mention it.  If your indexes are
stored on regular spinning disks, reduce mergeScheduler/maxThreadCount
to 1.  If they are stored on SSD, then a value of 3 is OK.  Spinning
disks cannot do seeks (read/write head moves) fast enough to handle
multiple merging threads properly.  All the seek activity required will
really slow down merging, which is a very bad thing when your indexing
load is high.  SSD disks do not have to seek, so multiple threads are OK
there.

An optimize is the only way to reclaim all of the disk space held by
deleted documents.  Over time, as segments are merged automatically,
deleted doc space will be automatically recovered, but it won't be
perfect, especially as segments are merged multiple times into very
large segments.

If you send an optimize command to a core/collection in SolrCloud, the
entire collection will be optimized ... the cloud will do one shard
replica (core) at a time until the entire collection has been
optimized.  There is no way (currently) to ask it to only optimize a
single core, or to do multiple cores simultaneously, even if they are on
different servers.

Thanks,
Shawn



Re: Solr Cloud reclaiming disk space from deleted documents

2015-04-17 Thread Rishi Easwaran
Thanks Shawn for the quick reply.
Our indexes are running on SSD, so 3 should be ok.
Any recommendation on bumping it up?

I guess will have to run optimize for entire solr cloud and see if we can 
reclaim space.

Thanks,
Rishi. 
 

 

 

 

-Original Message-
From: Shawn Heisey apa...@elyograg.org
To: solr-user solr-user@lucene.apache.org
Sent: Fri, Apr 17, 2015 6:22 pm
Subject: Re: Solr Cloud reclaiming disk space from deleted documents


On 4/17/2015 2:15 PM, Rishi Easwaran wrote:
 Running into an issue and wanted
to see if anyone had some suggestions.
 We are seeing this with both solr 4.6
and 4.10.3 code.
 We are running an extremely update heavy application, with
millions of writes and deletes happening to our indexes constantly.  An issue we
are seeing is that solr cloud reclaiming the disk space that can be used for new
inserts, by cleanup up deletes. 

 We used to run optimize periodically with
our old multicore set up, not sure if that works for solr cloud.

 Num
Docs:28762340
 Max Doc:48079586
 Deleted Docs:19317246

 Version
1429299216227
 Gen 16525463
 Size 109.92 GB

 In our solrconfig.xml we
use the following configs.

 indexConfig
 !-- Values here
affect all index writers and act as a default unless overridden. --

useCompoundFilefalse/useCompoundFile

maxBufferedDocs1000/maxBufferedDocs

maxMergeDocs2147483647/maxMergeDocs

maxFieldLength1/maxFieldLength


mergeFactor10/mergeFactor
 mergePolicy
class=org.apache.lucene.index.TieredMergePolicy/
 mergeScheduler
class=org.apache.lucene.index.ConcurrentMergeScheduler
 int
name=maxThreadCount3/int
 int
name=maxMergeCount15/int
 /mergeScheduler

ramBufferSizeMB64/ramBufferSizeMB
 
 /indexConfig

This
part of my response won't help the issue you wrote about, but it
can affect
performance, so I'm going to mention it.  If your indexes are
stored on regular
spinning disks, reduce mergeScheduler/maxThreadCount
to 1.  If they are stored
on SSD, then a value of 3 is OK.  Spinning
disks cannot do seeks (read/write
head moves) fast enough to handle
multiple merging threads properly.  All the
seek activity required will
really slow down merging, which is a very bad thing
when your indexing
load is high.  SSD disks do not have to seek, so multiple
threads are OK
there.

An optimize is the only way to reclaim all of the disk
space held by
deleted documents.  Over time, as segments are merged
automatically,
deleted doc space will be automatically recovered, but it won't
be
perfect, especially as segments are merged multiple times into very
large
segments.

If you send an optimize command to a core/collection in SolrCloud,
the
entire collection will be optimized ... the cloud will do one
shard
replica (core) at a time until the entire collection has been
optimized.
There is no way (currently) to ask it to only optimize a
single core, or to do
multiple cores simultaneously, even if they are on
different
servers.

Thanks,
Shawn