One option: First you may purge all documents before full-reindex that you don't need to run optimize unless you need the data to serve queries same time.
i think you are running into out of space because your 43 million may be consuming 30% of total disk space and when you re-index the total disk space usage goes to 60%. Now if you run optimize, it may require double another 60% disk space making to 120% which causes out of disk space. The other option is to increase disk space if you want to run optimize at the end. On Mon, Dec 12, 2016 at 3:36 PM, Michael Joyner <mich...@newsrx.com> wrote: > We are having an issue with running out of space when trying to do a full > re-index. > > We are indexing with autocommit at 30 minutes. > > We have it set to only optimize at the end of an indexing cycle. > > > > On 12/12/2016 02:43 PM, Erick Erickson wrote: > >> First off, optimize is actually rarely necessary. I wouldn't bother >> unless you have measurements to prove that it's desirable. >> >> I would _certainly_ not call optimize every 10M docs. If you must call >> it at all call it exactly once when indexing is complete. But see >> above. >> >> As far as the commit, I'd just set the autocommit settings in >> solrconfig.xml to something "reasonable" and forget it. I usually use >> time rather than doc count as it's a little more predictable. I often >> use 60 seconds, but it can be longer. The longer it is, the bigger >> your tlog will grow and if Solr shuts down forcefully the longer >> replaying may take. Here's the whole writeup on this topic: >> >> https://lucidworks.com/blog/2013/08/23/understanding-transac >> tion-logs-softcommit-and-commit-in-sorlcloud/ >> >> Running out of space during indexing with about 30% utilization is >> very odd. My guess is that you're trying to take too much control. >> Having multiple optimizations going on at once would be a very good >> way to run out of disk space. >> >> And I'm assuming one replica's index per disk or you're reporting >> aggregate index size per disk when you sah 30%. Having three replicas >> on the same disk each consuming 30% is A Bad Thing. >> >> Best, >> Erick >> >> On Mon, Dec 12, 2016 at 8:36 AM, Michael Joyner <mich...@newsrx.com> >> wrote: >> >>> Halp! >>> >>> I need to reindex over 43 millions documents, when optimized the >>> collection >>> is currently < 30% of disk space, we tried it over this weekend and it >>> ran >>> out of space during the reindexing. >>> >>> I'm thinking for the best solution for what we are trying to do is to >>> call >>> commit/optimize every 10,000,000 documents or so and then wait for the >>> optimize to complete. >>> >>> How to check optimized status via solrj for a particular collection? >>> >>> Also, is there is a way to check free space per shard by collection? >>> >>> -Mike >>> >>> >