bq: We are indexing with autocommit at 30 minutes OK, check the size of your tlogs. What this means is that all the updates accumulate for 30 minutes in a single tlog. That tlog will be closed when autocommit happens and a new one opened for the next 30 minutes. The first tlog won't be purged until the second one is closed. All this is detailed in the link I provided.
If the tlogs are significant in size this may be the entire problem. Best, Erick On Mon, Dec 12, 2016 at 12:46 PM, Susheel Kumar <susheel2...@gmail.com> wrote: > One option: > > First you may purge all documents before full-reindex that you don't need > to run optimize unless you need the data to serve queries same time. > > i think you are running into out of space because your 43 million may be > consuming 30% of total disk space and when you re-index the total disk > space usage goes to 60%. Now if you run optimize, it may require double > another 60% disk space making to 120% which causes out of disk space. > > The other option is to increase disk space if you want to run optimize at > the end. > > > On Mon, Dec 12, 2016 at 3:36 PM, Michael Joyner <mich...@newsrx.com> wrote: > >> We are having an issue with running out of space when trying to do a full >> re-index. >> >> We are indexing with autocommit at 30 minutes. >> >> We have it set to only optimize at the end of an indexing cycle. >> >> >> >> On 12/12/2016 02:43 PM, Erick Erickson wrote: >> >>> First off, optimize is actually rarely necessary. I wouldn't bother >>> unless you have measurements to prove that it's desirable. >>> >>> I would _certainly_ not call optimize every 10M docs. If you must call >>> it at all call it exactly once when indexing is complete. But see >>> above. >>> >>> As far as the commit, I'd just set the autocommit settings in >>> solrconfig.xml to something "reasonable" and forget it. I usually use >>> time rather than doc count as it's a little more predictable. I often >>> use 60 seconds, but it can be longer. The longer it is, the bigger >>> your tlog will grow and if Solr shuts down forcefully the longer >>> replaying may take. Here's the whole writeup on this topic: >>> >>> https://lucidworks.com/blog/2013/08/23/understanding-transac >>> tion-logs-softcommit-and-commit-in-sorlcloud/ >>> >>> Running out of space during indexing with about 30% utilization is >>> very odd. My guess is that you're trying to take too much control. >>> Having multiple optimizations going on at once would be a very good >>> way to run out of disk space. >>> >>> And I'm assuming one replica's index per disk or you're reporting >>> aggregate index size per disk when you sah 30%. Having three replicas >>> on the same disk each consuming 30% is A Bad Thing. >>> >>> Best, >>> Erick >>> >>> On Mon, Dec 12, 2016 at 8:36 AM, Michael Joyner <mich...@newsrx.com> >>> wrote: >>> >>>> Halp! >>>> >>>> I need to reindex over 43 millions documents, when optimized the >>>> collection >>>> is currently < 30% of disk space, we tried it over this weekend and it >>>> ran >>>> out of space during the reindexing. >>>> >>>> I'm thinking for the best solution for what we are trying to do is to >>>> call >>>> commit/optimize every 10,000,000 documents or so and then wait for the >>>> optimize to complete. >>>> >>>> How to check optimized status via solrj for a particular collection? >>>> >>>> Also, is there is a way to check free space per shard by collection? >>>> >>>> -Mike >>>> >>>> >>