One option:

First you may purge all documents before full-reindex that you don't need
to run optimize unless you need the data to serve queries same time.

i think you are running into out of space because your 43 million may be
consuming 30% of total disk space and when you re-index the total disk
space usage goes to 60%.  Now if you run optimize, it may require double
another 60% disk space making to 120% which causes out of disk space.

The other option is to increase disk space if you want to run optimize at
the end.


On Mon, Dec 12, 2016 at 3:36 PM, Michael Joyner <mich...@newsrx.com> wrote:

> We are having an issue with running out of space when trying to do a full
> re-index.
>
> We are indexing with autocommit at 30 minutes.
>
> We have it set to only optimize at the end of an indexing cycle.
>
>
>
> On 12/12/2016 02:43 PM, Erick Erickson wrote:
>
>> First off, optimize is actually rarely necessary. I wouldn't bother
>> unless you have measurements to prove that it's desirable.
>>
>> I would _certainly_ not call optimize every 10M docs. If you must call
>> it at all call it exactly once when indexing is complete. But see
>> above.
>>
>> As far as the commit, I'd just set the autocommit settings in
>> solrconfig.xml to something "reasonable" and forget it. I usually use
>> time rather than doc count as it's a little more predictable. I often
>> use 60 seconds, but it can be longer. The longer it is, the bigger
>> your tlog will grow and if Solr shuts down forcefully the longer
>> replaying may take. Here's the whole writeup on this topic:
>>
>> https://lucidworks.com/blog/2013/08/23/understanding-transac
>> tion-logs-softcommit-and-commit-in-sorlcloud/
>>
>> Running out of space during indexing with about 30% utilization is
>> very odd. My guess is that you're trying to take too much control.
>> Having multiple optimizations going on at once would be a very good
>> way to run out of disk space.
>>
>> And I'm assuming one replica's index per disk or you're reporting
>> aggregate index size per disk when you sah 30%. Having three replicas
>> on the same disk each consuming 30% is A Bad Thing.
>>
>> Best,
>> Erick
>>
>> On Mon, Dec 12, 2016 at 8:36 AM, Michael Joyner <mich...@newsrx.com>
>> wrote:
>>
>>> Halp!
>>>
>>> I need to reindex over 43 millions documents, when optimized the
>>> collection
>>> is currently < 30% of disk space, we tried it over this weekend and it
>>> ran
>>> out of space during the reindexing.
>>>
>>> I'm thinking for the best solution for what we are trying to do is to
>>> call
>>> commit/optimize every 10,000,000 documents or so and then wait for the
>>> optimize to complete.
>>>
>>> How to check optimized status via solrj for a particular collection?
>>>
>>> Also, is there is a way to check free space per shard by collection?
>>>
>>> -Mike
>>>
>>>
>

Reply via email to