Re: [Q] Faster Atomic Updates - use docValues?

Erick Erickson Tue, 03 Dec 2019 05:21:23 -0800

Do you have empirical evidence that all these parameter changes are doing you 
any good?


The first thing I note is that 8G for a 250M document index is a red flag. If 
you’re running on
a larger machine, I’d increase that to 16G as a test. I’ve seen GC start to 
take up more and
more CPU as you get closer to the max, sometimes to the point of having a 90% or
more of the CPU consumed by GC.

The second thing is you have no searchers being opened. Solr has to keep 
certain in-memory
structures in place to support Real Time Get that only gets reclaimed when a new
searcher is opened. Perhaps that’s chewing up memory and getting to a tipping 
point.

Why did you increase RamBufferSizeMB?  I’ve rarely found much increase in 
throughput
over the default 100M. It’s probably not very useful anyway since, unless your 
autocommit
limits mean that unless you’re using that full 2G for 100,000 docs or within 2 
minutes, it
won’t be used up anyway.

The third thing is that you have changed the TieredMergePolicy extensively. When
background merges kick in, they’ll be HUGE. Further, the settings will probably
cause you to have a lot of segments, which is not ideal.

Fourth why do you think the lookup of the <uniqueKey> has anything to do with
your slowdown? If I’m reading this right, you do atomic updates on 50M docs
_then_ things get slow. If it was a <uniqueKey> lookup I should think it’d
be a problem for the first 50M docs.

So here’s what I’d do:
1> go back to the defaults for TieredMergePolicy and RamBufferSizeMB
2> measure first, tweak later. Analyze your GC logs to see whether
     you’re taking an inordinate amount of time doing GC coincident with
     your slowness. If so, adjust your heap.
3> If it’s not GC, put a profiler on it and find out where, exactly, you’re
     spending your time.

Best,
Erick


> We occasionally reindex whole data to our Auto-Suggest corpus. Total
> documents to be indexed are around 250 million while, due to atomic
> updates, total unique documents after full indexing converges to 60
> million.
> 
> We have to atomically index documents to store different names for the same
> product (like "bag" and "bags"), to increase demand and to store the months
> they were searched for in the past. One approach could be to calculate all
> this beforehand and then index normally to Solr (non-atomic).
> 
> Once the atomic updates process over 50 million documents, the speed of
> indexing drops down to more than 10x of initial speed.
> 
> As what I have learnt, atomic updates fetch the matching document by
> uniqueKey and then does the normal index using the information in the
> fetched document. Is this actually taking time? As the number of documents
> increases, Solr might be taking time to fetch the stored document.
> 
> But shouldn't the fetch by uniqueKey take O(1) time? If this really impacts
> the fetch, can we use docValues for the field id (uniqueKey)? Our field is
> of type string.
> 
> 
> 
> I'm pasting my config lines that may impact this:
> 
> ----------------------------------------------------------------------------------
> 
> -Xmx8g -Xms8g
> 
> <field name="id" type="string" indexed="true" stored="true" required="true"
> omitNorms="false" multiValued="false" />
> <uniqueKey>id</uniqueKey>
> 
> <ramBufferSizeMB>2000</ramBufferSizeMB>
> 
> <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
>         <int name="maxMergeAtOnce">50</int>
>         <int name="segmentsPerTier">50</int>
> <int name="maxMergeAtOnce">150</int>
> </mergePolicyFactory>
> 
> <autoCommit>
>        <maxDocs>100000</maxDocs>
>        <maxTime>120000</maxTime>
>        <openSearcher>false</openSearcher>
> </autoCommit>
> 
> ----------------------------------------------------------------------------------
> 
> 
> 
> A normal indexing that should take less than 1 day actually takes over 5
> days with atomic updates. Any experience or suggestion will help. How do
> expedite your indexing process specifically atomic updates? I know this
> might have been asked so many times and I have actually read/implemented
> all of the recommendations. My question is specific to Atomic Updates and
> if something exclusive to Atomic Updates can make it faster.
> 
> 
> -- 
> -- 
> Regards,
> 
> *Paras Lehana* [65871]
> Development Engineer, Auto-Suggest,
> IndiaMART Intermesh Ltd.
> 
> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> Noida, UP, IN - 201303
> 
> Mob.: +91-9560911996
> Work: 01203916600 | Extn:  *8173*
> 
> -- 
> *
> *
> 
> <https://www.facebook.com/IndiaMART/videos/578196442936091/>

Re: [Q] Faster Atomic Updates - use docValues?

Reply via email to