Re: Solr server partial update is very slow

Sujay Bawaskar Sun, 12 Nov 2017 21:29:56 -0800

HI Shawn,

At time of indexing with partial updates CPU utilization is max 12%. Solr
JVM heap size is minimum 40GB because we are using data-import handler
with SortedMapBackedCache which uses java heap at time of full import.
Memory utilization is also decent when partial updates are running. Only
thing is when partial update is running at 700 updates per minutes the
QTime reaches 5 seconds. Is it the case that direct partial updates from
200 clients causing index merging to be slower? Here we open 40*200  (At
least 40 partial updates from each of process) HTTP solr connection with
solj for partial updates.


Thanks,
Sujay

On Mon, Nov 13, 2017 at 1:56 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 11/11/2017 8:17 AM, Sujay Bawaskar wrote:
>
> Thanks Shawn. Its good to know that OpenSearcher is not causing any issue.
>>
>> We are good with 15 minutes of softCommit interval . We are using stand
>> alone solr instance and not solr cloud. There are 100 cores on this machine
>> but index ingestion was going on for single core. Total size of index is
>> 100GB out of which this one with 10GB data is largest one.
>> Standalone solr machine is hosted on dedicated instances with 4 CPU cores
>> and 120 GB Memory. Solr JVM is configured with xms=40G and xmx=80G. In this
>> case partial update is being performed by 200 solr clients simultaneously.
>>
>
> Looks like I managed to send my previous reply direct instead of to the
> list.  I'm sending this one to the list.
>
> Why is your heap 80GB?  That's *huge*.  With 80GB of the 120GB total used
> by one Java process, you've got about 40GB left to cache the index --
> assuming that this one Solr instance is the only significant program
> running on the server.  40GB to cache a 100GB index might be enough for
> good performance, or it might not be enough.  There are no easy formulas
> for figuring that out.A heap that size is also likely to experience some
> occasional stop-the-world GC pauses that could take a VERY long time.
>
> My dev Solr server (6.6.2-SNAPSHOT) has all of the indexes on it that use
> several servers in production.  That's over 700GB of index data.  This
> server runs with a 28GB heap, and the only reason it's *that* high is
> because I had to increase the heap in order to successfully run some
> data-mining grouping and facet queries.  Normally it works just fine with
> about a 13GB heap.
>
> 200 simultaneous indexing requests seems excessive to me, especially when
> the Solr server only has 4 CPUs.  Indexing several requests at the same
> time is the best way to achieve fast indexing, but if you have too many,
> it's could actually get *worse* than indexing with only one thread/process.
>
> Thanks,
> Shawn
>
> --------------------
> For completeness, below is the full text of the thread where I replied
> before:
>
> On Fri, Nov 10, 2017 at 8:59 PM, Shawn Heisey <apa...@elyograg.org
>> <mailto:apa...@elyograg.org>>wrote:
>>
>>     On 11/9/2017 10:25 PM, Sujay Bawaskar wrote:
>>     > We are getting below log without invoking commit operation after
>>     every
>>     > partial update call. We have configured soft commit and commit
>>     time as
>>     > below. With below configuration we are able to perform 800
>>     partial updates
>>     > per minutes which I think is very slow. Our Index size is 10GB
>>     for this
>>     > particular core.
>>     > Is there any configuration we are missing here?
>>     >
>>     > Log:
>>     > 2017-11-10 05:13:33.730 INFO (qtp225493257-38746) [   x:collection]
>>     > o.a.s.s.SolrIndexSearcher Opening [Searcher@7010b1c6[collection]
>>     realtime]
>>
>>     This is a *realtime* searcher, for the realtime get handler.
>>     These will
>>     be recreated frequently as you index.  Opening realtime searchers
>>     should
>>     be extremely fast and not really affect the system much, and this
>>     happens without any configuration or user action.
>>
>>     The realtime get handler, which is typically accessed as /get, can
>>     retrieve documents that haven't been made accessible to the normal
>>     index
>>     searcher.  If this feature were likely to cause performance
>>     problems, it
>>     would not be turned on by default.
>>
>>     https://lucene.apache.org/solr/guide/6_6/realtime-get.html
>>     <https://lucene.apache.org/solr/guide/6_6/realtime-get.html>
>>
>>     Are you seeing any other frequent logs about opening searchers that
>>     aren't realtime?
>>
>>     > Commit configuration:
>>     > solr.autoCommit.maxTime:1800000
>>     > solr.autoSoftCommit.maxTime:900000
>>
>>     I applaud your restraint here.  We frequently see users that want
>>     these
>>     things to happen on intervals measured in seconds, not minutes --
>>     often
>>     as low as *one* second.  That said, I think I would actually decrease
>>     the autoCommit time to 60000, and make sure openSearcher is false.  I
>>     would probably decrease the autoSoftCommit time to 120000 or 300000.
>>
>>     Why would I recommend much shorter intervals than you have
>>     configured?
>>     For autoCommit, it comes down to the mantra on the blog post that
>>     Erick
>>     gave you:  "Hard commits are about durability."  Half an hour between
>>     hard commits doesn't address durability concerns very well, and a hard
>>     commit that does NOT open a new searcher is very quick.  For
>>     autoSoftCommit, my recommendation is just because fifteen minutes is a
>>     VERY long interval for that, and you really don't need to wait that
>>     long.  Unless your settings are pathological and cause commits to take
>>     an unreasonable amount of time, doing them once every two minutes
>>     won't
>>     cause problems.
>>
>>     Those recommendations aren't set in stone.  If you have hard evidence
>>     that you need different values, feel free, but I do think the
>>     intervals
>>     should be drastically reduced.
>>
>>     As for why your indexing is slow ... it is very unlikely to be related
>>     to the log message you quoted, or your automatic commit settings.
>>     With
>>     the information provided, I can't give you ANY recommendations --
>>     a lot
>>     more information will be required.
>>
>>     The entire solrconfig.xml would be useful.  You'll need to use a paste
>>     website or a file sharing site, attachments are typically stripped by
>>     the mailing list.  And here's some information that cannot be obtained
>>     from solrconfig.xml that will be helpful:  Are you running in cloud
>>     mode?  If so, are your indexes sharded?  How much total memory is
>>     in the
>>     machine?  Is there one Solr instance on the machine, or multiple?  Is
>>     there other significant software on the machine, like a webserver or a
>>     database server?  How much heap space does each Solr instance have?
>>     What is the total amount of data being handled by all Solr
>>     instances on
>>     the machine?  I'm looking for both a document count and disk
>>     space.  You
>>     mentioned that your index is 10GB, but that doesn't say whether that's
>>     the only index on the machine.
>>
>>
>


-- 
Thanks,
Sujay P Bawaskar
M:+91-77091 53669

Re: Solr server partial update is very slow

Reply via email to