On 11/11/2017 8:17 AM, Sujay Bawaskar wrote:

Thanks Shawn. Its good to know that OpenSearcher is not causing any issue.

We are good with 15 minutes of softCommit interval . We are using stand alone solr instance and not solr cloud. There are 100 cores on this machine but index ingestion was going on for single core. Total size of index is 100GB out of which this one with 10GB data is largest one. Standalone solr machine is hosted on dedicated instances with 4 CPU cores and 120 GB Memory. Solr JVM is configured with xms=40G and xmx=80G. In this case partial update is being performed by 200 solr clients simultaneously.

Looks like I managed to send my previous reply direct instead of to the list.  I'm sending this one to the list.

Why is your heap 80GB?  That's *huge*.  With 80GB of the 120GB total used by one Java process, you've got about 40GB left to cache the index -- assuming that this one Solr instance is the only significant program running on the server.  40GB to cache a 100GB index might be enough for good performance, or it might not be enough.  There are no easy formulas for figuring that out.A heap that size is also likely to experience some occasional stop-the-world GC pauses that could take a VERY long time.

My dev Solr server (6.6.2-SNAPSHOT) has all of the indexes on it that use several servers in production.  That's over 700GB of index data.  This server runs with a 28GB heap, and the only reason it's *that* high is because I had to increase the heap in order to successfully run some data-mining grouping and facet queries.  Normally it works just fine with about a 13GB heap.

200 simultaneous indexing requests seems excessive to me, especially when the Solr server only has 4 CPUs.  Indexing several requests at the same time is the best way to achieve fast indexing, but if you have too many, it's could actually get *worse* than indexing with only one thread/process.

Thanks,
Shawn

--------------------
For completeness, below is the full text of the thread where I replied before:

On Fri, Nov 10, 2017 at 8:59 PM, Shawn Heisey <apa...@elyograg.org <mailto:apa...@elyograg.org>>wrote:

    On 11/9/2017 10:25 PM, Sujay Bawaskar wrote:
    > We are getting below log without invoking commit operation after
    every
    > partial update call. We have configured soft commit and commit
    time as
    > below. With below configuration we are able to perform 800
    partial updates
    > per minutes which I think is very slow. Our Index size is 10GB
    for this
    > particular core.
    > Is there any configuration we are missing here?
    >
    > Log:
    > 2017-11-10 05:13:33.730 INFO (qtp225493257-38746) [   x:collection]
    > o.a.s.s.SolrIndexSearcher Opening [Searcher@7010b1c6[collection]
    realtime]

    This is a *realtime* searcher, for the realtime get handler. 
    These will
    be recreated frequently as you index.  Opening realtime searchers
    should
    be extremely fast and not really affect the system much, and this
    happens without any configuration or user action.

    The realtime get handler, which is typically accessed as /get, can
    retrieve documents that haven't been made accessible to the normal
    index
    searcher.  If this feature were likely to cause performance
    problems, it
    would not be turned on by default.

    https://lucene.apache.org/solr/guide/6_6/realtime-get.html
    <https://lucene.apache.org/solr/guide/6_6/realtime-get.html>

    Are you seeing any other frequent logs about opening searchers that
    aren't realtime?

    > Commit configuration:
    > solr.autoCommit.maxTime:1800000
    > solr.autoSoftCommit.maxTime:900000

    I applaud your restraint here.  We frequently see users that want
    these
    things to happen on intervals measured in seconds, not minutes --
    often
    as low as *one* second.  That said, I think I would actually decrease
    the autoCommit time to 60000, and make sure openSearcher is false.  I
    would probably decrease the autoSoftCommit time to 120000 or 300000.

    Why would I recommend much shorter intervals than you have
    configured?
    For autoCommit, it comes down to the mantra on the blog post that
    Erick
    gave you:  "Hard commits are about durability."  Half an hour between
    hard commits doesn't address durability concerns very well, and a hard
    commit that does NOT open a new searcher is very quick.  For
    autoSoftCommit, my recommendation is just because fifteen minutes is a
    VERY long interval for that, and you really don't need to wait that
    long.  Unless your settings are pathological and cause commits to take
    an unreasonable amount of time, doing them once every two minutes
    won't
    cause problems.

    Those recommendations aren't set in stone.  If you have hard evidence
    that you need different values, feel free, but I do think the
    intervals
    should be drastically reduced.

    As for why your indexing is slow ... it is very unlikely to be related
    to the log message you quoted, or your automatic commit settings. 
    With
    the information provided, I can't give you ANY recommendations --
    a lot
    more information will be required.

    The entire solrconfig.xml would be useful.  You'll need to use a paste
    website or a file sharing site, attachments are typically stripped by
    the mailing list.  And here's some information that cannot be obtained
    from solrconfig.xml that will be helpful:  Are you running in cloud
    mode?  If so, are your indexes sharded?  How much total memory is
    in the
    machine?  Is there one Solr instance on the machine, or multiple?  Is
    there other significant software on the machine, like a webserver or a
    database server?  How much heap space does each Solr instance have?
    What is the total amount of data being handled by all Solr
    instances on
    the machine?  I'm looking for both a document count and disk
    space.  You
    mentioned that your index is 10GB, but that doesn't say whether that's
    the only index on the machine.


Reply via email to