On 11/11/2017 8:17 AM, Sujay Bawaskar wrote:
Thanks Shawn. Its good to know that OpenSearcher is not causing any
issue.
We are good with 15 minutes of softCommit interval . We are using
stand alone solr instance and not solr cloud. There are 100 cores on
this machine but index ingestion was going on for single core. Total
size of index is 100GB out of which this one with 10GB data is largest
one.
Standalone solr machine is hosted on dedicated instances with 4 CPU
cores and 120 GB Memory. Solr JVM is configured with xms=40G and
xmx=80G. In this case partial update is being performed by 200 solr
clients simultaneously.
Looks like I managed to send my previous reply direct instead of to the
list. I'm sending this one to the list.
Why is your heap 80GB? That's *huge*. With 80GB of the 120GB total
used by one Java process, you've got about 40GB left to cache the index
-- assuming that this one Solr instance is the only significant program
running on the server. 40GB to cache a 100GB index might be enough for
good performance, or it might not be enough. There are no easy formulas
for figuring that out.A heap that size is also likely to experience some
occasional stop-the-world GC pauses that could take a VERY long time.
My dev Solr server (6.6.2-SNAPSHOT) has all of the indexes on it that
use several servers in production. That's over 700GB of index data.
This server runs with a 28GB heap, and the only reason it's *that* high
is because I had to increase the heap in order to successfully run some
data-mining grouping and facet queries. Normally it works just fine
with about a 13GB heap.
200 simultaneous indexing requests seems excessive to me, especially
when the Solr server only has 4 CPUs. Indexing several requests at the
same time is the best way to achieve fast indexing, but if you have too
many, it's could actually get *worse* than indexing with only one
thread/process.
Thanks,
Shawn
--------------------
For completeness, below is the full text of the thread where I replied
before:
On Fri, Nov 10, 2017 at 8:59 PM, Shawn Heisey <apa...@elyograg.org
<mailto:apa...@elyograg.org>>wrote:
On 11/9/2017 10:25 PM, Sujay Bawaskar wrote:
> We are getting below log without invoking commit operation after
every
> partial update call. We have configured soft commit and commit
time as
> below. With below configuration we are able to perform 800
partial updates
> per minutes which I think is very slow. Our Index size is 10GB
for this
> particular core.
> Is there any configuration we are missing here?
>
> Log:
> 2017-11-10 05:13:33.730 INFO (qtp225493257-38746) [ x:collection]
> o.a.s.s.SolrIndexSearcher Opening [Searcher@7010b1c6[collection]
realtime]
This is a *realtime* searcher, for the realtime get handler.
These will
be recreated frequently as you index. Opening realtime searchers
should
be extremely fast and not really affect the system much, and this
happens without any configuration or user action.
The realtime get handler, which is typically accessed as /get, can
retrieve documents that haven't been made accessible to the normal
index
searcher. If this feature were likely to cause performance
problems, it
would not be turned on by default.
https://lucene.apache.org/solr/guide/6_6/realtime-get.html
<https://lucene.apache.org/solr/guide/6_6/realtime-get.html>
Are you seeing any other frequent logs about opening searchers that
aren't realtime?
> Commit configuration:
> solr.autoCommit.maxTime:1800000
> solr.autoSoftCommit.maxTime:900000
I applaud your restraint here. We frequently see users that want
these
things to happen on intervals measured in seconds, not minutes --
often
as low as *one* second. That said, I think I would actually decrease
the autoCommit time to 60000, and make sure openSearcher is false. I
would probably decrease the autoSoftCommit time to 120000 or 300000.
Why would I recommend much shorter intervals than you have
configured?
For autoCommit, it comes down to the mantra on the blog post that
Erick
gave you: "Hard commits are about durability." Half an hour between
hard commits doesn't address durability concerns very well, and a hard
commit that does NOT open a new searcher is very quick. For
autoSoftCommit, my recommendation is just because fifteen minutes is a
VERY long interval for that, and you really don't need to wait that
long. Unless your settings are pathological and cause commits to take
an unreasonable amount of time, doing them once every two minutes
won't
cause problems.
Those recommendations aren't set in stone. If you have hard evidence
that you need different values, feel free, but I do think the
intervals
should be drastically reduced.
As for why your indexing is slow ... it is very unlikely to be related
to the log message you quoted, or your automatic commit settings.
With
the information provided, I can't give you ANY recommendations --
a lot
more information will be required.
The entire solrconfig.xml would be useful. You'll need to use a paste
website or a file sharing site, attachments are typically stripped by
the mailing list. And here's some information that cannot be obtained
from solrconfig.xml that will be helpful: Are you running in cloud
mode? If so, are your indexes sharded? How much total memory is
in the
machine? Is there one Solr instance on the machine, or multiple? Is
there other significant software on the machine, like a webserver or a
database server? How much heap space does each Solr instance have?
What is the total amount of data being handled by all Solr
instances on
the machine? I'm looking for both a document count and disk
space. You
mentioned that your index is 10GB, but that doesn't say whether that's
the only index on the machine.