On 9/22/2016 3:27 PM, vsolakhian wrote:
> This is not the cause of the problem though. The disk cache is
> important for queries and overall performance during optimization, but
> once it is done, everything should go back to "normal" (whatever that
> normal is). In our case it is the SOFT COMMIT (that opens a new
> Searcher) that takes 10 times longer AFTER the index was optimized and
> deleted records were removed (and index size went down to 60 GB).
It's difficult to say without hard numbers, and that is complicated by
my very limited understanding of how HDFS gets cached.
"Normal" is achieved only when relevant data is in the disk cache.
Which will most likely not be the case after an optimize, unless you
have enough caching memory for both the before and after index to fit at
the same time. Similar performance issues are likely to occur right
after a server reboot.
A soft commit opens a new searcher. When a new searcher is opened, the
*Solr* caches (which are entirely different from the disk cache) look at
their autowarmCount settings. Each cache gathers the top N queries
contained in the cache, up to the autowarmCount number, and proceeds to
execute the those queries on the index to create a brand new cache for
the new searcher. The new searcher is not put into place until the
warming is done. The commit will not finish until the new searcher is
If the info sitting in the OS disk cache when the warming queries happen
is not useful for fast queries, then those queries will be very slow,
which makes the commit take longer.
For better commit times, reduce autowarmCount on your Solr caches. This
will make it more likely that users will notice slow queries, though.
Good Solr performance with large indexes requires a LOT of memory. The
amount required is usually very surprising to admins.