Hi
I have been profiling SolrCloud when indexing into a sharded non-replica
collection because indexing slows down when the index files (*.fdt)
grows to a couple of GB (the largest is about 3.5GB).
When profiling for a couple of minutes I see that most time is spend in
the DirectUpdateHandler2.addDoc method (being called about 8000 times).
Its time is spend
in UpdateLog.lookupVersion, VersionInfo.getVersionFromIndex,
SolrIndexSearcher.lookupId (being called about 6000 times) and it spends
it time in AtomicReader.termDocsEnums which is called about 530.000
times taking about 770.000 ms
Is it true, that the reason for "AtomicReader.termDocsEnums" is being
called 530.000/6000 =~ 90 times per "SolrIndexSearcher.lookupId" call,
is that I have in average 90 "term"-files?
Can I do anything to lower this number of "term"-files?
I'm running more cores on my SolrCloud instance. Is there any way I can
lower the time spend in each "AtomicReader.termDocsEnums" method call
(this seems to be much faster when I don't have so many documents in my
collection/shard)?
Thanks as always.
Best regards Trym