Toke Eskildsen kirjoitti 4.9.2017 klo 13.38:
On Mon, 2017-09-04 at 13:21 +0300, Ere Maijala wrote:
Thanks for the insight, Yonik. I can confirm that #2 is true. I ran

<optimize maxSegments="1" waitSearcher="true"/>

and after it completed I was able to retrieve 2000 values in 17ms.

Very interesting. Is this on spinning disks or SSD? Is your index data
cached in memory? What I am aiming at is if this is primarily a "many
relatively slow random access"-thing or more due to the way DocValues
are represented in the segments (the codec).

I indexed a few million new/changed records, and the performance is back to slow. Upside is that I can test again with a slow server.

It's spinning disks on a SAN, and the full index doesn't fit into memory. I don't see any IO wait, and repeated attempts are just as slow even though I would have thought the relevant parts would be cached in memory. During testing and reporting the results I've always discarded the very first requests since they're always slower than subsequent repeats due to there being another test index on the same server. Maybe worth noting is that while there's no IO wait, there is fairly high CPU usage for Solr's Java process hovering around 100% if I repeat the request in a loop.

I took a quick sample with VisualVM, and the top hotspots are:

org.apache.solr.search.facet.UnInvertedField.getCounts() 32.079956 7,356 ms (32.1%) 7,356 ms 7,655 ms 7,655 ms org.apache.lucene.util.PriorityQueue.downHeap() 30.232546 6,932 ms (30.2%) 6,932 ms 6,932 ms 6,932 ms org.apache.lucene.index.MultiTermsEnum.pushTop() 11.628195 2,666 ms (11.6%) 2,666 ms 11,177 ms 11,177 ms org.apache.lucene.index.MultiTermsEnum$TermMergeQueue.fillTop() 9.079571 2,082 ms (9.1%) 2,082 ms 2,082 ms 2,082 ms org.apache.lucene.store.ByteBufferGuard.getBytes() 4.176216 957 ms (4.2%) 957 ms 957 ms 957 ms org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.next() 2.6867974 616 ms (2.7%) 616 ms 616 ms 616 ms org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermLeaf() 1.393562 319 ms (1.4%) 319 ms 319 ms 319 ms org.apache.lucene.util.fst.ByteSequenceOutputs.read() 1.2111844 277 ms (1.2%) 277 ms 277 ms 277 ms

(sorry if that looks bad in the email)

I'm building another index on a higher-end server that can load the full index to memory and will retest with that. But note that this index has docValues disabled as facet.method=uif seems to only cause trouble if docValues are enabled.

--Ere

--
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Reply via email to