So after re-feeding our data with a new boolean field that is true when data exists and false when it doesn't our search times have gone from avg of about 20s to around 150ms... pretty amazing change in perf... It seems like https://issues.apache.org/jira/browse/SOLR-5093 might alleviate many peoples pain in doing this kind of query (if I have some time I may take a look at it)..
Anyway we are in pretty good shape at this point.. the only remaining issue is that the first queries after commits are taking 5-6s... This is cause by the loading of 2 (one long and one int) FieldCaches (uninvert) that are used for sorting.. I'm suspecting that docvalues will greatly help this load performance? thanks, steve On Wed, Jul 31, 2013 at 4:32 PM, Steven Bower <smb-apa...@alcyon.net> wrote: > the list of IDs does change relatively frequently, but this doesn't seem > to have very much impact on the performance of the query as far as I can > tell. > > attached are the stacks > > thanks, > > steve > > > On Wed, Jul 31, 2013 at 6:33 AM, Mikhail Khludnev < > mkhlud...@griddynamics.com> wrote: > >> On Wed, Jul 31, 2013 at 1:10 AM, Steven Bower <sbo...@alcyon.net> wrote: >> >> > >> > not sure what you mean by good hit raitio? >> > >> >> I mean such queries are really expensive (even on cache hit), so if the >> list of ids changes every time, it never hit cache and hence executes >> these >> heavy queries every time. It's well known performance problem. >> >> >> > Here are the stacks... >> > >> they seems like hotspots, and shows index reading that's reasonable. But I >> can't see what caused these readings, to get that I need whole stack of >> hot >> thread. >> >> >> > >> > Name Time (ms) Own Time (ms) >> > >> > >> org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(AtomicReaderContext, >> > Bits) 300879 203478 >> > >> > >> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.nextDoc() >> > 45539 19 >> > >> > >> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.refillDocs() >> > 45519 40 >> > >> > >> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.readVIntBlock(IndexInput, >> > int[], int[], int, boolean) 24352 0 >> > org.apache.lucene.store.DataInput.readVInt() 24352 24352 >> > org.apache.lucene.codecs.lucene41.ForUtil.readBlock(IndexInput, byte[], >> > int[]) 21126 14976 >> > org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int) >> > 6150 0 java.nio.DirectByteBuffer.get(byte[], int, int) >> > 6150 0 >> > java.nio.Bits.copyToArray(long, Object, long, long, long) 6150 6150 >> > >> > >> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.docs(Bits, >> > DocsEnum, int) 35342 421 >> > >> > >> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.decodeMetaData() >> > 34920 27939 >> > >> > >> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.nextTerm(FieldInfo, >> > BlockTermState) 6980 6980 >> > >> > >> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.next() >> > 14129 1053 >> > >> > >> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadNextFloorBlock() >> > 5948 261 >> > >> > >> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock() >> > 5686 199 >> > org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int) >> > 3606 0 java.nio.DirectByteBuffer.get(byte[], int, int) >> > 3606 0 >> > java.nio.Bits.copyToArray(long, Object, long, long, long) 3606 3606 >> > >> > >> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.readTermsBlock(IndexInput, >> > FieldInfo, BlockTermState) 1879 80 >> > org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int) >> > 1798 0 java.nio.DirectByteBuffer.get(byte[], int, int) >> > 1798 0 >> > java.nio.Bits.copyToArray(long, Object, long, long, long) 1798 1798 >> > >> > >> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.next() >> > 4010 3324 >> > >> > >> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.nextNonLeaf() >> > 685 685 >> > >> > >> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock() >> > 3117 144 >> > org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int) >> > 1861 0 java.nio.DirectByteBuffer.get(byte[], int, int) 1861 >> > 0 >> > java.nio.Bits.copyToArray(long, Object, long, long, long) 1861 1861 >> > >> > >> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.readTermsBlock(IndexInput, >> > FieldInfo, BlockTermState) 1090 19 >> > org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int) >> > 1070 0 java.nio.DirectByteBuffer.get(byte[], int, int) >> > 1070 0 >> > java.nio.Bits.copyToArray(long, Object, long, long, long) 1070 1070 >> > >> > >> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.initIndexInput() >> > 20 0 org.apache.lucene.store.ByteBufferIndexInput.clone() >> > 20 0 >> > org.apache.lucene.store.ByteBufferIndexInput.clone() 20 0 >> > org.apache.lucene.store.ByteBufferIndexInput.buildSlice(long, long) 20 >> > 0 >> > org.apache.lucene.util.WeakIdentityMap.put(Object, Object) 20 0 >> > >> org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference.<init>(Object, >> > ReferenceQueue) 20 0 >> > java.lang.System.identityHashCode(Object) 20 20 >> > org.apache.lucene.index.FilteredTermsEnum.docs(Bits, DocsEnum, int) >> > 1485 527 >> > >> > >> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.docs(Bits, >> > DocsEnum, int) 957 0 >> > >> > >> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.decodeMetaData() >> > 957 513 >> > >> > >> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.nextTerm(FieldInfo, >> > BlockTermState) 443 443 >> > org.apache.lucene.index.FilteredTermsEnum.next() 874 324 >> > >> > >> org.apache.lucene.search.NumericRangeQuery$NumericRangeTermsEnum.accept(BytesRef) >> > 368 0 >> > >> > >> org.apache.lucene.util.BytesRef$UTF8SortedAsUnicodeComparator.compare(Object, >> > Object) 368 368 >> > >> > >> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.next() >> > 160 0 >> > >> > >> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadNextFloorBlock() >> > 160 0 >> > >> > >> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock() >> > 160 0 >> > org.apache.lucene.store.ByteBufferIndexInput.readBytes(byte[], int, int) >> > 120 >> > 0 >> > >> > >> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.readTermsBlock(IndexInput, >> > FieldInfo, BlockTermState) 39 0 >> > >> > >> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.seekCeil(BytesRef, >> > boolean) 19 0 >> > >> > >> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock() >> > 19 0 >> > >> > >> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.initIndexInput() >> > 19 0 org.apache.lucene.store.ByteBufferIndexInput.clone() >> > 19 0 >> > org.apache.lucene.store.ByteBufferIndexInput.clone() 19 0 >> > org.apache.lucene.store.ByteBufferIndexInput.buildSlice(long, long) 19 >> > 0 >> > org.apache.lucene.util.WeakIdentityMap.put(Object, Object) 19 0 >> > >> org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference.<init>(Object, >> > ReferenceQueue) 19 0 >> > java.lang.System.identityHashCode(Object) 19 19 >> > org.apache.lucene.util.FixedBitSet.<init>(int) 28 28 >> > >> > >> > On Tue, Jul 30, 2013 at 4:18 PM, Mikhail Khludnev < >> > mkhlud...@griddynamics.com> wrote: >> > >> > > On Tue, Jul 30, 2013 at 12:45 AM, Steven Bower <smb-apa...@alcyon.net >> > > >wrote: >> > > >> > > > >> > > > - Most of my time (98%) is being spent in >> > > > java.nio.Bits.copyToByteArray(long,Object,long,long) which is being >> > > >> > > >> > > Steven, please >> > > >> > >> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html.my >> > > benchmarking experience shows that NIO is a turtle, absolutely. >> > > >> > > also, are you sure that fq=(vid:86XXX73 OR vid:86XXX20 ..... has good >> hit >> > > ratio? otherwise it's a well known beast. >> > > >> > > could you also show deeper stack, to make sure what causes to >> excessive >> > > reading? >> > > >> > > >> > > >> > > -- >> > > Sincerely yours >> > > Mikhail Khludnev >> > > Principal Engineer, >> > > Grid Dynamics >> > > >> > > <http://www.griddynamics.com> >> > > <mkhlud...@griddynamics.com> >> > > >> > >> >> >> >> -- >> Sincerely yours >> Mikhail Khludnev >> Principal Engineer, >> Grid Dynamics >> >> <http://www.griddynamics.com> >> <mkhlud...@griddynamics.com> >> > >