Re: How to make UnInvertedField faster?
On Fri, Oct 21, 2011 at 4:37 PM, Michael McCandless luc...@mikemccandless.com wrote: Well... the limitation of DocValues is that it cannot handle more than one value per document (which UnInvertedField can). you can pack this into one byte[] or use more than one field? I don't see a real limitation here. simon Hopefully we can fix that at some point :) Mike McCandless http://blog.mikemccandless.com On Fri, Oct 21, 2011 at 7:50 AM, Simon Willnauer simon.willna...@googlemail.com wrote: In trunk we have a feature called IndexDocValues which basically creates the uninverted structure at index time. You can then simply suck that into memory or even access it on disk directly (RandomAccess). Even if I can't help you right now this is certainly going to help you here. There is no need to uninvert at all anymore in lucene 4.0 simon On Wed, Oct 19, 2011 at 8:05 PM, Michael Ryan mr...@moreover.com wrote: I was wondering if anyone has any ideas for making UnInvertedField.uninvert() faster, or other alternatives for generating facets quickly. The vast majority of the CPU time for our Solr instances is spent generating UnInvertedFields after each commit. Here's an example of one of our slower fields: [2011-10-19 17:46:01,055] INFO125974[pool-1-thread-1] - (SolrCore:440) - UnInverted multi-valued field {field=authorCS,memSize=38063628,tindexSize=422652, time=15610,phase1=15584,nTerms=1558514,bigTerms=0,termInstances=4510674,uses=0} That is from an index with approximately 8 million documents. After each commit, it takes on average about 90 seconds to uninvert all the fields that we facet on. Any ideas at all would be greatly appreciated. -Michael
Re: How to make UnInvertedField faster?
On Sat, Oct 22, 2011 at 4:10 AM, Simon Willnauer simon.willna...@googlemail.com wrote: On Fri, Oct 21, 2011 at 4:37 PM, Michael McCandless luc...@mikemccandless.com wrote: Well... the limitation of DocValues is that it cannot handle more than one value per document (which UnInvertedField can). you can pack this into one byte[] or use more than one field? I don't see a real limitation here. Well... not very easily? UnInvertedField (DocTermOrds in Lucene) is the same as DocValues' BYTES_VAR_SORTED. So for an app to do this on top it'd have to handle the term - ord resolving itself, save that somewhere, then encode the multiple ords into a byte[]. I agree for other simple types (no deref/sorting involved) an app could pack them into its own byte[] that's otherwise opaque to Lucene. Mike McCandless http://blog.mikemccandless.com
Re: How to make UnInvertedField faster?
In trunk we have a feature called IndexDocValues which basically creates the uninverted structure at index time. You can then simply suck that into memory or even access it on disk directly (RandomAccess). Even if I can't help you right now this is certainly going to help you here. There is no need to uninvert at all anymore in lucene 4.0 simon On Wed, Oct 19, 2011 at 8:05 PM, Michael Ryan mr...@moreover.com wrote: I was wondering if anyone has any ideas for making UnInvertedField.uninvert() faster, or other alternatives for generating facets quickly. The vast majority of the CPU time for our Solr instances is spent generating UnInvertedFields after each commit. Here's an example of one of our slower fields: [2011-10-19 17:46:01,055] INFO125974[pool-1-thread-1] - (SolrCore:440) - UnInverted multi-valued field {field=authorCS,memSize=38063628,tindexSize=422652, time=15610,phase1=15584,nTerms=1558514,bigTerms=0,termInstances=4510674,uses=0} That is from an index with approximately 8 million documents. After each commit, it takes on average about 90 seconds to uninvert all the fields that we facet on. Any ideas at all would be greatly appreciated. -Michael
Re: How to make UnInvertedField faster?
Sweet + Very cool! On Fri, Oct 21, 2011 at 7:50 AM, Simon Willnauer simon.willna...@googlemail.com wrote: In trunk we have a feature called IndexDocValues which basically creates the uninverted structure at index time. You can then simply suck that into memory or even access it on disk directly (RandomAccess). Even if I can't help you right now this is certainly going to help you here. There is no need to uninvert at all anymore in lucene 4.0 simon On Wed, Oct 19, 2011 at 8:05 PM, Michael Ryan mr...@moreover.com wrote: I was wondering if anyone has any ideas for making UnInvertedField.uninvert() faster, or other alternatives for generating facets quickly. The vast majority of the CPU time for our Solr instances is spent generating UnInvertedFields after each commit. Here's an example of one of our slower fields: [2011-10-19 17:46:01,055] INFO125974[pool-1-thread-1] - (SolrCore:440) - UnInverted multi-valued field {field=authorCS,memSize=38063628,tindexSize=422652, time=15610,phase1=15584,nTerms=1558514,bigTerms=0,termInstances=4510674,uses=0} That is from an index with approximately 8 million documents. After each commit, it takes on average about 90 seconds to uninvert all the fields that we facet on. Any ideas at all would be greatly appreciated. -Michael
Re: How to make UnInvertedField faster?
Well... the limitation of DocValues is that it cannot handle more than one value per document (which UnInvertedField can). Hopefully we can fix that at some point :) Mike McCandless http://blog.mikemccandless.com On Fri, Oct 21, 2011 at 7:50 AM, Simon Willnauer simon.willna...@googlemail.com wrote: In trunk we have a feature called IndexDocValues which basically creates the uninverted structure at index time. You can then simply suck that into memory or even access it on disk directly (RandomAccess). Even if I can't help you right now this is certainly going to help you here. There is no need to uninvert at all anymore in lucene 4.0 simon On Wed, Oct 19, 2011 at 8:05 PM, Michael Ryan mr...@moreover.com wrote: I was wondering if anyone has any ideas for making UnInvertedField.uninvert() faster, or other alternatives for generating facets quickly. The vast majority of the CPU time for our Solr instances is spent generating UnInvertedFields after each commit. Here's an example of one of our slower fields: [2011-10-19 17:46:01,055] INFO125974[pool-1-thread-1] - (SolrCore:440) - UnInverted multi-valued field {field=authorCS,memSize=38063628,tindexSize=422652, time=15610,phase1=15584,nTerms=1558514,bigTerms=0,termInstances=4510674,uses=0} That is from an index with approximately 8 million documents. After each commit, it takes on average about 90 seconds to uninvert all the fields that we facet on. Any ideas at all would be greatly appreciated. -Michael
How to make UnInvertedField faster?
I was wondering if anyone has any ideas for making UnInvertedField.uninvert() faster, or other alternatives for generating facets quickly. The vast majority of the CPU time for our Solr instances is spent generating UnInvertedFields after each commit. Here's an example of one of our slower fields: [2011-10-19 17:46:01,055] INFO125974[pool-1-thread-1] - (SolrCore:440) - UnInverted multi-valued field {field=authorCS,memSize=38063628,tindexSize=422652, time=15610,phase1=15584,nTerms=1558514,bigTerms=0,termInstances=4510674,uses=0} That is from an index with approximately 8 million documents. After each commit, it takes on average about 90 seconds to uninvert all the fields that we facet on. Any ideas at all would be greatly appreciated. -Michael