Re: How to make UnInvertedField faster?

2011-10-22 Thread Simon Willnauer
On Fri, Oct 21, 2011 at 4:37 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 Well... the limitation of DocValues is that it cannot handle more than
 one value per document (which UnInvertedField can).

you can pack this into one byte[] or use more than one field? I don't
see a real limitation here.

simon

 Hopefully we can fix that at some point :)

 Mike McCandless

 http://blog.mikemccandless.com

 On Fri, Oct 21, 2011 at 7:50 AM, Simon Willnauer
 simon.willna...@googlemail.com wrote:
 In trunk we have a feature called IndexDocValues which basically
 creates the uninverted structure at index time. You can then simply
 suck that into memory or even access it on disk directly
 (RandomAccess). Even if I can't help you right now this is certainly
 going to help you here. There is no need to uninvert at all anymore in
 lucene 4.0

 simon

 On Wed, Oct 19, 2011 at 8:05 PM, Michael Ryan mr...@moreover.com wrote:
 I was wondering if anyone has any ideas for making 
 UnInvertedField.uninvert()
 faster, or other alternatives for generating facets quickly.

 The vast majority of the CPU time for our Solr instances is spent generating
 UnInvertedFields after each commit. Here's an example of one of our slower 
 fields:

 [2011-10-19 17:46:01,055] INFO125974[pool-1-thread-1] - (SolrCore:440) -
 UnInverted multi-valued field 
 {field=authorCS,memSize=38063628,tindexSize=422652,
 time=15610,phase1=15584,nTerms=1558514,bigTerms=0,termInstances=4510674,uses=0}

 That is from an index with approximately 8 million documents. After each 
 commit,
 it takes on average about 90 seconds to uninvert all the fields that we 
 facet on.

 Any ideas at all would be greatly appreciated.

 -Michael





Re: How to make UnInvertedField faster?

2011-10-22 Thread Michael McCandless
On Sat, Oct 22, 2011 at 4:10 AM, Simon Willnauer
simon.willna...@googlemail.com wrote:
 On Fri, Oct 21, 2011 at 4:37 PM, Michael McCandless
 luc...@mikemccandless.com wrote:
 Well... the limitation of DocValues is that it cannot handle more than
 one value per document (which UnInvertedField can).

 you can pack this into one byte[] or use more than one field? I don't
 see a real limitation here.

Well... not very easily?

UnInvertedField (DocTermOrds in Lucene) is the same as DocValues'
BYTES_VAR_SORTED.

So for an app to do this on top it'd have to handle the term - ord
resolving itself, save that somewhere, then encode the multiple ords
into a byte[].

I agree for other simple types (no deref/sorting involved) an app
could pack them into its own byte[] that's otherwise opaque to Lucene.

Mike McCandless

http://blog.mikemccandless.com


Re: How to make UnInvertedField faster?

2011-10-21 Thread Simon Willnauer
In trunk we have a feature called IndexDocValues which basically
creates the uninverted structure at index time. You can then simply
suck that into memory or even access it on disk directly
(RandomAccess). Even if I can't help you right now this is certainly
going to help you here. There is no need to uninvert at all anymore in
lucene 4.0

simon

On Wed, Oct 19, 2011 at 8:05 PM, Michael Ryan mr...@moreover.com wrote:
 I was wondering if anyone has any ideas for making UnInvertedField.uninvert()
 faster, or other alternatives for generating facets quickly.

 The vast majority of the CPU time for our Solr instances is spent generating
 UnInvertedFields after each commit. Here's an example of one of our slower 
 fields:

 [2011-10-19 17:46:01,055] INFO125974[pool-1-thread-1] - (SolrCore:440) -
 UnInverted multi-valued field 
 {field=authorCS,memSize=38063628,tindexSize=422652,
 time=15610,phase1=15584,nTerms=1558514,bigTerms=0,termInstances=4510674,uses=0}

 That is from an index with approximately 8 million documents. After each 
 commit,
 it takes on average about 90 seconds to uninvert all the fields that we facet 
 on.

 Any ideas at all would be greatly appreciated.

 -Michael



Re: How to make UnInvertedField faster?

2011-10-21 Thread Jason Rutherglen
Sweet + Very cool!

On Fri, Oct 21, 2011 at 7:50 AM, Simon Willnauer 
simon.willna...@googlemail.com wrote:

 In trunk we have a feature called IndexDocValues which basically
 creates the uninverted structure at index time. You can then simply
 suck that into memory or even access it on disk directly
 (RandomAccess). Even if I can't help you right now this is certainly
 going to help you here. There is no need to uninvert at all anymore in
 lucene 4.0

 simon

 On Wed, Oct 19, 2011 at 8:05 PM, Michael Ryan mr...@moreover.com wrote:
  I was wondering if anyone has any ideas for making
 UnInvertedField.uninvert()
  faster, or other alternatives for generating facets quickly.
 
  The vast majority of the CPU time for our Solr instances is spent
 generating
  UnInvertedFields after each commit. Here's an example of one of our
 slower fields:
 
  [2011-10-19 17:46:01,055] INFO125974[pool-1-thread-1] - (SolrCore:440) -
  UnInverted multi-valued field
 {field=authorCS,memSize=38063628,tindexSize=422652,
 
 time=15610,phase1=15584,nTerms=1558514,bigTerms=0,termInstances=4510674,uses=0}
 
  That is from an index with approximately 8 million documents. After each
 commit,
  it takes on average about 90 seconds to uninvert all the fields that we
 facet on.
 
  Any ideas at all would be greatly appreciated.
 
  -Michael
 



Re: How to make UnInvertedField faster?

2011-10-21 Thread Michael McCandless
Well... the limitation of DocValues is that it cannot handle more than
one value per document (which UnInvertedField can).

Hopefully we can fix that at some point :)

Mike McCandless

http://blog.mikemccandless.com

On Fri, Oct 21, 2011 at 7:50 AM, Simon Willnauer
simon.willna...@googlemail.com wrote:
 In trunk we have a feature called IndexDocValues which basically
 creates the uninverted structure at index time. You can then simply
 suck that into memory or even access it on disk directly
 (RandomAccess). Even if I can't help you right now this is certainly
 going to help you here. There is no need to uninvert at all anymore in
 lucene 4.0

 simon

 On Wed, Oct 19, 2011 at 8:05 PM, Michael Ryan mr...@moreover.com wrote:
 I was wondering if anyone has any ideas for making UnInvertedField.uninvert()
 faster, or other alternatives for generating facets quickly.

 The vast majority of the CPU time for our Solr instances is spent generating
 UnInvertedFields after each commit. Here's an example of one of our slower 
 fields:

 [2011-10-19 17:46:01,055] INFO125974[pool-1-thread-1] - (SolrCore:440) -
 UnInverted multi-valued field 
 {field=authorCS,memSize=38063628,tindexSize=422652,
 time=15610,phase1=15584,nTerms=1558514,bigTerms=0,termInstances=4510674,uses=0}

 That is from an index with approximately 8 million documents. After each 
 commit,
 it takes on average about 90 seconds to uninvert all the fields that we 
 facet on.

 Any ideas at all would be greatly appreciated.

 -Michael




How to make UnInvertedField faster?

2011-10-19 Thread Michael Ryan
I was wondering if anyone has any ideas for making UnInvertedField.uninvert()
faster, or other alternatives for generating facets quickly.

The vast majority of the CPU time for our Solr instances is spent generating
UnInvertedFields after each commit. Here's an example of one of our slower 
fields:

[2011-10-19 17:46:01,055] INFO125974[pool-1-thread-1] - (SolrCore:440) -
UnInverted multi-valued field 
{field=authorCS,memSize=38063628,tindexSize=422652,
time=15610,phase1=15584,nTerms=1558514,bigTerms=0,termInstances=4510674,uses=0}

That is from an index with approximately 8 million documents. After each commit,
it takes on average about 90 seconds to uninvert all the fields that we facet 
on.

Any ideas at all would be greatly appreciated.

-Michael