Re: Multi field search with values

2012-03-20 Thread Deb Lucene
Hi group,

Is there any  way to index a document based on a key value (key = text,
value = double) pair? For example, we have a situation where -

document 1
IBM - 0.5
Google - 0.9
Apple - 0.3


document 2
IBM - 0.6
Google - 0.1
Apple - 0.4

now we need to search using two fields, the name (e.g. "IBM", "Apple") and
the score ( > 0.5 etc). A typical search query would be - "name == "IBM" &
value > 0.5 . Previously we have done experiments with MFQP and Numeric
Field Query - but here we need to link the fields.

Thanks in advance.
--d


Re: suppressing FreqProxPostingsArray

2012-03-20 Thread Ken McCracken

Hi Mike,

Thanks for the response.  We will do some more investigation.  We will  
look to see if there is a clean way to suppress at least the extra 3  
array allocations.


Cheers,

-Ken

On Mar 19, 2012, at 5:32 PM, Michael McCandless > wrote:



Hmm, I agree we could be more RAM efficient if the field is DOCS_ONLY.

We shouldn't have to allocate/use docFreqs, lastDocCodes,
lastPositions arrays (3 of the 7); the others are still needed, I
think.

But, that said, you shouldn't hit OOME, as long as your max heap sizes
is large enough (and, your IndexWriterConfig's RAMBufferSizeMB is
small enough); Lucene should simply flush a new segment once the
buffered documents are using too much RAM.

Hmm, and you don't index massive documents.  How many UUIDs per  
document?


Mike McCandless

http://blog.mikemccandless.com



On Mon, Mar 19, 2012 at 3:29 PM, Ken McCracken > wrote:

Hi,

I am using lucene-3.5 and getting an OutOfMemoryError on a large  
indexing
task of 100M documents.  I am creating an index with 3 UUIDs as  
separate

field values.  I am using Store.YES on 1 of them and Store.NO on the
others; I am using Index.NOT_ANALYZED_NO_NORMS on all three;  
explicitly

setting
field.setIndexOptions(IndexOptions.DOCS_ONLY);  and
indexWriterConfig.setTermIndexInterval(termIndexInterval);   to  
1024.  I am

trying to index 100M records into my index.

Is there any reason  
FreqProxTermsWriterPerField.FreqProxPostingsArray needs
to be constructed even though I have the positions etc suppressed?   
It
seems that the reason I get an OutOfMemoryError is that 7 int[] of  
size
proportional to number of unique fields are being constructed;  
however, at
least some of them are probably wasteful given my indexing  
configurations.


Any help is appreciated.

Thanks,
-Ken

[junit] Error:
   [junit] Exception in thread "Thread-18"  
java.lang.OutOfMemoryError:

Java heap space
   [junit] at
org.apache.lucene.index.ParallelPostingsArray. 
(ParallelPostingsArray.java:35)

   [junit] at
org.apache.lucene.index.FreqProxTermsWriterPerField 
$FreqProxPostingsArray.(FreqProxTermsWriterPerField.java:190)

   [junit] at
org.apache.lucene.index.FreqProxTermsWriterPerField 
$FreqProxPostingsArray.newInstance(FreqProxTermsWriterPerField.java: 
204)

   [junit] at
org.apache.lucene.index.ParallelPostingsArray.grow 
(ParallelPostingsArray.java:48)

   [junit] at
org.apache.lucene.index.TermsHashPerField.growParallelPostingsArray 
(TermsHashPerField.java:137)

   [junit] at
org.apache.lucene.index.TermsHashPerField.add 
(TermsHashPerField.java:440)

   [junit] at
org.apache.lucene.index.DocInverterPerField.processFields 
(DocInverterPerField.java:94)

   [junit] at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument 
(DocFieldProcessorPerThread.java:278)


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org