
as I'm also involved in this issue (on the side of Sven) I created a
patch, that replaces the float array by a map that stores score by doc,
so it contains as many entries as the external scoring file contains
lines, but no more.

I created an issue for this: https://issues.apache.org/jira/browse/SOLR-2583

It would be great if someone could have a look at it and comment.

Thanx for your feedback,

On 06/08/2011 12:22 PM, Bohnsack, Sven wrote:
> Hi,
> I could not provide a stack trace and IMHO it won't provide some useful 
> information. But we've made a good progress in the analysis.
> We took a deeper look at what happened, when an "external-file-field"-Request 
> is sent to SOLR:
> * SOLR looks if there is a file for the requested query, e.g. "trousers"
> * If so, then SOLR loads the "trousers"-file and generates a HashMap-Entry 
> consisting of a FileFloatSource-Object and a FloatArray with the size of the 
> number of documents in the SOLR-index. Every document matched by the query 
> gains the score-value, which is provided in the external-score-file. For 
> every(!) other document SOLR writes a zero in that FloatArray
> * if SOLR does not find a file for the query-Request, then SOLR still 
> generates a HashMapEntry with score zero for every document
> In our case we have about 8.5 Mio. documents in our index and one of those 
> Arrays occupies about 34MB Heap Space. Having e.g. 100 different queries and 
> using external file field for sorting the result, SOLR occupies about 3.4GB 
> of Heap Space.
> The problem might be the use of WeakHashMap [1], which prevents the Garbage 
> Collector from cleaning up unused Keys.
> What do you think could be a possible solution for this whole problem? 
> (except from "don't use external file fields" ;)
> Regards
> Sven
> [1]: "A hashtable-based Map implementation with weak keys. An entry in a 
> WeakHashMap will automatically be removed when its key is no longer in 
> ordinary use. More precisely, the presence of a mapping for a given key will 
> not prevent the key from being discarded by the garbage collector, that is, 
> made finalizable, finalized, and then reclaimed. When a key has been 
> discarded its entry is effectively removed from the map, so this class 
> behaves somewhat differently than other Map implementations."
> -----Ursprüngliche Nachricht-----
> Von: mtnes...@gmail.com [mailto:mtnes...@gmail.com] Im Auftrag von Simon 
> Rosenthal
> Gesendet: Mittwoch, 8. Juni 2011 03:56
> An: solr-user@lucene.apache.org
> Betreff: Re: How to deal with many files using solr external file field
> Can you provide a stack trace for the OOM eexception ?
> On Tue, Jun 7, 2011 at 4:25 PM, Bohnsack, Sven
> <sven.bohns...@shopping24.de>wrote:
>> Hi all,
>> we're using solr 1.4 and external file field ([1]) for sorting our
>> searchresults. We have about 40.000 Terms, for which we use this sorting
>> option.
>> Currently we're running into massive OutOfMemory-Problems and were not
>> pretty sure, what's the matter. It seems that the garbage collector stops
>> working or some processes are going wild. However, solr starts to allocate
>> more and more RAM until we experience this OutOfMemory-Exception.
>> We noticed the following:
>> For some terms one could see in the solr log that there appear some
>> java.io.FileNotFoundExceptions, when solr tries to load an external file for
>> a term for which there is not such a file, e.g. solr tries to load the
>> external score file for "trousers" but there ist none in the
>> /solr/data-Folder.
>> Question: is it possible, that those exceptions are responsible for the
>> OutOfMemory-Problem or could it be due to the large(?) number of 40k terms
>> for which we want to sort the result via external file field?
>> I'm looking forward for your answers, suggestions and ideas :)
>> Regards
>> Sven
>> [1]:
>> http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html

Martin Grotzke

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to