Hi, as I'm also involved in this issue (on the side of Sven) I created a patch, that replaces the float array by a map that stores score by doc, so it contains as many entries as the external scoring file contains lines, but no more.
I created an issue for this: https://issues.apache.org/jira/browse/SOLR-2583 It would be great if someone could have a look at it and comment. Thanx for your feedback, cheers, Martin On 06/08/2011 12:22 PM, Bohnsack, Sven wrote: > Hi, > > I could not provide a stack trace and IMHO it won't provide some useful > information. But we've made a good progress in the analysis. > > We took a deeper look at what happened, when an "external-file-field"-Request > is sent to SOLR: > > * SOLR looks if there is a file for the requested query, e.g. "trousers" > * If so, then SOLR loads the "trousers"-file and generates a HashMap-Entry > consisting of a FileFloatSource-Object and a FloatArray with the size of the > number of documents in the SOLR-index. Every document matched by the query > gains the score-value, which is provided in the external-score-file. For > every(!) other document SOLR writes a zero in that FloatArray > * if SOLR does not find a file for the query-Request, then SOLR still > generates a HashMapEntry with score zero for every document > > In our case we have about 8.5 Mio. documents in our index and one of those > Arrays occupies about 34MB Heap Space. Having e.g. 100 different queries and > using external file field for sorting the result, SOLR occupies about 3.4GB > of Heap Space. > > The problem might be the use of WeakHashMap [1], which prevents the Garbage > Collector from cleaning up unused Keys. > > > What do you think could be a possible solution for this whole problem? > (except from "don't use external file fields" ;) > > > Regards > Sven > > > [1]: "A hashtable-based Map implementation with weak keys. An entry in a > WeakHashMap will automatically be removed when its key is no longer in > ordinary use. More precisely, the presence of a mapping for a given key will > not prevent the key from being discarded by the garbage collector, that is, > made finalizable, finalized, and then reclaimed. When a key has been > discarded its entry is effectively removed from the map, so this class > behaves somewhat differently than other Map implementations." > > -----Ursprüngliche Nachricht----- > Von: mtnes...@gmail.com [mailto:mtnes...@gmail.com] Im Auftrag von Simon > Rosenthal > Gesendet: Mittwoch, 8. Juni 2011 03:56 > An: solr-user@lucene.apache.org > Betreff: Re: How to deal with many files using solr external file field > > Can you provide a stack trace for the OOM eexception ? > > On Tue, Jun 7, 2011 at 4:25 PM, Bohnsack, Sven > <sven.bohns...@shopping24.de>wrote: > >> Hi all, >> >> we're using solr 1.4 and external file field ([1]) for sorting our >> searchresults. We have about 40.000 Terms, for which we use this sorting >> option. >> Currently we're running into massive OutOfMemory-Problems and were not >> pretty sure, what's the matter. It seems that the garbage collector stops >> working or some processes are going wild. However, solr starts to allocate >> more and more RAM until we experience this OutOfMemory-Exception. >> >> >> We noticed the following: >> >> For some terms one could see in the solr log that there appear some >> java.io.FileNotFoundExceptions, when solr tries to load an external file for >> a term for which there is not such a file, e.g. solr tries to load the >> external score file for "trousers" but there ist none in the >> /solr/data-Folder. >> >> Question: is it possible, that those exceptions are responsible for the >> OutOfMemory-Problem or could it be due to the large(?) number of 40k terms >> for which we want to sort the result via external file field? >> >> I'm looking forward for your answers, suggestions and ideas :) >> >> >> Regards >> Sven >> >> >> [1]: >> http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html >> -- Martin Grotzke http://twitter.com/martin_grotzke
signature.asc
Description: OpenPGP digital signature