Hi,

I could not provide a stack trace and IMHO it won't provide some useful 
information. But we've made a good progress in the analysis.

We took a deeper look at what happened, when an "external-file-field"-Request 
is sent to SOLR:

* SOLR looks if there is a file for the requested query, e.g. "trousers"
* If so, then SOLR loads the "trousers"-file and generates a HashMap-Entry 
consisting of a FileFloatSource-Object and a FloatArray with the size of the 
number of documents in the SOLR-index. Every document matched by the query 
gains the score-value, which is provided in the external-score-file. For 
every(!) other document SOLR writes a zero in that FloatArray
* if SOLR does not find a file for the query-Request, then SOLR still generates 
a HashMapEntry with score zero for every document

In our case we have about 8.5 Mio. documents in our index and one of those 
Arrays occupies about 34MB Heap Space. Having e.g. 100 different queries and 
using external file field for sorting the result, SOLR occupies about 3.4GB of 
Heap Space.

The problem might be the use of WeakHashMap [1], which prevents the Garbage 
Collector from cleaning up unused Keys.


What do you think could be a possible solution for this whole problem? (except 
from "don't use external file fields" ;)


Regards
Sven


[1]: "A hashtable-based Map implementation with weak keys. An entry in a 
WeakHashMap will automatically be removed when its key is no longer in ordinary 
use. More precisely, the presence of a mapping for a given key will not prevent 
the key from being discarded by the garbage collector, that is, made 
finalizable, finalized, and then reclaimed. When a key has been discarded its 
entry is effectively removed from the map, so this class behaves somewhat 
differently than other Map implementations."

-----Ursprüngliche Nachricht-----
Von: mtnes...@gmail.com [mailto:mtnes...@gmail.com] Im Auftrag von Simon 
Rosenthal
Gesendet: Mittwoch, 8. Juni 2011 03:56
An: solr-user@lucene.apache.org
Betreff: Re: How to deal with many files using solr external file field

Can you provide a stack trace for the OOM eexception ?

On Tue, Jun 7, 2011 at 4:25 PM, Bohnsack, Sven
<sven.bohns...@shopping24.de>wrote:

> Hi all,
>
> we're using solr 1.4 and external file field ([1]) for sorting our
> searchresults. We have about 40.000 Terms, for which we use this sorting
> option.
> Currently we're running into massive OutOfMemory-Problems and were not
> pretty sure, what's the matter. It seems that the garbage collector stops
> working or some processes are going wild. However, solr starts to allocate
> more and more RAM until we experience this OutOfMemory-Exception.
>
>
> We noticed the following:
>
> For some terms one could see in the solr log that there appear some
> java.io.FileNotFoundExceptions, when solr tries to load an external file for
> a term for which there is not such a file, e.g. solr tries to load the
> external score file for "trousers" but there ist none in the
> /solr/data-Folder.
>
> Question: is it possible, that those exceptions are responsible for the
> OutOfMemory-Problem or could it be due to the large(?) number of 40k terms
> for which we want to sort the result via external file field?
>
> I'm looking forward for your answers, suggestions and ideas :)
>
>
> Regards
> Sven
>
>
> [1]:
> http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html
>

Reply via email to