Re: Excessive Heap Usage from docValues?

2014-03-20 Thread Toke Eskildsen
On Wed, 2014-03-19 at 22:01 +0100, tradergene wrote:
 I have a Solr index with about 32 million docs.  Each doc is relatively
 small but has multiple dynamic fields that are storing INTs.  The initial
 problem that I had to resolve is that we were running into OOMs (on a 48GB
 heap, 130GB on-disk index).  I narrowed that issue down to Lucene FieldCache
 filling up the heap due to all the dynamic fields.

48GB heap for a 130GB, 32M docs index sounds excessive.  Could you tell
us how many unique fields your searcher uses in total for faceting and
maybe the overall layout of your index? Is this perhaps a case of many
distinct groups of data put in the same index, where the searches are
always within a single group and each group has its own fields for
faceting? Are the fields single- or multi-valued?

- Toke Eskildsen, State and University Library, Denmark




Excessive Heap Usage from docValues?

2014-03-19 Thread tradergene
Hello All,

I'm hoping to get your assistance in debugging what seems like a memory
issue.

I have a Solr index with about 32 million docs.  Each doc is relatively
small but has multiple dynamic fields that are storing INTs.  The initial
problem that I had to resolve is that we were running into OOMs (on a 48GB
heap, 130GB on-disk index).  I narrowed that issue down to Lucene FieldCache
filling up the heap due to all the dynamic fields.  To mitigate this, I
enabled docValues on the schema for many of the dynamicField culprits.  This
dropped the FieldCache down to almost nothing.

Now, when re-indexing for docValues functionality, I ran into OOMs as soon
as I reached 12 million of the 32 million documents.  Before enabling
docValues, I was able to load up Solr on a 48GB heap but ran into problems
after enough unique searches occurred (normal FieldCache issue).  Now, with
docValues, a 48GB heap is giving me OOM after 12 million docs indexed.  I
split the collection into 10 shards and with 2 nodes (48GB heap each) was
able to get up to 21 million docs indexed.  Now, I've had to move the shards
to more nodes and am up to 10 shards across 4 nodes and am hoping to be able
to get all 32 million docs indexed.  This will be 48GB x 4 heap which seems
really excessive for an index that was only 132GB pre-docValues.

I would love some thoughts as to whether I'm expecting too much efficiency
with docValues enabled.  I was under the impression that docValues would
increase storage requirements on disk (which it has), but l thought that RAM
usage would go down during searching (which I haven't tested) as well as
indexing.

Thanks for any assistance anyone can provide.

Gene



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Excessive-Heap-Usage-from-docValues-tp4125577.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Excessive Heap Usage from docValues?

2014-03-19 Thread Otis Gospodnetic
Hi,

Which type of doc values? See Wiki or reference guide for a list of types.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Mar 19, 2014 5:02 PM, tradergene nos...@krevets.com wrote:

 Hello All,

 I'm hoping to get your assistance in debugging what seems like a memory
 issue.

 I have a Solr index with about 32 million docs.  Each doc is relatively
 small but has multiple dynamic fields that are storing INTs.  The initial
 problem that I had to resolve is that we were running into OOMs (on a 48GB
 heap, 130GB on-disk index).  I narrowed that issue down to Lucene
 FieldCache
 filling up the heap due to all the dynamic fields.  To mitigate this, I
 enabled docValues on the schema for many of the dynamicField culprits.
  This
 dropped the FieldCache down to almost nothing.

 Now, when re-indexing for docValues functionality, I ran into OOMs as soon
 as I reached 12 million of the 32 million documents.  Before enabling
 docValues, I was able to load up Solr on a 48GB heap but ran into problems
 after enough unique searches occurred (normal FieldCache issue).  Now, with
 docValues, a 48GB heap is giving me OOM after 12 million docs indexed.  I
 split the collection into 10 shards and with 2 nodes (48GB heap each) was
 able to get up to 21 million docs indexed.  Now, I've had to move the
 shards
 to more nodes and am up to 10 shards across 4 nodes and am hoping to be
 able
 to get all 32 million docs indexed.  This will be 48GB x 4 heap which seems
 really excessive for an index that was only 132GB pre-docValues.

 I would love some thoughts as to whether I'm expecting too much efficiency
 with docValues enabled.  I was under the impression that docValues would
 increase storage requirements on disk (which it has), but l thought that
 RAM
 usage would go down during searching (which I haven't tested) as well as
 indexing.

 Thanks for any assistance anyone can provide.

 Gene



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Excessive-Heap-Usage-from-docValues-tp4125577.html
 Sent from the Solr - User mailing list archive at Nabble.com.