: There are 14,696,502 documents, we are doing a lot of funky stuff but I'm
: not sure which is most likely to cause an impact. We're sorting on a dynamic
: field there are about 1000 different variants of this field that look like
: "priority_sort_for_<client_id>", which is an integer field. I've heard that
: sorting can have a big impact on memory consumption, could that be it?

sorting on a field requires that an array of the corrisponding type be 
constructed for that field - the size of the array is the size of maxDoc 
(ie: the number of documents in your index, including deleted documents).

If you are using TrieInts, and have an index with no deletions, sorting 
~14.7Mil docs on 1000 diff int fields will take up about ~55GB.

Thats a minimum just for the sorting of those int fields (SortablIntField 
which keeps a string version of the field value will be signifcantly 
bigger) and doesn't take into consideration any other data structures used 
for searching.

I'm not a GC expert, but based on my limited understanding your graph 
actually seems fine to me .. particularly the part where it says 
you've configured a Max heap of ~122GB or ram, and it's 
never spend anytime doing ConcurrentMarkSweep.  My uneducated 
understanding of those two numbers is that you've told the JVM it can use 
an ungodly amount of RAM, so it is.  It's done some basic cleanup of 
young gen (ParNew) but because the heap size has never gone above 50GB, 
it hasn't found any reason to actualy start a CMS GC to look for dea 
objects in Old Gen that it can clean up.


(Can someone who understands GC and JVM tunning better then me please 
sanity check me on that?)


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss      ...  Stump The Chump!

Reply via email to