You can also sort on a field by using a function query instead of the
"sort=field+desc" parameter. This will not eat up memory. Instead, it
will be slower. In short, it is a classic speed v.s. space trade-off.

You'll have to benchmark and decide which you want, and maybe some
fields need the fast sort and some can get away with the slow one.

http://www.lucidimagination.com/search/?q=function+query

On Thu, Sep 30, 2010 at 11:47 AM, Jeff Moss <jm...@heavyobjects.com> wrote:
> I think you've probably nailed it Chris, thanks for that, I think I can get
> by with a different approach than this.
>
> Do you know if I will get the same memory consumption using the
> RandomFieldType vs the TrieInt?
>
> -Jeff
>
> On Thu, Sep 30, 2010 at 12:36 PM, Chris Hostetter
> <hossman_luc...@fucit.org>wrote:
>
>>
>> : There are 14,696,502 documents, we are doing a lot of funky stuff but I'm
>> : not sure which is most likely to cause an impact. We're sorting on a
>> dynamic
>> : field there are about 1000 different variants of this field that look
>> like
>> : "priority_sort_for_<client_id>", which is an integer field. I've heard
>> that
>> : sorting can have a big impact on memory consumption, could that be it?
>>
>> sorting on a field requires that an array of the corrisponding type be
>> constructed for that field - the size of the array is the size of maxDoc
>> (ie: the number of documents in your index, including deleted documents).
>>
>> If you are using TrieInts, and have an index with no deletions, sorting
>> ~14.7Mil docs on 1000 diff int fields will take up about ~55GB.
>>
>> Thats a minimum just for the sorting of those int fields (SortablIntField
>> which keeps a string version of the field value will be signifcantly
>> bigger) and doesn't take into consideration any other data structures used
>> for searching.
>>
>> I'm not a GC expert, but based on my limited understanding your graph
>> actually seems fine to me .. particularly the part where it says
>> you've configured a Max heap of ~122GB or ram, and it's
>> never spend anytime doing ConcurrentMarkSweep.  My uneducated
>> understanding of those two numbers is that you've told the JVM it can use
>> an ungodly amount of RAM, so it is.  It's done some basic cleanup of
>> young gen (ParNew) but because the heap size has never gone above 50GB,
>> it hasn't found any reason to actualy start a CMS GC to look for dea
>> objects in Old Gen that it can clean up.
>>
>>
>> (Can someone who understands GC and JVM tunning better then me please
>> sanity check me on that?)
>>
>>
>> -Hoss
>>
>> --
>> http://lucenerevolution.org/  ...  October 7-8, Boston
>> http://bit.ly/stump-hoss      ...  Stump The Chump!
>>
>>
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to