I think the memory size is about the (number of groups) * ((size of key) + (a little memory for the bucket to hold members of that group). This latter is (I'm guessing here) quite small.
Sure, you can have all 500.000 groups consume memory, quite easily. q=*:* (OK, that one wouldn't be scored, but you get the idea). Whether they're returned or not is not germane, they all have to be counted (Martjin may jump all over _that_. Consider some group X with a low-scoring document in it. When could you _know_ that you don't need to return that group? Unfortunately, not until the very last document is scored since it could be a perfect match for the query. Best Erick On Fri, Aug 24, 2012 at 10:11 AM, reikje <reik.sch...@gmail.com> wrote: > I have a question regarding expected memory consumption when using field > collapsing with the ngroups parameter. We have indexed a forum with 500.000 > threads. Each thread is a group, so we can have max. 500.000 groups. I read > somewhere that for each group a org.apache.lucene.util.ByteRef is created > which is added to a ArrayList. Whats the content of the byte[] the ByteRef > is created with? It will help me to estimate how much memory is used in > worst case if all groups are returned (which is unlikly). > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/ngroups-question-tp4003093.html > Sent from the Solr - User mailing list archive at Nabble.com.