Hi!

As as I know currently there isn't another way. Unfortunately the
performance degrades badly when having a lot of unique groups.
I think an issue should be opened to investigate how we can improve this...

Question: Does Solr have a decent chuck of heap space (-Xmx)? Because
grouping requires quite some heap space (also without
group.ngroups=true).

Martijn

On 9 December 2011 23:08, Michael Jakl <jakl.mich...@gmail.com> wrote:
> Hi!
>
> On Fri, Dec 9, 2011 at 17:41, Martijn v Groningen
> <martijn.v.gronin...@gmail.com> wrote:
>> On what field type are you grouping and what version of Solr are you
>> using? Grouping by string field is faster.
>
> The field is defined as follows:
> <field name="signature" type="string" indexed="true" stored="true" />
>
> Grouping itself is quite fast, only computing the number of groups
> seems to increase significantly with the number of documents (linear).
>
> I was hoping for a faster solution to compute the total number of
> distinct documents (or in other terms, the number of distinct values
> in the signature field). Facets came to mind, but as far as I could
> see, they don't offer a total number of facets as well.
>
> I'm using Solr 3.5 (upgraded from Solr 3.4 without reindexing).
>
> Thanks,
> Michael
>
>> On 9 December 2011 12:46, Michael Jakl <jakl.mich...@gmail.com> wrote:
>>> Hi, I'm using the grouping feature of Solr to return a list of unique
>>> documents together with a count of the duplicates.
>>>
>>> Essentially I use Solr's signature algorithm to create the "signature"
>>> field and use grouping on it.
>>>
>>> To provide good numbers for paging through my result list, I'd like to
>>> compute the total number of documents found (= matches) and the number
>>> of unique documents (= ngroups). Unfortunately, enabling
>>> "group.ngroups" considerably slows down the query (from 500ms to
>>> 23000ms for a result list of roughly 300000 documents).
>>>
>>> Is there a faster way to compute the number of groups (or unique
>>> values in the signature field) in the search result? My Solr instance
>>> currently contains about 50 million documents and around 10% of them
>>> are duplicates.
>>>
>>> Thank you,
>>> Michael
>>
>>
>>
>> --
>> Met vriendelijke groet,
>>
>> Martijn van Groningen



-- 
Met vriendelijke groet,

Martijn van Groningen

Reply via email to