Re: Setting group.ngroups=true considerable slows down queries

2011-12-12 Thread Martijn v Groningen
Hi!

As as I know currently there isn't another way. Unfortunately the
performance degrades badly when having a lot of unique groups.
I think an issue should be opened to investigate how we can improve this...

Question: Does Solr have a decent chuck of heap space (-Xmx)? Because
grouping requires quite some heap space (also without
group.ngroups=true).

Martijn

On 9 December 2011 23:08, Michael Jakl jakl.mich...@gmail.com wrote:
 Hi!

 On Fri, Dec 9, 2011 at 17:41, Martijn v Groningen
 martijn.v.gronin...@gmail.com wrote:
 On what field type are you grouping and what version of Solr are you
 using? Grouping by string field is faster.

 The field is defined as follows:
 field name=signature type=string indexed=true stored=true /

 Grouping itself is quite fast, only computing the number of groups
 seems to increase significantly with the number of documents (linear).

 I was hoping for a faster solution to compute the total number of
 distinct documents (or in other terms, the number of distinct values
 in the signature field). Facets came to mind, but as far as I could
 see, they don't offer a total number of facets as well.

 I'm using Solr 3.5 (upgraded from Solr 3.4 without reindexing).

 Thanks,
 Michael

 On 9 December 2011 12:46, Michael Jakl jakl.mich...@gmail.com wrote:
 Hi, I'm using the grouping feature of Solr to return a list of unique
 documents together with a count of the duplicates.

 Essentially I use Solr's signature algorithm to create the signature
 field and use grouping on it.

 To provide good numbers for paging through my result list, I'd like to
 compute the total number of documents found (= matches) and the number
 of unique documents (= ngroups). Unfortunately, enabling
 group.ngroups considerably slows down the query (from 500ms to
 23000ms for a result list of roughly 30 documents).

 Is there a faster way to compute the number of groups (or unique
 values in the signature field) in the search result? My Solr instance
 currently contains about 50 million documents and around 10% of them
 are duplicates.

 Thank you,
 Michael



 --
 Met vriendelijke groet,

 Martijn van Groningen



-- 
Met vriendelijke groet,

Martijn van Groningen


Re: Setting group.ngroups=true considerable slows down queries

2011-12-12 Thread Michael Jakl
Hi!

On Mon, Dec 12, 2011 at 13:57, Martijn v Groningen
martijn.v.gronin...@gmail.com wrote:
 As as I know currently there isn't another way. Unfortunately the
 performance degrades badly when having a lot of unique groups.
 I think an issue should be opened to investigate how we can improve this...

 Question: Does Solr have a decent chuck of heap space (-Xmx)? Because
 grouping requires quite some heap space (also without
 group.ngroups=true).

Thanks, for answering. The Server has gotten as much memory as the
machine can afford (without swapping):
  -Xmx21g \
  -Xms4g \

Shall I open an issue as a subtask of SOLR-236 even though there is
already a performance related task (SOLR-2205)?

Cheers,
Michael


Re: Setting group.ngroups=true considerable slows down queries

2011-12-12 Thread Martijn v Groningen
I'd not make a subtaks onder SOLR-236 b/c it is related to a
completely different implementation which was never committed.
SOLR-2205 is related to general result grouping and think should be closed.
I'd make a new issue for improving the performance of
group.ngroups=true when there are a lot of unique groups.

Martijn

On 12 December 2011 14:32, Michael Jakl jakl.mich...@gmail.com wrote:
 Hi!

 On Mon, Dec 12, 2011 at 13:57, Martijn v Groningen
 martijn.v.gronin...@gmail.com wrote:
 As as I know currently there isn't another way. Unfortunately the
 performance degrades badly when having a lot of unique groups.
 I think an issue should be opened to investigate how we can improve this...

 Question: Does Solr have a decent chuck of heap space (-Xmx)? Because
 grouping requires quite some heap space (also without
 group.ngroups=true).

 Thanks, for answering. The Server has gotten as much memory as the
 machine can afford (without swapping):
  -Xmx21g \
  -Xms4g \

 Shall I open an issue as a subtask of SOLR-236 even though there is
 already a performance related task (SOLR-2205)?

 Cheers,
 Michael



-- 
Met vriendelijke groet,

Martijn van Groningen


Setting group.ngroups=true considerable slows down queries

2011-12-09 Thread Michael Jakl
Hi, I'm using the grouping feature of Solr to return a list of unique
documents together with a count of the duplicates.

Essentially I use Solr's signature algorithm to create the signature
field and use grouping on it.

To provide good numbers for paging through my result list, I'd like to
compute the total number of documents found (= matches) and the number
of unique documents (= ngroups). Unfortunately, enabling
group.ngroups considerably slows down the query (from 500ms to
23000ms for a result list of roughly 30 documents).

Is there a faster way to compute the number of groups (or unique
values in the signature field) in the search result? My Solr instance
currently contains about 50 million documents and around 10% of them
are duplicates.

Thank you,
Michael


Re: Setting group.ngroups=true considerable slows down queries

2011-12-09 Thread Martijn v Groningen
Hi Micheal,

On what field type are you grouping and what version of Solr are you
using? Grouping by string field is faster.

Martijn

On 9 December 2011 12:46, Michael Jakl jakl.mich...@gmail.com wrote:
 Hi, I'm using the grouping feature of Solr to return a list of unique
 documents together with a count of the duplicates.

 Essentially I use Solr's signature algorithm to create the signature
 field and use grouping on it.

 To provide good numbers for paging through my result list, I'd like to
 compute the total number of documents found (= matches) and the number
 of unique documents (= ngroups). Unfortunately, enabling
 group.ngroups considerably slows down the query (from 500ms to
 23000ms for a result list of roughly 30 documents).

 Is there a faster way to compute the number of groups (or unique
 values in the signature field) in the search result? My Solr instance
 currently contains about 50 million documents and around 10% of them
 are duplicates.

 Thank you,
 Michael



-- 
Met vriendelijke groet,

Martijn van Groningen


Re: Setting group.ngroups=true considerable slows down queries

2011-12-09 Thread Michael Jakl
Hi!

On Fri, Dec 9, 2011 at 17:41, Martijn v Groningen
martijn.v.gronin...@gmail.com wrote:
 On what field type are you grouping and what version of Solr are you
 using? Grouping by string field is faster.

The field is defined as follows:
field name=signature type=string indexed=true stored=true /

Grouping itself is quite fast, only computing the number of groups
seems to increase significantly with the number of documents (linear).

I was hoping for a faster solution to compute the total number of
distinct documents (or in other terms, the number of distinct values
in the signature field). Facets came to mind, but as far as I could
see, they don't offer a total number of facets as well.

I'm using Solr 3.5 (upgraded from Solr 3.4 without reindexing).

Thanks,
Michael

 On 9 December 2011 12:46, Michael Jakl jakl.mich...@gmail.com wrote:
 Hi, I'm using the grouping feature of Solr to return a list of unique
 documents together with a count of the duplicates.

 Essentially I use Solr's signature algorithm to create the signature
 field and use grouping on it.

 To provide good numbers for paging through my result list, I'd like to
 compute the total number of documents found (= matches) and the number
 of unique documents (= ngroups). Unfortunately, enabling
 group.ngroups considerably slows down the query (from 500ms to
 23000ms for a result list of roughly 30 documents).

 Is there a faster way to compute the number of groups (or unique
 values in the signature field) in the search result? My Solr instance
 currently contains about 50 million documents and around 10% of them
 are duplicates.

 Thank you,
 Michael



 --
 Met vriendelijke groet,

 Martijn van Groningen