Re: Setting group.ngroups=true considerable slows down queries
Hi! As as I know currently there isn't another way. Unfortunately the performance degrades badly when having a lot of unique groups. I think an issue should be opened to investigate how we can improve this... Question: Does Solr have a decent chuck of heap space (-Xmx)? Because grouping requires quite some heap space (also without group.ngroups=true). Martijn On 9 December 2011 23:08, Michael Jakl jakl.mich...@gmail.com wrote: Hi! On Fri, Dec 9, 2011 at 17:41, Martijn v Groningen martijn.v.gronin...@gmail.com wrote: On what field type are you grouping and what version of Solr are you using? Grouping by string field is faster. The field is defined as follows: field name=signature type=string indexed=true stored=true / Grouping itself is quite fast, only computing the number of groups seems to increase significantly with the number of documents (linear). I was hoping for a faster solution to compute the total number of distinct documents (or in other terms, the number of distinct values in the signature field). Facets came to mind, but as far as I could see, they don't offer a total number of facets as well. I'm using Solr 3.5 (upgraded from Solr 3.4 without reindexing). Thanks, Michael On 9 December 2011 12:46, Michael Jakl jakl.mich...@gmail.com wrote: Hi, I'm using the grouping feature of Solr to return a list of unique documents together with a count of the duplicates. Essentially I use Solr's signature algorithm to create the signature field and use grouping on it. To provide good numbers for paging through my result list, I'd like to compute the total number of documents found (= matches) and the number of unique documents (= ngroups). Unfortunately, enabling group.ngroups considerably slows down the query (from 500ms to 23000ms for a result list of roughly 30 documents). Is there a faster way to compute the number of groups (or unique values in the signature field) in the search result? My Solr instance currently contains about 50 million documents and around 10% of them are duplicates. Thank you, Michael -- Met vriendelijke groet, Martijn van Groningen -- Met vriendelijke groet, Martijn van Groningen
Re: Setting group.ngroups=true considerable slows down queries
Hi! On Mon, Dec 12, 2011 at 13:57, Martijn v Groningen martijn.v.gronin...@gmail.com wrote: As as I know currently there isn't another way. Unfortunately the performance degrades badly when having a lot of unique groups. I think an issue should be opened to investigate how we can improve this... Question: Does Solr have a decent chuck of heap space (-Xmx)? Because grouping requires quite some heap space (also without group.ngroups=true). Thanks, for answering. The Server has gotten as much memory as the machine can afford (without swapping): -Xmx21g \ -Xms4g \ Shall I open an issue as a subtask of SOLR-236 even though there is already a performance related task (SOLR-2205)? Cheers, Michael
Re: Setting group.ngroups=true considerable slows down queries
I'd not make a subtaks onder SOLR-236 b/c it is related to a completely different implementation which was never committed. SOLR-2205 is related to general result grouping and think should be closed. I'd make a new issue for improving the performance of group.ngroups=true when there are a lot of unique groups. Martijn On 12 December 2011 14:32, Michael Jakl jakl.mich...@gmail.com wrote: Hi! On Mon, Dec 12, 2011 at 13:57, Martijn v Groningen martijn.v.gronin...@gmail.com wrote: As as I know currently there isn't another way. Unfortunately the performance degrades badly when having a lot of unique groups. I think an issue should be opened to investigate how we can improve this... Question: Does Solr have a decent chuck of heap space (-Xmx)? Because grouping requires quite some heap space (also without group.ngroups=true). Thanks, for answering. The Server has gotten as much memory as the machine can afford (without swapping): -Xmx21g \ -Xms4g \ Shall I open an issue as a subtask of SOLR-236 even though there is already a performance related task (SOLR-2205)? Cheers, Michael -- Met vriendelijke groet, Martijn van Groningen
Setting group.ngroups=true considerable slows down queries
Hi, I'm using the grouping feature of Solr to return a list of unique documents together with a count of the duplicates. Essentially I use Solr's signature algorithm to create the signature field and use grouping on it. To provide good numbers for paging through my result list, I'd like to compute the total number of documents found (= matches) and the number of unique documents (= ngroups). Unfortunately, enabling group.ngroups considerably slows down the query (from 500ms to 23000ms for a result list of roughly 30 documents). Is there a faster way to compute the number of groups (or unique values in the signature field) in the search result? My Solr instance currently contains about 50 million documents and around 10% of them are duplicates. Thank you, Michael
Re: Setting group.ngroups=true considerable slows down queries
Hi Micheal, On what field type are you grouping and what version of Solr are you using? Grouping by string field is faster. Martijn On 9 December 2011 12:46, Michael Jakl jakl.mich...@gmail.com wrote: Hi, I'm using the grouping feature of Solr to return a list of unique documents together with a count of the duplicates. Essentially I use Solr's signature algorithm to create the signature field and use grouping on it. To provide good numbers for paging through my result list, I'd like to compute the total number of documents found (= matches) and the number of unique documents (= ngroups). Unfortunately, enabling group.ngroups considerably slows down the query (from 500ms to 23000ms for a result list of roughly 30 documents). Is there a faster way to compute the number of groups (or unique values in the signature field) in the search result? My Solr instance currently contains about 50 million documents and around 10% of them are duplicates. Thank you, Michael -- Met vriendelijke groet, Martijn van Groningen
Re: Setting group.ngroups=true considerable slows down queries
Hi! On Fri, Dec 9, 2011 at 17:41, Martijn v Groningen martijn.v.gronin...@gmail.com wrote: On what field type are you grouping and what version of Solr are you using? Grouping by string field is faster. The field is defined as follows: field name=signature type=string indexed=true stored=true / Grouping itself is quite fast, only computing the number of groups seems to increase significantly with the number of documents (linear). I was hoping for a faster solution to compute the total number of distinct documents (or in other terms, the number of distinct values in the signature field). Facets came to mind, but as far as I could see, they don't offer a total number of facets as well. I'm using Solr 3.5 (upgraded from Solr 3.4 without reindexing). Thanks, Michael On 9 December 2011 12:46, Michael Jakl jakl.mich...@gmail.com wrote: Hi, I'm using the grouping feature of Solr to return a list of unique documents together with a count of the duplicates. Essentially I use Solr's signature algorithm to create the signature field and use grouping on it. To provide good numbers for paging through my result list, I'd like to compute the total number of documents found (= matches) and the number of unique documents (= ngroups). Unfortunately, enabling group.ngroups considerably slows down the query (from 500ms to 23000ms for a result list of roughly 30 documents). Is there a faster way to compute the number of groups (or unique values in the signature field) in the search result? My Solr instance currently contains about 50 million documents and around 10% of them are duplicates. Thank you, Michael -- Met vriendelijke groet, Martijn van Groningen