Re: Facet pivot 50.000.000 different values
In case anyone is interested, I solved my problem using the grouping feature: *query* -- filter query (if any) *field* -- field that you want to count (in my case field B) SolrQuery solrQuery = new SolrQuery(query); solrQuery.add(group, true); solrQuery.add(group.field, B); // Group by the field solrQuery.add(group.ngroups, true); solrQuery.setRows(0); And in the response *getNGroups()* will give you the total number of distinct values (total number of B distinct values) Cheers, Carlos. 2013/5/18 Carlos Bonilla carlosbonill...@gmail.com Hi Mikhail, yes the thing is that I need to take into account different queries and that's why I can't use the Terms Component. Cheers. 2013/5/17 Mikhail Khludnev mkhlud...@griddynamics.com On Fri, May 17, 2013 at 12:47 PM, Carlos Bonilla carlosbonill...@gmail.comwrote: We only need to calculate how many different B values have more than 1 document but it takes ages Carlos, It's not clear whether you need to take results of a query into account or just gather statistics from index. if later you can just enumerate terms and watch into TermsEnum.docFreq() . Am I getting it right? -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Facet pivot 50.000.000 different values
Hi Mikhail, yes the thing is that I need to take into account different queries and that's why I can't use the Terms Component. Cheers. 2013/5/17 Mikhail Khludnev mkhlud...@griddynamics.com On Fri, May 17, 2013 at 12:47 PM, Carlos Bonilla carlosbonill...@gmail.comwrote: We only need to calculate how many different B values have more than 1 document but it takes ages Carlos, It's not clear whether you need to take results of a query into account or just gather statistics from index. if later you can just enumerate terms and watch into TermsEnum.docFreq() . Am I getting it right? -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Facet pivot 50.000.000 different values
Hi, To calculate some stats we are using a field B with 50.000. different values as facet pivot in a schema that contains 200.000.000 documents. We only need to calculate how many different B values have more than 1 document but it takes ages Is there any other better way/configuration to do this? Configuration: Solr 4.2.1 JVM Java 7 Max Java Heap size : 12Gb 8 GB RAM Dual Core Many thanks.
Re: Facet pivot 50.000.000 different values
Sorry, 16 GB RAM (not 8). 2013/5/17 Carlos Bonilla carlosbonill...@gmail.com Hi, To calculate some stats we are using a field B with 50.000. different values as facet pivot in a schema that contains 200.000.000 documents. We only need to calculate how many different B values have more than 1 document but it takes ages Is there any other better way/configuration to do this? Configuration: Solr 4.2.1 JVM Java 7 Max Java Heap size : 12Gb 8 GB RAM Dual Core Many thanks.
Re: Facet pivot 50.000.000 different values
On 5/17/2013 2:47 AM, Carlos Bonilla wrote: To calculate some stats we are using a field B with 50.000. different values as facet pivot in a schema that contains 200.000.000 documents. We only need to calculate how many different B values have more than 1 document but it takes ages Is there any other better way/configuration to do this? Configuration: Solr 4.2.1 JVM Java 7 Max Java Heap size : 12Gb 8 GB RAM Dual Core You probably don't have enough RAM. With 200 million documents, I would imagine that your index is considerably larger than 4GB in size. With the 16GB of RAM that you mentioned in your other message, this configuration leaves 4GB of RAM for caching after Java manages to allocate the entire 12GB heap - which it will do very quickly with a large index. See the following: http://wiki.apache.org/solr/SolrPerformanceProblems I don't know the size of your index. If it is 100GB, then ideally you would want to have at least 112GB of RAM, but you could probably make it work in 64GB. Thanks, Shawn
Re: Facet pivot 50.000.000 different values
On Fri, May 17, 2013 at 12:47 PM, Carlos Bonilla carlosbonill...@gmail.comwrote: We only need to calculate how many different B values have more than 1 document but it takes ages Carlos, It's not clear whether you need to take results of a query into account or just gather statistics from index. if later you can just enumerate terms and watch into TermsEnum.docFreq() . Am I getting it right? -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com