Re: Facet pivot 50.000.000 different values

2013-05-23 Thread Carlos Bonilla
In case anyone is interested, I solved my problem using the grouping
feature:

*query* -- filter query (if any)
*field* -- field that you want to count (in my case field B)

SolrQuery solrQuery = new SolrQuery(query);
solrQuery.add(group, true);
solrQuery.add(group.field, B); // Group by the field
solrQuery.add(group.ngroups, true);
solrQuery.setRows(0);

And in the response *getNGroups()* will give you the total number
of distinct values (total number of B distinct values)

Cheers,
Carlos.


2013/5/18 Carlos Bonilla carlosbonill...@gmail.com

 Hi Mikhail,
 yes the thing is that I need to take into account different queries and
 that's why I can't use the Terms Component.

 Cheers.


 2013/5/17 Mikhail Khludnev mkhlud...@griddynamics.com

 On Fri, May 17, 2013 at 12:47 PM, Carlos Bonilla
 carlosbonill...@gmail.comwrote:

  We
  only need to calculate how many different B values have more than 1
  document but it takes ages
 

 Carlos,
 It's not clear whether you need to take results of a query into account or
 just gather statistics from index. if later you can just enumerate terms
 and watch into TermsEnum.docFreq() . Am I getting it right?


 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com





Re: Facet pivot 50.000.000 different values

2013-05-18 Thread Carlos Bonilla
Hi Mikhail,
yes the thing is that I need to take into account different queries and
that's why I can't use the Terms Component.

Cheers.


2013/5/17 Mikhail Khludnev mkhlud...@griddynamics.com

 On Fri, May 17, 2013 at 12:47 PM, Carlos Bonilla
 carlosbonill...@gmail.comwrote:

  We
  only need to calculate how many different B values have more than 1
  document but it takes ages
 

 Carlos,
 It's not clear whether you need to take results of a query into account or
 just gather statistics from index. if later you can just enumerate terms
 and watch into TermsEnum.docFreq() . Am I getting it right?


 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com



Facet pivot 50.000.000 different values

2013-05-17 Thread Carlos Bonilla
Hi,
To calculate some stats we are using a field B with 50.000. different
values as facet pivot in a schema that contains 200.000.000 documents. We
only need to calculate how many different B values have more than 1
document but it takes ages Is there any other better way/configuration
to do this?

Configuration:
Solr 4.2.1
JVM Java 7
Max Java Heap size : 12Gb
8 GB RAM
Dual Core

Many thanks.


Re: Facet pivot 50.000.000 different values

2013-05-17 Thread Carlos Bonilla
Sorry, 16 GB RAM (not 8).


2013/5/17 Carlos Bonilla carlosbonill...@gmail.com

 Hi,
 To calculate some stats we are using a field B with 50.000.
 different values as facet pivot in a schema that contains 200.000.000
 documents. We only need to calculate how many different B values have
 more than 1 document but it takes ages Is there any other better
 way/configuration to do this?

 Configuration:
 Solr 4.2.1
 JVM Java 7
 Max Java Heap size : 12Gb
 8 GB RAM
 Dual Core

 Many thanks.



Re: Facet pivot 50.000.000 different values

2013-05-17 Thread Shawn Heisey
On 5/17/2013 2:47 AM, Carlos Bonilla wrote:
 To calculate some stats we are using a field B with 50.000. different
 values as facet pivot in a schema that contains 200.000.000 documents. We
 only need to calculate how many different B values have more than 1
 document but it takes ages Is there any other better way/configuration
 to do this?
 
 Configuration:
 Solr 4.2.1
 JVM Java 7
 Max Java Heap size : 12Gb
 8 GB RAM
 Dual Core


You probably don't have enough RAM.  With 200 million documents, I would
imagine that your index is considerably larger than 4GB in size.  With
the 16GB of RAM that you mentioned in your other message, this
configuration leaves 4GB of RAM for caching after Java manages to
allocate the entire 12GB heap - which it will do very quickly with a
large index.

See the following:

http://wiki.apache.org/solr/SolrPerformanceProblems

I don't know the size of your index.  If it is 100GB, then ideally you
would want to have at least 112GB of RAM, but you could probably make it
work in 64GB.

Thanks,
Shawn



Re: Facet pivot 50.000.000 different values

2013-05-17 Thread Mikhail Khludnev
On Fri, May 17, 2013 at 12:47 PM, Carlos Bonilla
carlosbonill...@gmail.comwrote:

 We
 only need to calculate how many different B values have more than 1
 document but it takes ages


Carlos,
It's not clear whether you need to take results of a query into account or
just gather statistics from index. if later you can just enumerate terms
and watch into TermsEnum.docFreq() . Am I getting it right?


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com