[jira] Updated: (SOLR-711) SimpleFacets: Performance Boost for Tokenized Fields for smaller DocSet using Term Vectors

Fuad Efendi (JIRA) Tue, 19 Aug 2008 13:04:46 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Fuad Efendi updated SOLR-711:
-----------------------------

    Description: 
>From 
>[http://www.nabble.com/SimpleFacets%3A-Performance-Boost-for-Tokenized-Fields-td19033760.html]:

Scenario:
- 10,000,000 documents in the index; 
- 5-10 terms per document; 
- 200,000 unique terms for a tokenized field. 

_Obviously calculating sizes of 200,000 intersections with FilterCache is 100 
times slower than traversing 10 - 20,000 documents for smaller DocSets and 
counting frequencies of Terms._

Not applicable if size of DocSet is close to total number of unique tokens 
(200,000 in our scenario).

See   SimpleFacets:
{code:title=SimpleFacets.java|borderStyle=solid}
public NamedList getFacetTermEnumCounts(
  SolrIndexSearcher searcher, 
  DocSet docs, ...
{code}




  was:
>From 
>[url]http://www.nabble.com/SimpleFacets%3A-Performance-Boost-for-Tokenized-Fields-td19033760.html[/url]:

Scenario:
- 10,000,000 documents in the index; 
- 5-10 terms per document; 
- 200,000 unique terms for a tokenized field. 

_Obviously calculating sizes of 200,000 intersections with FilterCache is 100 
times slower than traversing 10 - 20,000 documents for smaller DocSets and 
counting frequencies of Terms._

Not applicable if size of DocSet is close to total number of unique tokens 
(200,000 in our scenario).

See   SimpleFacets:
 {{
public NamedList getFacetTermEnumCounts(
  SolrIndexSearcher searcher, 
  DocSet docs, 
  String field, 
  int offset, 
  int limit, 
  int mincount, 
  boolean missing, 
  boolean sort, 
  String prefix)
throws IOException {...}
}}





trivial formatting

> SimpleFacets: Performance Boost for Tokenized Fields for smaller DocSet using 
> Term Vectors
> ------------------------------------------------------------------------------------------
>
>                 Key: SOLR-711
>                 URL: https://issues.apache.org/jira/browse/SOLR-711
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Fuad Efendi
>             Fix For: 1.4
>
>   Original Estimate: 1680h
>  Remaining Estimate: 1680h
>
> From 
> [http://www.nabble.com/SimpleFacets%3A-Performance-Boost-for-Tokenized-Fields-td19033760.html]:
> Scenario:
> - 10,000,000 documents in the index; 
> - 5-10 terms per document; 
> - 200,000 unique terms for a tokenized field. 
> _Obviously calculating sizes of 200,000 intersections with FilterCache is 100 
> times slower than traversing 10 - 20,000 documents for smaller DocSets and 
> counting frequencies of Terms._
> Not applicable if size of DocSet is close to total number of unique tokens 
> (200,000 in our scenario).
> See   SimpleFacets:
> {code:title=SimpleFacets.java|borderStyle=solid}
> public NamedList getFacetTermEnumCounts(
>   SolrIndexSearcher searcher, 
>   DocSet docs, ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-711) SimpleFacets: Performance Boost for Tokenized Fields for smaller DocSet using Term Vectors

Reply via email to