[
https://issues.apache.org/jira/browse/SOLR-711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fuad Efendi updated SOLR-711:
-----------------------------
Description:
>From
>[http://www.nabble.com/SimpleFacets%3A-Performance-Boost-for-Tokenized-Fields-td19033760.html]:
Scenario:
- 10,000,000 documents in the index;
- 5-10 terms per document;
- 200,000 unique terms for a tokenized field.
_Obviously calculating sizes of 200,000 intersections with FilterCache is 100
times slower than traversing 10 - 20,000 documents for smaller DocSets and
counting frequencies of Terms._
Not applicable if size of DocSet is close to total number of unique tokens
(200,000 in our scenario).
See SimpleFacets:
{code:title=SimpleFacets.java|borderStyle=solid}
public NamedList getFacetTermEnumCounts(
SolrIndexSearcher searcher,
DocSet docs, ...
{code}
was:
>From
>[url]http://www.nabble.com/SimpleFacets%3A-Performance-Boost-for-Tokenized-Fields-td19033760.html[/url]:
Scenario:
- 10,000,000 documents in the index;
- 5-10 terms per document;
- 200,000 unique terms for a tokenized field.
_Obviously calculating sizes of 200,000 intersections with FilterCache is 100
times slower than traversing 10 - 20,000 documents for smaller DocSets and
counting frequencies of Terms._
Not applicable if size of DocSet is close to total number of unique tokens
(200,000 in our scenario).
See SimpleFacets:
{{
public NamedList getFacetTermEnumCounts(
SolrIndexSearcher searcher,
DocSet docs,
String field,
int offset,
int limit,
int mincount,
boolean missing,
boolean sort,
String prefix)
throws IOException {...}
}}
trivial formatting
> SimpleFacets: Performance Boost for Tokenized Fields for smaller DocSet using
> Term Vectors
> ------------------------------------------------------------------------------------------
>
> Key: SOLR-711
> URL: https://issues.apache.org/jira/browse/SOLR-711
> Project: Solr
> Issue Type: Improvement
> Components: search
> Affects Versions: 1.3
> Reporter: Fuad Efendi
> Fix For: 1.4
>
> Original Estimate: 1680h
> Remaining Estimate: 1680h
>
> From
> [http://www.nabble.com/SimpleFacets%3A-Performance-Boost-for-Tokenized-Fields-td19033760.html]:
> Scenario:
> - 10,000,000 documents in the index;
> - 5-10 terms per document;
> - 200,000 unique terms for a tokenized field.
> _Obviously calculating sizes of 200,000 intersections with FilterCache is 100
> times slower than traversing 10 - 20,000 documents for smaller DocSets and
> counting frequencies of Terms._
> Not applicable if size of DocSet is close to total number of unique tokens
> (200,000 in our scenario).
> See SimpleFacets:
> {code:title=SimpleFacets.java|borderStyle=solid}
> public NamedList getFacetTermEnumCounts(
> SolrIndexSearcher searcher,
> DocSet docs, ...
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.