Please note, for now, that this problem is not relevant for us anymore,
and we will change our c-field from being of type string (docValue) to
being of type long (docValue). And faceting on huge numbers of long
docValues seem to perform very well - except for
If anyone if following this one, just an update. We are not going to
upgrade to 4.5.1 in order to see if the String facet performance problem
has been fixed. Instead we have made a few hacks around our data so that
we can store the c-field (c_dstr_doc_sto) as long instead
(c_dlng_doc_sto). So
Per,
As you are seeing there are different implementations for calculating
facets for numeric fields and string fields. The numeric fields I believe
are using an int-to-int or long-to-int hashmap to hold the facet counts.
This map grows as values are added to it. The String version uses an int
Thanks for all the help, guys!
Just to clarify. Everything is working functionality-wise - we have
tests showing that.
It is just that two similar queries (hitting the same number of rows
(only 6 among 12billion in this example) and resulting in the same
number of facet-groups etc etc) is
Forget about the quoted comment a the bottom below. It is not true. Both
the fast/efficient and the slow/memory-consuming query follow the
getTermCounts-path.
But I have identified another place where they take different paths in
the code. In SimpleFacets.getTermCounts you will find the code
Before lucene 4.5 docvalues were loaded entirely into RAM.
I'm not going to waste time debugging any old code releases here, you
should upgrade to the latest release!
On Wed, Nov 6, 2013 at 4:58 AM, Per Steffensen st...@designware.dk wrote:
Forget about the quoted comment a the bottom below. It
It seems like NumericFacets.getCounts is using the FieldCache. This is
what we wanted to avoid by using doc-values in the first place - because
we have experienced so many times that the FieldCache makes us go OOM.
We where told that if we used doc-values the FieldCache would not be
used. But
On 11/6/13 11:43 AM, Robert Muir wrote:
Before lucene 4.5 docvalues were loaded entirely into RAM.
I'm not going to waste time debugging any old code releases here, you
should upgrade to the latest release!
Ok, thanks!
I do not consider it a bug (just a performance issue), so no debugging
Hi
We have a 6-Solr-node (release 4.4.0) setup with 12billion small
documents loadad. The documents have the following fields
* a_dlng_doc_sto
* b_dlng_doc_sto
* c_dstr_doc_sto
* timestamp_lng_ind_sto
* d_lng_ind_sto
From schema.xml
dynamicField name=*_dstr_doc_sto type=dstring
Looking at threaddumps
It seems like one of the major differences in what is done for
c_dstr_doc_sto vs a_dlng_doc_sto is in SimpleFactes.getFacetFieldCounts,
where c_dstr_doc_sto takes the getTermCounts-path and a_dlng_doc_sto
takes the getListedTermCounts-path.
String termList
If you are querying on a field, you should index it!
On Tue, Nov 5, 2013 at 5:47 AM, Per Steffensen st...@designware.dk wrote:
Hi
We have a 6-Solr-node (release 4.4.0) setup with 12billion small documents
loadad. The documents have the following fields
* a_dlng_doc_sto
* b_dlng_doc_sto
*
On 11/5/13 3:30 PM, Robert Muir wrote:
If you are querying on a field, you should index it!
Believe I do that. Query looks like this timestamp_lng_ind_sto:[x TO y]
AND d_lng_ind_sto:(a OR b OR ... OR n) and both timestamp_lng_ind_sto
and d_lng_ind_sto are indexed.
Please elaborate!
I
On Tue, Nov 5, 2013 at 9:42 AM, Per Steffensen st...@designware.dk wrote:
On 11/5/13 3:30 PM, Robert Muir wrote:
If you are querying on a field, you should index it!
Believe I do that. Query looks like this timestamp_lng_ind_sto:[x TO y] AND
d_lng_ind_sto:(a OR b OR ... OR n) and both
H. I was just looking at the DocValues Wiki page. Should I add a bit
about docValuesFormat supporting Disk as a 4.5 plus feature? Currently it
kind of looks like you can do that with 4.2
Or am I off base here? I'm going from CHANGES.txt about LUCENE-5124
Erick
On Tue, Nov 5, 2013 at
On Tue, Nov 5, 2013 at 3:27 PM, Erick Erickson erickerick...@gmail.com wrote:
H. I was just looking at the DocValues Wiki page. Should I add a bit
about docValuesFormat supporting Disk as a 4.5 plus feature? Currently it
kind of looks like you can do that with 4.2
It's in the Solr Ref
Hmmm, what I'm referring to is this bit:
fieldType name=string_ondisk class=solr.StrField docValuesFormat=Disk
/
The docValuesFormat=Disk bit isn't supported until 4.5, which doesn't
seem clear in either place. Feel free to disagree of course :).
On Tue, Nov 5, 2013 at 11:43 AM, Cassandra
On 11/5/2013 11:56 AM, Erick Erickson wrote:
Hmmm, what I'm referring to is this bit:
|||fieldType||name||=||string_ondisk||class||=||solr.StrField||docValuesFormat||=||Disk||/|
|
|
|The docValuesFormat=Disk bit isn't supported until 4.5, which
doesn't seem clear in either place. Feel free to
17 matches
Mail list logo