I saw some previous threads related to this subject, but on a slightly
different use case, so staring a new thread...
For reference, a related thread topic can be found here:
http://www.lucidimagination.com/search/document/2025d6670004838b/date_faceting_and_double_counting#2025d6670004838b
This has to do with date facets setting double counts across adjacent date
facets, if the documents' time is 'on the cusp'.
In fact, I found this problem because I was testing date facets where the gap
is +1SECOND. In this case many/most/all document counts can be duplicated,
because as a general rule in my case, milliseconds are set to 0, and there is
'No logic for milliseconds' in the DateMathParser. This behaviour can sometimes
be observed in general date faceting -- in the +1SECOND scenario, it is much
more likely to occur (because these values are more likely to be quantized).
I had a look at the date math with regards this (in SimpleFacets.java :
getFacetDateCounts()), and I noticed the following line of code (~line 622):
resInner.add(label, rangeCount(sf,low,high,true,true));
The two 'true' booleans mean: 'include at start of range' *AND* 'include at end
of range'. Any documents that live on the border will match in date.facet[n]
and date.facet[n+1], because of the 'double-sided' inclusive range search.
By convention, a time value of '0' (00:00) belongs to the next period, rather
than the previous, so I changed the *first* boolean to false, and voila! no
more duplications! I believe this will be the case for other gap values, not
just +1SECOND.
As there's no need to read any '[' or '{' because date faceting doesn't
have/need these, the patch couldn't be simpler.
My question to the experts of this code is:
Was this done for a reason - are there any implications somewhere else for
having a Lucene-double-sided-inclusive search?
I can't think of any reason, but perhaps someone knows differently?
If interested parties are in agreement, I can create an issue for it and the
associated fix.
Many thanks,
Peter
_________________________________________________________________
Tell us your greatest, weirdest and funniest Hotmail stories
http://clk.atdmt.com/UKM/go/195013117/direct/01/