I've also seen the suggestion (more from a pure Lucene perspective) of
breaking
apart your dates. Remember that the time/space issues are due to the number
of
terms. So it's possible (although I haven't tried it) to, index many fewer
distinct
terms. e.g. break your dates into some number of fields, say

<year><month><day><hour><minute><second><millisecond>
or maybe
<YearMonthDay><HourMinute><SecondMillisecond>
or
<YearMonthDayHour><MinuteSecond><Millisecond>
or...

and then dance fancy with the queries you generate.

I haven't tried this myself, but if you do I'd really like to get an idea of
how much it improves your results.
Of course I'm not real sure how to accomplish this in SOLR, but free advice
is definitely worth what you pay for it <GG>

Best
Erick

On Wed, Oct 29, 2008 at 4:47 PM, Alok Dhir <[EMAIL PROTECTED]> wrote:

> Well, no - we don't care so much about the seconds, but hours & minutes are
> indeed crucial.
>
> ---
> Alok K. Dhir
> Symplicity Corporation
> www.symplicity.com
> (703) 351-0200 x 8080
> [EMAIL PROTECTED]
>
> On Oct 29, 2008, at 4:41 PM, Chris Harris wrote:
>
>  Do you need to search down to the minutes and seconds level? If searching
>> by
>> date provides sufficient granularity, for instance, you can normalize all
>> the time-of-day portions of the timestamps to midnight while indexing. (So
>> index any event happening on Oct 01, 2008 as 2008-10-01T00:00:00Z.) That
>> would give Solr many fewer unique timestamp values to go through.
>>
>> On Wed, Oct 29, 2008 at 1:30 PM, Alok Dhir <[EMAIL PROTECTED]> wrote:
>>
>>  Hi -- using solr 1.3 -- roughly 11M docs on a 64 gig 8 core machine.
>>>
>>> Fairly simple schema -- no large text fields, standard request handler.
>>>  4
>>> small facet fields.
>>>
>>> The index is an event log -- a primary search/retrieval requirement is
>>> date
>>> range queries.
>>>
>>> A simple query without a date range subquery is ridiculously fast - 2ms.
>>> The same query with a date range takes up to 30s (30,000ms).
>>>
>>> Concrete example, this query just look 18s:
>>>
>>>      instance:client\-csm.symplicity.com AND dt:[2008-10-01T04:00:00Z TO
>>> 2008-10-30T03:59:59Z] AND label_facet:"Added to Position"
>>>
>>> The exact same query without the date range took 2ms.
>>>
>>> I saw a thread from Apr 2008 which explains the problem being due to too
>>> much precision on the DateField type, and the range expansion leading to
>>> far
>>> too many elements being checked.  Proposed solution appears to be a hack
>>> where you index date fields as strings and hacking together date
>>> functions
>>> to generate proper queries/format results.
>>>
>>> Does this remain the recommended solution to this issue?
>>>
>>> Thanks
>>>
>>> ---
>>> Alok K. Dhir
>>> Symplicity Corporation
>>> www.symplicity.com
>>> (703) 351-0200 x 8080
>>> [EMAIL PROTECTED]
>>>
>>>
>>>
>

Reply via email to