Solr 1.4 was the first tagged release with trie fields.

And Solr 1.4+ also includes a 'date' field based on 'trie' just for dates. If your dates are actually going to include hour/minute/second, not just calendar day-of-month, then I'd definitely use the built in solr trie date field, that's what it's for, will do the translation from calendar date-time to integer for you (in both directions), and add trie buckets for fast range querying too.

I was suggesting that just using 'int' might be simpler if you don't need hour/minute/second precision, but are just storing year-month-day. If you've got hour-minute-second too, no reason not to use Solr's date type, and lots of reasons to do so.

Jonathan

Dennis Gearon wrote:
So now, vs when 'trie' came out, Solr has an INT field that IS 'trie', right?

And nothing date/timestamp related has come out since, making 'trie'/INT the 
field of choice for timestamps, right?

Seems like the fastest choice.

I will have to read up on it.

Seems like my original choice to use unix timestamp as storage in my SQL 
database, vs native Postgres timestamp, will make everything easier between:
  PHP
  Symfony
  Postgres
  Solr

It's probably going to be a good idea to store two other columns in the search 
index for display, 'date', 'time'. That is, unless I force the user's 
javascript to generate the time and date from the unix timestamp. hmmmmmm.

Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/8/10, Jonathan Rochkind <rochk...@jhu.edu> wrote:

From: Jonathan Rochkind <rochk...@jhu.edu>
Subject: Re: How to import data with a different date format
To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
Date: Wednesday, September 8, 2010, 11:35 AM
So the standard 'int' field in Solr
1.4 is a "trie based" field, although the example "int" type
in the default solrconfig.xml has a "precision" set to 0,
which means it's not really doing "trie" things. If you set
the precision to something greater than 0, as in the default
example "tint" type, then it's really using 'trie'
functionality.  'trie' functionality speeds up range
queries by putting each value into 'buckets' (my own term),
per the precision specified, so solr has to do less to grab
all values within a certain range.

That's all tint/non-zero-precision-trie does, speed up
range queries. Your use case involves range queries though,
so it's worth investigating.  If you use a string or
other textual type for sorting or range queries, you need to
make sure your values sort the way you want them to as
strings. But yyyy-mm-dd will.

More on trie: 
http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/

I think there probably won't be much of a difference at
query time between non-trie int and string, although I'm not
sure, and it may depend on the nature of your data and
queries.   Using a trie int will be faster
for (and only for) range queries, if you have a lot of data.
(There are some cases, depending on the data and the nature
of your queries, where the overhead of a non-zero-precision
trie may outweigh the hypothetical gain, but generally it's
faster).
I don't think there should be any appreciable difference
between how long a non-trie int or a string will take to
index -- at least as far as solr is concerned, if your app
preparing the documents for solr takes longer to prepare one
than another, that's another story. An actual trie
(non-zero-precision) theoretically has indexing-time
overhead, but I doubt it would be noticeable, unless you
have a really really lean mean indexing setup where ever
microsecond counts.

Jonathan

Dennis Gearon wrote:
I'm doing something similar for
dates/times/timestamps.
I'm actually trying to do, "'now' is within the range
of what appointments(date/time from and to combos, i.e.
timestamps).
Fairly simple search of:

   What items have a start time BEFORE now,
and an end time AFTER now?
My thoughts were to store:
  unix time stamp BIGINTS (64 bit)
  "ISO_DATE ISO_TIME" strings

Which is going to be faster:
   1/ Indexing?
   2/ Searching?

How does the 'tint' field mentioned below apply?



Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/8/10, Jonathan Rochkind <rochk...@jhu.edu>
wrote:
From: Jonathan Rochkind <rochk...@jhu.edu>
Subject: Re: How to import data with a different
date format
To: "solr-user@lucene.apache.org"
<solr-user@lucene.apache.org>
Date: Wednesday, September 8, 2010, 10:27 AM
Just throwing it out there, I'd
consider a different approach for an actual real
app,
although it might not be easier to get up quickly.
(For
quickly, yeah, I'd just store it as a string, more
on that
at bottom).

If none of your dates have times, they're all just
full
days, I'm not sure you really need the date type
at all.
Convert the date to number-of-days since epoch
integer.  (Most languages will have a way to
do this,
but I don't know about pure XSLT).  Store
_that_ in a
1.4 'int' field.  On top of that, make it a
"tint"
(precision non-zero) for faster range queries.

But now your actual interface will have to convert
from
"number of days since epoch" to a displayable
date. (And if
you allow user input, convert the input to
number-of-days-since-epoch before making a range
query or
fq, but you'd have to do that anyway even with
solr dates,
users aren't going to be entering W3CDate raw, I
don't
think).

That is probably the most efficient way to have
solr handle
it -- using an actual date field type gives you a
lot more
precision than you need, which is going to hurt
performance
on range queries. Which you can compensate for
with trie
date sure, but if you don't really need that
precision to
begin with, why use it?  Also the extra
precision can
end up doing unexpected things and making it
easier to have
bugs (range queries on that high precision stuff,
you need
to make sure your start date has 00:00:00 set and
your end
date has 23:59:59 set, to do what you probably
expect). If
you aren't going to use the extra precision,
makes
everything a lot simpler to not use a date field.

Alternately, for your "get this done quick"
method, yeah,
I'd just store it as a string. With a string
exactly as
you've specified, sorting and range queries won't
work how
you'd want.  But if you can make it a string
of the
format "yyyy/mm/dd" instead (always two-digit
month and
year), then you can even sort and do range queries
on your
string dates. For the quick and dirty prototype,
I'd just do
that.  In fact, while this might make range
queries and
sorting _slightly_ slower than if you use an int
or a tint,
this might really be good enough even for a real
app (hey,
it's what lots of people did before the trie-based
fields
existed).

Jonathan

Erick Erickson wrote:

I think Markus is spot-on given the fact that
you have
2 days. Using a

string field is quickest.

However, if you absolutely MUST have
functioning
dates, there are three

options I can think of:
1> can you make your XSLT transform the
dates?
Confession; I'm XSLT-ignorant

2> use DIH and DateTransformer, see:
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer
       you can walk a

directory importing all the XML files with

FileDataSource.
<http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer>3>

you

could write a program to do this manually.

But given the time constraints, I suspect your
time
would be better spent

doing the other stuff and just using string as
per
Markus. I have no clue

how SOLR-savvy you are, so pardon if this is
something
you already know. But

lots of people trip up over the "string" field
type,
which is NOT tokenized.

You usually want "text" unless it's some sort
of
ID.... So it might be worth

it to do some searching earlier rather than
later
<G>....

Best
Erick

On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma
<markus.jel...@buyways.nl>wrote:
No. The Datefield [1] will not accept it
any other
way. You could, however,

fool your boss and dump your dates in an
ordinary
string field. But then you

cannot use some of the nice date
features.

[1]:
http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html

-----Original message-----
From: Rico Lelina <rlel...@yahoo.com>
Sent: Wed 08-09-2010 17:36
To: solr-user@lucene.apache.org;
Subject: How to import data with a
different date
format

Hi,

I am attempting to import some of our data
into
SOLR. I did it the quickest

way
I know because I literally only have 2
days to
import the data and do some

queries for a proof-of-concept.

So I have this data in XML format and I
wrote a
short XSLT script to

convert it
to the format in solr/example/exampledocs
(except
I retained the element

names
so I had to modify schema.xml in the conf

directory. So far so good -- the

import works and I can search the data.
One of my
immediate problems is

that
there is a date field with the format
MM/DD/YYYY.
Looking at schema.xml, it

seems SOLR accepts only full date fields
--
everything seems to be

mandatory
including the Z for Zulu/UTC time
according to the
doc. Is there a way to

specify the date format?

Thanks very much.
Rico




Reply via email to