Just throwing it out there, I'd consider a different approach for an
actual real app, although it might not be easier to get up quickly. (For
quickly, yeah, I'd just store it as a string, more on that at bottom).
If none of your dates have times, they're all just full days, I'm not
sure you really need the date type at all.
Convert the date to number-of-days since epoch integer. (Most languages
will have a way to do this, but I don't know about pure XSLT). Store
_that_ in a 1.4 'int' field. On top of that, make it a "tint"
(precision non-zero) for faster range queries.
But now your actual interface will have to convert from "number of days
since epoch" to a displayable date. (And if you allow user input,
convert the input to number-of-days-since-epoch before making a range
query or fq, but you'd have to do that anyway even with solr dates,
users aren't going to be entering W3CDate raw, I don't think).
That is probably the most efficient way to have solr handle it -- using
an actual date field type gives you a lot more precision than you need,
which is going to hurt performance on range queries. Which you can
compensate for with trie date sure, but if you don't really need that
precision to begin with, why use it? Also the extra precision can end
up doing unexpected things and making it easier to have bugs (range
queries on that high precision stuff, you need to make sure your start
date has 00:00:00 set and your end date has 23:59:59 set, to do what you
probably expect). If you aren't going to use the extra precision, makes
everything a lot simpler to not use a date field.
Alternately, for your "get this done quick" method, yeah, I'd just store
it as a string. With a string exactly as you've specified, sorting and
range queries won't work how you'd want. But if you can make it a
string of the format "yyyy/mm/dd" instead (always two-digit month and
year), then you can even sort and do range queries on your string dates.
For the quick and dirty prototype, I'd just do that. In fact, while
this might make range queries and sorting _slightly_ slower than if you
use an int or a tint, this might really be good enough even for a real
app (hey, it's what lots of people did before the trie-based fields
existed).
Jonathan
Erick Erickson wrote:
I think Markus is spot-on given the fact that you have 2 days. Using a
string field is quickest.
However, if you absolutely MUST have functioning dates, there are three
options I can think of:
1> can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant
2> use DIH and DateTransformer, see:
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer
you can walk a directory importing all the XML files with
FileDataSource.
<http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer>3> you
could write a program to do this manually.
But given the time constraints, I suspect your time would be better spent
doing the other stuff and just using string as per Markus. I have no clue
how SOLR-savvy you are, so pardon if this is something you already know. But
lots of people trip up over the "string" field type, which is NOT tokenized.
You usually want "text" unless it's some sort of ID.... So it might be worth
it to do some searching earlier rather than later <G>....
Best
Erick
On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma <markus.jel...@buyways.nl>wrote:
No. The Datefield [1] will not accept it any other way. You could, however,
fool your boss and dump your dates in an ordinary string field. But then you
cannot use some of the nice date features.
[1]:
http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html
-----Original message-----
From: Rico Lelina <rlel...@yahoo.com>
Sent: Wed 08-09-2010 17:36
To: solr-user@lucene.apache.org;
Subject: How to import data with a different date format
Hi,
I am attempting to import some of our data into SOLR. I did it the quickest
way
I know because I literally only have 2 days to import the data and do some
queries for a proof-of-concept.
So I have this data in XML format and I wrote a short XSLT script to
convert it
to the format in solr/example/exampledocs (except I retained the element
names
so I had to modify schema.xml in the conf directory. So far so good -- the
import works and I can search the data. One of my immediate problems is
that
there is a date field with the format MM/DD/YYYY. Looking at schema.xml, it
seems SOLR accepts only full date fields -- everything seems to be
mandatory
including the Z for Zulu/UTC time according to the doc. Is there a way to
specify the date format?
Thanks very much.
Rico