I already have the issue of how to store between different databases, languages, platforms, and frameworks.
Settling on LONGINT/unix timestamp solves the problem on all fronts. I may even send them to the browser and have the JScript convert them to date/times (maybe ;-) So, it's *nix timestamp or bust! Dennis Gearon Signature Warning ---------------- EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/8/10, Jonathan Rochkind <rochk...@jhu.edu> wrote: > From: Jonathan Rochkind <rochk...@jhu.edu> > Subject: Re: How to import data with a different date format > To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> > Date: Wednesday, September 8, 2010, 3:07 PM > Solr 1.4 was the first tagged release > with trie fields. > > And Solr 1.4+ also includes a 'date' field based on 'trie' > just for > dates. If your dates are actually going to include > hour/minute/second, > not just calendar day-of-month, then I'd definitely use the > built in > solr trie date field, that's what it's for, will do the > translation from > calendar date-time to integer for you (in both directions), > and add trie > buckets for fast range querying too. > > I was suggesting that just using 'int' might be simpler if > you don't > need hour/minute/second precision, but are just storing > year-month-day. > If you've got hour-minute-second too, no reason not to use > Solr's date > type, and lots of reasons to do so. > > Jonathan > > Dennis Gearon wrote: > > So now, vs when 'trie' came out, Solr has an INT field > that IS 'trie', right? > > > > And nothing date/timestamp related has come out since, > making 'trie'/INT the field of choice for timestamps, > right? > > > > Seems like the fastest choice. > > > > I will have to read up on it. > > > > Seems like my original choice to use unix timestamp as > storage in my SQL database, vs native Postgres timestamp, > will make everything easier between: > > PHP > > Symfony > > Postgres > > Solr > > > > It's probably going to be a good idea to store two > other columns in the search index for display, 'date', > 'time'. That is, unless I force the user's javascript to > generate the time and date from the unix timestamp. > hmmmmmm. > > > > Dennis Gearon > > > > Signature Warning > > ---------------- > > EARTH has a Right To Life, > > otherwise we all die. > > > > Read 'Hot, Flat, and Crowded' > > Laugh at http://www.yert.com/film.php > > > > > > --- On Wed, 9/8/10, Jonathan Rochkind <rochk...@jhu.edu> > wrote: > > > > > >> From: Jonathan Rochkind <rochk...@jhu.edu> > >> Subject: Re: How to import data with a different > date format > >> To: "solr-user@lucene.apache.org" > <solr-user@lucene.apache.org> > >> Date: Wednesday, September 8, 2010, 11:35 AM > >> So the standard 'int' field in Solr > >> 1.4 is a "trie based" field, although the example > "int" type > >> in the default solrconfig.xml has a "precision" > set to 0, > >> which means it's not really doing "trie" things. > If you set > >> the precision to something greater than 0, as in > the default > >> example "tint" type, then it's really using > 'trie' > >> functionality. 'trie' functionality speeds > up range > >> queries by putting each value into 'buckets' (my > own term), > >> per the precision specified, so solr has to do > less to grab > >> all values within a certain range. > >> > >> That's all tint/non-zero-precision-trie does, > speed up > >> range queries. Your use case involves range > queries though, > >> so it's worth investigating. If you use a > string or > >> other textual type for sorting or range queries, > you need to > >> make sure your values sort the way you want them > to as > >> strings. But yyyy-mm-dd will. > >> > >> More on trie: > >> http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/ > >> > >> I think there probably won't be much of a > difference at > >> query time between non-trie int and string, > although I'm not > >> sure, and it may depend on the nature of your data > and > >> queries. Using a trie int will be > faster > >> for (and only for) range queries, if you have a > lot of data. > >> (There are some cases, depending on the data and > the nature > >> of your queries, where the overhead of a > non-zero-precision > >> trie may outweigh the hypothetical gain, but > generally it's > >> faster). > >> I don't think there should be any appreciable > difference > >> between how long a non-trie int or a string will > take to > >> index -- at least as far as solr is concerned, if > your app > >> preparing the documents for solr takes longer to > prepare one > >> than another, that's another story. An actual > trie > >> (non-zero-precision) theoretically has > indexing-time > >> overhead, but I doubt it would be noticeable, > unless you > >> have a really really lean mean indexing setup > where ever > >> microsecond counts. > >> > >> Jonathan > >> > >> Dennis Gearon wrote: > >> > >>> I'm doing something similar for > >>> > >> dates/times/timestamps. > >> > >>> I'm actually trying to do, "'now' is within > the range > >>> > >> of what appointments(date/time from and to combos, > i.e. > >> timestamps). > >> > >>> Fairly simple search of: > >>> > >>> What items have a start time > BEFORE now, > >>> > >> and an end time AFTER now? > >> > >>> My thoughts were to store: > >>> unix time stamp BIGINTS (64 > bit) > >>> "ISO_DATE ISO_TIME" strings > >>> > >>> Which is going to be faster: > >>> 1/ Indexing? > >>> 2/ Searching? > >>> > >>> How does the 'tint' field mentioned below > apply? > >>> > >>> > >>> > >>> Dennis Gearon > >>> > >>> Signature Warning > >>> ---------------- > >>> EARTH has a Right To Life, > >>> otherwise we all die. > >>> > >>> Read 'Hot, Flat, and Crowded' > >>> Laugh at http://www.yert.com/film.php > >>> > >>> > >>> --- On Wed, 9/8/10, Jonathan Rochkind <rochk...@jhu.edu> > >>> > >> wrote: > >> > >>> > >>>> From: Jonathan Rochkind <rochk...@jhu.edu> > >>>> Subject: Re: How to import data with a > different > >>>> > >> date format > >> > >>>> To: "solr-user@lucene.apache.org" > >>>> > >> <solr-user@lucene.apache.org> > >> > >>>> Date: Wednesday, September 8, 2010, 10:27 > AM > >>>> Just throwing it out there, I'd > >>>> consider a different approach for an > actual real > >>>> > >> app, > >> > >>>> although it might not be easier to get up > quickly. > >>>> > >> (For > >> > >>>> quickly, yeah, I'd just store it as a > string, more > >>>> > >> on that > >> > >>>> at bottom). > >>>> > >>>> If none of your dates have times, they're > all just > >>>> > >> full > >> > >>>> days, I'm not sure you really need the > date type > >>>> > >> at all. > >> > >>>> Convert the date to number-of-days since > epoch > >>>> integer. (Most languages will have a > way to > >>>> > >> do this, > >> > >>>> but I don't know about pure XSLT). > Store > >>>> > >> _that_ in a > >> > >>>> 1.4 'int' field. On top of that, > make it a > >>>> > >> "tint" > >> > >>>> (precision non-zero) for faster range > queries. > >>>> > >>>> But now your actual interface will have to > convert > >>>> > >> from > >> > >>>> "number of days since epoch" to a > displayable > >>>> > >> date. (And if > >> > >>>> you allow user input, convert the input > to > >>>> number-of-days-since-epoch before making a > range > >>>> > >> query or > >> > >>>> fq, but you'd have to do that anyway even > with > >>>> > >> solr dates, > >> > >>>> users aren't going to be entering W3CDate > raw, I > >>>> > >> don't > >> > >>>> think). > >>>> > >>>> That is probably the most efficient way to > have > >>>> > >> solr handle > >> > >>>> it -- using an actual date field type > gives you a > >>>> > >> lot more > >> > >>>> precision than you need, which is going to > hurt > >>>> > >> performance > >> > >>>> on range queries. Which you can compensate > for > >>>> > >> with trie > >> > >>>> date sure, but if you don't really need > that > >>>> > >> precision to > >> > >>>> begin with, why use it? Also the > extra > >>>> > >> precision can > >> > >>>> end up doing unexpected things and making > it > >>>> > >> easier to have > >> > >>>> bugs (range queries on that high precision > stuff, > >>>> > >> you need > >> > >>>> to make sure your start date has 00:00:00 > set and > >>>> > >> your end > >> > >>>> date has 23:59:59 set, to do what you > probably > >>>> > >> expect). If > >> > >>>> you aren't going to use the extra > precision, > >>>> > >> makes > >> > >>>> everything a lot simpler to not use a date > field. > >>>> > >>>> Alternately, for your "get this done > quick" > >>>> > >> method, yeah, > >> > >>>> I'd just store it as a string. With a > string > >>>> > >> exactly as > >> > >>>> you've specified, sorting and range > queries won't > >>>> > >> work how > >> > >>>> you'd want. But if you can make it a > string > >>>> > >> of the > >> > >>>> format "yyyy/mm/dd" instead (always > two-digit > >>>> > >> month and > >> > >>>> year), then you can even sort and do range > queries > >>>> > >> on your > >> > >>>> string dates. For the quick and dirty > prototype, > >>>> > >> I'd just do > >> > >>>> that. In fact, while this might make > range > >>>> > >> queries and > >> > >>>> sorting _slightly_ slower than if you use > an int > >>>> > >> or a tint, > >> > >>>> this might really be good enough even for > a real > >>>> > >> app (hey, > >> > >>>> it's what lots of people did before the > trie-based > >>>> > >> fields > >> > >>>> existed). > >>>> > >>>> Jonathan > >>>> > >>>> Erick Erickson wrote: > >>>> > >>>> > >>>>> I think Markus is spot-on given the > fact that > >>>>> > > >> you have > >> > >>>> 2 days. Using a > >>>> > >>>> > >>>>> string field is quickest. > >>>>> > >>>>> However, if you absolutely MUST have > >>>>> > > >> functioning > >> > >>>> dates, there are three > >>>> > >>>> > >>>>> options I can think of: > >>>>> 1> can you make your XSLT transform > the > >>>>> > > >> dates? > >> > >>>> Confession; I'm XSLT-ignorant > >>>> > >>>> > >>>>> 2> use DIH and DateTransformer, > see: > >>>>> http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer > >>>>> you can > walk a > >>>>> > >>>>> > > >>>> directory importing all the XML files > with > >>>> > >>>> > >>>>> FileDataSource. > >>>>> <http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer>3> > >>>>> > >>>>> > > >>>> you > >>>> > >>>> > >>>>> could write a program to do this > manually. > >>>>> > >>>>> But given the time constraints, I > suspect your > >>>>> > > >> time > >> > >>>> would be better spent > >>>> > >>>> > >>>>> doing the other stuff and just using > string as > >>>>> > > >> per > >> > >>>> Markus. I have no clue > >>>> > >>>> > >>>>> how SOLR-savvy you are, so pardon if > this is > >>>>> > > >> something > >> > >>>> you already know. But > >>>> > >>>> > >>>>> lots of people trip up over the > "string" field > >>>>> > > >> type, > >> > >>>> which is NOT tokenized. > >>>> > >>>> > >>>>> You usually want "text" unless it's > some sort > >>>>> > > >> of > >> > >>>> ID.... So it might be worth > >>>> > >>>> > >>>>> it to do some searching earlier rather > than > >>>>> > > >> later > >> > >>>> <G>.... > >>>> > >>>> > >>>>> Best > >>>>> Erick > >>>>> > >>>>> On Wed, Sep 8, 2010 at 12:34 PM, > Markus Jelsma > >>>>> > > >> <markus.jel...@buyways.nl>wrote: > >> > >>>>> > > >>>>>> No. The Datefield [1] will not > accept it > >>>>>> > > >> any other > >> > >>>> way. You could, however, > >>>> > >>>> > >>>>>> fool your boss and dump your dates > in an > >>>>>> > > >> ordinary > >> > >>>> string field. But then you > >>>> > >>>> > >>>>>> cannot use some of the nice date > >>>>>> > > >> features. > >> > >>>>>> > >>>>>> [1]: > >>>>>> http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html > >>>>>> > >>>>>> -----Original message----- > >>>>>> From: Rico Lelina <rlel...@yahoo.com> > >>>>>> Sent: Wed 08-09-2010 17:36 > >>>>>> To: solr-user@lucene.apache.org; > >>>>>> Subject: How to import data with > a > >>>>>> > > >> different date > >> > >>>> format > >>>> > >>>> > >>>>>> Hi, > >>>>>> > >>>>>> I am attempting to import some of > our data > >>>>>> > > >> into > >> > >>>> SOLR. I did it the quickest > >>>> > >>>> > >>>>>> way > >>>>>> I know because I literally only > have 2 > >>>>>> > > >> days to > >> > >>>> import the data and do some > >>>> > >>>> > >>>>>> queries for a proof-of-concept. > >>>>>> > >>>>>> So I have this data in XML format > and I > >>>>>> > > >> wrote a > >> > >>>> short XSLT script to > >>>> > >>>> > >>>>>> convert it > >>>>>> to the format in > solr/example/exampledocs > >>>>>> > > >> (except > >> > >>>> I retained the element > >>>> > >>>> > >>>>>> names > >>>>>> so I had to modify schema.xml in > the conf > >>>>>> > >>>>>> > > >>>> directory. So far so good -- the > >>>> > >>>> > >>>>>> import works and I can search the > data. > >>>>>> > > >> One of my > >> > >>>> immediate problems is > >>>> > >>>> > >>>>>> that > >>>>>> there is a date field with the > format > >>>>>> > > >> MM/DD/YYYY. > >> > >>>> Looking at schema.xml, it > >>>> > >>>> > >>>>>> seems SOLR accepts only full date > fields > >>>>>> > > >> -- > >> > >>>> everything seems to be > >>>> > >>>> > >>>>>> mandatory > >>>>>> including the Z for Zulu/UTC time > >>>>>> > > >> according to the > >> > >>>> doc. Is there a way to > >>>> > >>>> > >>>>>> specify the date format? > >>>>>> > >>>>>> Thanks very much. > >>>>>> Rico > >>>>>> > >>>>>> > >>>>>> > >>>>>> > > >>> > > > > >