I already have the issue of how to store between different databases, 
languages, platforms, and frameworks.

Settling on LONGINT/unix timestamp solves the problem on all fronts.

I may even send them to the browser and have the JScript convert them to 
date/times (maybe ;-)

So, it's *nix timestamp or bust!

Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/8/10, Jonathan Rochkind <rochk...@jhu.edu> wrote:

> From: Jonathan Rochkind <rochk...@jhu.edu>
> Subject: Re: How to import data with a different date format
> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
> Date: Wednesday, September 8, 2010, 3:07 PM
> Solr 1.4 was the first tagged release
> with trie fields.
> 
> And Solr 1.4+ also includes a 'date' field based on 'trie'
> just for 
> dates.  If your dates are actually going to include
> hour/minute/second, 
> not just calendar day-of-month, then I'd definitely use the
> built in 
> solr trie date field, that's what it's for, will do the
> translation from 
> calendar date-time to integer for you (in both directions),
> and add trie 
> buckets for fast range querying too.
> 
> I was suggesting that just using 'int' might be simpler if
> you don't 
> need hour/minute/second precision, but are just storing
> year-month-day. 
> If you've got hour-minute-second too, no reason not to use
> Solr's date 
> type, and lots of reasons to do so.
> 
> Jonathan
> 
> Dennis Gearon wrote:
> > So now, vs when 'trie' came out, Solr has an INT field
> that IS 'trie', right?
> >
> > And nothing date/timestamp related has come out since,
> making 'trie'/INT the field of choice for timestamps,
> right?
> >
> > Seems like the fastest choice.
> >
> > I will have to read up on it.
> >
> > Seems like my original choice to use unix timestamp as
> storage in my SQL database, vs native Postgres timestamp,
> will make everything easier between:
> >   PHP
> >   Symfony
> >   Postgres
> >   Solr
> >
> > It's probably going to be a good idea to store two
> other columns in the search index for display, 'date',
> 'time'. That is, unless I force the user's javascript to
> generate the time and date from the unix timestamp.
> hmmmmmm.
> >
> > Dennis Gearon
> >
> > Signature Warning
> > ----------------
> > EARTH has a Right To Life,
> >   otherwise we all die.
> >
> > Read 'Hot, Flat, and Crowded'
> > Laugh at http://www.yert.com/film.php
> >
> >
> > --- On Wed, 9/8/10, Jonathan Rochkind <rochk...@jhu.edu>
> wrote:
> >
> >   
> >> From: Jonathan Rochkind <rochk...@jhu.edu>
> >> Subject: Re: How to import data with a different
> date format
> >> To: "solr-user@lucene.apache.org"
> <solr-user@lucene.apache.org>
> >> Date: Wednesday, September 8, 2010, 11:35 AM
> >> So the standard 'int' field in Solr
> >> 1.4 is a "trie based" field, although the example
> "int" type
> >> in the default solrconfig.xml has a "precision"
> set to 0,
> >> which means it's not really doing "trie" things.
> If you set
> >> the precision to something greater than 0, as in
> the default
> >> example "tint" type, then it's really using
> 'trie'
> >> functionality.  'trie' functionality speeds
> up range
> >> queries by putting each value into 'buckets' (my
> own term),
> >> per the precision specified, so solr has to do
> less to grab
> >> all values within a certain range.
> >>
> >> That's all tint/non-zero-precision-trie does,
> speed up
> >> range queries. Your use case involves range
> queries though,
> >> so it's worth investigating.  If you use a
> string or
> >> other textual type for sorting or range queries,
> you need to
> >> make sure your values sort the way you want them
> to as
> >> strings. But yyyy-mm-dd will.
> >>
> >> More on trie: 
> >> http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/
> >>
> >> I think there probably won't be much of a
> difference at
> >> query time between non-trie int and string,
> although I'm not
> >> sure, and it may depend on the nature of your data
> and
> >> queries.   Using a trie int will be
> faster
> >> for (and only for) range queries, if you have a
> lot of data.
> >> (There are some cases, depending on the data and
> the nature
> >> of your queries, where the overhead of a
> non-zero-precision
> >> trie may outweigh the hypothetical gain, but
> generally it's
> >> faster).
> >> I don't think there should be any appreciable
> difference
> >> between how long a non-trie int or a string will
> take to
> >> index -- at least as far as solr is concerned, if
> your app
> >> preparing the documents for solr takes longer to
> prepare one
> >> than another, that's another story. An actual
> trie
> >> (non-zero-precision) theoretically has
> indexing-time
> >> overhead, but I doubt it would be noticeable,
> unless you
> >> have a really really lean mean indexing setup
> where ever
> >> microsecond counts.
> >>
> >> Jonathan
> >>
> >> Dennis Gearon wrote:
> >>     
> >>> I'm doing something similar for
> >>>       
> >> dates/times/timestamps.
> >>     
> >>> I'm actually trying to do, "'now' is within
> the range
> >>>       
> >> of what appointments(date/time from and to combos,
> i.e.
> >> timestamps).
> >>     
> >>> Fairly simple search of:
> >>>
> >>>    What items have a start time
> BEFORE now,
> >>>       
> >> and an end time AFTER now?
> >>     
> >>> My thoughts were to store:
> >>>   unix time stamp BIGINTS (64
> bit)
> >>>   "ISO_DATE ISO_TIME" strings
> >>>
> >>> Which is going to be faster:
> >>>    1/ Indexing?
> >>>    2/ Searching?
> >>>
> >>> How does the 'tint' field mentioned below
> apply?
> >>>
> >>>
> >>>
> >>> Dennis Gearon
> >>>
> >>> Signature Warning
> >>> ----------------
> >>> EARTH has a Right To Life,
> >>>   otherwise we all die.
> >>>
> >>> Read 'Hot, Flat, and Crowded'
> >>> Laugh at http://www.yert.com/film.php
> >>>
> >>>
> >>> --- On Wed, 9/8/10, Jonathan Rochkind <rochk...@jhu.edu>
> >>>       
> >> wrote:
> >>     
> >>>       
> >>>> From: Jonathan Rochkind <rochk...@jhu.edu>
> >>>> Subject: Re: How to import data with a
> different
> >>>>         
> >> date format
> >>     
> >>>> To: "solr-user@lucene.apache.org"
> >>>>         
> >> <solr-user@lucene.apache.org>
> >>     
> >>>> Date: Wednesday, September 8, 2010, 10:27
> AM
> >>>> Just throwing it out there, I'd
> >>>> consider a different approach for an
> actual real
> >>>>         
> >> app,
> >>     
> >>>> although it might not be easier to get up
> quickly.
> >>>>         
> >> (For
> >>     
> >>>> quickly, yeah, I'd just store it as a
> string, more
> >>>>         
> >> on that
> >>     
> >>>> at bottom).
> >>>>
> >>>> If none of your dates have times, they're
> all just
> >>>>         
> >> full
> >>     
> >>>> days, I'm not sure you really need the
> date type
> >>>>         
> >> at all.
> >>     
> >>>> Convert the date to number-of-days since
> epoch
> >>>> integer.  (Most languages will have a
> way to
> >>>>         
> >> do this,
> >>     
> >>>> but I don't know about pure XSLT). 
> Store
> >>>>         
> >> _that_ in a
> >>     
> >>>> 1.4 'int' field.  On top of that,
> make it a
> >>>>         
> >> "tint"
> >>     
> >>>> (precision non-zero) for faster range
> queries.
> >>>>
> >>>> But now your actual interface will have to
> convert
> >>>>         
> >> from
> >>     
> >>>> "number of days since epoch" to a
> displayable
> >>>>         
> >> date. (And if
> >>     
> >>>> you allow user input, convert the input
> to
> >>>> number-of-days-since-epoch before making a
> range
> >>>>         
> >> query or
> >>     
> >>>> fq, but you'd have to do that anyway even
> with
> >>>>         
> >> solr dates,
> >>     
> >>>> users aren't going to be entering W3CDate
> raw, I
> >>>>         
> >> don't
> >>     
> >>>> think).
> >>>>
> >>>> That is probably the most efficient way to
> have
> >>>>         
> >> solr handle
> >>     
> >>>> it -- using an actual date field type
> gives you a
> >>>>         
> >> lot more
> >>     
> >>>> precision than you need, which is going to
> hurt
> >>>>         
> >> performance
> >>     
> >>>> on range queries. Which you can compensate
> for
> >>>>         
> >> with trie
> >>     
> >>>> date sure, but if you don't really need
> that
> >>>>         
> >> precision to
> >>     
> >>>> begin with, why use it?  Also the
> extra
> >>>>         
> >> precision can
> >>     
> >>>> end up doing unexpected things and making
> it
> >>>>         
> >> easier to have
> >>     
> >>>> bugs (range queries on that high precision
> stuff,
> >>>>         
> >> you need
> >>     
> >>>> to make sure your start date has 00:00:00
> set and
> >>>>         
> >> your end
> >>     
> >>>> date has 23:59:59 set, to do what you
> probably
> >>>>         
> >> expect). If
> >>     
> >>>> you aren't going to use the extra
> precision,
> >>>>         
> >> makes
> >>     
> >>>> everything a lot simpler to not use a date
> field.
> >>>>
> >>>> Alternately, for your "get this done
> quick"
> >>>>         
> >> method, yeah,
> >>     
> >>>> I'd just store it as a string. With a
> string
> >>>>         
> >> exactly as
> >>     
> >>>> you've specified, sorting and range
> queries won't
> >>>>         
> >> work how
> >>     
> >>>> you'd want.  But if you can make it a
> string
> >>>>         
> >> of the
> >>     
> >>>> format "yyyy/mm/dd" instead (always
> two-digit
> >>>>         
> >> month and
> >>     
> >>>> year), then you can even sort and do range
> queries
> >>>>         
> >> on your
> >>     
> >>>> string dates. For the quick and dirty
> prototype,
> >>>>         
> >> I'd just do
> >>     
> >>>> that.  In fact, while this might make
> range
> >>>>         
> >> queries and
> >>     
> >>>> sorting _slightly_ slower than if you use
> an int
> >>>>         
> >> or a tint,
> >>     
> >>>> this might really be good enough even for
> a real
> >>>>         
> >> app (hey,
> >>     
> >>>> it's what lots of people did before the
> trie-based
> >>>>         
> >> fields
> >>     
> >>>> existed).
> >>>>
> >>>> Jonathan
> >>>>
> >>>> Erick Erickson wrote:
> >>>>
> >>>>         
> >>>>> I think Markus is spot-on given the
> fact that
> >>>>>       
>    
> >> you have
> >>     
> >>>> 2 days. Using a
> >>>>
> >>>>         
> >>>>> string field is quickest.
> >>>>>
> >>>>> However, if you absolutely MUST have
> >>>>>       
>    
> >> functioning
> >>     
> >>>> dates, there are three
> >>>>
> >>>>         
> >>>>> options I can think of:
> >>>>> 1> can you make your XSLT transform
> the
> >>>>>       
>    
> >> dates?
> >>     
> >>>> Confession; I'm XSLT-ignorant
> >>>>
> >>>>         
> >>>>> 2> use DIH and DateTransformer,
> see:
> >>>>> http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer
> >>>>>        you can
> walk a
> >>>>>
> >>>>>       
>    
> >>>> directory importing all the XML files
> with
> >>>>
> >>>>         
> >>>>> FileDataSource.
> >>>>> <http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer>3>
> >>>>>
> >>>>>       
>    
> >>>> you
> >>>>
> >>>>         
> >>>>> could write a program to do this
> manually.
> >>>>>
> >>>>> But given the time constraints, I
> suspect your
> >>>>>       
>    
> >> time
> >>     
> >>>> would be better spent
> >>>>
> >>>>         
> >>>>> doing the other stuff and just using
> string as
> >>>>>       
>    
> >> per
> >>     
> >>>> Markus. I have no clue
> >>>>
> >>>>         
> >>>>> how SOLR-savvy you are, so pardon if
> this is
> >>>>>       
>    
> >> something
> >>     
> >>>> you already know. But
> >>>>
> >>>>         
> >>>>> lots of people trip up over the
> "string" field
> >>>>>       
>    
> >> type,
> >>     
> >>>> which is NOT tokenized.
> >>>>
> >>>>         
> >>>>> You usually want "text" unless it's
> some sort
> >>>>>       
>    
> >> of
> >>     
> >>>> ID.... So it might be worth
> >>>>
> >>>>         
> >>>>> it to do some searching earlier rather
> than
> >>>>>       
>    
> >> later
> >>     
> >>>> <G>....
> >>>>
> >>>>         
> >>>>> Best
> >>>>> Erick
> >>>>>
> >>>>> On Wed, Sep 8, 2010 at 12:34 PM,
> Markus Jelsma
> >>>>>       
>    
> >> <markus.jel...@buyways.nl>wrote:
> >>     
> >>>>>       
>    
> >>>>>> No. The Datefield [1] will not
> accept it
> >>>>>>         
>    
> >> any other
> >>     
> >>>> way. You could, however,
> >>>>
> >>>>         
> >>>>>> fool your boss and dump your dates
> in an
> >>>>>>         
>    
> >> ordinary
> >>     
> >>>> string field. But then you
> >>>>
> >>>>         
> >>>>>> cannot use some of the nice date
> >>>>>>         
>    
> >> features.
> >>     
> >>>>>>
> >>>>>> [1]:
> >>>>>> http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html
> >>>>>>
> >>>>>> -----Original message-----
> >>>>>> From: Rico Lelina <rlel...@yahoo.com>
> >>>>>> Sent: Wed 08-09-2010 17:36
> >>>>>> To: solr-user@lucene.apache.org;
> >>>>>> Subject: How to import data with
> a
> >>>>>>         
>    
> >> different date
> >>     
> >>>> format
> >>>>
> >>>>         
> >>>>>> Hi,
> >>>>>>
> >>>>>> I am attempting to import some of
> our data
> >>>>>>         
>    
> >> into
> >>     
> >>>> SOLR. I did it the quickest
> >>>>
> >>>>         
> >>>>>> way
> >>>>>> I know because I literally only
> have 2
> >>>>>>         
>    
> >> days to
> >>     
> >>>> import the data and do some
> >>>>
> >>>>         
> >>>>>> queries for a proof-of-concept.
> >>>>>>
> >>>>>> So I have this data in XML format
> and I
> >>>>>>         
>    
> >> wrote a
> >>     
> >>>> short XSLT script to
> >>>>
> >>>>         
> >>>>>> convert it
> >>>>>> to the format in
> solr/example/exampledocs
> >>>>>>         
>    
> >> (except
> >>     
> >>>> I retained the element
> >>>>
> >>>>         
> >>>>>> names
> >>>>>> so I had to modify schema.xml in
> the conf
> >>>>>>
> >>>>>>         
>    
> >>>> directory. So far so good -- the
> >>>>
> >>>>         
> >>>>>> import works and I can search the
> data.
> >>>>>>         
>    
> >> One of my
> >>     
> >>>> immediate problems is
> >>>>
> >>>>         
> >>>>>> that
> >>>>>> there is a date field with the
> format
> >>>>>>         
>    
> >> MM/DD/YYYY.
> >>     
> >>>> Looking at schema.xml, it
> >>>>
> >>>>         
> >>>>>> seems SOLR accepts only full date
> fields
> >>>>>>         
>    
> >> --
> >>     
> >>>> everything seems to be
> >>>>
> >>>>         
> >>>>>> mandatory
> >>>>>> including the Z for Zulu/UTC time
> >>>>>>         
>    
> >> according to the
> >>     
> >>>> doc. Is there a way to
> >>>>
> >>>>         
> >>>>>> specify the date format?
> >>>>>>
> >>>>>> Thanks very much.
> >>>>>> Rico
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>         
>    
> >>>       
> >
> >   
>

Reply via email to