Re: Dates, Times and Timezones

2008-02-04 Thread Chris Hostetter

:   One great problem we are having to integrate solr with plone is that
: plone can have dates and times in diferent timezones, and each user can query
: the data in its own timezone. So we would be really interested in being able
: to put date/time data on solr with a timezone and specifying the timezone of a
: query so we get perfect results. I saw somewhere that part of this suport is
: going to be in 1.3, is that right? And how is it going to work?

I'm not sure what part of this you are thinking of will be in 1.3 ... I 
don't know of any new Timezone related stuff in the trunk.

Solr specificly tries to be as agnostic about timezones as possible ... 
when interacting with Solr all dates should be in UTC.  If your 
application is getting/giving dates fromt/to users who have configured 
timezones prefrences then the parsing/formatting when interacting with the 
user should be aware of their prefered timezone -- but you should allways 
be in UTC when dealing with Solr.

The one place where it would *definitely* make sense to make Solr aware of 
timezones would be in dealing with DateMath -- when you round by DAY 
Solr currently does that in UTC, even though that's probably not what 
matters to you -- but ever other aspect of date processing Solr should 
work fine provided you transform your dates before putting them in the 
index.

There have been a some other discussions about adding options DateField to 
make it more flexible in parsing Dates in other formats (which might 
included timezone support) but: 1) i don't know of anyone who has actually 
started on this; 2) it would only affect the input to Solr ... values 
would still be indexed in UTC so they could be compared with any other 
date (regardless of it's format/timezone) and search clients would still 
need to know the current users prefered TZ to make sure to query/display 
those dates appropriately.



-Hoss



Dates, Times and Timezones

2008-02-01 Thread Leonardo Santagada
	One great problem we are having to integrate solr with plone is that  
plone can have dates and times in diferent timezones, and each user  
can query the data in its own timezone. So we would be really  
interested in being able to put date/time data on solr with a timezone  
and specifying the timezone of a query so we get perfect results. I  
saw somewhere that part of this suport is going to be in 1.3, is that  
right? And how is it going to work?


[]'s
--
Leonardo Santagada





dates times

2007-05-10 Thread Brian Whitman
After writing my 3rd parser in my third scripting language in so many  
months to go from unix timestamps to Solr Time (8601) I have to  
ask: shouldn't the date/time field type be more resilient? I assume  
there's a good reason that it's 8601 internally, but certainly it  
would be excellent for Solr to transcode different types into Solr Time.


My main problem (as a normal Solr end user) is that it's hard to do  
math directly on 8601 dates or really parse them without specific  
packages. My XSL 2.0 parsers don't even like it without some  
massaging (forget about XSL 1.0.) UNIX time (seconds since the epoch)  
is super easy, as are sortable delimitable strings like  
20070510125403.


I'm not advocating replacing 8601 as the known good Solr Time, just  
that some leeway be given in the parser to accept unix time or  
something else and the conversion to 8601 happens internally. And a  
further dream is to have a strftime formatter in solrconfig for the  
query response, so I can always have my date fields come back as May  
10th, 2007, 12:58pm.


-Brian









Re: dates times

2007-05-10 Thread Mike Klaas

On 5/10/07, Brian Whitman [EMAIL PROTECTED] wrote:

After writing my 3rd parser in my third scripting language in so many
months to go from unix timestamps to Solr Time (8601) I have to
ask: shouldn't the date/time field type be more resilient? I assume
there's a good reason that it's 8601 internally, but certainly it
would be excellent for Solr to transcode different types into Solr Time.

My main problem (as a normal Solr end user) is that it's hard to do
math directly on 8601 dates or really parse them without specific
packages. My XSL 2.0 parsers don't even like it without some
massaging (forget about XSL 1.0.) UNIX time (seconds since the epoch)
is super easy, as are sortable delimitable strings like
20070510125403.


I'm not sure what delimitable means, but Solr datetimes _are_
essentially sortable inverse-magnitude like the above, with a few
punctuation symbols thrown in.  I have no XSLT-fu, but is it not
possible to do regexp-replace s/[TZ:-]//g on the solrdate to get the
above?


I'm not advocating replacing 8601 as the known good Solr Time, just
that some leeway be given in the parser to accept unix time or
something else and the conversion to 8601 happens internally. And a
further dream is to have a strftime formatter in solrconfig for the
query response, so I can always have my date fields come back as May
10th, 2007, 12:58pm.


Those are interesting ideas and it probably would not be difficult to
create a patch if you were interested, but I'm curious:  What about
XSL makes what seems to me an elementary string-processing task so
difficult?

regards
-Mike


RE: dates times

2007-05-10 Thread Binkley, Peter
You can get at some of this functionality in the built-in xslt 1.0
engine (Xalan) by using the e-xslt date-time extensions: see
http://exslt.org/date/index.html, and for Xalan's implementation see
http://xml.apache.org/xalan-j/extensionslib.html#exslt . There are some
examples here:
http://www-128.ibm.com/developerworks/library/x-exslt.html . I haven't
tried this in Solr but I don't think there's any reason why it wouldn't
work; I've used it in other Xalan-J environments, notably Cocoon. 

Peter

-Original Message-
From: Brian Whitman [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 10, 2007 11:49 AM
To: solr-user@lucene.apache.org
Subject: Re: dates  times


 Those are interesting ideas and it probably would not be difficult to 
 create a patch if you were interested, but I'm curious:  What about 
 XSL makes what seems to me an elementary string-processing task so 
 difficult?


Well, XSL 1.0 (which is the one that comes for free with Solr/java)
doesn't handle dates and times at all. XSL 2.0 handles it well enough,
but it's only supported through a GPL jar, which we can't distribute.

It's more than string processing, anyway. I would want to convert the
Solr Time 2007-03-15T00:41:5:2Z to March 15th, 2007 in a web app.  
I'd also like to say 'Posted 3 days ago. In my vision of things, that
work is done on Solr's side. (The former case with a strftime type
formatter in solrconfig, the latter by having strftime return the day
number this year.)







Re: dates times

2007-05-10 Thread Brian Whitman

You can get at some of this functionality in the built-in xslt 1.0
engine (Xalan) by using the e-xslt date-time extensions: see
http://exslt.org/date/index.html, and for Xalan's implementation see
http://xml.apache.org/xalan-j/extensionslib.html#exslt .


The exslt stuff looks good, thanks! I'll have to try it out. That's  
only one direction though, still want parsing of unix timestamp-like  
formats into the indexer on doc adds and updates.


Just FYi the license for the exslt stuff seems OK w/ the APL: http:// 
lists.fourthought.com/pipermail/exslt-manage/2004-June/000603.html
So if it works out we might want to put the date/time xsl stuff in  
the solr distribution  in lieu of shipping with a XSL 2.0 processor.








Those are interesting ideas and it probably would not be difficult to
create a patch if you were interested, but I'm curious:  What about
XSL makes what seems to me an elementary string-processing task so
difficult?



Well, XSL 1.0 (which is the one that comes for free with Solr/java)
doesn't handle dates and times at all. XSL 2.0 handles it well enough,
but it's only supported through a GPL jar, which we can't distribute.

It's more than string processing, anyway. I would want to convert the
Solr Time 2007-03-15T00:41:5:2Z to March 15th, 2007 in a web app.
I'd also like to say 'Posted 3 days ago. In my vision of things, that
work is done on Solr's side. (The former case with a strftime type
formatter in solrconfig, the latter by having strftime return the day
number this year.)







--
http://variogr.am/
[EMAIL PROTECTED]





Re: dates times

2007-05-10 Thread Chris Hostetter

: It's more than string processing, anyway. I would want to convert the
: Solr Time 2007-03-15T00:41:5:2Z to March 15th, 2007 in a web app.
: I'd also like to say 'Posted 3 days ago. In my vision of things,
: that work is done on Solr's side. (The former case with a strftime
: type formatter in solrconfig, the latter by having strftime return
: the day number this year.)

One of the early architecture/design principles of the Solr search
APIs was compute secondary info about a result if it's more
efficient or easier to compute in Solr then it would be for a client to do
it -- DocSet caches, facet counts, and sorting/pagination being
great examples of things where Solr can do less work to get the same
info out of raw data then a client app would because of it's low level
access to the data, and becuase of how much data would need to go over the
wire for the client to do the same computation. ... that's largely just a
lit bit of historic trivial however, Solr has a lot of features now which
might not hold up to the yard stick, but i mention it only to clarify one
of hte reasons Solr didnt' have more 'configurable date formatting to
start with.

it has been on the TaskList since the start of incubation however...

  * a DateTime field (or Query Parser extension) that allows flexible
input for easier human entered queries
  * allow alternate format for date output to ease client creation of
date objects?

One of hte reasons i dont' think anyone has tackled them yet is because
it's hard to get a holistic view of a solution, because there are really
several loosely related problems with date formatting issues:

The first is a discusion of the internal format and what resolution the
dates are stored at in the index itself.  if you *know* that you never
plan on querying with anything more fine grained then day resolution,
storing your dates with only day resolution can make your index a lot
smaller (and make date searches a lot faster).  with the current DateField
the same performance benefits can be achieved by rounding your dates
before indexing them, but if we were to make it a config option on
DateField itself to automaticly round, we would need to take this info
into account when parsing updates -- should the client be exepcted to know
what precision each date field uses?  do they send dates expressed using
the internal format, or as fully qualified times?  is it an
error/warning to attempt to index more datetime precision then a field
supports?

The second is a discussion of external format (which seems to be what
you are mostly discussing)  the most trivial way to address this would be
options on the ResponseWriters that allow them to be configured with
DateFormater Strings they would use to process any date they return .. but
that raises questions about the QueryParsing aspect as well ... should
date formating be a property of the response, or a property of the
request, such that both input and output formats are identicle?

Third is how the discussions of the internal format and the external
format shouldn't be treated completely indepndent.  it's tempting to say
that there will be a clean abstraction between the two, that all client
interaction will be done using configured external formater(s) to create
internal java Date objects, which will then be translated back to Strings
by an internal formater for the purpose of indexing (and querying) but
what happens when a query expresses a date range too precise for the
granularity expressed by the internal format? do we match
nothing/everything? ... what if the indexed granularity is *more* recised
then the uery graunlarity .. how do we know if a range query between March
6, 2007 and May 10, 2007 on a field that stores millisencond granularity
is suppose to go from the first millisecond of each day or the last?



Questions like these are whiy I'm glad Solr currently keeps it simple and
makes people deal in absolutes .. less room for confusion  :)


-Hoss



Re: dates times

2007-05-10 Thread Brian Whitman

On May 10, 2007, at 2:30 PM, Chris Hostetter wrote:
Questions like these are whiy I'm glad Solr currently keeps it  
simple and

makes people deal in absolutes .. less room for confusion  :)


I get all that, thanks for the great explanation.

I imagine most of my problems can be solved with a custom field  
analyzer (converting other date strings to 8601 during indexing) and  
then XSL on the select?q= end (which we do anyway.) In other words,  
leaving core solr absolute with an option for different date  
analyzers. I see the need to not clutter it up, especially at this  
stage.


What would, say, a filter that converted unix timestamps to 8601  
before indexing as a solr.DateField look like? Is that a custom  
filter, or a tokenizer?







Re: dates times

2007-05-10 Thread Yonik Seeley

On 5/10/07, Brian Whitman [EMAIL PROTECTED] wrote:

On May 10, 2007, at 2:30 PM, Chris Hostetter wrote:
 Questions like these are whiy I'm glad Solr currently keeps it
 simple and
 makes people deal in absolutes .. less room for confusion  :)

I get all that, thanks for the great explanation.

I imagine most of my problems can be solved with a custom field
analyzer (converting other date strings to 8601 during indexing) and
then XSL on the select?q= end (which we do anyway.) In other words,
leaving core solr absolute with an option for different date
analyzers. I see the need to not clutter it up, especially at this
stage.

What would, say, a filter that converted unix timestamps to 8601
before indexing as a solr.DateField look like? Is that a custom
filter, or a tokenizer?


That would be a custom filter which is currently only supported by
text fields, so the XML output would be str instead of date (if
that matters to you).

One could also just store seconds or milliseconds in an int or long
field.  That's fine for small devel teams, but not ideal since it's a
bit less expressive.

The right approach for more flexible date parsing is probably to add
more functionality to the date field and configure via optional
attributes.

-Yonik


Re: dates times

2007-05-10 Thread Chris Hostetter

: The right approach for more flexible date parsing is probably to add
: more functionality to the date field and configure via optional
: attributes.

Adding configuration options to DateField seems like it might ultimately
be the right choice for changing the *internal* format, but assuming we
want to keep the internal representation of DateField fixed and
unconfigurable for the time being and address the the various *external*
formatting issues i imagine the simplest things to tackle this (in a way
that is consistent with the other datatypes) would be...

1) change DateField to support Analyzers.  that way you could have
seperate analyzers for indexing vs querying just like a text field (so you
could for example send Solr seconds since epoch when indexing, and
query query using MM/DD/)

The Analyzers used would be responsible for producing Tokens which match
what values the current DateField.toInternal() already consideres legal
(either a DateMath string or an iso8601 string).

(In general a DateTranslatingTokenFilter class would be a pretty cool
addition to Lucene, it could as constructor args two DateFormatters (one
for parsing the incoming tokens, and one for formating the outgoing
tokens) and a boolean indicating wether it's job was to replace matching
tokens or inject duplicate tokens in the same position ... maybe another
option indicating wether incoming Tokens that can't be parsed should be
striped or passed through ... the idea being that for something like
DateFiled you would use KeywordTokenizer along with an instance of this to
parse whatever format you wanted -- but when parsing generic text you
might have several of these TokenFilters configured with differnet
DateFormatters so if they see a Token in the text that matches a known
DateFormat they could inject the name of the month, or the day of hte week
into the text at the same position.)


2) add options to the various QueryResponseWriters to control which format
they use when writting fields out. .. in the case of XmlResposneWriter it
would still produce a date tag, but the value would be formated
according to the configuration.


-Hoss



Re: dates times

2007-05-10 Thread Ryan McKinley


(In general a DateTranslatingTokenFilter class would be a pretty cool
addition to Lucene, it could as constructor args two DateFormatters (one
for parsing the incoming tokens, and one for formating the outgoing


If this happens, it would be nice (perhaps overkill) to have a chronic 
input filter:


http://chronic.rubyforge.org/

the java port:
https://jchronic.dev.java.net/

---

brian, for a quick easy solution, if you find working with unix 
timestamps easier, perhaps just want to to put in dates as a 
SortableLongField and deal with the formatting that way.


RE: dates times

2007-05-10 Thread Binkley, Peter
Regarding Hoss's points about the internal format, resolution of
date-times, etc.: maybe a good starting point would be to implement the
date-time algorithms of XML Schema
(http://www.w3.org/TR/xmlschema-2/#isoformats), where these behaviors
are spelled out in reasonably precise terms. There must be code
somewhere that Solr could steal to help with this. This would mesh well
with XSLT 2.0, and presumably other modern XML environments.

peter

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 10, 2007 12:30 PM
To: solr-user@lucene.apache.org
Subject: Re: dates  times


: It's more than string processing, anyway. I would want to convert the
: Solr Time 2007-03-15T00:41:5:2Z to March 15th, 2007 in a web app.
: I'd also like to say 'Posted 3 days ago. In my vision of things,
: that work is done on Solr's side. (The former case with a strftime
: type formatter in solrconfig, the latter by having strftime return
: the day number this year.)

One of the early architecture/design principles of the Solr search
APIs was compute secondary info about a result if it's more efficient
or easier to compute in Solr then it would be for a client to do it --
DocSet caches, facet counts, and sorting/pagination being great examples
of things where Solr can do less work to get the same info out of raw
data then a client app would because of it's low level access to the
data, and becuase of how much data would need to go over the wire for
the client to do the same computation. ... that's largely just a lit bit
of historic trivial however, Solr has a lot of features now which might
not hold up to the yard stick, but i mention it only to clarify one of
hte reasons Solr didnt' have more 'configurable date formatting to
start with.

it has been on the TaskList since the start of incubation however...

  * a DateTime field (or Query Parser extension) that allows flexible
input for easier human entered queries
  * allow alternate format for date output to ease client creation of
date objects?

One of hte reasons i dont' think anyone has tackled them yet is because
it's hard to get a holistic view of a solution, because there are really
several loosely related problems with date formatting issues:

The first is a discusion of the internal format and what resolution
the dates are stored at in the index itself.  if you *know* that you
never plan on querying with anything more fine grained then day
resolution, storing your dates with only day resolution can make your
index a lot smaller (and make date searches a lot faster).  with the
current DateField the same performance benefits can be achieved by
rounding your dates before indexing them, but if we were to make it a
config option on DateField itself to automaticly round, we would need to
take this info into account when parsing updates -- should the client be
exepcted to know what precision each date field uses?  do they send
dates expressed using the internal format, or as fully qualified
times?  is it an error/warning to attempt to index more datetime
precision then a field supports?

The second is a discussion of external format (which seems to be what
you are mostly discussing)  the most trivial way to address this would
be options on the ResponseWriters that allow them to be configured with
DateFormater Strings they would use to process any date they return ..
but that raises questions about the QueryParsing aspect as well ...
should date formating be a property of the response, or a property of
the request, such that both input and output formats are identicle?

Third is how the discussions of the internal format and the external
format shouldn't be treated completely indepndent.  it's tempting to say
that there will be a clean abstraction between the two, that all client
interaction will be done using configured external formater(s) to
create internal java Date objects, which will then be translated back to
Strings by an internal formater for the purpose of indexing (and
querying) but what happens when a query expresses a date range too
precise for the granularity expressed by the internal format? do we
match nothing/everything? ... what if the indexed granularity is *more*
recised then the uery graunlarity .. how do we know if a range query
between March 6, 2007 and May 10, 2007 on a field that stores
millisencond granularity is suppose to go from the first millisecond of
each day or the last?



Questions like these are whiy I'm glad Solr currently keeps it simple
and makes people deal in absolutes .. less room for confusion  :)


-Hoss