Re: Dates, Times and Timezones
: One great problem we are having to integrate solr with plone is that : plone can have dates and times in diferent timezones, and each user can query : the data in its own timezone. So we would be really interested in being able : to put date/time data on solr with a timezone and specifying the timezone of a : query so we get perfect results. I saw somewhere that part of this suport is : going to be in 1.3, is that right? And how is it going to work? I'm not sure what part of this you are thinking of will be in 1.3 ... I don't know of any new Timezone related stuff in the trunk. Solr specificly tries to be as agnostic about timezones as possible ... when interacting with Solr all dates should be in UTC. If your application is getting/giving dates fromt/to users who have configured timezones prefrences then the parsing/formatting when interacting with the user should be aware of their prefered timezone -- but you should allways be in UTC when dealing with Solr. The one place where it would *definitely* make sense to make Solr aware of timezones would be in dealing with DateMath -- when you round by DAY Solr currently does that in UTC, even though that's probably not what matters to you -- but ever other aspect of date processing Solr should work fine provided you transform your dates before putting them in the index. There have been a some other discussions about adding options DateField to make it more flexible in parsing Dates in other formats (which might included timezone support) but: 1) i don't know of anyone who has actually started on this; 2) it would only affect the input to Solr ... values would still be indexed in UTC so they could be compared with any other date (regardless of it's format/timezone) and search clients would still need to know the current users prefered TZ to make sure to query/display those dates appropriately. -Hoss
Dates, Times and Timezones
One great problem we are having to integrate solr with plone is that plone can have dates and times in diferent timezones, and each user can query the data in its own timezone. So we would be really interested in being able to put date/time data on solr with a timezone and specifying the timezone of a query so we get perfect results. I saw somewhere that part of this suport is going to be in 1.3, is that right? And how is it going to work? []'s -- Leonardo Santagada
dates times
After writing my 3rd parser in my third scripting language in so many months to go from unix timestamps to Solr Time (8601) I have to ask: shouldn't the date/time field type be more resilient? I assume there's a good reason that it's 8601 internally, but certainly it would be excellent for Solr to transcode different types into Solr Time. My main problem (as a normal Solr end user) is that it's hard to do math directly on 8601 dates or really parse them without specific packages. My XSL 2.0 parsers don't even like it without some massaging (forget about XSL 1.0.) UNIX time (seconds since the epoch) is super easy, as are sortable delimitable strings like 20070510125403. I'm not advocating replacing 8601 as the known good Solr Time, just that some leeway be given in the parser to accept unix time or something else and the conversion to 8601 happens internally. And a further dream is to have a strftime formatter in solrconfig for the query response, so I can always have my date fields come back as May 10th, 2007, 12:58pm. -Brian
Re: dates times
On 5/10/07, Brian Whitman [EMAIL PROTECTED] wrote: After writing my 3rd parser in my third scripting language in so many months to go from unix timestamps to Solr Time (8601) I have to ask: shouldn't the date/time field type be more resilient? I assume there's a good reason that it's 8601 internally, but certainly it would be excellent for Solr to transcode different types into Solr Time. My main problem (as a normal Solr end user) is that it's hard to do math directly on 8601 dates or really parse them without specific packages. My XSL 2.0 parsers don't even like it without some massaging (forget about XSL 1.0.) UNIX time (seconds since the epoch) is super easy, as are sortable delimitable strings like 20070510125403. I'm not sure what delimitable means, but Solr datetimes _are_ essentially sortable inverse-magnitude like the above, with a few punctuation symbols thrown in. I have no XSLT-fu, but is it not possible to do regexp-replace s/[TZ:-]//g on the solrdate to get the above? I'm not advocating replacing 8601 as the known good Solr Time, just that some leeway be given in the parser to accept unix time or something else and the conversion to 8601 happens internally. And a further dream is to have a strftime formatter in solrconfig for the query response, so I can always have my date fields come back as May 10th, 2007, 12:58pm. Those are interesting ideas and it probably would not be difficult to create a patch if you were interested, but I'm curious: What about XSL makes what seems to me an elementary string-processing task so difficult? regards -Mike
RE: dates times
You can get at some of this functionality in the built-in xslt 1.0 engine (Xalan) by using the e-xslt date-time extensions: see http://exslt.org/date/index.html, and for Xalan's implementation see http://xml.apache.org/xalan-j/extensionslib.html#exslt . There are some examples here: http://www-128.ibm.com/developerworks/library/x-exslt.html . I haven't tried this in Solr but I don't think there's any reason why it wouldn't work; I've used it in other Xalan-J environments, notably Cocoon. Peter -Original Message- From: Brian Whitman [mailto:[EMAIL PROTECTED] Sent: Thursday, May 10, 2007 11:49 AM To: solr-user@lucene.apache.org Subject: Re: dates times Those are interesting ideas and it probably would not be difficult to create a patch if you were interested, but I'm curious: What about XSL makes what seems to me an elementary string-processing task so difficult? Well, XSL 1.0 (which is the one that comes for free with Solr/java) doesn't handle dates and times at all. XSL 2.0 handles it well enough, but it's only supported through a GPL jar, which we can't distribute. It's more than string processing, anyway. I would want to convert the Solr Time 2007-03-15T00:41:5:2Z to March 15th, 2007 in a web app. I'd also like to say 'Posted 3 days ago. In my vision of things, that work is done on Solr's side. (The former case with a strftime type formatter in solrconfig, the latter by having strftime return the day number this year.)
Re: dates times
You can get at some of this functionality in the built-in xslt 1.0 engine (Xalan) by using the e-xslt date-time extensions: see http://exslt.org/date/index.html, and for Xalan's implementation see http://xml.apache.org/xalan-j/extensionslib.html#exslt . The exslt stuff looks good, thanks! I'll have to try it out. That's only one direction though, still want parsing of unix timestamp-like formats into the indexer on doc adds and updates. Just FYi the license for the exslt stuff seems OK w/ the APL: http:// lists.fourthought.com/pipermail/exslt-manage/2004-June/000603.html So if it works out we might want to put the date/time xsl stuff in the solr distribution in lieu of shipping with a XSL 2.0 processor. Those are interesting ideas and it probably would not be difficult to create a patch if you were interested, but I'm curious: What about XSL makes what seems to me an elementary string-processing task so difficult? Well, XSL 1.0 (which is the one that comes for free with Solr/java) doesn't handle dates and times at all. XSL 2.0 handles it well enough, but it's only supported through a GPL jar, which we can't distribute. It's more than string processing, anyway. I would want to convert the Solr Time 2007-03-15T00:41:5:2Z to March 15th, 2007 in a web app. I'd also like to say 'Posted 3 days ago. In my vision of things, that work is done on Solr's side. (The former case with a strftime type formatter in solrconfig, the latter by having strftime return the day number this year.) -- http://variogr.am/ [EMAIL PROTECTED]
Re: dates times
: It's more than string processing, anyway. I would want to convert the : Solr Time 2007-03-15T00:41:5:2Z to March 15th, 2007 in a web app. : I'd also like to say 'Posted 3 days ago. In my vision of things, : that work is done on Solr's side. (The former case with a strftime : type formatter in solrconfig, the latter by having strftime return : the day number this year.) One of the early architecture/design principles of the Solr search APIs was compute secondary info about a result if it's more efficient or easier to compute in Solr then it would be for a client to do it -- DocSet caches, facet counts, and sorting/pagination being great examples of things where Solr can do less work to get the same info out of raw data then a client app would because of it's low level access to the data, and becuase of how much data would need to go over the wire for the client to do the same computation. ... that's largely just a lit bit of historic trivial however, Solr has a lot of features now which might not hold up to the yard stick, but i mention it only to clarify one of hte reasons Solr didnt' have more 'configurable date formatting to start with. it has been on the TaskList since the start of incubation however... * a DateTime field (or Query Parser extension) that allows flexible input for easier human entered queries * allow alternate format for date output to ease client creation of date objects? One of hte reasons i dont' think anyone has tackled them yet is because it's hard to get a holistic view of a solution, because there are really several loosely related problems with date formatting issues: The first is a discusion of the internal format and what resolution the dates are stored at in the index itself. if you *know* that you never plan on querying with anything more fine grained then day resolution, storing your dates with only day resolution can make your index a lot smaller (and make date searches a lot faster). with the current DateField the same performance benefits can be achieved by rounding your dates before indexing them, but if we were to make it a config option on DateField itself to automaticly round, we would need to take this info into account when parsing updates -- should the client be exepcted to know what precision each date field uses? do they send dates expressed using the internal format, or as fully qualified times? is it an error/warning to attempt to index more datetime precision then a field supports? The second is a discussion of external format (which seems to be what you are mostly discussing) the most trivial way to address this would be options on the ResponseWriters that allow them to be configured with DateFormater Strings they would use to process any date they return .. but that raises questions about the QueryParsing aspect as well ... should date formating be a property of the response, or a property of the request, such that both input and output formats are identicle? Third is how the discussions of the internal format and the external format shouldn't be treated completely indepndent. it's tempting to say that there will be a clean abstraction between the two, that all client interaction will be done using configured external formater(s) to create internal java Date objects, which will then be translated back to Strings by an internal formater for the purpose of indexing (and querying) but what happens when a query expresses a date range too precise for the granularity expressed by the internal format? do we match nothing/everything? ... what if the indexed granularity is *more* recised then the uery graunlarity .. how do we know if a range query between March 6, 2007 and May 10, 2007 on a field that stores millisencond granularity is suppose to go from the first millisecond of each day or the last? Questions like these are whiy I'm glad Solr currently keeps it simple and makes people deal in absolutes .. less room for confusion :) -Hoss
Re: dates times
On May 10, 2007, at 2:30 PM, Chris Hostetter wrote: Questions like these are whiy I'm glad Solr currently keeps it simple and makes people deal in absolutes .. less room for confusion :) I get all that, thanks for the great explanation. I imagine most of my problems can be solved with a custom field analyzer (converting other date strings to 8601 during indexing) and then XSL on the select?q= end (which we do anyway.) In other words, leaving core solr absolute with an option for different date analyzers. I see the need to not clutter it up, especially at this stage. What would, say, a filter that converted unix timestamps to 8601 before indexing as a solr.DateField look like? Is that a custom filter, or a tokenizer?
Re: dates times
On 5/10/07, Brian Whitman [EMAIL PROTECTED] wrote: On May 10, 2007, at 2:30 PM, Chris Hostetter wrote: Questions like these are whiy I'm glad Solr currently keeps it simple and makes people deal in absolutes .. less room for confusion :) I get all that, thanks for the great explanation. I imagine most of my problems can be solved with a custom field analyzer (converting other date strings to 8601 during indexing) and then XSL on the select?q= end (which we do anyway.) In other words, leaving core solr absolute with an option for different date analyzers. I see the need to not clutter it up, especially at this stage. What would, say, a filter that converted unix timestamps to 8601 before indexing as a solr.DateField look like? Is that a custom filter, or a tokenizer? That would be a custom filter which is currently only supported by text fields, so the XML output would be str instead of date (if that matters to you). One could also just store seconds or milliseconds in an int or long field. That's fine for small devel teams, but not ideal since it's a bit less expressive. The right approach for more flexible date parsing is probably to add more functionality to the date field and configure via optional attributes. -Yonik
Re: dates times
: The right approach for more flexible date parsing is probably to add : more functionality to the date field and configure via optional : attributes. Adding configuration options to DateField seems like it might ultimately be the right choice for changing the *internal* format, but assuming we want to keep the internal representation of DateField fixed and unconfigurable for the time being and address the the various *external* formatting issues i imagine the simplest things to tackle this (in a way that is consistent with the other datatypes) would be... 1) change DateField to support Analyzers. that way you could have seperate analyzers for indexing vs querying just like a text field (so you could for example send Solr seconds since epoch when indexing, and query query using MM/DD/) The Analyzers used would be responsible for producing Tokens which match what values the current DateField.toInternal() already consideres legal (either a DateMath string or an iso8601 string). (In general a DateTranslatingTokenFilter class would be a pretty cool addition to Lucene, it could as constructor args two DateFormatters (one for parsing the incoming tokens, and one for formating the outgoing tokens) and a boolean indicating wether it's job was to replace matching tokens or inject duplicate tokens in the same position ... maybe another option indicating wether incoming Tokens that can't be parsed should be striped or passed through ... the idea being that for something like DateFiled you would use KeywordTokenizer along with an instance of this to parse whatever format you wanted -- but when parsing generic text you might have several of these TokenFilters configured with differnet DateFormatters so if they see a Token in the text that matches a known DateFormat they could inject the name of the month, or the day of hte week into the text at the same position.) 2) add options to the various QueryResponseWriters to control which format they use when writting fields out. .. in the case of XmlResposneWriter it would still produce a date tag, but the value would be formated according to the configuration. -Hoss
Re: dates times
(In general a DateTranslatingTokenFilter class would be a pretty cool addition to Lucene, it could as constructor args two DateFormatters (one for parsing the incoming tokens, and one for formating the outgoing If this happens, it would be nice (perhaps overkill) to have a chronic input filter: http://chronic.rubyforge.org/ the java port: https://jchronic.dev.java.net/ --- brian, for a quick easy solution, if you find working with unix timestamps easier, perhaps just want to to put in dates as a SortableLongField and deal with the formatting that way.
RE: dates times
Regarding Hoss's points about the internal format, resolution of date-times, etc.: maybe a good starting point would be to implement the date-time algorithms of XML Schema (http://www.w3.org/TR/xmlschema-2/#isoformats), where these behaviors are spelled out in reasonably precise terms. There must be code somewhere that Solr could steal to help with this. This would mesh well with XSLT 2.0, and presumably other modern XML environments. peter -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Thursday, May 10, 2007 12:30 PM To: solr-user@lucene.apache.org Subject: Re: dates times : It's more than string processing, anyway. I would want to convert the : Solr Time 2007-03-15T00:41:5:2Z to March 15th, 2007 in a web app. : I'd also like to say 'Posted 3 days ago. In my vision of things, : that work is done on Solr's side. (The former case with a strftime : type formatter in solrconfig, the latter by having strftime return : the day number this year.) One of the early architecture/design principles of the Solr search APIs was compute secondary info about a result if it's more efficient or easier to compute in Solr then it would be for a client to do it -- DocSet caches, facet counts, and sorting/pagination being great examples of things where Solr can do less work to get the same info out of raw data then a client app would because of it's low level access to the data, and becuase of how much data would need to go over the wire for the client to do the same computation. ... that's largely just a lit bit of historic trivial however, Solr has a lot of features now which might not hold up to the yard stick, but i mention it only to clarify one of hte reasons Solr didnt' have more 'configurable date formatting to start with. it has been on the TaskList since the start of incubation however... * a DateTime field (or Query Parser extension) that allows flexible input for easier human entered queries * allow alternate format for date output to ease client creation of date objects? One of hte reasons i dont' think anyone has tackled them yet is because it's hard to get a holistic view of a solution, because there are really several loosely related problems with date formatting issues: The first is a discusion of the internal format and what resolution the dates are stored at in the index itself. if you *know* that you never plan on querying with anything more fine grained then day resolution, storing your dates with only day resolution can make your index a lot smaller (and make date searches a lot faster). with the current DateField the same performance benefits can be achieved by rounding your dates before indexing them, but if we were to make it a config option on DateField itself to automaticly round, we would need to take this info into account when parsing updates -- should the client be exepcted to know what precision each date field uses? do they send dates expressed using the internal format, or as fully qualified times? is it an error/warning to attempt to index more datetime precision then a field supports? The second is a discussion of external format (which seems to be what you are mostly discussing) the most trivial way to address this would be options on the ResponseWriters that allow them to be configured with DateFormater Strings they would use to process any date they return .. but that raises questions about the QueryParsing aspect as well ... should date formating be a property of the response, or a property of the request, such that both input and output formats are identicle? Third is how the discussions of the internal format and the external format shouldn't be treated completely indepndent. it's tempting to say that there will be a clean abstraction between the two, that all client interaction will be done using configured external formater(s) to create internal java Date objects, which will then be translated back to Strings by an internal formater for the purpose of indexing (and querying) but what happens when a query expresses a date range too precise for the granularity expressed by the internal format? do we match nothing/everything? ... what if the indexed granularity is *more* recised then the uery graunlarity .. how do we know if a range query between March 6, 2007 and May 10, 2007 on a field that stores millisencond granularity is suppose to go from the first millisecond of each day or the last? Questions like these are whiy I'm glad Solr currently keeps it simple and makes people deal in absolutes .. less room for confusion :) -Hoss