Re: Solr, operating systems and globalization
On 18-Oct-07, at 11:43 AM, Chris Hostetter wrote: : This is easy--I always convert dates to UTC. Doubly important since several : of our servers operate in different timezones. : : Less easy is changing Solr's interpretation of NOW in DateMath to be UTC. : What is the correct way to go about this? You lost me there ... Dates in java have no concept of timezone, they are absolute moments in the space/time continuom. timezones only affect the parsing/formating of dates. NOW is whenever Solr parses the string, and when Solr then formats that Date as a string, it formats it in UTC. Ah, that is good. So if: $ date Thu Oct 18 12:07:42 PDT 2007 Then NOW in Solr will be the absolute date Thu Oct 18 04:07:42 2007 (which is the current time in UTC)? i'm guessing you are refering to the notion of rounding down the the nearest day (or anything of less granularity) ... this is currently hardcoded to be done relative UTC -- but as I mention, this is the type of thing where ideally Solr would have a setting to let you specify which timezone the rounding should be relative to. I'm not sure this is desirable. If your user's are all over the world, you'd ideally want to round to _their_ timezone, but I don't see how this is realistic. We had the same general issue just a few months ago. We can generate reports on things like SCM commit activity for a given day. For larger customers, they have users in multiple timezones - so what is the timezone to use? I wrote a blog post about it at http://blog.krugle.com/?p=267, but the short answer is that ultimately we decided to use UTC for all times (server, report, API, and UI) as the least heinous of the various options. -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 If you can't find it, you can't fix it
Re: Solr, operating systems and globalization
: Ah, that is good. So if: : : $ date : Thu Oct 18 12:07:42 PDT 2007 : : Then NOW in Solr will be the absolute date Thu Oct 18 04:07:42 2007 (which is : the current time in UTC)? first off: PDT is only 7 hours off UTC Second: i'm going to get a little bit pedandic... NOW is now .. it's an abstract DateTime instance, a point in the one-dimensional space representing linear time. TimeZones are an artificial concept that exist only in the perspective of an observer who places a coordinate system (with an origin) in that dimension. When you try to express an abstract DateTime instance in an email (or in an HTTP response) it stops being an abstract moment in time, and becomes a string representation of a DateTime instance relative that coordinate system. If string representation includes a TimeZone delcaration (and as much precision as is measurable in your universe, but for now lets gloss over that and assume milliseconds is a quantum unit and all string representations include millisecond precision), then that string representation is unambiguous. (just as refering to a point in space using coordinates requires you to have an origin and a unit of distance in order for it to be unambiguous). Thu Oct 18 12:07:42 PDT 2007 and Thu Oct 18 05:07:42 UTC 2007 are both unambiguous string representations of the same moment in time. they happen to be relative differnet origins, but the information about the differnece in their orrigins is included in their string representation. Date objects in java represent abstract moments in time, and Solr uses those abstract objects when doing all of it's date based calculations. when it's neccessary to know about the coordinate system (in order to represent as a string, or to do rounding or math) Solr uses the UTC coordinate system. : I'm not sure this is desirable. If your user's are all over the world, you'd : ideally want to round to _their_ timezone, but I don't see how this is : realistic. hence the reason it's not implemented yet :) in theory, we should at least allow the schema to specify what the normal TimeZone and Locale are for a date field ... and then let clients specify alternative per request ... this would only affect the computation of info (and perhps the string representations accepted from clients or returned to clinents) the string representations stored in the physical index should always be in UTC. -Hoss
Re: Solr, operating systems and globalization
: This is exactly the scenario. Ideally what I'd like to achieve is for : Solrsharp to discover the culture settings from the targeted Solr instance : and set the client in appropriate position. well ... my point is there shouldn't be any cultural settings on the targeted Solr server that the client needs to know about. the communication between the server and any clients should always be in a fixed format independent of culture. Any (hypothetical) culture specific settings the server has to have might affect teh functionality, but shouldn't affect the communication (ie: for the purposes of date rounding/faceting the Solr server might be configured to know what timezone to use for rounding to the nearest day is, or what Locale to use to compute the first first day of the week, but when returning that info to clients it should still be stringified in an abolute format (UTC) : multi-lingual systems across different JVM and OS platforms. If it *were* : the case that different underlying system stacks affected solr in such a : way, Solrsharp should follow the server's lead. if that were the case, the server would be buggy and should be fixed :) i don't know much about C#, but i can't really think of a lot of cases where client APIs really need to be very multi-cultural aware ... typically culture/locale type settings related to parsing and formatting of datatypes (ie: how to stringify a number, how to convert a date to/from a string, etc...). when client code is taking input and sending it to solr it's dealing with native objects nad stringifying them into the canonical format Solr wants -- independent of culture. when client code is reading data back from Solr and returning it it needs to parse those strings from the canonical form and return them as native objects. The only culture that SolrSharp should need to worry about is the InvariantCulture you described ... right? -Hoss
Re: Solr, operating systems and globalization
OK, this simplifies things greatly. For C#, the proper culture setting for interaction with Solr should be Invariant. Basically, the primary requirement for Solrsharp is to be culturally-consistent with the targeted Solr server to ensure proper data-type formatting. Since Solr is culturally-agnostic, Solrsharp should be so as well. Thanks for the clarification. On 10/17/07, Chris Hostetter [EMAIL PROTECTED] wrote: : This is exactly the scenario. Ideally what I'd like to achieve is for : Solrsharp to discover the culture settings from the targeted Solr instance : and set the client in appropriate position. well ... my point is there shouldn't be any cultural settings on the targeted Solr server that the client needs to know about. the communication between the server and any clients should always be in a fixed format independent of culture. Any (hypothetical) culture specific settings the server has to have might affect teh functionality, but shouldn't affect the communication (ie: for the purposes of date rounding/faceting the Solr server might be configured to know what timezone to use for rounding to the nearest day is, or what Locale to use to compute the first first day of the week, but when returning that info to clients it should still be stringified in an abolute format (UTC) : multi-lingual systems across different JVM and OS platforms. If it *were* : the case that different underlying system stacks affected solr in such a : way, Solrsharp should follow the server's lead. if that were the case, the server would be buggy and should be fixed :) i don't know much about C#, but i can't really think of a lot of cases where client APIs really need to be very multi-cultural aware ... typically culture/locale type settings related to parsing and formatting of datatypes (ie: how to stringify a number, how to convert a date to/from a string, etc...). when client code is taking input and sending it to solr it's dealing with native objects nad stringifying them into the canonical format Solr wants -- independent of culture. when client code is reading data back from Solr and returning it it needs to parse those strings from the canonical form and return them as native objects. The only culture that SolrSharp should need to worry about is the InvariantCulture you described ... right? -Hoss
Re: Solr, operating systems and globalization
: This is easy--I always convert dates to UTC. Doubly important since several : of our servers operate in different timezones. : : Less easy is changing Solr's interpretation of NOW in DateMath to be UTC. : What is the correct way to go about this? You lost me there ... Dates in java have no concept of timezone, they are absolute moments in the space/time continuom. timezones only affect the parsing/formating of dates. NOW is whenever Solr parses the string, and when Solr then formats that Date as a string, it formats it in UTC. i'm guessing you are refering to the notion of rounding down the the nearest day (or anything of less granularity) ... this is currently hardcoded to be done relative UTC -- but as I mention, this is the type of thing where ideally Solr would have a setting to let you specify which timezone the rounding should be relative to. -Hoss
Re: Solr, operating systems and globalization
On 18-Oct-07, at 11:43 AM, Chris Hostetter wrote: : This is easy--I always convert dates to UTC. Doubly important since several : of our servers operate in different timezones. : : Less easy is changing Solr's interpretation of NOW in DateMath to be UTC. : What is the correct way to go about this? You lost me there ... Dates in java have no concept of timezone, they are absolute moments in the space/time continuom. timezones only affect the parsing/formating of dates. NOW is whenever Solr parses the string, and when Solr then formats that Date as a string, it formats it in UTC. Ah, that is good. So if: $ date Thu Oct 18 12:07:42 PDT 2007 Then NOW in Solr will be the absolute date Thu Oct 18 04:07:42 2007 (which is the current time in UTC)? i'm guessing you are refering to the notion of rounding down the the nearest day (or anything of less granularity) ... this is currently hardcoded to be done relative UTC -- but as I mention, this is the type of thing where ideally Solr would have a setting to let you specify which timezone the rounding should be relative to. I'm not sure this is desirable. If your user's are all over the world, you'd ideally want to round to _their_ timezone, but I don't see how this is realistic. thanks, -Mike
Re: Solr, operating systems and globalization
: However, SolrSharp culture settings should be reflective and consistent with : the solr server instance's culture. This leads to my question: does Solr : control its culture language settings through the various language : components that can be incorporated, or does the underlying OS have a say in : how that data is treated? As a general rule: 1) Solr (the server) should operate as culturally and locally agnostic as possible. 2) Solr Clients that want to act culturally appropriate should explicitly translate from local formats to absolute concepts that it sends to the server. (ala: the absolute unambiguous date format) Ideally you should be able to take a Solr install from one box, move it to another JVM on a different OS in a different timezone with different Locale settings and everything will keep working the same. (I think once upon a time i argued that Solr should assume the charencoding of the local JVM, and wiser people then me pointed out that was bad). There may be exceptions to this -- but those exceptions should be in cases where: a) the person configuring Solr is in completley control; and b) the exception is prudent because doing the work in the client would require more complexity. Analysis is a good example of this: we don't make the clients analyze the text according to the native language customs -- we let the person creating the schema.xml specify what the Analysis should be. As i recal, the issue that prompted this email had to do with C# and the various cultural ways to specify a floating point number: 1,234 vs 1.234 (comma vs period). this is the kind of thing that should be translated in clients to the canonical floating point representation. ... by which i mean: the one the solr server uses :) *IF* Solr has the behavior where setting the JVM local to something random makes Solr assume floats should be in the comma format, then i would consider that a Bug in Solr ... Solr should allways be consistent. -Hoss
Re: Solr, operating systems and globalization
Thanks for the comments Hoss. More notes embedded below... On 10/17/07, Chris Hostetter [EMAIL PROTECTED] wrote: : However, SolrSharp culture settings should be reflective and consistent with : the solr server instance's culture. This leads to my question: does Solr : control its culture language settings through the various language : components that can be incorporated, or does the underlying OS have a say in : how that data is treated? As a general rule: 1) Solr (the server) should operate as culturally and locally agnostic as possible. 2) Solr Clients that want to act culturally appropriate should explicitly translate from local formats to absolute concepts that it sends to the server. (ala: the absolute unambiguous date format) Ideally you should be able to take a Solr install from one box, move it to another JVM on a different OS in a different timezone with different Locale settings and everything will keep working the same. I fully understand that approach. Going back to C#/Windows, this is known as an Invariant culture setting, which we're incorporating into Solrsharp (along with configurable culture settings as appropriate.) (I think once upon a time i argued that Solr should assume the charencoding of the local JVM, and wiser people then me pointed out that was bad). There may be exceptions to this -- but those exceptions should be in cases where: a) the person configuring Solr is in completley control; and b) the exception is prudent because doing the work in the client would require more complexity. Analysis is a good example of this: we don't make the clients analyze the text according to the native language customs -- we let the person creating the schema.xml specify what the Analysis should be. As i recal, the issue that prompted this email had to do with C# and the various cultural ways to specify a floating point number: 1,234 vs 1.234 (comma vs period). this is the kind of thing that should be translated in clients to the canonical floating point representation. ... by which i mean: the one the solr server uses :) This is exactly the scenario. Ideally what I'd like to achieve is for Solrsharp to discover the culture settings from the targeted Solr instance and set the client in appropriate position. *IF* Solr has the behavior where setting the JVM local to something random makes Solr assume floats should be in the comma format, then i would consider that a Bug in Solr ... Solr should allways be consistent. This would be an interesting discovery exercise for those who deal with multi-lingual systems across different JVM and OS platforms. If it *were* the case that different underlying system stacks affected solr in such a way, Solrsharp should follow the server's lead. -Hoss
Solr, operating systems and globalization
We discovered and verified an issue in SolrSharp whereby indexing and searching can be disrupted without taking Windows globalization culture settings into consideration. For example, European cultures affect numeric and date values differently from US/English cultures. The resolution for this type of issue is to specifically control the culture settings to allow for index data formatting to work. However, SolrSharp culture settings should be reflective and consistent with the solr server instance's culture. This leads to my question: does Solr control its culture language settings through the various language components that can be incorporated, or does the underlying OS have a say in how that data is treated? Some education on this would be greatly appreciated. cheers, jeff r.