Re: Solr, operating systems and globalization

2007-10-19 Thread Ken Krugler

On 18-Oct-07, at 11:43 AM, Chris Hostetter wrote:

: This is easy--I always convert dates to UTC.  Doubly important 
since several

: of our servers operate in different timezones.
:
: Less easy is changing Solr's interpretation of NOW in DateMath to be UTC.
: What is the correct way to go about this?

You lost me there ... Dates in java have no concept of timezone, they
are absolute moments in the space/time continuom.  timezones only affect
the parsing/formating of dates.  NOW is whenever Solr parses the string,
and when Solr then formats that Date as a string, it formats it in UTC.


Ah, that is good.  So if:

$ date
Thu Oct 18 12:07:42 PDT 2007

Then NOW in Solr will be the absolute date Thu Oct 18 04:07:42 2007 
(which is the current time in UTC)?



i'm guessing you are refering to the notion of rounding down the the
nearest day (or anything of less granularity) ... this is currently
hardcoded to be done relative UTC -- but as I mention, this is the type of
thing where ideally Solr would have a setting to let you specify which
timezone the rounding should be relative to.


I'm not sure this is desirable.  If your user's are all over the 
world, you'd ideally want to round to _their_ timezone, but I don't 
see how this is realistic.


We had the same general issue just a few months ago. We can generate 
reports on things like SCM commit activity for a given day. For 
larger customers, they have users in multiple timezones - so what is 
the timezone to use?


I wrote a blog post about it at http://blog.krugle.com/?p=267, but 
the short answer is that ultimately we decided to use UTC for all 
times (server, report, API, and UI) as the least heinous of the 
various options.


-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
If you can't find it, you can't fix it


Re: Solr, operating systems and globalization

2007-10-19 Thread Chris Hostetter
: Ah, that is good.  So if:
: 
: $ date
: Thu Oct 18 12:07:42 PDT 2007
: 
: Then NOW in Solr will be the absolute date Thu Oct 18 04:07:42 2007 (which is
: the current time in UTC)?

first off: PDT is only 7 hours off UTC

Second: i'm going to get a little bit pedandic... 

NOW is now .. it's an abstract DateTime instance, a point in the 
one-dimensional space representing linear time.  TimeZones are an 
artificial concept that exist only in the perspective of an observer who 
places a coordinate system (with an origin) in that dimension.

When you try to express an abstract DateTime instance in an email (or in 
an HTTP response) it stops being an abstract moment in time, and becomes a 
string representation of a DateTime instance relative that coordinate 
system.  If string representation includes a TimeZone delcaration (and as 
much precision as is measurable in your universe, but for now lets gloss 
over that and assume milliseconds is a quantum unit and all string 
representations include millisecond precision), then that string 
representation is unambiguous. (just as refering to a point in space using 
coordinates requires you to have an origin and a unit of distance in 
order for it to be unambiguous).

Thu Oct 18 12:07:42 PDT 2007 and Thu Oct 18 05:07:42 UTC 2007 are 
both unambiguous string representations of the same moment in time.  they 
happen to be relative differnet origins, but the information about the 
differnece in their orrigins is included in their string representation.

Date objects in java represent abstract moments in time, and Solr uses 
those abstract objects when doing all of it's date based calculations.  
when it's neccessary to know about the coordinate system (in order to 
represent as a string, or to do rounding or math) Solr uses the UTC 
coordinate system.

: I'm not sure this is desirable.  If your user's are all over the world, you'd
: ideally want to round to _their_ timezone, but I don't see how this is
: realistic.

hence the reason it's not implemented yet :)

in theory, we should at least allow the schema to specify what the 
normal TimeZone and Locale are for a date field ... and then let clients 
specify alternative per request ... this would only affect the computation 
of info (and perhps the string representations accepted from clients or 
returned to clinents) the string representations stored in the physical 
index should always be in UTC.


-Hoss



Re: Solr, operating systems and globalization

2007-10-18 Thread Chris Hostetter

: This is exactly the scenario.  Ideally what I'd like to achieve is for
: Solrsharp to discover the culture settings from the targeted Solr instance
: and set the client in appropriate position.

well ... my point is there shouldn't be any cultural settings on the 
targeted Solr server that the client needs to know about. 

the communication between the server and any clients should always be in a 
fixed format independent of culture.  Any (hypothetical) culture specific 
settings the server has to have might affect teh functionality, but 
shouldn't affect the communication (ie: for the purposes of date 
rounding/faceting the Solr server might be configured to know what 
timezone to use for rounding to the nearest day is, or what Locale to use 
to compute the first first day of the week, but when returning that info 
to clients it should still be stringified in an abolute format (UTC)

: multi-lingual systems across different JVM and OS platforms.  If it *were*
: the case that different underlying system stacks affected solr in such a
: way, Solrsharp should follow the server's lead.

if that were the case, the server would be buggy and should be fixed :)

i don't know much about C#, but i can't really think of a lot of cases 
where client APIs really need to be very multi-cultural aware ... 
typically culture/locale type settings related to parsing and formatting 
of datatypes (ie: how to stringify a number, how to convert a date to/from 
a string, etc...).  when client code is taking input and sending it to 
solr it's dealing with native objects nad stringifying them into the 
canonical format Solr wants -- independent of culture.  when client code 
is reading data back from Solr and returning it it needs to parse those 
strings from the canonical form and return them as native objects.

The only culture that SolrSharp should need to worry about is the 
InvariantCulture you described ... right?



-Hoss



Re: Solr, operating systems and globalization

2007-10-18 Thread Jeff Rodenburg
OK, this simplifies things greatly.  For C#, the proper culture setting for
interaction with Solr should be Invariant.

Basically, the primary requirement for Solrsharp is to be
culturally-consistent with the targeted Solr server to ensure proper
data-type formatting.  Since Solr is culturally-agnostic, Solrsharp should
be so as well.

Thanks for the clarification.

On 10/17/07, Chris Hostetter [EMAIL PROTECTED] wrote:


 : This is exactly the scenario.  Ideally what I'd like to achieve is for
 : Solrsharp to discover the culture settings from the targeted Solr
 instance
 : and set the client in appropriate position.

 well ... my point is there shouldn't be any cultural settings on the
 targeted Solr server that the client needs to know about.

 the communication between the server and any clients should always be in a
 fixed format independent of culture.  Any (hypothetical) culture specific
 settings the server has to have might affect teh functionality, but
 shouldn't affect the communication (ie: for the purposes of date
 rounding/faceting the Solr server might be configured to know what
 timezone to use for rounding to the nearest day is, or what Locale to use
 to compute the first first day of the week, but when returning that info
 to clients it should still be stringified in an abolute format (UTC)



: multi-lingual systems across different JVM and OS platforms.  If it *were*
 : the case that different underlying system stacks affected solr in such a
 : way, Solrsharp should follow the server's lead.

 if that were the case, the server would be buggy and should be fixed :)

 i don't know much about C#, but i can't really think of a lot of cases
 where client APIs really need to be very multi-cultural aware ...
 typically culture/locale type settings related to parsing and formatting
 of datatypes (ie: how to stringify a number, how to convert a date to/from
 a string, etc...).  when client code is taking input and sending it to
 solr it's dealing with native objects nad stringifying them into the
 canonical format Solr wants -- independent of culture.  when client code
 is reading data back from Solr and returning it it needs to parse those
 strings from the canonical form and return them as native objects.

 The only culture that SolrSharp should need to worry about is the
 InvariantCulture you described ... right?



 -Hoss




Re: Solr, operating systems and globalization

2007-10-18 Thread Chris Hostetter
: This is easy--I always convert dates to UTC.  Doubly important since several
: of our servers operate in different timezones.
: 
: Less easy is changing Solr's interpretation of NOW in DateMath to be UTC.
: What is the correct way to go about this?

You lost me there ... Dates in java have no concept of timezone, they 
are absolute moments in the space/time continuom.  timezones only affect 
the parsing/formating of dates.  NOW is whenever Solr parses the string, 
and when Solr then formats that Date as a string, it formats it in UTC.

i'm guessing you are refering to the notion of rounding down the the 
nearest day (or anything of less granularity) ... this is currently 
hardcoded to be done relative UTC -- but as I mention, this is the type of 
thing where ideally Solr would have a setting to let you specify which 
timezone the rounding should be relative to.



-Hoss



Re: Solr, operating systems and globalization

2007-10-18 Thread Mike Klaas

On 18-Oct-07, at 11:43 AM, Chris Hostetter wrote:

: This is easy--I always convert dates to UTC.  Doubly important  
since several

: of our servers operate in different timezones.
:
: Less easy is changing Solr's interpretation of NOW in DateMath to  
be UTC.

: What is the correct way to go about this?

You lost me there ... Dates in java have no concept of timezone,  
they
are absolute moments in the space/time continuom.  timezones only  
affect
the parsing/formating of dates.  NOW is whenever Solr parses the  
string,
and when Solr then formats that Date as a string, it formats it in  
UTC.


Ah, that is good.  So if:

$ date
Thu Oct 18 12:07:42 PDT 2007

Then NOW in Solr will be the absolute date Thu Oct 18 04:07:42 2007  
(which is the current time in UTC)?



i'm guessing you are refering to the notion of rounding down the the
nearest day (or anything of less granularity) ... this is currently
hardcoded to be done relative UTC -- but as I mention, this is the  
type of

thing where ideally Solr would have a setting to let you specify which
timezone the rounding should be relative to.


I'm not sure this is desirable.  If your user's are all over the  
world, you'd ideally want to round to _their_ timezone, but I don't  
see how this is realistic.


thanks,
-Mike


Re: Solr, operating systems and globalization

2007-10-17 Thread Chris Hostetter

: However, SolrSharp culture settings should be reflective and consistent with
: the solr server instance's culture.  This leads to my question: does Solr
: control its culture  language settings through the various language
: components that can be incorporated, or does the underlying OS have a say in
: how that data is treated?

As a general rule:
  1) Solr (the server) should operate as culturally and locally agnostic as 
possible.
  2) Solr Clients that want to act culturally appropriate should 
 explicitly translate from local formats to absolute concepts that 
 it sends to the server.  (ala: the absolute unambiguous date format)

Ideally you should be able to take a Solr install from one box, move it to 
another JVM on a different OS in a different timezone with different 
Locale settings and everything will keep working the same.

(I think once upon a time i argued that Solr should assume the 
charencoding of the local JVM, and wiser people then me pointed out that 
was bad).

There may be exceptions to this -- but those exceptions should be in cases 
where: a) the person configuring Solr is in completley control; and b) the 
exception is prudent because doing the work in the client would require 
more complexity.  Analysis is a good example of this: we don't make the 
clients analyze the text according to the native language customs -- we 
let the person creating the schema.xml specify what the Analysis should 
be.

As i recal, the issue that prompted this email had to do with C# and the 
various cultural ways to specify a floating point number: 1,234 vs 1.234 
(comma vs period).  this is the kind of thing that should be translated in 
clients to the canonical floating point representation. ... by which i 
mean: the one the solr server uses :)

*IF* Solr has the behavior where setting the JVM local to something random 
makes Solr assume floats should be in the comma format, then i would 
consider that a Bug in Solr ... Solr should allways be consistent.

-Hoss



Re: Solr, operating systems and globalization

2007-10-17 Thread Jeff Rodenburg
Thanks for the comments Hoss.  More notes embedded below...

On 10/17/07, Chris Hostetter [EMAIL PROTECTED] wrote:


 : However, SolrSharp culture settings should be reflective and consistent
 with
 : the solr server instance's culture.  This leads to my question: does
 Solr
 : control its culture  language settings through the various language
 : components that can be incorporated, or does the underlying OS have a
 say in
 : how that data is treated?

 As a general rule:
   1) Solr (the server) should operate as culturally and locally agnostic
 as possible.
   2) Solr Clients that want to act culturally appropriate should
  explicitly translate from local formats to absolute concepts that
  it sends to the server.  (ala: the absolute unambiguous date format)

 Ideally you should be able to take a Solr install from one box, move it to
 another JVM on a different OS in a different timezone with different
 Locale settings and everything will keep working the same.


I fully understand that approach.  Going back to C#/Windows, this is known
as an Invariant culture setting, which we're incorporating into Solrsharp
(along with configurable culture settings as appropriate.)

(I think once upon a time i argued that Solr should assume the
 charencoding of the local JVM, and wiser people then me pointed out that
 was bad).

 There may be exceptions to this -- but those exceptions should be in cases
 where: a) the person configuring Solr is in completley control; and b) the
 exception is prudent because doing the work in the client would require
 more complexity.  Analysis is a good example of this: we don't make the
 clients analyze the text according to the native language customs -- we
 let the person creating the schema.xml specify what the Analysis should
 be.

 As i recal, the issue that prompted this email had to do with C# and the
 various cultural ways to specify a floating point number: 1,234 vs 1.234
 (comma vs period).  this is the kind of thing that should be translated in
 clients to the canonical floating point representation. ... by which i
 mean: the one the solr server uses :)


This is exactly the scenario.  Ideally what I'd like to achieve is for
Solrsharp to discover the culture settings from the targeted Solr instance
and set the client in appropriate position.

*IF* Solr has the behavior where setting the JVM local to something random
 makes Solr assume floats should be in the comma format, then i would
 consider that a Bug in Solr ... Solr should allways be consistent.


This would be an interesting discovery exercise for those who deal with
multi-lingual systems across different JVM and OS platforms.  If it *were*
the case that different underlying system stacks affected solr in such a
way, Solrsharp should follow the server's lead.

-Hoss




Solr, operating systems and globalization

2007-10-12 Thread Jeff Rodenburg
We discovered and verified an issue in SolrSharp whereby indexing and
searching can be disrupted without taking Windows globalization  culture
settings into consideration.  For example, European cultures affect numeric
and date values differently from US/English cultures.  The resolution for
this type of issue is to specifically control the culture settings to allow
for index data formatting to work.

However, SolrSharp culture settings should be reflective and consistent with
the solr server instance's culture.  This leads to my question: does Solr
control its culture  language settings through the various language
components that can be incorporated, or does the underlying OS have a say in
how that data is treated?

Some education on this would be greatly appreciated.

cheers,
jeff r.