Re: [DISCUSS] Expanding Arrow interval type metadata, changing Java memory representation

2017-11-11 Thread Wes McKinney
Thanks all. Can we scope the work that would be required to implement the described change (making interval a 64-bit integer for the DAY_TIME variety, and adding unit metadata)? I suppose mainly Dremio would be moderately disrupted by this On Wed, Nov 8, 2017 at 6:05 PM, Jacques Nadeau

Re: [DISCUSS] Expanding Arrow interval type metadata, changing Java memory representation

2017-11-08 Thread Jacques Nadeau
My analysis previously (if I recall) was basically (I think I did it on the similar Parquet PR) was that no system truly supported N fields for all operations (postgres was closest I believe). Some basic operations would maintain them but they would quickly not behave differently (e.g. 24 hours is

Re: [DISCUSS] Expanding Arrow interval type metadata, changing Java memory representation

2017-11-08 Thread Julian Hyde
I have argued before on this list, and still believe, that you should represent an interval as you would a number. If intervals are 64 bit signed, then sure, use the 64 bit integer representation; if you were to allow intervals with fixed precision and scale, then use the same representation as

Re: [DISCUSS] Expanding Arrow interval type metadata, changing Java memory representation

2017-11-08 Thread Wes McKinney
Makes sense. The key question is whether the data is represented as a single 64-bit integer or as effectively a C struct struct { int32_t days; int32_t milliseconds; } The struct representation cannot accommodate higher resolution units like microseconds and nanoseconds. From my perspective,

Re: [DISCUSS] Expanding Arrow interval type metadata, changing Java memory representation

2017-11-08 Thread Julian Hyde
I don't know many examples of interval being used in the real world. But here's the kind of thing: the policy is that an offer is open for 60 hours, so if the offer is made to a particular customer at 12:34pm on Sunday, you want to compute that it ends at 12:34am on Wednesday. The interval "60

Re: [DISCUSS] Expanding Arrow interval type metadata, changing Java memory representation

2017-11-08 Thread Wes McKinney
Pleading ignorance on use of the SQL interval type, my prior would be that many algorithms would first convert the interval components into an absolute timedelta. Is that not the case? My preference right now would be to have a single Interval type, where the DAY_TIME type actually contains an

Re: [DISCUSS] Expanding Arrow interval type metadata, changing Java memory representation

2017-11-08 Thread Jacques Nadeau
I'm all for moving interval to the new definition. I think we should avoid introducing a timedelta type until it is really important. We need several users demanding a type before we should implement it. Otherwise, we have huge amounts of type bloat (which means nothing will fully implement the

Re: [DISCUSS] Expanding Arrow interval type metadata, changing Java memory representation

2017-11-04 Thread Wes McKinney
It seems like we don't have enough input on this topic to make a decision right now. I placed the JIRA ARROW-352 in the 0.9.0 milestone, but we really should try to get this done soon so that downstream users are not blocked on using Arrow to send around interval data. - Wes On Fri, Oct 20, 2017

Re: [DISCUSS] Expanding Arrow interval type metadata, changing Java memory representation

2017-10-19 Thread Li Jin
+1 on this one. My reason is this makes timestamp/interval calculation faster, i.e, "timestamp + interval < timestamp" should be faster without dealing with two component in interval. Although I am not quite sure about the rational behind the two component representation, which seems to be what

[DISCUSS] Expanding Arrow interval type metadata, changing Java memory representation

2017-10-18 Thread Wes McKinney
I opened this patch over 2 months ago to add some additional metadata for intervals: https://github.com/apache/arrow/pull/920 Java supports a two-component DAY_TIME interval type as a combo of days and milliseconds: