[jira] [Commented] (HIVE-3910) Create a new DATE datatype

2013-06-24 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692613#comment-13692613
 ] 

Jason Dere commented on HIVE-3910:
--

Ok, got the date conversion working without the need for the Joda Time library. 
 I'll remove the Joda dependency in my next version of the patch for HIVE-4055.

> Create a new DATE datatype
> --
>
> Key: HIVE-3910
> URL: https://issues.apache.org/jira/browse/HIVE-3910
> Project: Hive
>  Issue Type: Task
>Reporter: Namit Jain
> Attachments: HIVE-3910.1.patch
>
>
> It might be useful to have a DATE datatype along with timestamp.
> This can only store the day (possibly number of days from 1970-01-01,
> and would thus give space savings in binary format).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3910) Create a new DATE datatype

2013-06-24 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692223#comment-13692223
 ] 

Thejas M Nair commented on HIVE-3910:
-

[~appodictic] I agree that adding new libraries to hive that need to get 
shipped to nodes can cause conflicts with udfs. But I don't think not using new 
libraries in hive or hadoop is the right solution to this problem. We should 
consider using some ant/maven package renaming utilities to rename the package 
name of the dependencies, like one the tools mentioned here - 
http://java.dzone.com/articles/tools-renaming-package .


> Create a new DATE datatype
> --
>
> Key: HIVE-3910
> URL: https://issues.apache.org/jira/browse/HIVE-3910
> Project: Hive
>  Issue Type: Task
>Reporter: Namit Jain
> Attachments: HIVE-3910.1.patch
>
>
> It might be useful to have a DATE datatype along with timestamp.
> This can only store the day (possibly number of days from 1970-01-01,
> and would thus give space savings in binary format).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3910) Create a new DATE datatype

2013-06-24 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692196#comment-13692196
 ] 

Jason Dere commented on HIVE-3910:
--

When converting from a Date (millis since epoch) to DateWritable (days since 
epoch), I was trying to do the following:
  Determine the year/month/day of the Date, in the local time zone. 
  Based on the year/month/day, calculate the days since epoch. 
As well as to do the opposite when generating a Date value based on the days 
since epoch value. Ideally you would want to be able to generate a Date value 
in one place, and to be able to have the DateWritable still show the same 
year/month/day even if it is later processed in a different time zone. 

I was actually having trouble doing this using the standard Java date 
libraries, there seemed to be some issues related to daylight savings which was 
messing up the conversions I was doing. Joda actually seemed to be the only way 
to get it to work correctly.  I can another look at trying it using the 
standard date lib - if I can get it working then I'll remove the joda 
dependency. 

> Create a new DATE datatype
> --
>
> Key: HIVE-3910
> URL: https://issues.apache.org/jira/browse/HIVE-3910
> Project: Hive
>  Issue Type: Task
>Reporter: Namit Jain
> Attachments: HIVE-3910.1.patch
>
>
> It might be useful to have a DATE datatype along with timestamp.
> This can only store the day (possibly number of days from 1970-01-01,
> and would thus give space savings in binary format).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3910) Create a new DATE datatype

2013-06-22 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691289#comment-13691289
 ] 

Edward Capriolo commented on HIVE-3910:
---

Do we really want to add joda time to the class path and ship this jar out with 
every job just because its "better"? You have to remember that every jar we 
ship with hive had to be put on the distributed cache and if anyone else has a 
joda time in use by a udf they can run into conflicts.

> Create a new DATE datatype
> --
>
> Key: HIVE-3910
> URL: https://issues.apache.org/jira/browse/HIVE-3910
> Project: Hive
>  Issue Type: Task
>Reporter: Namit Jain
> Attachments: HIVE-3910.1.patch
>
>
> It might be useful to have a DATE datatype along with timestamp.
> This can only store the day (possibly number of days from 1970-01-01,
> and would thus give space savings in binary format).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3910) Create a new DATE datatype

2013-06-21 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13690936#comment-13690936
 ] 

Jason Dere commented on HIVE-3910:
--

Spoke to Ashutosh, and since HIVE-4055 had the original patch, it makes more 
sense to continue using that ticket for this work.  I'll move this patch over 
to HIVE-4055 and close this one out.

> Create a new DATE datatype
> --
>
> Key: HIVE-3910
> URL: https://issues.apache.org/jira/browse/HIVE-3910
> Project: Hive
>  Issue Type: Task
>Reporter: Namit Jain
> Attachments: HIVE-3910.1.patch
>
>
> It might be useful to have a DATE datatype along with timestamp.
> This can only store the day (possibly number of days from 1970-01-01,
> and would thus give space savings in binary format).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3910) Create a new DATE datatype

2013-06-14 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13683993#comment-13683993
 ] 

Jason Dere commented on HIVE-3910:
--

I've started taking a look at making these changes to Sun Rui's patch from 
HIVE-4055, will post a patch when I'm done.

> Create a new DATE datatype
> --
>
> Key: HIVE-3910
> URL: https://issues.apache.org/jira/browse/HIVE-3910
> Project: Hive
>  Issue Type: Task
>Reporter: Namit Jain
>
> It might be useful to have a DATE datatype along with timestamp.
> This can only store the day (possibly number of days from 1970-01-01,
> and would thus give space savings in binary format).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3910) Create a new DATE datatype

2013-06-05 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676626#comment-13676626
 ] 

Jason Dere commented on HIVE-3910:
--

HIVE-4055 already has a patch with an initial implementation of a DATE type, 
which has already done quite a bit of the work for DATE support. Took a look at 
this and I had a few proposed additions to this:

1. Use Joda Time rather than java.sql.Date
The existing patch uses java.sql.Date as the underlying data type (based on 
java.util.Date).  Thejas proposed using the Joda Time library as this is 
supposed to be a better datetime implementation, and is also used by Pig for 
datetime handling.  It does not appear that Joda Time is currently used by Hive 
and so this would need to be pulled in as a dependent library.

2. Storage of DATE values
In the existing patch, DateWritable writes out long value (8 bytes) 
representing seconds since the Unix epoch.  As mentioned in HIVE-3910, since 
DATE is in days, we could reduce the storage space by instead storing a 4-byte 
integer value representing days since some epoch (1970? 4713 BC?). The range of 
dates that we can represent with such an integer representation would be +/- 2 
billion days, or 5.8M years.

3. Considerations for Hive vectorization support
Talking to some folks who are concerned about Hive vectorization (HIVE-4160), 
and in the interests of vectorization support they want the date type to be 
represented as primitive values.  They are proposing that DateWritable would 
hold the integer value (rather than Date value) which will still be usable for 
comparison operations, which would be the most common operations that would be 
used on date types (group-by, sorting).  If an actual Date value is required, 
then DateWritable.get() will generate a Date object based on the 
days-since-epoch integer value.

4. SQL syntax compliance
The existing patch creates date values using a DATE() UDF - DATE('2013-01-01). 
The SQL standard actually has syntax to represent a date literal - DATE 
'2013-01-01'.  The Hive grammar would need to be extended to support the SQL 
syntax.

5. Operations on DATE types
The SQL standard (section 6.14) looks like it just supports DATE operations 
involving the INTERVAL type:
     ::=
    
  |   
  |   
  |   

There is currently no interval type support in Hive. Support for the interval 
type will be added as a later item.

6. Compatibility with other types
The existing patch allows a lot of implicit conversion to/from other types 
(numeric, string).  It does appear that TIMESTAMP has set a bit of a precedent 
in allowing a lot of implicit type conversion.  However, given the limited 
operations with other types as described in above from the SQL standard, I 
would propose limiting the amount of implicit conversion that is allowed.  
There are UDFs that the user can use to convert DATE into numeric/string 
values, which can then be used in arithmetic or aggregation functions.  

> Create a new DATE datatype
> --
>
> Key: HIVE-3910
> URL: https://issues.apache.org/jira/browse/HIVE-3910
> Project: Hive
>  Issue Type: Task
>Reporter: Namit Jain
>
> It might be useful to have a DATE datatype along with timestamp.
> This can only store the day (possibly number of days from 1970-01-01,
> and would thus give space savings in binary format).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira