[jira] [Commented] (HIVE-3910) Create a new DATE datatype
[ https://issues.apache.org/jira/browse/HIVE-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692613#comment-13692613 ] Jason Dere commented on HIVE-3910: -- Ok, got the date conversion working without the need for the Joda Time library. I'll remove the Joda dependency in my next version of the patch for HIVE-4055. > Create a new DATE datatype > -- > > Key: HIVE-3910 > URL: https://issues.apache.org/jira/browse/HIVE-3910 > Project: Hive > Issue Type: Task >Reporter: Namit Jain > Attachments: HIVE-3910.1.patch > > > It might be useful to have a DATE datatype along with timestamp. > This can only store the day (possibly number of days from 1970-01-01, > and would thus give space savings in binary format). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3910) Create a new DATE datatype
[ https://issues.apache.org/jira/browse/HIVE-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692223#comment-13692223 ] Thejas M Nair commented on HIVE-3910: - [~appodictic] I agree that adding new libraries to hive that need to get shipped to nodes can cause conflicts with udfs. But I don't think not using new libraries in hive or hadoop is the right solution to this problem. We should consider using some ant/maven package renaming utilities to rename the package name of the dependencies, like one the tools mentioned here - http://java.dzone.com/articles/tools-renaming-package . > Create a new DATE datatype > -- > > Key: HIVE-3910 > URL: https://issues.apache.org/jira/browse/HIVE-3910 > Project: Hive > Issue Type: Task >Reporter: Namit Jain > Attachments: HIVE-3910.1.patch > > > It might be useful to have a DATE datatype along with timestamp. > This can only store the day (possibly number of days from 1970-01-01, > and would thus give space savings in binary format). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3910) Create a new DATE datatype
[ https://issues.apache.org/jira/browse/HIVE-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692196#comment-13692196 ] Jason Dere commented on HIVE-3910: -- When converting from a Date (millis since epoch) to DateWritable (days since epoch), I was trying to do the following: Determine the year/month/day of the Date, in the local time zone. Based on the year/month/day, calculate the days since epoch. As well as to do the opposite when generating a Date value based on the days since epoch value. Ideally you would want to be able to generate a Date value in one place, and to be able to have the DateWritable still show the same year/month/day even if it is later processed in a different time zone. I was actually having trouble doing this using the standard Java date libraries, there seemed to be some issues related to daylight savings which was messing up the conversions I was doing. Joda actually seemed to be the only way to get it to work correctly. I can another look at trying it using the standard date lib - if I can get it working then I'll remove the joda dependency. > Create a new DATE datatype > -- > > Key: HIVE-3910 > URL: https://issues.apache.org/jira/browse/HIVE-3910 > Project: Hive > Issue Type: Task >Reporter: Namit Jain > Attachments: HIVE-3910.1.patch > > > It might be useful to have a DATE datatype along with timestamp. > This can only store the day (possibly number of days from 1970-01-01, > and would thus give space savings in binary format). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3910) Create a new DATE datatype
[ https://issues.apache.org/jira/browse/HIVE-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691289#comment-13691289 ] Edward Capriolo commented on HIVE-3910: --- Do we really want to add joda time to the class path and ship this jar out with every job just because its "better"? You have to remember that every jar we ship with hive had to be put on the distributed cache and if anyone else has a joda time in use by a udf they can run into conflicts. > Create a new DATE datatype > -- > > Key: HIVE-3910 > URL: https://issues.apache.org/jira/browse/HIVE-3910 > Project: Hive > Issue Type: Task >Reporter: Namit Jain > Attachments: HIVE-3910.1.patch > > > It might be useful to have a DATE datatype along with timestamp. > This can only store the day (possibly number of days from 1970-01-01, > and would thus give space savings in binary format). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3910) Create a new DATE datatype
[ https://issues.apache.org/jira/browse/HIVE-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13690936#comment-13690936 ] Jason Dere commented on HIVE-3910: -- Spoke to Ashutosh, and since HIVE-4055 had the original patch, it makes more sense to continue using that ticket for this work. I'll move this patch over to HIVE-4055 and close this one out. > Create a new DATE datatype > -- > > Key: HIVE-3910 > URL: https://issues.apache.org/jira/browse/HIVE-3910 > Project: Hive > Issue Type: Task >Reporter: Namit Jain > Attachments: HIVE-3910.1.patch > > > It might be useful to have a DATE datatype along with timestamp. > This can only store the day (possibly number of days from 1970-01-01, > and would thus give space savings in binary format). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3910) Create a new DATE datatype
[ https://issues.apache.org/jira/browse/HIVE-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13683993#comment-13683993 ] Jason Dere commented on HIVE-3910: -- I've started taking a look at making these changes to Sun Rui's patch from HIVE-4055, will post a patch when I'm done. > Create a new DATE datatype > -- > > Key: HIVE-3910 > URL: https://issues.apache.org/jira/browse/HIVE-3910 > Project: Hive > Issue Type: Task >Reporter: Namit Jain > > It might be useful to have a DATE datatype along with timestamp. > This can only store the day (possibly number of days from 1970-01-01, > and would thus give space savings in binary format). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3910) Create a new DATE datatype
[ https://issues.apache.org/jira/browse/HIVE-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676626#comment-13676626 ] Jason Dere commented on HIVE-3910: -- HIVE-4055 already has a patch with an initial implementation of a DATE type, which has already done quite a bit of the work for DATE support. Took a look at this and I had a few proposed additions to this: 1. Use Joda Time rather than java.sql.Date The existing patch uses java.sql.Date as the underlying data type (based on java.util.Date). Thejas proposed using the Joda Time library as this is supposed to be a better datetime implementation, and is also used by Pig for datetime handling. It does not appear that Joda Time is currently used by Hive and so this would need to be pulled in as a dependent library. 2. Storage of DATE values In the existing patch, DateWritable writes out long value (8 bytes) representing seconds since the Unix epoch. As mentioned in HIVE-3910, since DATE is in days, we could reduce the storage space by instead storing a 4-byte integer value representing days since some epoch (1970? 4713 BC?). The range of dates that we can represent with such an integer representation would be +/- 2 billion days, or 5.8M years. 3. Considerations for Hive vectorization support Talking to some folks who are concerned about Hive vectorization (HIVE-4160), and in the interests of vectorization support they want the date type to be represented as primitive values. They are proposing that DateWritable would hold the integer value (rather than Date value) which will still be usable for comparison operations, which would be the most common operations that would be used on date types (group-by, sorting). If an actual Date value is required, then DateWritable.get() will generate a Date object based on the days-since-epoch integer value. 4. SQL syntax compliance The existing patch creates date values using a DATE() UDF - DATE('2013-01-01). The SQL standard actually has syntax to represent a date literal - DATE '2013-01-01'. The Hive grammar would need to be extended to support the SQL syntax. 5. Operations on DATE types The SQL standard (section 6.14) looks like it just supports DATE operations involving the INTERVAL type: ::= | | | There is currently no interval type support in Hive. Support for the interval type will be added as a later item. 6. Compatibility with other types The existing patch allows a lot of implicit conversion to/from other types (numeric, string). It does appear that TIMESTAMP has set a bit of a precedent in allowing a lot of implicit type conversion. However, given the limited operations with other types as described in above from the SQL standard, I would propose limiting the amount of implicit conversion that is allowed. There are UDFs that the user can use to convert DATE into numeric/string values, which can then be used in arithmetic or aggregation functions. > Create a new DATE datatype > -- > > Key: HIVE-3910 > URL: https://issues.apache.org/jira/browse/HIVE-3910 > Project: Hive > Issue Type: Task >Reporter: Namit Jain > > It might be useful to have a DATE datatype along with timestamp. > This can only store the day (possibly number of days from 1970-01-01, > and would thus give space savings in binary format). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira