[jira] [Resolved] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles
[ https://issues.apache.org/jira/browse/HIVE-25277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved HIVE-25277. - Fix Version/s: 4.0.0 Hadoop Flags: Reviewed Resolution: Fixed > Slow Hive partition deletion for Cloud object stores with expensive ListFiles > - > > Key: HIVE-25277 > URL: https://issues.apache.org/jira/browse/HIVE-25277 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: All Versions >Reporter: Zhou Fang >Assignee: Zhou Fang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 4.5h > Remaining Estimate: 0h > > Deleting a Hive partition is slow when use a Cloud object store as the > warehouse for which ListFiles is expensive. A root cause is that the > recursive parent dir deletion is very inefficient: there are many duplicated > calls to isEmpty (ListFiles is called at the end). This fix sorts the parents > to delete according to the path size, and always processes the longest one > (e.g., a/b/c is always before a/b). As a result, each parent path is only > needed to be checked once. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles
[ https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=646670=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646670 ] ASF GitHub Bot logged work on HIVE-25277: - Author: ASF GitHub Bot Created on: 05/Sep/21 05:02 Start Date: 05/Sep/21 05:02 Worklog Time Spent: 10m Work Description: sunchao commented on pull request #2421: URL: https://github.com/apache/hive/pull/2421#issuecomment-913087234 Merged. Thanks! BTW if you want this in other branches, please open backport PRs against them accordingly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 646670) Time Spent: 4.5h (was: 4h 20m) > Slow Hive partition deletion for Cloud object stores with expensive ListFiles > - > > Key: HIVE-25277 > URL: https://issues.apache.org/jira/browse/HIVE-25277 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: All Versions >Reporter: Zhou Fang >Assignee: Zhou Fang >Priority: Major > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > > Deleting a Hive partition is slow when use a Cloud object store as the > warehouse for which ListFiles is expensive. A root cause is that the > recursive parent dir deletion is very inefficient: there are many duplicated > calls to isEmpty (ListFiles is called at the end). This fix sorts the parents > to delete according to the path size, and always processes the longest one > (e.g., a/b/c is always before a/b). As a result, each parent path is only > needed to be checked once. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles
[ https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=646669=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646669 ] ASF GitHub Bot logged work on HIVE-25277: - Author: ASF GitHub Bot Created on: 05/Sep/21 05:01 Start Date: 05/Sep/21 05:01 Worklog Time Spent: 10m Work Description: sunchao merged pull request #2421: URL: https://github.com/apache/hive/pull/2421 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 646669) Time Spent: 4h 20m (was: 4h 10m) > Slow Hive partition deletion for Cloud object stores with expensive ListFiles > - > > Key: HIVE-25277 > URL: https://issues.apache.org/jira/browse/HIVE-25277 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: All Versions >Reporter: Zhou Fang >Assignee: Zhou Fang >Priority: Major > Labels: pull-request-available > Time Spent: 4h 20m > Remaining Estimate: 0h > > Deleting a Hive partition is slow when use a Cloud object store as the > warehouse for which ListFiles is expensive. A root cause is that the > recursive parent dir deletion is very inefficient: there are many duplicated > calls to isEmpty (ListFiles is called at the end). This fix sorts the parents > to delete according to the path size, and always processes the longest one > (e.g., a/b/c is always before a/b). As a result, each parent path is only > needed to be checked once. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles
[ https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=646668=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646668 ] ASF GitHub Bot logged work on HIVE-25277: - Author: ASF GitHub Bot Created on: 05/Sep/21 04:57 Start Date: 05/Sep/21 04:57 Worklog Time Spent: 10m Work Description: coufon commented on pull request #2421: URL: https://github.com/apache/hive/pull/2421#issuecomment-913086678 > Thanks Zhou Fang . LGTM with two nits. Thank you Chao for the helpful comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 646668) Time Spent: 4h 10m (was: 4h) > Slow Hive partition deletion for Cloud object stores with expensive ListFiles > - > > Key: HIVE-25277 > URL: https://issues.apache.org/jira/browse/HIVE-25277 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: All Versions >Reporter: Zhou Fang >Assignee: Zhou Fang >Priority: Major > Labels: pull-request-available > Time Spent: 4h 10m > Remaining Estimate: 0h > > Deleting a Hive partition is slow when use a Cloud object store as the > warehouse for which ListFiles is expensive. A root cause is that the > recursive parent dir deletion is very inefficient: there are many duplicated > calls to isEmpty (ListFiles is called at the end). This fix sorts the parents > to delete according to the path size, and always processes the longest one > (e.g., a/b/c is always before a/b). As a result, each parent path is only > needed to be checked once. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles
[ https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=64=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-64 ] ASF GitHub Bot logged work on HIVE-25277: - Author: ASF GitHub Bot Created on: 05/Sep/21 04:55 Start Date: 05/Sep/21 04:55 Worklog Time Spent: 10m Work Description: coufon commented on a change in pull request #2421: URL: https://github.com/apache/hive/pull/2421#discussion_r702365874 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java ## @@ -5156,14 +5155,40 @@ public boolean drop_partition(final String db_name, final String tbl_name, null); } - private static class PathAndPartValSize { -PathAndPartValSize(Path path, int partValSize) { - this.path = path; - this.partValSize = partValSize; +/** Stores a path and its size. */ +private static class PathAndDepth implements Comparable { + + final Path path; + final int depth; + + public PathAndDepth(Path path, int depth) { +this.path = path; +this.depth = depth; + } + + @Override + public int hashCode() { +return Objects.hash(path.hashCode(), depth); + } + + @Override + public boolean equals(Object o) { +if (o == this) { + return true; +} +if (!(o instanceof PathAndDepth)) { + return false; +} +PathAndDepth p = (PathAndDepth) o; +return path.equals(p.path) && depth == p.depth; Review comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 64) Time Spent: 4h (was: 3h 50m) > Slow Hive partition deletion for Cloud object stores with expensive ListFiles > - > > Key: HIVE-25277 > URL: https://issues.apache.org/jira/browse/HIVE-25277 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: All Versions >Reporter: Zhou Fang >Assignee: Zhou Fang >Priority: Major > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > > Deleting a Hive partition is slow when use a Cloud object store as the > warehouse for which ListFiles is expensive. A root cause is that the > recursive parent dir deletion is very inefficient: there are many duplicated > calls to isEmpty (ListFiles is called at the end). This fix sorts the parents > to delete according to the path size, and always processes the longest one > (e.g., a/b/c is always before a/b). As a result, each parent path is only > needed to be checked once. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles
[ https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=646665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646665 ] ASF GitHub Bot logged work on HIVE-25277: - Author: ASF GitHub Bot Created on: 05/Sep/21 04:52 Start Date: 05/Sep/21 04:52 Worklog Time Spent: 10m Work Description: coufon commented on a change in pull request #2421: URL: https://github.com/apache/hive/pull/2421#discussion_r702365624 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java ## @@ -5156,14 +5155,40 @@ public boolean drop_partition(final String db_name, final String tbl_name, null); } - private static class PathAndPartValSize { -PathAndPartValSize(Path path, int partValSize) { - this.path = path; - this.partValSize = partValSize; +/** Stores a path and its size. */ +private static class PathAndDepth implements Comparable { + Review comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 646665) Time Spent: 3h 50m (was: 3h 40m) > Slow Hive partition deletion for Cloud object stores with expensive ListFiles > - > > Key: HIVE-25277 > URL: https://issues.apache.org/jira/browse/HIVE-25277 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: All Versions >Reporter: Zhou Fang >Assignee: Zhou Fang >Priority: Major > Labels: pull-request-available > Time Spent: 3h 50m > Remaining Estimate: 0h > > Deleting a Hive partition is slow when use a Cloud object store as the > warehouse for which ListFiles is expensive. A root cause is that the > recursive parent dir deletion is very inefficient: there are many duplicated > calls to isEmpty (ListFiles is called at the end). This fix sorts the parents > to delete according to the path size, and always processes the longest one > (e.g., a/b/c is always before a/b). As a result, each parent path is only > needed to be checked once. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles
[ https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=646663=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646663 ] ASF GitHub Bot logged work on HIVE-25277: - Author: ASF GitHub Bot Created on: 05/Sep/21 04:43 Start Date: 05/Sep/21 04:43 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #2421: URL: https://github.com/apache/hive/pull/2421#discussion_r70236 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java ## @@ -5156,14 +5155,40 @@ public boolean drop_partition(final String db_name, final String tbl_name, null); } - private static class PathAndPartValSize { -PathAndPartValSize(Path path, int partValSize) { - this.path = path; - this.partValSize = partValSize; +/** Stores a path and its size. */ +private static class PathAndDepth implements Comparable { + + final Path path; + final int depth; + + public PathAndDepth(Path path, int depth) { +this.path = path; +this.depth = depth; + } + + @Override + public int hashCode() { +return Objects.hash(path.hashCode(), depth); + } + + @Override + public boolean equals(Object o) { +if (o == this) { + return true; +} +if (!(o instanceof PathAndDepth)) { + return false; +} +PathAndDepth p = (PathAndDepth) o; +return path.equals(p.path) && depth == p.depth; Review comment: we can just use auto-generated `equals` method and should handle null here too: ```java @Override public boolean equals(Object o) { if (this == o) return true; if (o == null || getClass() != o.getClass()) return false; PathAndDepth that = (PathAndDepth) o; return depth == that.depth && Objects.equals(path, that.path); } ``` ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java ## @@ -5156,14 +5155,40 @@ public boolean drop_partition(final String db_name, final String tbl_name, null); } - private static class PathAndPartValSize { -PathAndPartValSize(Path path, int partValSize) { - this.path = path; - this.partValSize = partValSize; +/** Stores a path and its size. */ +private static class PathAndDepth implements Comparable { + Review comment: nit: extra empty line -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 646663) Time Spent: 3h 40m (was: 3.5h) > Slow Hive partition deletion for Cloud object stores with expensive ListFiles > - > > Key: HIVE-25277 > URL: https://issues.apache.org/jira/browse/HIVE-25277 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: All Versions >Reporter: Zhou Fang >Assignee: Zhou Fang >Priority: Major > Labels: pull-request-available > Time Spent: 3h 40m > Remaining Estimate: 0h > > Deleting a Hive partition is slow when use a Cloud object store as the > warehouse for which ListFiles is expensive. A root cause is that the > recursive parent dir deletion is very inefficient: there are many duplicated > calls to isEmpty (ListFiles is called at the end). This fix sorts the parents > to delete according to the path size, and always processes the longest one > (e.g., a/b/c is always before a/b). As a result, each parent path is only > needed to be checked once. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24944) When the default engine of the hiveserver is MR and the tez engine is set by the client, the client TEZ progress log cannot be printed normally
[ https://issues.apache.org/jira/browse/HIVE-24944?focusedWorklogId=646654=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646654 ] ASF GitHub Bot logged work on HIVE-24944: - Author: ASF GitHub Bot Created on: 05/Sep/21 00:10 Start Date: 05/Sep/21 00:10 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #2204: URL: https://github.com/apache/hive/pull/2204 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 646654) Time Spent: 1h 50m (was: 1h 40m) > When the default engine of the hiveserver is MR and the tez engine is set by > the client, the client TEZ progress log cannot be printed normally > --- > > Key: HIVE-24944 > URL: https://issues.apache.org/jira/browse/HIVE-24944 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 3.1.0, 4.0.0 >Reporter: ZhangQiDong >Assignee: ZhangQiDong >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24944.001.patch, HIVE-24944.002.patch > > Time Spent: 1h 50m > Remaining Estimate: 0h > > HiveServer configuration parameter execution default MR. When set > hive.execution.engine = tez, the client cannot print the TEZ log. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25293) Alter partitioned table with "cascade" option create too many columns records.
[ https://issues.apache.org/jira/browse/HIVE-25293?focusedWorklogId=646653=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646653 ] ASF GitHub Bot logged work on HIVE-25293: - Author: ASF GitHub Bot Created on: 05/Sep/21 00:10 Start Date: 05/Sep/21 00:10 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #2436: URL: https://github.com/apache/hive/pull/2436 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 646653) Time Spent: 1h 10m (was: 1h) > Alter partitioned table with "cascade" option create too many columns records. > -- > > Key: HIVE-25293 > URL: https://issues.apache.org/jira/browse/HIVE-25293 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 2.3.3, 3.1.2 >Reporter: yongtaoliao >Assignee: yongtaoliao >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > When alter partitioned table with "cascade" option, all partitions supports > to be updated. Currently, a CD_ID will be created for each partition, > associated with a set of Columns, which will cause a large amount of > redundant data in the metadata database. > The following DDL statements can reproduce this scenario: > > {code:java} > create table test_table (f1 int) partitioned by (p string); > alter table test_table add partition(p='a'); > alter table test_table add partition(p='b'); > alter table test_table add partition(p='c'); > alter table test_table add columns (f2 int) cascade;{code} > All partitions use the table's `CD_ID` before adding columns, while each > partition use their own `CD_ID` after adding columns. > > My proposal is all partitions should use the same `CD_ID` when table was > altered with "cascade" option. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25296) Replace parquet-hadoop-bundle dependency with the actual parquet modules
[ https://issues.apache.org/jira/browse/HIVE-25296?focusedWorklogId=646655=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646655 ] ASF GitHub Bot logged work on HIVE-25296: - Author: ASF GitHub Bot Created on: 05/Sep/21 00:10 Start Date: 05/Sep/21 00:10 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #2288: URL: https://github.com/apache/hive/pull/2288#issuecomment-913058810 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 646655) Time Spent: 0.5h (was: 20m) > Replace parquet-hadoop-bundle dependency with the actual parquet modules > > > Key: HIVE-25296 > URL: https://issues.apache.org/jira/browse/HIVE-25296 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > The parquet-hadoop-bundle is not a real dependency but a mere packaging > of three parquet modules to create an uber jar. The Parquet community > created this artificial module on demand by HIVE-5783 but the > benefits if any are unclear. > On the contrary using the uber dependency has some drawbacks: > * Parquet souce code cannot be attached easily in IDEs which makes debugging > sessions cumbersome. > * Finding concrete dependencies with Parquet is not possible just by > inspecting the pom files. > * Extra maintenance cost for the Parquet community adding additional > verification steps during a release. > The goal of this JIRA is to replace the uber dependency with concrete > dependencies to the respective modules: > * parquet-common > * parquet-column > * parquet-hadoop -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25233) Removing deprecated unix_timestamp UDF
[ https://issues.apache.org/jira/browse/HIVE-25233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Sharma resolved HIVE-25233. -- Resolution: Invalid > Removing deprecated unix_timestamp UDF > -- > > Key: HIVE-25233 > URL: https://issues.apache.org/jira/browse/HIVE-25233 > Project: Hive > Issue Type: Task > Components: UDF >Affects Versions: All Versions >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Trivial > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Description > Since unix_timestamp() UDF was deprecated as part of > https://issues.apache.org/jira/browse/HIVE-10728. Internal > GenericUDFUnixTimeStamp extend GenericUDFToUnixTimeStamp and call > to_utc_timestamp() for unix_timestamp(string date) & unix_timestamp(string > date, string pattern). > unix_timestamp() => CURRENT_TIMESTAMP > unix_timestamp(string date) => to_unix_timestamp() > unix_timestamp(string date, string pattern) => to_unix_timestamp() > We should clean up unix_timestamp() and points to to_unix_timestamp() > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work stopped] (HIVE-25233) Removing deprecated unix_timestamp UDF
[ https://issues.apache.org/jira/browse/HIVE-25233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-25233 stopped by Ashish Sharma. > Removing deprecated unix_timestamp UDF > -- > > Key: HIVE-25233 > URL: https://issues.apache.org/jira/browse/HIVE-25233 > Project: Hive > Issue Type: Task > Components: UDF >Affects Versions: All Versions >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Trivial > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Description > Since unix_timestamp() UDF was deprecated as part of > https://issues.apache.org/jira/browse/HIVE-10728. Internal > GenericUDFUnixTimeStamp extend GenericUDFToUnixTimeStamp and call > to_utc_timestamp() for unix_timestamp(string date) & unix_timestamp(string > date, string pattern). > unix_timestamp() => CURRENT_TIMESTAMP > unix_timestamp(string date) => to_unix_timestamp() > unix_timestamp(string date, string pattern) => to_unix_timestamp() > We should clean up unix_timestamp() and points to to_unix_timestamp() > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-25499) select unix_timestamp(dt) from table and select unix_timestamp(constant date) are different
[ https://issues.apache.org/jira/browse/HIVE-25499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409933#comment-17409933 ] Ashish Sharma edited comment on HIVE-25499 at 9/4/21, 10:41 AM: query - create table testdate(dt date); insert into testdate values('0001-12-30'); select * from testdate; select unix_timestamp(dt) from testdate; select unix_timestamp('0001-12-30', '-MM-dd'); output - PREHOOK: query: create table testdate(dt date) PREHOOK: type: CREATETABLE PREHOOK: Output: database:default PREHOOK: Output: default@testdate POSTHOOK: query: create table testdate(dt date) POSTHOOK: type: CREATETABLE POSTHOOK: Output: database:default POSTHOOK: Output: default@testdate PREHOOK: query: insert into testdate values('0001-12-30') PREHOOK: type: QUERY PREHOOK: Input: _dummy_database@_dummy_table PREHOOK: Output: default@testdate POSTHOOK: query: insert into testdate values('0001-12-30') POSTHOOK: type: QUERY POSTHOOK: Input: _dummy_database@_dummy_table POSTHOOK: Output: default@testdate POSTHOOK: Lineage: testdate.dt SCRIPT [] PREHOOK: query: select * from testdate PREHOOK: type: QUERY PREHOOK: Input: default@testdate POSTHOOK: query: select * from testdate POSTHOOK: type: QUERY POSTHOOK: Input: default@testdate 0001-12-30 PREHOOK: query: select unix_timestamp(dt) from testdate PREHOOK: type: QUERY PREHOOK: Input: default@testdate POSTHOOK: query: select unix_timestamp(dt) from testdate POSTHOOK: type: QUERY POSTHOOK: Input: default@testdate *-62104205222* PREHOOK: query: select unix_timestamp('0001-12-30', '-MM-dd') PREHOOK: type: QUERY PREHOOK: Input: _dummy_database@_dummy_table POSTHOOK: query: select unix_timestamp('0001-12-30', '-MM-dd') POSTHOOK: type: QUERY POSTHOOK: Input: _dummy_database@_dummy_table *-62104205222* was (Author: ashish-kumar-sharma): query - create table testdate(dt date); insert into testdate values('0001-12-30'); select * from testdate; select unix_timestamp(dt) from testdate; select unix_timestamp('0001-12-30', '-MM-dd'); output - PREHOOK: query: create table testdate(dt date) PREHOOK: type: CREATETABLE PREHOOK: Output: database:default PREHOOK: Output: default@testdate POSTHOOK: query: create table testdate(dt date) POSTHOOK: type: CREATETABLE POSTHOOK: Output: database:default POSTHOOK: Output: default@testdate PREHOOK: query: insert into testdate values('0001-12-30') PREHOOK: type: QUERY PREHOOK: Input: _dummy_database@_dummy_table PREHOOK: Output: default@testdate POSTHOOK: query: insert into testdate values('0001-12-30') POSTHOOK: type: QUERY POSTHOOK: Input: _dummy_database@_dummy_table POSTHOOK: Output: default@testdate POSTHOOK: Lineage: testdate.dt SCRIPT [] PREHOOK: query: select * from testdate PREHOOK: type: QUERY PREHOOK: Input: default@testdate POSTHOOK: query: select * from testdate POSTHOOK: type: QUERY POSTHOOK: Input: default@testdate 0001-12-30 PREHOOK: query: select unix_timestamp(dt) from testdate PREHOOK: type: QUERY PREHOOK: Input: default@testdate POSTHOOK: query: select unix_timestamp(dt) from testdate POSTHOOK: type: QUERY POSTHOOK: Input: default@testdate -62104205222 PREHOOK: query: select unix_timestamp('0001-12-30', '-MM-dd') PREHOOK: type: QUERY PREHOOK: Input: _dummy_database@_dummy_table POSTHOOK: query: select unix_timestamp('0001-12-30', '-MM-dd') POSTHOOK: type: QUERY POSTHOOK: Input: _dummy_database@_dummy_table -62104205222 > select unix_timestamp(dt) from table and select unix_timestamp(constant date) > are different > > > Key: HIVE-25499 > URL: https://issues.apache.org/jira/browse/HIVE-25499 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2 >Reporter: zhaolong >Assignee: Ashish Sharma >Priority: Major > > I found select unix_timestamp(date column) from table and select > unix_timestamp(constant date) are different in 3.1.2, for example: > create table testdate(dt date); > insert into testdate values('0001-12-30'); > select * from testdate; --> 0001-12-30 > select unix_timestamp(dt) from testdate; --> -62104233600 > select unix_timestamp('0001-12-30', '-MM-dd'); --> -62104406400 > the -62104233600 is different with -62104406400. > > and convert timestap value is: > select from_unixtime(-62104233600); --> 0002-01-01 00:00:00 , 62104233600 is > select unix_timestamp(date column) from table value which date is 0001-12-30. > select from_unixtime(-62104406400); --> 0001-12-30 00:00:00 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-25499) select unix_timestamp(dt) from table and select unix_timestamp(constant date) are different
[ https://issues.apache.org/jira/browse/HIVE-25499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409933#comment-17409933 ] Ashish Sharma edited comment on HIVE-25499 at 9/4/21, 10:41 AM: query - create table testdate(dt date); insert into testdate values('0001-12-30'); select * from testdate; select unix_timestamp(dt) from testdate; select unix_timestamp('0001-12-30', '-MM-dd'); output - PREHOOK: query: create table testdate(dt date) PREHOOK: type: CREATETABLE PREHOOK: Output: database:default PREHOOK: Output: default@testdate POSTHOOK: query: create table testdate(dt date) POSTHOOK: type: CREATETABLE POSTHOOK: Output: database:default POSTHOOK: Output: default@testdate PREHOOK: query: insert into testdate values('0001-12-30') PREHOOK: type: QUERY PREHOOK: Input: _dummy_database@_dummy_table PREHOOK: Output: default@testdate POSTHOOK: query: insert into testdate values('0001-12-30') POSTHOOK: type: QUERY POSTHOOK: Input: _dummy_database@_dummy_table POSTHOOK: Output: default@testdate POSTHOOK: Lineage: testdate.dt SCRIPT [] PREHOOK: query: select * from testdate PREHOOK: type: QUERY PREHOOK: Input: default@testdate POSTHOOK: query: select * from testdate POSTHOOK: type: QUERY POSTHOOK: Input: default@testdate 0001-12-30 PREHOOK: query: select unix_timestamp(dt) from testdate PREHOOK: type: QUERY PREHOOK: Input: default@testdate POSTHOOK: query: select unix_timestamp(dt) from testdate POSTHOOK: type: QUERY POSTHOOK: Input: default@testdate -62104205222 PREHOOK: query: select unix_timestamp('0001-12-30', '-MM-dd') PREHOOK: type: QUERY PREHOOK: Input: _dummy_database@_dummy_table POSTHOOK: query: select unix_timestamp('0001-12-30', '-MM-dd') POSTHOOK: type: QUERY POSTHOOK: Input: _dummy_database@_dummy_table -62104205222 was (Author: ashish-kumar-sharma): query - create table testdate(dt date); insert into testdate values('0001-12-30'); select * from testdate; select unix_timestamp(dt) from testdate; select unix_timestamp('0001-12-30', '-MM-dd'); output - PREHOOK: query: create table testdate(dt date) PREHOOK: type: CREATETABLE PREHOOK: Output: database:default PREHOOK: Output: default@testdate POSTHOOK: query: create table testdate(dt date) POSTHOOK: type: CREATETABLE POSTHOOK: Output: database:default POSTHOOK: Output: default@testdate PREHOOK: query: insert into testdate values('0001-12-30') PREHOOK: type: QUERY PREHOOK: Input: _dummy_database@_dummy_table PREHOOK: Output: default@testdate POSTHOOK: query: insert into testdate values('0001-12-30') POSTHOOK: type: QUERY POSTHOOK: Input: _dummy_database@_dummy_table POSTHOOK: Output: default@testdate POSTHOOK: Lineage: testdate.dt SCRIPT [] PREHOOK: query: select * from testdate PREHOOK: type: QUERY PREHOOK: Input: default@testdate A masked pattern was here POSTHOOK: query: select * from testdate POSTHOOK: type: QUERY POSTHOOK: Input: default@testdate A masked pattern was here 0001-12-30 PREHOOK: query: select unix_timestamp(dt) from testdate PREHOOK: type: QUERY PREHOOK: Input: default@testdate A masked pattern was here POSTHOOK: query: select unix_timestamp(dt) from testdate POSTHOOK: type: QUERY POSTHOOK: Input: default@testdate A masked pattern was here -62104205222 PREHOOK: query: select unix_timestamp('0001-12-30', '-MM-dd') PREHOOK: type: QUERY PREHOOK: Input: _dummy_database@_dummy_table A masked pattern was here POSTHOOK: query: select unix_timestamp('0001-12-30', '-MM-dd') POSTHOOK: type: QUERY POSTHOOK: Input: _dummy_database@_dummy_table A masked pattern was here -62104205222 > select unix_timestamp(dt) from table and select unix_timestamp(constant date) > are different > > > Key: HIVE-25499 > URL: https://issues.apache.org/jira/browse/HIVE-25499 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2 >Reporter: zhaolong >Assignee: Ashish Sharma >Priority: Major > > I found select unix_timestamp(date column) from table and select > unix_timestamp(constant date) are different in 3.1.2, for example: > create table testdate(dt date); > insert into testdate values('0001-12-30'); > select * from testdate; --> 0001-12-30 > select unix_timestamp(dt) from testdate; --> -62104233600 > select unix_timestamp('0001-12-30', '-MM-dd'); --> -62104406400 > the -62104233600 is different with -62104406400. > > and convert timestap value is: > select from_unixtime(-62104233600); --> 0002-01-01 00:00:00 , 62104233600 is > select unix_timestamp(date column) from table value which date is 0001-12-30. > select from_unixtime(-62104406400); --> 0001-12-30 00:00:00 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25499) select unix_timestamp(dt) from table and select unix_timestamp(constant date) are different
[ https://issues.apache.org/jira/browse/HIVE-25499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Sharma resolved HIVE-25499. -- Resolution: Fixed query - create table testdate(dt date); insert into testdate values('0001-12-30'); select * from testdate; select unix_timestamp(dt) from testdate; select unix_timestamp('0001-12-30', '-MM-dd'); output - PREHOOK: query: create table testdate(dt date) PREHOOK: type: CREATETABLE PREHOOK: Output: database:default PREHOOK: Output: default@testdate POSTHOOK: query: create table testdate(dt date) POSTHOOK: type: CREATETABLE POSTHOOK: Output: database:default POSTHOOK: Output: default@testdate PREHOOK: query: insert into testdate values('0001-12-30') PREHOOK: type: QUERY PREHOOK: Input: _dummy_database@_dummy_table PREHOOK: Output: default@testdate POSTHOOK: query: insert into testdate values('0001-12-30') POSTHOOK: type: QUERY POSTHOOK: Input: _dummy_database@_dummy_table POSTHOOK: Output: default@testdate POSTHOOK: Lineage: testdate.dt SCRIPT [] PREHOOK: query: select * from testdate PREHOOK: type: QUERY PREHOOK: Input: default@testdate A masked pattern was here POSTHOOK: query: select * from testdate POSTHOOK: type: QUERY POSTHOOK: Input: default@testdate A masked pattern was here 0001-12-30 PREHOOK: query: select unix_timestamp(dt) from testdate PREHOOK: type: QUERY PREHOOK: Input: default@testdate A masked pattern was here POSTHOOK: query: select unix_timestamp(dt) from testdate POSTHOOK: type: QUERY POSTHOOK: Input: default@testdate A masked pattern was here -62104205222 PREHOOK: query: select unix_timestamp('0001-12-30', '-MM-dd') PREHOOK: type: QUERY PREHOOK: Input: _dummy_database@_dummy_table A masked pattern was here POSTHOOK: query: select unix_timestamp('0001-12-30', '-MM-dd') POSTHOOK: type: QUERY POSTHOOK: Input: _dummy_database@_dummy_table A masked pattern was here -62104205222 > select unix_timestamp(dt) from table and select unix_timestamp(constant date) > are different > > > Key: HIVE-25499 > URL: https://issues.apache.org/jira/browse/HIVE-25499 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2 >Reporter: zhaolong >Assignee: Ashish Sharma >Priority: Major > > I found select unix_timestamp(date column) from table and select > unix_timestamp(constant date) are different in 3.1.2, for example: > create table testdate(dt date); > insert into testdate values('0001-12-30'); > select * from testdate; --> 0001-12-30 > select unix_timestamp(dt) from testdate; --> -62104233600 > select unix_timestamp('0001-12-30', '-MM-dd'); --> -62104406400 > the -62104233600 is different with -62104406400. > > and convert timestap value is: > select from_unixtime(-62104233600); --> 0002-01-01 00:00:00 , 62104233600 is > select unix_timestamp(date column) from table value which date is 0001-12-30. > select from_unixtime(-62104406400); --> 0001-12-30 00:00:00 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25499) select unix_timestamp(dt) from table and select unix_timestamp(constant date) are different
[ https://issues.apache.org/jira/browse/HIVE-25499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Sharma reassigned HIVE-25499: Assignee: Ashish Sharma > select unix_timestamp(dt) from table and select unix_timestamp(constant date) > are different > > > Key: HIVE-25499 > URL: https://issues.apache.org/jira/browse/HIVE-25499 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2 >Reporter: zhaolong >Assignee: Ashish Sharma >Priority: Major > > I found select unix_timestamp(date column) from table and select > unix_timestamp(constant date) are different in 3.1.2, for example: > create table testdate(dt date); > insert into testdate values('0001-12-30'); > select * from testdate; --> 0001-12-30 > select unix_timestamp(dt) from testdate; --> -62104233600 > select unix_timestamp('0001-12-30', '-MM-dd'); --> -62104406400 > the -62104233600 is different with -62104406400. > > and convert timestap value is: > select from_unixtime(-62104233600); --> 0002-01-01 00:00:00 , 62104233600 is > select unix_timestamp(date column) from table value which date is 0001-12-30. > select from_unixtime(-62104406400); --> 0001-12-30 00:00:00 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25499) select unix_timestamp(dt) from table and select unix_timestamp(constant date) are different
[ https://issues.apache.org/jira/browse/HIVE-25499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409902#comment-17409902 ] zhaolong commented on HIVE-25499: - It looks like, data column convert to timestamp use: LocalDate localDate = LocalDate.of(0001, 12, 30); long localTime = localDate.atStartOfDay().toEpochSecond(ZoneOffset.of("+0")); System.out.println(localTime); and constant data convert to timestamp use: String date = "0001-12-30"; SimpleDateFormat sdf = new SimpleDateFormat("-MM-dd"); Date date1 = sdf.parse(date); System.out.println(date1.getTime()/1000); > select unix_timestamp(dt) from table and select unix_timestamp(constant date) > are different > > > Key: HIVE-25499 > URL: https://issues.apache.org/jira/browse/HIVE-25499 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2 >Reporter: zhaolong >Priority: Major > > I found select unix_timestamp(date column) from table and select > unix_timestamp(constant date) are different in 3.1.2, for example: > create table testdate(dt date); > insert into testdate values('0001-12-30'); > select * from testdate; --> 0001-12-30 > select unix_timestamp(dt) from testdate; --> -62104233600 > select unix_timestamp('0001-12-30', '-MM-dd'); --> -62104406400 > the -62104233600 is different with -62104406400. > > and convert timestap value is: > select from_unixtime(-62104233600); --> 0002-01-01 00:00:00 , 62104233600 is > select unix_timestamp(date column) from table value which date is 0001-12-30. > select from_unixtime(-62104406400); --> 0001-12-30 00:00:00 > -- This message was sent by Atlassian Jira (v8.3.4#803005)