[jira] [Resolved] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-09-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved HIVE-25277.
-
Fix Version/s: 4.0.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -
>
> Key: HIVE-25277
> URL: https://issues.apache.org/jira/browse/HIVE-25277
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the 
> warehouse for which ListFiles is expensive. A root cause is that the 
> recursive parent dir deletion is very inefficient: there are many duplicated 
> calls to isEmpty (ListFiles is called at the end). This fix sorts the parents 
> to delete according to the path size, and always processes the longest one 
> (e.g., a/b/c is always before a/b). As a result, each parent path is only 
> needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=646670=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646670
 ]

ASF GitHub Bot logged work on HIVE-25277:
-

Author: ASF GitHub Bot
Created on: 05/Sep/21 05:02
Start Date: 05/Sep/21 05:02
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #2421:
URL: https://github.com/apache/hive/pull/2421#issuecomment-913087234


   Merged. Thanks!
   
   BTW if you want this in other branches, please open backport PRs against 
them accordingly. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 646670)
Time Spent: 4.5h  (was: 4h 20m)

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -
>
> Key: HIVE-25277
> URL: https://issues.apache.org/jira/browse/HIVE-25277
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the 
> warehouse for which ListFiles is expensive. A root cause is that the 
> recursive parent dir deletion is very inefficient: there are many duplicated 
> calls to isEmpty (ListFiles is called at the end). This fix sorts the parents 
> to delete according to the path size, and always processes the longest one 
> (e.g., a/b/c is always before a/b). As a result, each parent path is only 
> needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=646669=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646669
 ]

ASF GitHub Bot logged work on HIVE-25277:
-

Author: ASF GitHub Bot
Created on: 05/Sep/21 05:01
Start Date: 05/Sep/21 05:01
Worklog Time Spent: 10m 
  Work Description: sunchao merged pull request #2421:
URL: https://github.com/apache/hive/pull/2421


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 646669)
Time Spent: 4h 20m  (was: 4h 10m)

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -
>
> Key: HIVE-25277
> URL: https://issues.apache.org/jira/browse/HIVE-25277
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the 
> warehouse for which ListFiles is expensive. A root cause is that the 
> recursive parent dir deletion is very inefficient: there are many duplicated 
> calls to isEmpty (ListFiles is called at the end). This fix sorts the parents 
> to delete according to the path size, and always processes the longest one 
> (e.g., a/b/c is always before a/b). As a result, each parent path is only 
> needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=646668=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646668
 ]

ASF GitHub Bot logged work on HIVE-25277:
-

Author: ASF GitHub Bot
Created on: 05/Sep/21 04:57
Start Date: 05/Sep/21 04:57
Worklog Time Spent: 10m 
  Work Description: coufon commented on pull request #2421:
URL: https://github.com/apache/hive/pull/2421#issuecomment-913086678


   > Thanks Zhou Fang . LGTM with two nits.
   
   Thank you Chao for the helpful comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 646668)
Time Spent: 4h 10m  (was: 4h)

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -
>
> Key: HIVE-25277
> URL: https://issues.apache.org/jira/browse/HIVE-25277
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the 
> warehouse for which ListFiles is expensive. A root cause is that the 
> recursive parent dir deletion is very inefficient: there are many duplicated 
> calls to isEmpty (ListFiles is called at the end). This fix sorts the parents 
> to delete according to the path size, and always processes the longest one 
> (e.g., a/b/c is always before a/b). As a result, each parent path is only 
> needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=64=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-64
 ]

ASF GitHub Bot logged work on HIVE-25277:
-

Author: ASF GitHub Bot
Created on: 05/Sep/21 04:55
Start Date: 05/Sep/21 04:55
Worklog Time Spent: 10m 
  Work Description: coufon commented on a change in pull request #2421:
URL: https://github.com/apache/hive/pull/2421#discussion_r702365874



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -5156,14 +5155,40 @@ public boolean drop_partition(final String db_name, 
final String tbl_name,
 null);
   }
 
-  private static class PathAndPartValSize {
-PathAndPartValSize(Path path, int partValSize) {
-  this.path = path;
-  this.partValSize = partValSize;
+/** Stores a path and its size. */
+private static class PathAndDepth implements Comparable {
+
+  final Path path;
+  final int depth;
+
+  public PathAndDepth(Path path, int depth) {
+this.path = path;
+this.depth = depth;
+  }
+
+  @Override
+  public int hashCode() {
+return Objects.hash(path.hashCode(), depth);
+  }
+
+  @Override
+  public boolean equals(Object o) {
+if (o == this) {
+  return true;
+}
+if (!(o instanceof PathAndDepth)) {
+  return false;
+}
+PathAndDepth p = (PathAndDepth) o;
+return path.equals(p.path) && depth == p.depth;

Review comment:
   Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 64)
Time Spent: 4h  (was: 3h 50m)

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -
>
> Key: HIVE-25277
> URL: https://issues.apache.org/jira/browse/HIVE-25277
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the 
> warehouse for which ListFiles is expensive. A root cause is that the 
> recursive parent dir deletion is very inefficient: there are many duplicated 
> calls to isEmpty (ListFiles is called at the end). This fix sorts the parents 
> to delete according to the path size, and always processes the longest one 
> (e.g., a/b/c is always before a/b). As a result, each parent path is only 
> needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=646665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646665
 ]

ASF GitHub Bot logged work on HIVE-25277:
-

Author: ASF GitHub Bot
Created on: 05/Sep/21 04:52
Start Date: 05/Sep/21 04:52
Worklog Time Spent: 10m 
  Work Description: coufon commented on a change in pull request #2421:
URL: https://github.com/apache/hive/pull/2421#discussion_r702365624



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -5156,14 +5155,40 @@ public boolean drop_partition(final String db_name, 
final String tbl_name,
 null);
   }
 
-  private static class PathAndPartValSize {
-PathAndPartValSize(Path path, int partValSize) {
-  this.path = path;
-  this.partValSize = partValSize;
+/** Stores a path and its size. */
+private static class PathAndDepth implements Comparable {
+

Review comment:
   Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 646665)
Time Spent: 3h 50m  (was: 3h 40m)

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -
>
> Key: HIVE-25277
> URL: https://issues.apache.org/jira/browse/HIVE-25277
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the 
> warehouse for which ListFiles is expensive. A root cause is that the 
> recursive parent dir deletion is very inefficient: there are many duplicated 
> calls to isEmpty (ListFiles is called at the end). This fix sorts the parents 
> to delete according to the path size, and always processes the longest one 
> (e.g., a/b/c is always before a/b). As a result, each parent path is only 
> needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=646663=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646663
 ]

ASF GitHub Bot logged work on HIVE-25277:
-

Author: ASF GitHub Bot
Created on: 05/Sep/21 04:43
Start Date: 05/Sep/21 04:43
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #2421:
URL: https://github.com/apache/hive/pull/2421#discussion_r70236



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -5156,14 +5155,40 @@ public boolean drop_partition(final String db_name, 
final String tbl_name,
 null);
   }
 
-  private static class PathAndPartValSize {
-PathAndPartValSize(Path path, int partValSize) {
-  this.path = path;
-  this.partValSize = partValSize;
+/** Stores a path and its size. */
+private static class PathAndDepth implements Comparable {
+
+  final Path path;
+  final int depth;
+
+  public PathAndDepth(Path path, int depth) {
+this.path = path;
+this.depth = depth;
+  }
+
+  @Override
+  public int hashCode() {
+return Objects.hash(path.hashCode(), depth);
+  }
+
+  @Override
+  public boolean equals(Object o) {
+if (o == this) {
+  return true;
+}
+if (!(o instanceof PathAndDepth)) {
+  return false;
+}
+PathAndDepth p = (PathAndDepth) o;
+return path.equals(p.path) && depth == p.depth;

Review comment:
   we can just use auto-generated `equals` method and should handle null 
here too:
   ```java
 @Override
 public boolean equals(Object o) {
   if (this == o) return true;
   if (o == null || getClass() != o.getClass()) return false;
   PathAndDepth that = (PathAndDepth) o;
   return depth == that.depth && Objects.equals(path, that.path);
 }
   ```

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -5156,14 +5155,40 @@ public boolean drop_partition(final String db_name, 
final String tbl_name,
 null);
   }
 
-  private static class PathAndPartValSize {
-PathAndPartValSize(Path path, int partValSize) {
-  this.path = path;
-  this.partValSize = partValSize;
+/** Stores a path and its size. */
+private static class PathAndDepth implements Comparable {
+

Review comment:
   nit: extra empty line




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 646663)
Time Spent: 3h 40m  (was: 3.5h)

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -
>
> Key: HIVE-25277
> URL: https://issues.apache.org/jira/browse/HIVE-25277
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the 
> warehouse for which ListFiles is expensive. A root cause is that the 
> recursive parent dir deletion is very inefficient: there are many duplicated 
> calls to isEmpty (ListFiles is called at the end). This fix sorts the parents 
> to delete according to the path size, and always processes the longest one 
> (e.g., a/b/c is always before a/b). As a result, each parent path is only 
> needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24944) When the default engine of the hiveserver is MR and the tez engine is set by the client, the client TEZ progress log cannot be printed normally

2021-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24944?focusedWorklogId=646654=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646654
 ]

ASF GitHub Bot logged work on HIVE-24944:
-

Author: ASF GitHub Bot
Created on: 05/Sep/21 00:10
Start Date: 05/Sep/21 00:10
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #2204:
URL: https://github.com/apache/hive/pull/2204


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 646654)
Time Spent: 1h 50m  (was: 1h 40m)

> When the default engine of the hiveserver is MR and the tez engine is set by 
> the client, the client TEZ progress log cannot be printed normally
> ---
>
> Key: HIVE-24944
> URL: https://issues.apache.org/jira/browse/HIVE-24944
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0, 4.0.0
>Reporter: ZhangQiDong
>Assignee: ZhangQiDong
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24944.001.patch, HIVE-24944.002.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> HiveServer configuration parameter execution default MR. When set 
> hive.execution.engine = tez, the client cannot print the TEZ log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25293) Alter partitioned table with "cascade" option create too many columns records.

2021-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25293?focusedWorklogId=646653=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646653
 ]

ASF GitHub Bot logged work on HIVE-25293:
-

Author: ASF GitHub Bot
Created on: 05/Sep/21 00:10
Start Date: 05/Sep/21 00:10
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #2436:
URL: https://github.com/apache/hive/pull/2436


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 646653)
Time Spent: 1h 10m  (was: 1h)

> Alter partitioned table with "cascade" option create too many columns records.
> --
>
> Key: HIVE-25293
> URL: https://issues.apache.org/jira/browse/HIVE-25293
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.3.3, 3.1.2
>Reporter: yongtaoliao
>Assignee: yongtaoliao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When alter partitioned table with "cascade" option, all partitions supports 
> to be updated. Currently, a CD_ID will be created for each partition, 
> associated with a set of Columns, which will cause a large amount of 
> redundant data in the metadata database.
> The following DDL statements can reproduce this scenario:
>  
> {code:java}
> create table test_table (f1 int) partitioned by (p string);
> alter table test_table add partition(p='a');
> alter table test_table add partition(p='b');
> alter table test_table add partition(p='c');
> alter table test_table add columns (f2 int) cascade;{code}
> All partitions use the table's `CD_ID` before adding columns, while each 
> partition use their own `CD_ID` after adding columns.
>  
> My proposal is all partitions should use the same `CD_ID` when table was 
> altered with "cascade" option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25296) Replace parquet-hadoop-bundle dependency with the actual parquet modules

2021-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25296?focusedWorklogId=646655=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646655
 ]

ASF GitHub Bot logged work on HIVE-25296:
-

Author: ASF GitHub Bot
Created on: 05/Sep/21 00:10
Start Date: 05/Sep/21 00:10
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #2288:
URL: https://github.com/apache/hive/pull/2288#issuecomment-913058810


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 646655)
Time Spent: 0.5h  (was: 20m)

> Replace parquet-hadoop-bundle dependency with the actual parquet modules
> 
>
> Key: HIVE-25296
> URL: https://issues.apache.org/jira/browse/HIVE-25296
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The parquet-hadoop-bundle is not a real dependency but a mere packaging
> of three parquet modules to create an uber jar. The Parquet community
> created this artificial module on demand by HIVE-5783 but the
> benefits if any are unclear.
> On the contrary using the uber dependency has some drawbacks:
> * Parquet souce code cannot be attached easily in IDEs which makes debugging 
> sessions cumbersome.
> * Finding concrete dependencies with Parquet is not possible just by 
> inspecting the pom files.
> * Extra maintenance cost for the Parquet community adding additional 
> verification steps during a release.
> The goal of this JIRA is to replace the uber dependency with concrete 
> dependencies to the respective modules:
> * parquet-common
> * parquet-column
> * parquet-hadoop



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25233) Removing deprecated unix_timestamp UDF

2021-09-04 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma resolved HIVE-25233.
--
Resolution: Invalid

> Removing deprecated unix_timestamp UDF
> --
>
> Key: HIVE-25233
> URL: https://issues.apache.org/jira/browse/HIVE-25233
> Project: Hive
>  Issue Type: Task
>  Components: UDF
>Affects Versions: All Versions
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Description
> Since unix_timestamp() UDF was deprecated as part of 
> https://issues.apache.org/jira/browse/HIVE-10728. Internal 
> GenericUDFUnixTimeStamp extend GenericUDFToUnixTimeStamp and call 
> to_utc_timestamp() for unix_timestamp(string date) & unix_timestamp(string 
> date, string pattern).
> unix_timestamp()   => CURRENT_TIMESTAMP
> unix_timestamp(string date) => to_unix_timestamp()
> unix_timestamp(string date, string pattern) => to_unix_timestamp()
> We should clean up unix_timestamp() and points to to_unix_timestamp()
>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work stopped] (HIVE-25233) Removing deprecated unix_timestamp UDF

2021-09-04 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25233 stopped by Ashish Sharma.

> Removing deprecated unix_timestamp UDF
> --
>
> Key: HIVE-25233
> URL: https://issues.apache.org/jira/browse/HIVE-25233
> Project: Hive
>  Issue Type: Task
>  Components: UDF
>Affects Versions: All Versions
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Description
> Since unix_timestamp() UDF was deprecated as part of 
> https://issues.apache.org/jira/browse/HIVE-10728. Internal 
> GenericUDFUnixTimeStamp extend GenericUDFToUnixTimeStamp and call 
> to_utc_timestamp() for unix_timestamp(string date) & unix_timestamp(string 
> date, string pattern).
> unix_timestamp()   => CURRENT_TIMESTAMP
> unix_timestamp(string date) => to_unix_timestamp()
> unix_timestamp(string date, string pattern) => to_unix_timestamp()
> We should clean up unix_timestamp() and points to to_unix_timestamp()
>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-25499) select unix_timestamp(dt) from table and select unix_timestamp(constant date) are different

2021-09-04 Thread Ashish Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409933#comment-17409933
 ] 

Ashish Sharma edited comment on HIVE-25499 at 9/4/21, 10:41 AM:


query - 
create table testdate(dt date);
insert into testdate values('0001-12-30');
select * from testdate;
select unix_timestamp(dt) from testdate;
select unix_timestamp('0001-12-30', '-MM-dd');

output - 
PREHOOK: query: create table testdate(dt date)
PREHOOK: type: CREATETABLE
PREHOOK: Output: database:default
PREHOOK: Output: default@testdate
POSTHOOK: query: create table testdate(dt date)
POSTHOOK: type: CREATETABLE
POSTHOOK: Output: database:default
POSTHOOK: Output: default@testdate
PREHOOK: query: insert into testdate values('0001-12-30')
PREHOOK: type: QUERY
PREHOOK: Input: _dummy_database@_dummy_table
PREHOOK: Output: default@testdate
POSTHOOK: query: insert into testdate values('0001-12-30')
POSTHOOK: type: QUERY
POSTHOOK: Input: _dummy_database@_dummy_table
POSTHOOK: Output: default@testdate
POSTHOOK: Lineage: testdate.dt SCRIPT []
PREHOOK: query: select * from testdate
PREHOOK: type: QUERY
PREHOOK: Input: default@testdate
POSTHOOK: query: select * from testdate
POSTHOOK: type: QUERY
POSTHOOK: Input: default@testdate
0001-12-30
PREHOOK: query: select unix_timestamp(dt) from testdate
PREHOOK: type: QUERY
PREHOOK: Input: default@testdate
POSTHOOK: query: select unix_timestamp(dt) from testdate
POSTHOOK: type: QUERY
POSTHOOK: Input: default@testdate
*-62104205222*
PREHOOK: query: select unix_timestamp('0001-12-30', '-MM-dd')
PREHOOK: type: QUERY
PREHOOK: Input: _dummy_database@_dummy_table
POSTHOOK: query: select unix_timestamp('0001-12-30', '-MM-dd')
POSTHOOK: type: QUERY
POSTHOOK: Input: _dummy_database@_dummy_table
*-62104205222*



was (Author: ashish-kumar-sharma):
query - 
create table testdate(dt date);
insert into testdate values('0001-12-30');
select * from testdate;
select unix_timestamp(dt) from testdate;
select unix_timestamp('0001-12-30', '-MM-dd');

output - 
PREHOOK: query: create table testdate(dt date)
PREHOOK: type: CREATETABLE
PREHOOK: Output: database:default
PREHOOK: Output: default@testdate
POSTHOOK: query: create table testdate(dt date)
POSTHOOK: type: CREATETABLE
POSTHOOK: Output: database:default
POSTHOOK: Output: default@testdate
PREHOOK: query: insert into testdate values('0001-12-30')
PREHOOK: type: QUERY
PREHOOK: Input: _dummy_database@_dummy_table
PREHOOK: Output: default@testdate
POSTHOOK: query: insert into testdate values('0001-12-30')
POSTHOOK: type: QUERY
POSTHOOK: Input: _dummy_database@_dummy_table
POSTHOOK: Output: default@testdate
POSTHOOK: Lineage: testdate.dt SCRIPT []
PREHOOK: query: select * from testdate
PREHOOK: type: QUERY
PREHOOK: Input: default@testdate
POSTHOOK: query: select * from testdate
POSTHOOK: type: QUERY
POSTHOOK: Input: default@testdate
0001-12-30
PREHOOK: query: select unix_timestamp(dt) from testdate
PREHOOK: type: QUERY
PREHOOK: Input: default@testdate
POSTHOOK: query: select unix_timestamp(dt) from testdate
POSTHOOK: type: QUERY
POSTHOOK: Input: default@testdate
-62104205222
PREHOOK: query: select unix_timestamp('0001-12-30', '-MM-dd')
PREHOOK: type: QUERY
PREHOOK: Input: _dummy_database@_dummy_table
POSTHOOK: query: select unix_timestamp('0001-12-30', '-MM-dd')
POSTHOOK: type: QUERY
POSTHOOK: Input: _dummy_database@_dummy_table
-62104205222


> select unix_timestamp(dt) from table and select unix_timestamp(constant date) 
>  are different
> 
>
> Key: HIVE-25499
> URL: https://issues.apache.org/jira/browse/HIVE-25499
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: zhaolong
>Assignee: Ashish Sharma
>Priority: Major
>
> I found select unix_timestamp(date column) from table and select 
> unix_timestamp(constant date) are different in 3.1.2, for example:
> create table testdate(dt date);
> insert into testdate values('0001-12-30');
> select * from testdate; --> 0001-12-30
> select unix_timestamp(dt) from testdate; --> -62104233600
> select unix_timestamp('0001-12-30', '-MM-dd'); --> -62104406400
> the -62104233600 is different with -62104406400.
>  
> and convert timestap value is:
> select from_unixtime(-62104233600); --> 0002-01-01 00:00:00 , 62104233600 is  
> select unix_timestamp(date column) from table value which date is 0001-12-30.
> select from_unixtime(-62104406400); --> 0001-12-30 00:00:00
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-25499) select unix_timestamp(dt) from table and select unix_timestamp(constant date) are different

2021-09-04 Thread Ashish Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409933#comment-17409933
 ] 

Ashish Sharma edited comment on HIVE-25499 at 9/4/21, 10:41 AM:


query - 
create table testdate(dt date);
insert into testdate values('0001-12-30');
select * from testdate;
select unix_timestamp(dt) from testdate;
select unix_timestamp('0001-12-30', '-MM-dd');

output - 
PREHOOK: query: create table testdate(dt date)
PREHOOK: type: CREATETABLE
PREHOOK: Output: database:default
PREHOOK: Output: default@testdate
POSTHOOK: query: create table testdate(dt date)
POSTHOOK: type: CREATETABLE
POSTHOOK: Output: database:default
POSTHOOK: Output: default@testdate
PREHOOK: query: insert into testdate values('0001-12-30')
PREHOOK: type: QUERY
PREHOOK: Input: _dummy_database@_dummy_table
PREHOOK: Output: default@testdate
POSTHOOK: query: insert into testdate values('0001-12-30')
POSTHOOK: type: QUERY
POSTHOOK: Input: _dummy_database@_dummy_table
POSTHOOK: Output: default@testdate
POSTHOOK: Lineage: testdate.dt SCRIPT []
PREHOOK: query: select * from testdate
PREHOOK: type: QUERY
PREHOOK: Input: default@testdate
POSTHOOK: query: select * from testdate
POSTHOOK: type: QUERY
POSTHOOK: Input: default@testdate
0001-12-30
PREHOOK: query: select unix_timestamp(dt) from testdate
PREHOOK: type: QUERY
PREHOOK: Input: default@testdate
POSTHOOK: query: select unix_timestamp(dt) from testdate
POSTHOOK: type: QUERY
POSTHOOK: Input: default@testdate
-62104205222
PREHOOK: query: select unix_timestamp('0001-12-30', '-MM-dd')
PREHOOK: type: QUERY
PREHOOK: Input: _dummy_database@_dummy_table
POSTHOOK: query: select unix_timestamp('0001-12-30', '-MM-dd')
POSTHOOK: type: QUERY
POSTHOOK: Input: _dummy_database@_dummy_table
-62104205222



was (Author: ashish-kumar-sharma):
query - 
create table testdate(dt date);
insert into testdate values('0001-12-30');
select * from testdate;
select unix_timestamp(dt) from testdate;
select unix_timestamp('0001-12-30', '-MM-dd');

output - 
PREHOOK: query: create table testdate(dt date)
PREHOOK: type: CREATETABLE
PREHOOK: Output: database:default
PREHOOK: Output: default@testdate
POSTHOOK: query: create table testdate(dt date)
POSTHOOK: type: CREATETABLE
POSTHOOK: Output: database:default
POSTHOOK: Output: default@testdate
PREHOOK: query: insert into testdate values('0001-12-30')
PREHOOK: type: QUERY
PREHOOK: Input: _dummy_database@_dummy_table
PREHOOK: Output: default@testdate
POSTHOOK: query: insert into testdate values('0001-12-30')
POSTHOOK: type: QUERY
POSTHOOK: Input: _dummy_database@_dummy_table
POSTHOOK: Output: default@testdate
POSTHOOK: Lineage: testdate.dt SCRIPT []
PREHOOK: query: select * from testdate
PREHOOK: type: QUERY
PREHOOK: Input: default@testdate
 A masked pattern was here 
POSTHOOK: query: select * from testdate
POSTHOOK: type: QUERY
POSTHOOK: Input: default@testdate
 A masked pattern was here 
0001-12-30
PREHOOK: query: select unix_timestamp(dt) from testdate
PREHOOK: type: QUERY
PREHOOK: Input: default@testdate
 A masked pattern was here 
POSTHOOK: query: select unix_timestamp(dt) from testdate
POSTHOOK: type: QUERY
POSTHOOK: Input: default@testdate
 A masked pattern was here 
-62104205222
PREHOOK: query: select unix_timestamp('0001-12-30', '-MM-dd')
PREHOOK: type: QUERY
PREHOOK: Input: _dummy_database@_dummy_table
 A masked pattern was here 
POSTHOOK: query: select unix_timestamp('0001-12-30', '-MM-dd')
POSTHOOK: type: QUERY
POSTHOOK: Input: _dummy_database@_dummy_table
 A masked pattern was here 
-62104205222


> select unix_timestamp(dt) from table and select unix_timestamp(constant date) 
>  are different
> 
>
> Key: HIVE-25499
> URL: https://issues.apache.org/jira/browse/HIVE-25499
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: zhaolong
>Assignee: Ashish Sharma
>Priority: Major
>
> I found select unix_timestamp(date column) from table and select 
> unix_timestamp(constant date) are different in 3.1.2, for example:
> create table testdate(dt date);
> insert into testdate values('0001-12-30');
> select * from testdate; --> 0001-12-30
> select unix_timestamp(dt) from testdate; --> -62104233600
> select unix_timestamp('0001-12-30', '-MM-dd'); --> -62104406400
> the -62104233600 is different with -62104406400.
>  
> and convert timestap value is:
> select from_unixtime(-62104233600); --> 0002-01-01 00:00:00 , 62104233600 is  
> select unix_timestamp(date column) from table value which date is 0001-12-30.
> select from_unixtime(-62104406400); --> 0001-12-30 00:00:00
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25499) select unix_timestamp(dt) from table and select unix_timestamp(constant date) are different

2021-09-04 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma resolved HIVE-25499.
--
Resolution: Fixed

query - 
create table testdate(dt date);
insert into testdate values('0001-12-30');
select * from testdate;
select unix_timestamp(dt) from testdate;
select unix_timestamp('0001-12-30', '-MM-dd');

output - 
PREHOOK: query: create table testdate(dt date)
PREHOOK: type: CREATETABLE
PREHOOK: Output: database:default
PREHOOK: Output: default@testdate
POSTHOOK: query: create table testdate(dt date)
POSTHOOK: type: CREATETABLE
POSTHOOK: Output: database:default
POSTHOOK: Output: default@testdate
PREHOOK: query: insert into testdate values('0001-12-30')
PREHOOK: type: QUERY
PREHOOK: Input: _dummy_database@_dummy_table
PREHOOK: Output: default@testdate
POSTHOOK: query: insert into testdate values('0001-12-30')
POSTHOOK: type: QUERY
POSTHOOK: Input: _dummy_database@_dummy_table
POSTHOOK: Output: default@testdate
POSTHOOK: Lineage: testdate.dt SCRIPT []
PREHOOK: query: select * from testdate
PREHOOK: type: QUERY
PREHOOK: Input: default@testdate
 A masked pattern was here 
POSTHOOK: query: select * from testdate
POSTHOOK: type: QUERY
POSTHOOK: Input: default@testdate
 A masked pattern was here 
0001-12-30
PREHOOK: query: select unix_timestamp(dt) from testdate
PREHOOK: type: QUERY
PREHOOK: Input: default@testdate
 A masked pattern was here 
POSTHOOK: query: select unix_timestamp(dt) from testdate
POSTHOOK: type: QUERY
POSTHOOK: Input: default@testdate
 A masked pattern was here 
-62104205222
PREHOOK: query: select unix_timestamp('0001-12-30', '-MM-dd')
PREHOOK: type: QUERY
PREHOOK: Input: _dummy_database@_dummy_table
 A masked pattern was here 
POSTHOOK: query: select unix_timestamp('0001-12-30', '-MM-dd')
POSTHOOK: type: QUERY
POSTHOOK: Input: _dummy_database@_dummy_table
 A masked pattern was here 
-62104205222


> select unix_timestamp(dt) from table and select unix_timestamp(constant date) 
>  are different
> 
>
> Key: HIVE-25499
> URL: https://issues.apache.org/jira/browse/HIVE-25499
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: zhaolong
>Assignee: Ashish Sharma
>Priority: Major
>
> I found select unix_timestamp(date column) from table and select 
> unix_timestamp(constant date) are different in 3.1.2, for example:
> create table testdate(dt date);
> insert into testdate values('0001-12-30');
> select * from testdate; --> 0001-12-30
> select unix_timestamp(dt) from testdate; --> -62104233600
> select unix_timestamp('0001-12-30', '-MM-dd'); --> -62104406400
> the -62104233600 is different with -62104406400.
>  
> and convert timestap value is:
> select from_unixtime(-62104233600); --> 0002-01-01 00:00:00 , 62104233600 is  
> select unix_timestamp(date column) from table value which date is 0001-12-30.
> select from_unixtime(-62104406400); --> 0001-12-30 00:00:00
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25499) select unix_timestamp(dt) from table and select unix_timestamp(constant date) are different

2021-09-04 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma reassigned HIVE-25499:


Assignee: Ashish Sharma

> select unix_timestamp(dt) from table and select unix_timestamp(constant date) 
>  are different
> 
>
> Key: HIVE-25499
> URL: https://issues.apache.org/jira/browse/HIVE-25499
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: zhaolong
>Assignee: Ashish Sharma
>Priority: Major
>
> I found select unix_timestamp(date column) from table and select 
> unix_timestamp(constant date) are different in 3.1.2, for example:
> create table testdate(dt date);
> insert into testdate values('0001-12-30');
> select * from testdate; --> 0001-12-30
> select unix_timestamp(dt) from testdate; --> -62104233600
> select unix_timestamp('0001-12-30', '-MM-dd'); --> -62104406400
> the -62104233600 is different with -62104406400.
>  
> and convert timestap value is:
> select from_unixtime(-62104233600); --> 0002-01-01 00:00:00 , 62104233600 is  
> select unix_timestamp(date column) from table value which date is 0001-12-30.
> select from_unixtime(-62104406400); --> 0001-12-30 00:00:00
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25499) select unix_timestamp(dt) from table and select unix_timestamp(constant date) are different

2021-09-04 Thread zhaolong (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409902#comment-17409902
 ] 

zhaolong commented on HIVE-25499:
-

It looks like,

data column convert to timestamp use:

LocalDate localDate = LocalDate.of(0001, 12, 30);
long localTime = localDate.atStartOfDay().toEpochSecond(ZoneOffset.of("+0"));
System.out.println(localTime);

 

and constant data convert to timestamp use:

String date = "0001-12-30";
SimpleDateFormat sdf = new SimpleDateFormat("-MM-dd");
Date date1 = sdf.parse(date);
System.out.println(date1.getTime()/1000);

> select unix_timestamp(dt) from table and select unix_timestamp(constant date) 
>  are different
> 
>
> Key: HIVE-25499
> URL: https://issues.apache.org/jira/browse/HIVE-25499
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: zhaolong
>Priority: Major
>
> I found select unix_timestamp(date column) from table and select 
> unix_timestamp(constant date) are different in 3.1.2, for example:
> create table testdate(dt date);
> insert into testdate values('0001-12-30');
> select * from testdate; --> 0001-12-30
> select unix_timestamp(dt) from testdate; --> -62104233600
> select unix_timestamp('0001-12-30', '-MM-dd'); --> -62104406400
> the -62104233600 is different with -62104406400.
>  
> and convert timestap value is:
> select from_unixtime(-62104233600); --> 0002-01-01 00:00:00 , 62104233600 is  
> select unix_timestamp(date column) from table value which date is 0001-12-30.
> select from_unixtime(-62104406400); --> 0001-12-30 00:00:00
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)