[jira] [Work logged] (HIVE-25750) Beeline: Creating a standalone tarball by isolating dependencies

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25750?focusedWorklogId=735164=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-735164
 ]

ASF GitHub Bot logged work on HIVE-25750:
-

Author: ASF GitHub Bot
Created on: 02/Mar/22 07:57
Start Date: 02/Mar/22 07:57
Worklog Time Spent: 10m 
  Work Description: achennagiri closed pull request #3043:
URL: https://github.com/apache/hive/pull/3043


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 735164)
Time Spent: 4h  (was: 3h 50m)

> Beeline: Creating a standalone tarball by isolating dependencies
> 
>
> Key: HIVE-25750
> URL: https://issues.apache.org/jira/browse/HIVE-25750
> Project: Hive
>  Issue Type: Bug
>Reporter: Abhay
>Assignee: Abhay
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> The code to create a standalone beeline tarball was created as part of this 
> ticket https://issues.apache.org/jira/browse/HIVE-24348. However, a bug was 
> reported in the case when the beeline is tried to install without the hadoop 
> installed. 
> The beeline script complains of missing dependencies when it is run.
> The ask as part of this ticket is to fix that bug. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25641) Hive多分区表添加字段提示Partition not found

2022-03-01 Thread chenruotao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17499949#comment-17499949
 ] 

chenruotao commented on HIVE-25641:
---

打开debug看详细的日志

> Hive多分区表添加字段提示Partition not found
> -
>
> Key: HIVE-25641
> URL: https://issues.apache.org/jira/browse/HIVE-25641
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.0.1
>Reporter: zhusijie
>Priority: Major
> Attachments: image-2021-10-25-17-32-32-207.png
>
>
> 执行 ALTER TABLE cf_rds.cf_rds_jxd_clue_basic_di ADD COLUMNS (channel bigint 
> COMMENT '渠道号') CASCADE;
> 提示:Partition not found
> !image-2021-10-25-17-32-32-207.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25997) Fix release source packaging

2022-03-01 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-25997:
--
Summary: Fix release source packaging  (was: Fix release source packagin)

> Fix release source packaging
> 
>
> Key: HIVE-25997
> URL: https://issues.apache.org/jira/browse/HIVE-25997
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0-alpha-1
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> The generated source package is not compiling with:
> {code:java}
> mvn clean install -DskipTests {code}
> We should fix that for the release



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25997) Fix release source packagin

2022-03-01 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-25997:
-


> Fix release source packagin
> ---
>
> Key: HIVE-25997
> URL: https://issues.apache.org/jira/browse/HIVE-25997
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0-alpha-1
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> The generated source package is not compiling with:
> {code:java}
> mvn clean install -DskipTests {code}
> We should fix that for the release



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25996) Backport HIVE-21498 and HIVE-25098 to fix CVE-2020-13949

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25996?focusedWorklogId=735145=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-735145
 ]

ASF GitHub Bot logged work on HIVE-25996:
-

Author: ASF GitHub Bot
Created on: 02/Mar/22 07:04
Start Date: 02/Mar/22 07:04
Worklog Time Spent: 10m 
  Work Description: wangyum opened a new pull request #3066:
URL: https://github.com/apache/hive/pull/3066


   ### What changes were proposed in this pull request?
   
   Backport HIVE-21498 and HIVE-25098 to branch-2.3.
   
   ### Why are the changes needed?
   
   Fix CVE-2020-13949 and can make the downstreams easy to upgrade their Thrift 
version.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Local test.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 735145)
Remaining Estimate: 0h
Time Spent: 10m

> Backport HIVE-21498 and HIVE-25098 to fix CVE-2020-13949
> 
>
> Key: HIVE-25996
> URL: https://issues.apache.org/jira/browse/HIVE-25996
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.9
>Reporter: Yuming Wang
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25996) Backport HIVE-21498 and HIVE-25098 to fix CVE-2020-13949

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25996:
--
Labels: pull-request-available  (was: )

> Backport HIVE-21498 and HIVE-25098 to fix CVE-2020-13949
> 
>
> Key: HIVE-25996
> URL: https://issues.apache.org/jira/browse/HIVE-25996
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.9
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25995) Build from source distribution archive fails

2022-03-01 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17499893#comment-17499893
 ] 

Ayush Saxena commented on HIVE-25995:
-

The problem lies in the src.xml 

Adding these two entries 
{noformat}
udf/**/*
parser/**/*
{noformat}
 In this include block : 
[https://github.com/apache/hive/blob/master/packaging/src/main/assembly/src.xml#L108-L109]
 solves two of the issue.

Standalone-metastore has entries like:
{noformat}
standalone-metastore/metastore-common/**/*
standalone-metastore/metastore-server/**/*
{noformat}
So, this misses the pom.xml and other files in standalone-metastore, so either 
we have an explicit entry for pom.xml, or we include the entire 
standalone-metastore directory, as of now metastore-tools is also not included, 
not sure if deliberately due to this include structure.

I added the udf & the parser, changed the standalone-metastore stuffs to 
include the complete directory. The build passes post that, but I feel the 
intent of the script somewhat is to prevent addition of {{metastore-tools}} 
from being copied(not sure why). If that is the case then we have to explicitly 
add the pom and other relevant files like LICENCE & NOTICE present in 
standalone-metastore

> Build from source distribution archive fails
> 
>
> Key: HIVE-25995
> URL: https://issues.apache.org/jira/browse/HIVE-25995
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Stamatis Zampetakis
>Priority: Blocker
>
> The source distribution archive, apache-hive-4.0.0-SNAPSHOT-src.tar.gz, can 
> be produced by running:
> {code:bash}
> mvn clean package -DskipTests -Pdist
> {code}
> The file is generated under:
> {noformat}
> packaging/target/apache-hive-4.0.0-SNAPSHOT-src.tar.gz
> {noformat}
> The source distribution archive/package 
> [should|https://www.apache.org/legal/release-policy.html#source-packages] 
> allow anyone who downloads it to build and test Hive.
> At the moment, on commit 
> [b63dab11d229abac59a4ef5e141d8d9b28037c8b|https://github.com/apache/hive/commit/b63dab11d229abac59a4ef5e141d8d9b28037c8b],
>  if someone produces the source package and extracts the contents of the 
> archive, it is not possible to build Hive.
> Both {{mvn install}} and {{mvn package}} commands fail when they are executed 
> inside the directory extracted from the archive.
> {noformat}
> mvn clean install -DskipTests
> mvn clean package -DskipTests
> {noformat}
> The error is shown below:
> {noformat}
> [INFO] Scanning for projects...
> [ERROR] [ERROR] Some problems were encountered while processing the POMs:
> [ERROR] Child module 
> /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/parser of 
> /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml does not 
> exist @ 
> [ERROR] Child module 
> /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/udf of 
> /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml does not 
> exist @ 
> [ERROR] Child module 
> /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/standalone-metastore/pom.xml
>  of /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml does not 
> exist @ 
>  @ 
> [ERROR] The build could not read 1 project -> [Help 1]
> [ERROR]   
> [ERROR]   The project org.apache.hive:hive:4.0.0-SNAPSHOT 
> (/home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml) has 3 errors
> [ERROR] Child module 
> /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/parser of 
> /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml does not exist
> [ERROR] Child module 
> /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/udf of 
> /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml does not exist
> [ERROR] Child module 
> /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/standalone-metastore/pom.xml
>  of /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml does not 
> exist
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-24949) Fail to rename a partition with customized catalog

2022-03-01 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17499836#comment-17499836
 ] 

Zhihua Deng commented on HIVE-24949:


Merged to master, Thank you [~yufeigu] and [~kgyrtkirk] for the report and 
review!

> Fail to rename a partition with customized catalog
> --
>
> Key: HIVE-24949
> URL: https://issues.apache.org/jira/browse/HIVE-24949
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.1.2
>Reporter: Yufei Gu
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> How to trigger it? 
>  1. Create a customized catalog in HMS 3.x
>  2. Run a sql like "ALTER TABLE student PARTITION (age='12') RENAME TO 
> PARTITION (age='15');"
> Error message:
> {code:java}
> spark-sql> ALTER TABLE student PARTITION (age='12') RENAME TO PARTITION 
> (age='15');
> Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
> rename partition. Unable to alter partition because table or database does 
> not exist.;
> {code}
> We can fix if by replacing {{DEFAULT_CATALOG_NAME}} with {{catName}} in the 
> following 
> line([https://github.com/apache/hive/blob/a00621b49657da3d14bcb20bf9e49fea10dc9273/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L604])
> {code:java}
>  Table tbl = msdb.getTable(DEFAULT_CATALOG_NAME, dbname, name, null);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-25987) Incorrectly formatted pom.xml error in Beeline

2022-03-01 Thread Abhay (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhay resolved HIVE-25987.
--
Fix Version/s: All Versions
 Assignee: Abhay
   Resolution: Invalid

> Incorrectly formatted pom.xml error in Beeline
> --
>
> Key: HIVE-25987
> URL: https://issues.apache.org/jira/browse/HIVE-25987
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Abhay
>Assignee: Abhay
>Priority: Major
> Fix For: All Versions
>
>
> After applying the patch [https://github.com/apache/hive/pull/3043,] 
> HIVE-25750, the precommit tests have started complaining of this 
> *!!! incorrectly formatted pom.xmls detected; see above!*
> The code built fine locally and the pre-commit tests had run fine. Need to 
> investigate further why this was not caught earlier but the pom.xml file 
> needs to be fixed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-24949) Fail to rename a partition with customized catalog

2022-03-01 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-24949.

Resolution: Fixed

> Fail to rename a partition with customized catalog
> --
>
> Key: HIVE-24949
> URL: https://issues.apache.org/jira/browse/HIVE-24949
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.1.2
>Reporter: Yufei Gu
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> How to trigger it? 
>  1. Create a customized catalog in HMS 3.x
>  2. Run a sql like "ALTER TABLE student PARTITION (age='12') RENAME TO 
> PARTITION (age='15');"
> Error message:
> {code:java}
> spark-sql> ALTER TABLE student PARTITION (age='12') RENAME TO PARTITION 
> (age='15');
> Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
> rename partition. Unable to alter partition because table or database does 
> not exist.;
> {code}
> We can fix if by replacing {{DEFAULT_CATALOG_NAME}} with {{catName}} in the 
> following 
> line([https://github.com/apache/hive/blob/a00621b49657da3d14bcb20bf9e49fea10dc9273/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L604])
> {code:java}
>  Table tbl = msdb.getTable(DEFAULT_CATALOG_NAME, dbname, name, null);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-24949) Fail to rename a partition with customized catalog

2022-03-01 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-24949:
--

Assignee: Zhihua Deng

> Fail to rename a partition with customized catalog
> --
>
> Key: HIVE-24949
> URL: https://issues.apache.org/jira/browse/HIVE-24949
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.1.2
>Reporter: Yufei Gu
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> How to trigger it? 
>  1. Create a customized catalog in HMS 3.x
>  2. Run a sql like "ALTER TABLE student PARTITION (age='12') RENAME TO 
> PARTITION (age='15');"
> Error message:
> {code:java}
> spark-sql> ALTER TABLE student PARTITION (age='12') RENAME TO PARTITION 
> (age='15');
> Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
> rename partition. Unable to alter partition because table or database does 
> not exist.;
> {code}
> We can fix if by replacing {{DEFAULT_CATALOG_NAME}} with {{catName}} in the 
> following 
> line([https://github.com/apache/hive/blob/a00621b49657da3d14bcb20bf9e49fea10dc9273/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L604])
> {code:java}
>  Table tbl = msdb.getTable(DEFAULT_CATALOG_NAME, dbname, name, null);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-24949) Fail to rename a partition with customized catalog

2022-03-01 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-24949:
---
Fix Version/s: 4.0.0

> Fail to rename a partition with customized catalog
> --
>
> Key: HIVE-24949
> URL: https://issues.apache.org/jira/browse/HIVE-24949
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.1.2
>Reporter: Yufei Gu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> How to trigger it? 
>  1. Create a customized catalog in HMS 3.x
>  2. Run a sql like "ALTER TABLE student PARTITION (age='12') RENAME TO 
> PARTITION (age='15');"
> Error message:
> {code:java}
> spark-sql> ALTER TABLE student PARTITION (age='12') RENAME TO PARTITION 
> (age='15');
> Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
> rename partition. Unable to alter partition because table or database does 
> not exist.;
> {code}
> We can fix if by replacing {{DEFAULT_CATALOG_NAME}} with {{catName}} in the 
> following 
> line([https://github.com/apache/hive/blob/a00621b49657da3d14bcb20bf9e49fea10dc9273/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L604])
> {code:java}
>  Table tbl = msdb.getTable(DEFAULT_CATALOG_NAME, dbname, name, null);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-24949) Fail to rename a partition with customized catalog

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24949?focusedWorklogId=735080=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-735080
 ]

ASF GitHub Bot logged work on HIVE-24949:
-

Author: ASF GitHub Bot
Created on: 02/Mar/22 01:43
Start Date: 02/Mar/22 01:43
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 merged pull request #2910:
URL: https://github.com/apache/hive/pull/2910


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 735080)
Time Spent: 50m  (was: 40m)

> Fail to rename a partition with customized catalog
> --
>
> Key: HIVE-24949
> URL: https://issues.apache.org/jira/browse/HIVE-24949
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.1.2
>Reporter: Yufei Gu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> How to trigger it? 
>  1. Create a customized catalog in HMS 3.x
>  2. Run a sql like "ALTER TABLE student PARTITION (age='12') RENAME TO 
> PARTITION (age='15');"
> Error message:
> {code:java}
> spark-sql> ALTER TABLE student PARTITION (age='12') RENAME TO PARTITION 
> (age='15');
> Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
> rename partition. Unable to alter partition because table or database does 
> not exist.;
> {code}
> We can fix if by replacing {{DEFAULT_CATALOG_NAME}} with {{catName}} in the 
> following 
> line([https://github.com/apache/hive/blob/a00621b49657da3d14bcb20bf9e49fea10dc9273/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L604])
> {code:java}
>  Table tbl = msdb.getTable(DEFAULT_CATALOG_NAME, dbname, name, null);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (HIVE-25987) Incorrectly formatted pom.xml error in Beeline

2022-03-01 Thread Abhay (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17499832#comment-17499832
 ] 

Abhay edited comment on HIVE-25987 at 3/2/22, 1:42 AM:
---

[~kgyrtkirk]  I had to close the [PR|https://github.com/apache/hive/pull/2824]  
in question and create a new one to be merged which ran the precommit tests in 
the recent past but still before this merge was fixed. But the point is taken. 
Thank you for clarifying this :)

[~zabetak] I agree. Any thoughts on where this could be enforced locally?Maven 
enforcer?


was (Author: achennagiri):
[~kgyrtkirk]  I had to close the [PR|https://github.com/apache/hive/pull/2824]  
in question and create a new one to be merged which ran the precommit tests in 
the recent past but still before this merge was fixed. But the point is taken. 
Thank you for clarifying this :)

[~zabetak] I agree.

> Incorrectly formatted pom.xml error in Beeline
> --
>
> Key: HIVE-25987
> URL: https://issues.apache.org/jira/browse/HIVE-25987
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Abhay
>Priority: Major
>
> After applying the patch [https://github.com/apache/hive/pull/3043,] 
> HIVE-25750, the precommit tests have started complaining of this 
> *!!! incorrectly formatted pom.xmls detected; see above!*
> The code built fine locally and the pre-commit tests had run fine. Need to 
> investigate further why this was not caught earlier but the pom.xml file 
> needs to be fixed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25987) Incorrectly formatted pom.xml error in Beeline

2022-03-01 Thread Abhay (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17499832#comment-17499832
 ] 

Abhay commented on HIVE-25987:
--

[~kgyrtkirk]  I had to close the [PR|https://github.com/apache/hive/pull/2824]  
in question and create a new one to be merged which ran the precommit tests in 
the recent past but still before this merge was fixed. But the point is taken. 
Thank you for clarifying this :)

[~zabetak] I agree.

> Incorrectly formatted pom.xml error in Beeline
> --
>
> Key: HIVE-25987
> URL: https://issues.apache.org/jira/browse/HIVE-25987
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Abhay
>Priority: Major
>
> After applying the patch [https://github.com/apache/hive/pull/3043,] 
> HIVE-25750, the precommit tests have started complaining of this 
> *!!! incorrectly formatted pom.xmls detected; see above!*
> The code built fine locally and the pre-commit tests had run fine. Need to 
> investigate further why this was not caught earlier but the pom.xml file 
> needs to be fixed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25872) Skip tracking of alterDatabase events for replication specific properties.

2022-03-01 Thread Haymant Mangla (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haymant Mangla updated HIVE-25872:
--
Summary: Skip tracking of alterDatabase events for replication specific 
properties.  (was: Skip tracking of alterDatabase events for replication 
specific property (repl.last.id).)

> Skip tracking of alterDatabase events for replication specific properties.
> --
>
> Key: HIVE-25872
> URL: https://issues.apache.org/jira/browse/HIVE-25872
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-25665) Checkstyle LGPL files must not be in the release sources/binaries

2022-03-01 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-25665.
---
Resolution: Fixed

Pushed to master.

Thanks for the review [~kgyrtkirk]!

> Checkstyle LGPL files must not be in the release sources/binaries
> -
>
> Key: HIVE-25665
> URL: https://issues.apache.org/jira/browse/HIVE-25665
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 0.6.0
>Reporter: Stamatis Zampetakis
>Assignee: Peter Vary
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As discussed in the [dev 
> list|https://lists.apache.org/thread/r13e3236aa72a070b3267ed95f7cb3b45d3c4783fd4ca35f5376b1a35@%3cdev.hive.apache.org%3e]
>  LGPL files must not be present in the Apache released sources/binaries.
> The following files must not be present in the release:
> https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/checkstyle/checkstyle-noframes-sorted.xsl
> https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/storage-api/checkstyle/checkstyle-noframes-sorted.xsl
> https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/standalone-metastore/checkstyle/checkstyle-noframes-sorted.xsl
> There may be other checkstyle LGPL files in the repo. All these should either 
> be removed entirely from the repository or selectively excluded from the 
> release.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25665) Checkstyle LGPL files must not be in the release sources/binaries

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25665?focusedWorklogId=734985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734985
 ]

ASF GitHub Bot logged work on HIVE-25665:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 22:16
Start Date: 01/Mar/22 22:16
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #3063:
URL: https://github.com/apache/hive/pull/3063


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734985)
Time Spent: 0.5h  (was: 20m)

> Checkstyle LGPL files must not be in the release sources/binaries
> -
>
> Key: HIVE-25665
> URL: https://issues.apache.org/jira/browse/HIVE-25665
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 0.6.0
>Reporter: Stamatis Zampetakis
>Assignee: Peter Vary
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As discussed in the [dev 
> list|https://lists.apache.org/thread/r13e3236aa72a070b3267ed95f7cb3b45d3c4783fd4ca35f5376b1a35@%3cdev.hive.apache.org%3e]
>  LGPL files must not be present in the Apache released sources/binaries.
> The following files must not be present in the release:
> https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/checkstyle/checkstyle-noframes-sorted.xsl
> https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/storage-api/checkstyle/checkstyle-noframes-sorted.xsl
> https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/standalone-metastore/checkstyle/checkstyle-noframes-sorted.xsl
> There may be other checkstyle LGPL files in the repo. All these should either 
> be removed entirely from the repository or selectively excluded from the 
> release.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25995) Build from source distribution archive fails

2022-03-01 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-25995:
---
Priority: Blocker  (was: Major)

> Build from source distribution archive fails
> 
>
> Key: HIVE-25995
> URL: https://issues.apache.org/jira/browse/HIVE-25995
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Stamatis Zampetakis
>Priority: Blocker
>
> The source distribution archive, apache-hive-4.0.0-SNAPSHOT-src.tar.gz, can 
> be produced by running:
> {code:bash}
> mvn clean package -DskipTests -Pdist
> {code}
> The file is generated under:
> {noformat}
> packaging/target/apache-hive-4.0.0-SNAPSHOT-src.tar.gz
> {noformat}
> The source distribution archive/package 
> [should|https://www.apache.org/legal/release-policy.html#source-packages] 
> allow anyone who downloads it to build and test Hive.
> At the moment, on commit 
> [b63dab11d229abac59a4ef5e141d8d9b28037c8b|https://github.com/apache/hive/commit/b63dab11d229abac59a4ef5e141d8d9b28037c8b],
>  if someone produces the source package and extracts the contents of the 
> archive, it is not possible to build Hive.
> Both {{mvn install}} and {{mvn package}} commands fail when they are executed 
> inside the directory extracted from the archive.
> {noformat}
> mvn clean install -DskipTests
> mvn clean package -DskipTests
> {noformat}
> The error is shown below:
> {noformat}
> [INFO] Scanning for projects...
> [ERROR] [ERROR] Some problems were encountered while processing the POMs:
> [ERROR] Child module 
> /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/parser of 
> /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml does not 
> exist @ 
> [ERROR] Child module 
> /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/udf of 
> /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml does not 
> exist @ 
> [ERROR] Child module 
> /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/standalone-metastore/pom.xml
>  of /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml does not 
> exist @ 
>  @ 
> [ERROR] The build could not read 1 project -> [Help 1]
> [ERROR]   
> [ERROR]   The project org.apache.hive:hive:4.0.0-SNAPSHOT 
> (/home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml) has 3 errors
> [ERROR] Child module 
> /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/parser of 
> /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml does not exist
> [ERROR] Child module 
> /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/udf of 
> /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml does not exist
> [ERROR] Child module 
> /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/standalone-metastore/pom.xml
>  of /home/stamatis/Downloads/apache-hive-4.0.0-SNAPSHOT-src/pom.xml does not 
> exist
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25575) Add support for JWT authentication in HTTP mode

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25575?focusedWorklogId=734884=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734884
 ]

ASF GitHub Bot logged work on HIVE-25575:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 19:32
Start Date: 01/Mar/22 19:32
Worklog Time Spent: 10m 
  Work Description: hsnusonic commented on a change in pull request #3006:
URL: https://github.com/apache/hive/pull/3006#discussion_r817058633



##
File path: 
service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpServlet.java
##
@@ -213,6 +221,8 @@ protected void doPost(HttpServletRequest request, 
HttpServletResponse response)
 } else {
   clientUserName = doKerberosAuth(request);
 }
+  } else if (authType.isEnabled(HiveAuthConstants.AuthTypes.JWT) && 
hasJWT(request)) {

Review comment:
   We now support multi authentication at a time. If SAML and JWT are both 
enabled, the bearer token is possible for either of them so we need to 
distinguish which one should be used.

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4127,7 +4127,8 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "  (Use with property 
hive.server2.custom.authentication.class)\n" +
 "  PAM: Pluggable authentication module\n" +
 "  NOSASL:  Raw transport\n" +
-"  SAML: SAML 2.0 compliant authentication. This is only supported in 
http transport mode."),
+"  SAML: SAML 2.0 compliant authentication. This is only supported in 
http transport mode\n" +
+"  JWT: JWT based authentication, JWT needs to contain the user as 
subject"),

Review comment:
   Yes, it is. I will add a supplement here.

##
File path: 
service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpServlet.java
##
@@ -264,8 +274,7 @@ protected void doPost(HttpServletRequest request, 
HttpServletResponse response)
 LOG.info("Cookie added for clientUserName " + clientUserName);

Review comment:
   This is interesting. It seems the same scenario is applicable to all 
other authentications. HS2 will keep a authenticated user to have access for 
the max age of cookie, one day by default. I'm not sure if we need to change 
how we set the max age of cookie. If we need, maybe change it for all the 
authentications in another ticket?

##
File path: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
##
@@ -804,6 +811,34 @@ protected boolean requestIsAborted(final HttpRequest 
request) {
 return httpClientBuilder.build();
   }
 
+  private String getJWT() {
+String jwtCredential = getJWTStringFromSession();
+if (jwtCredential == null) {
+  jwtCredential = getJWTStringFromEnv();
+}
+return jwtCredential;
+  }
+
+  private String getJWTStringFromEnv() {
+String jwtCredential = System.getenv(JdbcConnectionParams.AUTH_JWT_ENV);
+if (jwtCredential == null) {
+  LOG.debug("No JWT is specified in env variable {}", 
JdbcConnectionParams.AUTH_JWT_ENV);
+} else {
+  LOG.debug("Fetched JWT from the env.");
+}
+return jwtCredential;
+  }
+
+  private String getJWTStringFromSession() {
+String jwtCredential = 
sessConfMap.get(JdbcConnectionParams.AUTH_TYPE_JWT_KEY);
+if (jwtCredential == null) {
+  LOG.debug("No JWT is specified in connection string.");
+} else {
+  LOG.debug("Fetched JWT from the session.");

Review comment:
   I assumed we should treat token like password, so I chose not to print 
it out. Thoughts?

##
File path: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
##
@@ -804,6 +811,34 @@ protected boolean requestIsAborted(final HttpRequest 
request) {
 return httpClientBuilder.build();
   }
 
+  private String getJWT() {
+String jwtCredential = getJWTStringFromSession();
+if (jwtCredential == null) {

Review comment:
   If the JWT is passed explicitly to the connection string, we should just 
use it no matter what it looks like, right? Do you think we should check it in 
driver side? For example, 
`jdbc:hive2:///default;transportMode=http;httpPath=cliservice;jwt=;`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734884)
Time Spent: 1.5h  (was: 1h 20m)

> Add support for JWT authentication in HTTP mode
> ---
>
> Key: HIVE-25575
> URL: https://issues.apache.org/jira/browse/HIVE-25575
> Project: 

[jira] [Work logged] (HIVE-25988) CreateTableEvent should have database object as one of the hive privilege object.

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25988?focusedWorklogId=734866=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734866
 ]

ASF GitHub Bot logged work on HIVE-25988:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 19:00
Start Date: 01/Mar/22 19:00
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #3057:
URL: https://github.com/apache/hive/pull/3057#discussion_r817056805



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/metastore/events/CreateTableEvent.java
##
@@ -56,16 +57,28 @@ public HiveMetaStoreAuthzInfo getAuthzContext() {
 return ret;
   }
 
-  private List getInputHObjs() { return 
Collections.emptyList(); }
+  private List getInputHObjs() {
+List ret   = new ArrayList<>();
+PreCreateTableEvent   event = (PreCreateTableEvent) preEventContext;
+Table table = event.getTable();
+Stringuri   = getSdLocation(table.getSd());
+

Review comment:
   Shouldn't we also add the table itself to this set? Looks like we are 
adding just the location check




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734866)
Time Spent: 20m  (was: 10m)

> CreateTableEvent should have database object as one of the hive privilege 
> object.
> -
>
> Key: HIVE-25988
> URL: https://issues.apache.org/jira/browse/HIVE-25988
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The CreateTableEvent in HMS should have a database object as one of the 
> HivePrivilege Objects so that it is consistent with HS2's CreateTable Event.
> Also, we need to move the DFS_URI object into the InputList so that this is 
> also consistent with HS2's behavior.
> Having database objects in the create table events hive privilege objects 
> helps to determine if a user has the right permissions to create a table in a 
> particular database via ranger/sentry.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734771=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734771
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 16:53
Start Date: 01/Mar/22 16:53
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816959131



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -8128,13 +8128,9 @@ private FileSinkDesc createFileSinkDesc(String dest, 
TableDesc table_desc,
   // Some non-native tables might be partitioned without partition spec 
information being present in the Table object
   HiveStorageHandler storageHandler = dest_tab.getStorageHandler();
   if (storageHandler != null && storageHandler.alwaysUnpartitioned()) {
-List nonNativePartSpecs = 
storageHandler.getPartitionTransformSpec(dest_tab);
-if (dpCtx == null && nonNativePartSpecs != null && 
!nonNativePartSpecs.isEmpty()) {
-  verifyDynamicPartitionEnabled(conf, qb, dest);
-  Map partSpec = new LinkedHashMap<>();
-  nonNativePartSpecs.forEach(ps -> partSpec.put(ps.getColumnName(), 
null));
-  dpCtx = new DynamicPartitionCtx(partSpec, 
conf.getVar(HiveConf.ConfVars.DEFAULTPARTITIONNAME),
-  
conf.getIntVar(HiveConf.ConfVars.DYNAMICPARTITIONMAXPARTSPERNODE));
+DynamicPartitionCtx nonNativeDpCtx = 
storageHandler.createDPContext(conf, dest_tab);
+if (dpCtx == null && nonNativeDpCtx != null) {

Review comment:
   That's right. It's because Hive doesn't know about partition column in 
Iceberg tables, so doesn't bother creating a dynamic partitioning context at 
all..




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734771)
Time Spent: 5h  (was: 4h 50m)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734768=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734768
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 16:50
Start Date: 01/Mar/22 16:50
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816956834



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
##
@@ -521,34 +570,45 @@ private void inferSortPositions(Operator fsParent,
   }
 }
 
-public ReduceSinkOperator getReduceSinkOp(List partitionPositions,
-List sortPositions, List sortOrder, List 
sortNullOrder,
-ArrayList allCols, ArrayList 
bucketColumns, int numBuckets,
-Operator parent, AcidUtils.Operation 
writeType) throws SemanticException {
+public ReduceSinkOperator getReduceSinkOp(List 
partitionPositions, List sortPositions,
+List, ExprNodeDesc>> customSortExprs, 
List sortOrder,
+List sortNullOrder, ArrayList allCols, 
ArrayList bucketColumns,
+int numBuckets, Operator parent, 
AcidUtils.Operation writeType) {
+
+  // Order of KEY columns, if custom sort is present partition and bucket 
columns are disregarded:
+  // 0) Custom sort expressions
+  //  1) Partition columns
+  //  2) Bucket number column

Review comment:
   This one's deliberate. Wanted to emphasise it's either 0+3 or 1,2+3




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734768)
Time Spent: 4h 50m  (was: 4h 40m)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734765=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734765
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 16:48
Start Date: 01/Mar/22 16:48
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816955075



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
##
@@ -585,6 +585,13 @@
 system.registerGenericUDF(GenericUDFMaskShowFirstN.UDF_NAME, 
GenericUDFMaskShowFirstN.class);
 system.registerGenericUDF(GenericUDFMaskShowLastN.UDF_NAME, 
GenericUDFMaskShowLastN.class);
 system.registerGenericUDF(GenericUDFMaskHash.UDF_NAME, 
GenericUDFMaskHash.class);
+
+try {
+  system.registerGenericUDF("iceberg_bucket",

Review comment:
   It's easier to deploy like this, as the this way iceberg_bucket will be 
a built-in function (which it is). I've seen some problems with the 
availability of UDFs on LLAP for example..




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734765)
Time Spent: 4h 40m  (was: 4.5h)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734762=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734762
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 16:47
Start Date: 01/Mar/22 16:47
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816953758



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -307,6 +328,41 @@ public boolean supportsPartitionTransform() {
 }).collect(Collectors.toList());
   }
 
+  @Override
+  public DynamicPartitionCtx createDPContext(HiveConf hiveConf, 
org.apache.hadoop.hive.ql.metadata.Table hmsTable)
+  throws SemanticException {
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = IcebergTableUtil.getTable(conf, tableDesc.getProperties());
+if (table.spec().isUnpartitioned()) {
+  return null;
+}
+// Iceberg currently doesn't have publicly accessible partition transform 
information, hence use above string parse
+List partitionTransformSpecs = 
getPartitionTransformSpec(hmsTable);
+
+DynamicPartitionCtx dpCtx = new DynamicPartitionCtx(new LinkedHashMap<>(),
+hiveConf.getVar(HiveConf.ConfVars.DEFAULTPARTITIONNAME),
+hiveConf.getIntVar(HiveConf.ConfVars.DYNAMICPARTITIONMAXPARTSPERNODE));
+List, ExprNodeDesc>> customSortExprs = new 
LinkedList<>();
+dpCtx.setCustomSortExpressions(customSortExprs);
+
+Map fieldOrderMap = new HashMap<>();
+List fields = table.schema().columns();
+for (int i = 0; i < fields.size(); ++i) {
+  fieldOrderMap.put(fields.get(i).name(), i);
+}
+
+for (PartitionTransformSpec spec : partitionTransformSpecs) {
+  int order = fieldOrderMap.get(spec.getColumnName());

Review comment:
   Sure, why not? Those expressions would just have the same source column.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734762)
Time Spent: 4.5h  (was: 4h 20m)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734760=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734760
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 16:44
Start Date: 01/Mar/22 16:44
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816951333



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java
##
@@ -280,6 +283,21 @@ default boolean supportsPartitionTransform() {
 return null;
   }
 
+  /**
+   * Creates a DynnamicPartitionCtx instance that will be set up by the 
storage handler itself. Useful for non-native
+   * tables where partitions are not handled by Hive, and sorting is required 
in a custom way before writing the table.

Review comment:
   fixing javadoc




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734760)
Time Spent: 4h 20m  (was: 4h 10m)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734759=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734759
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 16:42
Start Date: 01/Mar/22 16:42
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816949566



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/GenericUDFIcebergBucket.java
##
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.nio.ByteBuffer;
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
+import org.apache.hadoop.hive.serde2.io.HiveDecimalWritable;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableConstantIntObjectInspector;
+import org.apache.hadoop.hive.serde2.typeinfo.DecimalTypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;
+import org.apache.hadoop.io.BytesWritable;
+import org.apache.hadoop.io.DoubleWritable;
+import org.apache.hadoop.io.FloatWritable;
+import org.apache.hadoop.io.IntWritable;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.iceberg.transforms.Transform;
+import org.apache.iceberg.transforms.Transforms;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+/**
+ * GenericUDFIcebergBucket - UDF that wraps around Iceberg's bucket transform 
function
+ */
+@Description(name = "iceberg_bucket",
+value = "_FUNC_(value, bucketCount) - " +
+"Returns the bucket value calculated by Iceberg bucket transform 
function ",
+extended = "Example:\n  > SELECT _FUNC_('A bucket full of ice!', 5);\n  4")
+//@VectorizedExpressions({StringLength.class})
+public class GenericUDFIcebergBucket extends GenericUDF {
+  private final IntWritable result = new IntWritable();
+  private transient PrimitiveObjectInspector argumentOI;
+  private transient PrimitiveObjectInspectorConverter.StringConverter 
stringConverter;
+  private transient PrimitiveObjectInspectorConverter.BinaryConverter 
binaryConverter;
+  private transient PrimitiveObjectInspectorConverter.IntConverter 
intConverter;
+  private transient PrimitiveObjectInspectorConverter.LongConverter 
longConverter;
+  private transient PrimitiveObjectInspectorConverter.HiveDecimalConverter 
decimalConverter;
+  private transient PrimitiveObjectInspectorConverter.FloatConverter 
floatConverter;
+  private transient PrimitiveObjectInspectorConverter.DoubleConverter 
doubleConverter;
+  private transient Type.PrimitiveType icebergType;
+  private int numBuckets = -1;
+
+  @Override
+  public ObjectInspector initialize(ObjectInspector[] arguments) throws 
UDFArgumentException {
+if (arguments.length != 2) {
+  throw new UDFArgumentLengthException(
+  "ICEBERG_BUCKET requires 2 argument, got " + arguments.length);
+}
+
+if (arguments[0].getCategory() != ObjectInspector.Category.PRIMITIVE) {
+  throw new UDFArgumentException(
+  "ICEBERG_BUCKET first argument takes primitive types, got " + 
argumentOI.getTypeName());
+}
+argumentOI = (PrimitiveObjectInspector) arguments[0];
+
+PrimitiveObjectInspector.PrimitiveCategory inputType = 
argumentOI.getPrimitiveCategory();
+ObjectInspector outputOI = null;
+switch (inputType) {
+  case CHAR:
+ 

[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734758=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734758
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 16:41
Start Date: 01/Mar/22 16:41
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816948553



##
File path: ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicPartitionCtx.java
##
@@ -50,6 +52,14 @@
   private String defaultPartName; // default partition name in case of null or 
empty value
   private int maxPartsPerNode;// maximum dynamic partitions created per 
mapper/reducer
   private Pattern whiteListPattern;
+  /**
+   * Expressions describing a custom way of sorting the table before write. 
Expressions can reference simple
+   * column descriptions or a tree of expressions containing more columns and 
UDFs.
+   * Can be useful for custom bucket/hash sorting.
+   * A custom expression should be a lambda that is given the original column 
description expressions as per read
+   * schema and returns a single expression. Example for simply just 
referencing column 3: cols -> cols.get(3).clone()
+   */
+  private transient List, ExprNodeDesc>> 
customSortExpressions;

Review comment:
   Yeah, I was wondering if it's okay if these expressions don't get to the 
executors. But I got my answer - Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734758)
Time Spent: 4h  (was: 3h 50m)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734757=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734757
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 16:40
Start Date: 01/Mar/22 16:40
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816946872



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/GenericUDFIcebergBucket.java
##
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.nio.ByteBuffer;
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
+import org.apache.hadoop.hive.serde2.io.HiveDecimalWritable;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableConstantIntObjectInspector;
+import org.apache.hadoop.hive.serde2.typeinfo.DecimalTypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;
+import org.apache.hadoop.io.BytesWritable;
+import org.apache.hadoop.io.DoubleWritable;
+import org.apache.hadoop.io.FloatWritable;
+import org.apache.hadoop.io.IntWritable;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.iceberg.transforms.Transform;
+import org.apache.iceberg.transforms.Transforms;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+/**
+ * GenericUDFIcebergBucket - UDF that wraps around Iceberg's bucket transform 
function
+ */
+@Description(name = "iceberg_bucket",
+value = "_FUNC_(value, bucketCount) - " +
+"Returns the bucket value calculated by Iceberg bucket transform 
function ",
+extended = "Example:\n  > SELECT _FUNC_('A bucket full of ice!', 5);\n  4")
+//@VectorizedExpressions({StringLength.class})

Review comment:
   just a leftover - removing it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734757)
Time Spent: 3h 50m  (was: 3h 40m)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734756=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734756
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 16:39
Start Date: 01/Mar/22 16:39
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816946090



##
File path: ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicPartitionCtx.java
##
@@ -50,6 +52,14 @@
   private String defaultPartName; // default partition name in case of null or 
empty value
   private int maxPartsPerNode;// maximum dynamic partitions created per 
mapper/reducer
   private Pattern whiteListPattern;
+  /**
+   * Expressions describing a custom way of sorting the table before write. 
Expressions can reference simple
+   * column descriptions or a tree of expressions containing more columns and 
UDFs.
+   * Can be useful for custom bucket/hash sorting.
+   * A custom expression should be a lambda that is given the original column 
description expressions as per read
+   * schema and returns a single expression. Example for simply just 
referencing column 3: cols -> cols.get(3).clone()
+   */
+  private transient List, ExprNodeDesc>> 
customSortExpressions;

Review comment:
   The job has the dpCtx serialized, and for some reason kryo is throwing 
exceptions for lambdas on the executor side.
   The expressions here are only needed in HS2 during query planning so I just 
prevent them from going to executors in the first place.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734756)
Time Spent: 3h 40m  (was: 3.5h)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734654=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734654
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 14:59
Start Date: 01/Mar/22 14:59
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816848964



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -8128,13 +8128,9 @@ private FileSinkDesc createFileSinkDesc(String dest, 
TableDesc table_desc,
   // Some non-native tables might be partitioned without partition spec 
information being present in the Table object
   HiveStorageHandler storageHandler = dest_tab.getStorageHandler();
   if (storageHandler != null && storageHandler.alwaysUnpartitioned()) {
-List nonNativePartSpecs = 
storageHandler.getPartitionTransformSpec(dest_tab);
-if (dpCtx == null && nonNativePartSpecs != null && 
!nonNativePartSpecs.isEmpty()) {
-  verifyDynamicPartitionEnabled(conf, qb, dest);
-  Map partSpec = new LinkedHashMap<>();
-  nonNativePartSpecs.forEach(ps -> partSpec.put(ps.getColumnName(), 
null));
-  dpCtx = new DynamicPartitionCtx(partSpec, 
conf.getVar(HiveConf.ConfVars.DEFAULTPARTITIONNAME),
-  
conf.getIntVar(HiveConf.ConfVars.DYNAMICPARTITIONMAXPARTSPERNODE));
+DynamicPartitionCtx nonNativeDpCtx = 
storageHandler.createDPContext(conf, dest_tab);
+if (dpCtx == null && nonNativeDpCtx != null) {

Review comment:
   This means `dpCtx` does not get initialized at this point for Iceberg 
queries?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734654)
Time Spent: 3.5h  (was: 3h 20m)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734652=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734652
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 14:58
Start Date: 01/Mar/22 14:58
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816847425



##
File path: ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicPartitionCtx.java
##
@@ -50,6 +52,14 @@
   private String defaultPartName; // default partition name in case of null or 
empty value
   private int maxPartsPerNode;// maximum dynamic partitions created per 
mapper/reducer
   private Pattern whiteListPattern;
+  /**
+   * Expressions describing a custom way of sorting the table before write. 
Expressions can reference simple
+   * column descriptions or a tree of expressions containing more columns and 
UDFs.
+   * Can be useful for custom bucket/hash sorting.
+   * A custom expression should be a lambda that is given the original column 
description expressions as per read
+   * schema and returns a single expression. Example for simply just 
referencing column 3: cols -> cols.get(3).clone()
+   */
+  private transient List, ExprNodeDesc>> 
customSortExpressions;

Review comment:
   Out of curiosity, why transient?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734652)
Time Spent: 3h 20m  (was: 3h 10m)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734630=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734630
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 14:38
Start Date: 01/Mar/22 14:38
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816827544



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -307,6 +328,41 @@ public boolean supportsPartitionTransform() {
 }).collect(Collectors.toList());
   }
 
+  @Override
+  public DynamicPartitionCtx createDPContext(HiveConf hiveConf, 
org.apache.hadoop.hive.ql.metadata.Table hmsTable)
+  throws SemanticException {
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = IcebergTableUtil.getTable(conf, tableDesc.getProperties());
+if (table.spec().isUnpartitioned()) {
+  return null;
+}
+// Iceberg currently doesn't have publicly accessible partition transform 
information, hence use above string parse
+List partitionTransformSpecs = 
getPartitionTransformSpec(hmsTable);
+
+DynamicPartitionCtx dpCtx = new DynamicPartitionCtx(new LinkedHashMap<>(),

Review comment:
   nit: Maps.newLinkedHashMap()




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734630)
Time Spent: 3h 10m  (was: 3h)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25993) Query-based compaction doesn't work when partition column type is boolean

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25993?focusedWorklogId=734625=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734625
 ]

ASF GitHub Bot logged work on HIVE-25993:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 14:32
Start Date: 01/Mar/22 14:32
Worklog Time Spent: 10m 
  Work Description: veghlaci05 opened a new pull request #3065:
URL: https://github.com/apache/hive/pull/3065


   ### What changes were proposed in this pull request?
   This PR fixes the issue, when compaction fails on tables with boolean 
partition column.
   
   ### Why are the changes needed?
   Compaction must work properly on tables with boolean partition column as 
well.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Using automated tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734625)
Remaining Estimate: 0h
Time Spent: 10m

> Query-based compaction doesn't work when partition column type is boolean
> -
>
> Key: HIVE-25993
> URL: https://issues.apache.org/jira/browse/HIVE-25993
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Query based compaction fails on tables with boolean partition column.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25993) Query-based compaction doesn't work when partition column type is boolean

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25993:
--
Labels: pull-request-available  (was: )

> Query-based compaction doesn't work when partition column type is boolean
> -
>
> Key: HIVE-25993
> URL: https://issues.apache.org/jira/browse/HIVE-25993
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Query based compaction fails on tables with boolean partition column.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734621=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734621
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 14:30
Start Date: 01/Mar/22 14:30
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816819174



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/GenericUDFIcebergBucket.java
##
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.nio.ByteBuffer;
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
+import org.apache.hadoop.hive.serde2.io.HiveDecimalWritable;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableConstantIntObjectInspector;
+import org.apache.hadoop.hive.serde2.typeinfo.DecimalTypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;
+import org.apache.hadoop.io.BytesWritable;
+import org.apache.hadoop.io.DoubleWritable;
+import org.apache.hadoop.io.FloatWritable;
+import org.apache.hadoop.io.IntWritable;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.iceberg.transforms.Transform;
+import org.apache.iceberg.transforms.Transforms;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+/**
+ * GenericUDFIcebergBucket - UDF that wraps around Iceberg's bucket transform 
function
+ */
+@Description(name = "iceberg_bucket",
+value = "_FUNC_(value, bucketCount) - " +
+"Returns the bucket value calculated by Iceberg bucket transform 
function ",
+extended = "Example:\n  > SELECT _FUNC_('A bucket full of ice!', 5);\n  4")
+//@VectorizedExpressions({StringLength.class})
+public class GenericUDFIcebergBucket extends GenericUDF {
+  private final IntWritable result = new IntWritable();
+  private transient PrimitiveObjectInspector argumentOI;
+  private transient PrimitiveObjectInspectorConverter.StringConverter 
stringConverter;
+  private transient PrimitiveObjectInspectorConverter.BinaryConverter 
binaryConverter;
+  private transient PrimitiveObjectInspectorConverter.IntConverter 
intConverter;
+  private transient PrimitiveObjectInspectorConverter.LongConverter 
longConverter;
+  private transient PrimitiveObjectInspectorConverter.HiveDecimalConverter 
decimalConverter;
+  private transient PrimitiveObjectInspectorConverter.FloatConverter 
floatConverter;
+  private transient PrimitiveObjectInspectorConverter.DoubleConverter 
doubleConverter;
+  private transient Type.PrimitiveType icebergType;
+  private int numBuckets = -1;
+
+  @Override
+  public ObjectInspector initialize(ObjectInspector[] arguments) throws 
UDFArgumentException {
+if (arguments.length != 2) {
+  throw new UDFArgumentLengthException(
+  "ICEBERG_BUCKET requires 2 argument, got " + arguments.length);
+}
+
+if (arguments[0].getCategory() != ObjectInspector.Category.PRIMITIVE) {
+  throw new UDFArgumentException(
+  "ICEBERG_BUCKET first argument takes primitive types, got " + 
argumentOI.getTypeName());
+}
+argumentOI = (PrimitiveObjectInspector) arguments[0];
+
+PrimitiveObjectInspector.PrimitiveCategory inputType = 
argumentOI.getPrimitiveCategory();
+ObjectInspector outputOI = null;
+switch (inputType) {
+  case 

[jira] [Work logged] (HIVE-25980) Support HiveMetaStoreChecker.checkTable operation with multi-threaded

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=734620=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734620
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 14:30
Start Date: 01/Mar/22 14:30
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r816819306



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -311,48 +315,96 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
 
 Set partPaths = new HashSet<>();
 
-// check that the partition folders exist on disk
-for (Partition partition : parts) {
-  if (partition == null) {
-// most likely the user specified an invalid partition
-continue;
-  }
-  Path partPath = getDataLocation(table, partition);
-  if (partPath == null) {
-continue;
-  }
-  fs = partPath.getFileSystem(conf);
+int threadCount = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.METASTORE_MSCK_FS_HANDLER_THREADS_COUNT);
 
-  CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
-  prFromMetastore.setPartitionName(getPartitionName(table, partition));
-  prFromMetastore.setTableName(partition.getTableName());
-  if (!fs.exists(partPath)) {
-result.getPartitionsNotOnFs().add(prFromMetastore);
-  } else {
-result.getCorrectPartitions().add(prFromMetastore);
-  }
+Preconditions.checkArgument(!(threadCount < 
1),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be less than 1");
+Preconditions.checkArgument(!(threadCount > 
30),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be more than 30");
 
-  if (partitionExpirySeconds > 0) {
-long currentEpochSecs = Instant.now().getEpochSecond();
-long createdTime = partition.getCreateTime();
-long partitionAgeSeconds = currentEpochSecs - createdTime;
-if (partitionAgeSeconds > partitionExpirySeconds) {
-  CheckResult.PartitionResult pr = new CheckResult.PartitionResult();
-  pr.setPartitionName(getPartitionName(table, partition));
-  pr.setTableName(partition.getTableName());
-  result.getExpiredPartitions().add(pr);
-  if (LOG.isDebugEnabled()) {
-LOG.debug("{}.{}.{}.{} expired. createdAt: {} current: {} age: {}s 
expiry: {}s", partition.getCatName(),
-partition.getDbName(), partition.getTableName(), 
pr.getPartitionName(), createdTime, currentEpochSecs,
-partitionAgeSeconds, partitionExpirySeconds);
-  }
+LOG.debug("Running with threads "+threadCount);
+
+// For Multi Threaded run, we do not want to wait for All partitions in 
queue to be processed,
+// instead we run in batch to avoid OOM, we set Min and Max Pool Size = 
METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+// and Waiting Queue size = METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+
+final ExecutorService pool = new ThreadPoolExecutor(threadCount,
+threadCount,
+0L,
+TimeUnit.MILLISECONDS,
+new ArrayBlockingQueue<>(threadCount),
+new ThreadPoolExecutor.CallerRunsPolicy());
+
+ArrayList processedPartitions = new ArrayList();;
+
+try {
+  Queue> futures = new LinkedList<>();
+  // check that the partition folders exist on disk
+  for (Partition partition : parts) {
+if (partition == null) {
+  // most likely the user specified an invalid partition
+  continue;
 }
-  }
+Path partPath = getDataLocation(table, partition);
+if (partPath == null) {
+  continue;
+}
+futures.add(pool.submit(new Callable() {
+  @Override
+  public Object call() throws Exception {
+Path tempPartPath = partPath;
+FileSystem tempFs = tempPartPath.getFileSystem(conf);
+CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
+String partName = getPartitionName(table, partition);
+prFromMetastore.setPartitionName(partName);
+prFromMetastore.setTableName(partition.getTableName());
+
+synchronized (this) {
+  if (!tempFs.exists(tempPartPath)) {
+result.getPartitionsNotOnFs().add(prFromMetastore);
+  } else {
+result.getCorrectPartitions().add(prFromMetastore);
+  }
+}
 
-  for (int i = 0; i < getPartitionSpec(table, partition).size(); i++) {
-Path qualifiedPath = partPath.makeQualified(fs);
-partPaths.add(qualifiedPath);
-partPath = partPath.getParent();
+ 

[jira] [Work logged] (HIVE-25980) Support HiveMetaStoreChecker.checkTable operation with multi-threaded

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=734618=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734618
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 14:29
Start Date: 01/Mar/22 14:29
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r816818516



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -311,48 +315,96 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
 
 Set partPaths = new HashSet<>();
 
-// check that the partition folders exist on disk
-for (Partition partition : parts) {
-  if (partition == null) {
-// most likely the user specified an invalid partition
-continue;
-  }
-  Path partPath = getDataLocation(table, partition);
-  if (partPath == null) {
-continue;
-  }
-  fs = partPath.getFileSystem(conf);
+int threadCount = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.METASTORE_MSCK_FS_HANDLER_THREADS_COUNT);
 
-  CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
-  prFromMetastore.setPartitionName(getPartitionName(table, partition));
-  prFromMetastore.setTableName(partition.getTableName());
-  if (!fs.exists(partPath)) {
-result.getPartitionsNotOnFs().add(prFromMetastore);
-  } else {
-result.getCorrectPartitions().add(prFromMetastore);
-  }
+Preconditions.checkArgument(!(threadCount < 
1),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be less than 1");
+Preconditions.checkArgument(!(threadCount > 
30),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be more than 30");
 
-  if (partitionExpirySeconds > 0) {
-long currentEpochSecs = Instant.now().getEpochSecond();
-long createdTime = partition.getCreateTime();
-long partitionAgeSeconds = currentEpochSecs - createdTime;
-if (partitionAgeSeconds > partitionExpirySeconds) {
-  CheckResult.PartitionResult pr = new CheckResult.PartitionResult();
-  pr.setPartitionName(getPartitionName(table, partition));
-  pr.setTableName(partition.getTableName());
-  result.getExpiredPartitions().add(pr);
-  if (LOG.isDebugEnabled()) {
-LOG.debug("{}.{}.{}.{} expired. createdAt: {} current: {} age: {}s 
expiry: {}s", partition.getCatName(),
-partition.getDbName(), partition.getTableName(), 
pr.getPartitionName(), createdTime, currentEpochSecs,
-partitionAgeSeconds, partitionExpirySeconds);
-  }
+LOG.debug("Running with threads "+threadCount);
+
+// For Multi Threaded run, we do not want to wait for All partitions in 
queue to be processed,
+// instead we run in batch to avoid OOM, we set Min and Max Pool Size = 
METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+// and Waiting Queue size = METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+
+final ExecutorService pool = new ThreadPoolExecutor(threadCount,
+threadCount,
+0L,
+TimeUnit.MILLISECONDS,
+new ArrayBlockingQueue<>(threadCount),
+new ThreadPoolExecutor.CallerRunsPolicy());
+
+ArrayList processedPartitions = new ArrayList();;
+
+try {
+  Queue> futures = new LinkedList<>();
+  // check that the partition folders exist on disk
+  for (Partition partition : parts) {
+if (partition == null) {
+  // most likely the user specified an invalid partition
+  continue;
 }
-  }
+Path partPath = getDataLocation(table, partition);
+if (partPath == null) {
+  continue;
+}
+futures.add(pool.submit(new Callable() {
+  @Override
+  public Object call() throws Exception {
+Path tempPartPath = partPath;
+FileSystem tempFs = tempPartPath.getFileSystem(conf);
+CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
+String partName = getPartitionName(table, partition);
+prFromMetastore.setPartitionName(partName);
+prFromMetastore.setTableName(partition.getTableName());
+
+synchronized (this) {
+  if (!tempFs.exists(tempPartPath)) {
+result.getPartitionsNotOnFs().add(prFromMetastore);
+  } else {
+result.getCorrectPartitions().add(prFromMetastore);
+  }
+}
 
-  for (int i = 0; i < getPartitionSpec(table, partition).size(); i++) {
-Path qualifiedPath = partPath.makeQualified(fs);
-partPaths.add(qualifiedPath);
-partPath = partPath.getParent();
+ 

[jira] [Work logged] (HIVE-25980) Support HiveMetaStoreChecker.checkTable operation with multi-threaded

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=734615=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734615
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 14:28
Start Date: 01/Mar/22 14:28
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r816817711



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -311,48 +315,96 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
 
 Set partPaths = new HashSet<>();
 
-// check that the partition folders exist on disk
-for (Partition partition : parts) {
-  if (partition == null) {
-// most likely the user specified an invalid partition
-continue;
-  }
-  Path partPath = getDataLocation(table, partition);
-  if (partPath == null) {
-continue;
-  }
-  fs = partPath.getFileSystem(conf);
+int threadCount = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.METASTORE_MSCK_FS_HANDLER_THREADS_COUNT);
 
-  CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
-  prFromMetastore.setPartitionName(getPartitionName(table, partition));
-  prFromMetastore.setTableName(partition.getTableName());
-  if (!fs.exists(partPath)) {
-result.getPartitionsNotOnFs().add(prFromMetastore);
-  } else {
-result.getCorrectPartitions().add(prFromMetastore);
-  }
+Preconditions.checkArgument(!(threadCount < 
1),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be less than 1");
+Preconditions.checkArgument(!(threadCount > 
30),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be more than 30");
 
-  if (partitionExpirySeconds > 0) {
-long currentEpochSecs = Instant.now().getEpochSecond();
-long createdTime = partition.getCreateTime();
-long partitionAgeSeconds = currentEpochSecs - createdTime;
-if (partitionAgeSeconds > partitionExpirySeconds) {
-  CheckResult.PartitionResult pr = new CheckResult.PartitionResult();
-  pr.setPartitionName(getPartitionName(table, partition));
-  pr.setTableName(partition.getTableName());
-  result.getExpiredPartitions().add(pr);
-  if (LOG.isDebugEnabled()) {
-LOG.debug("{}.{}.{}.{} expired. createdAt: {} current: {} age: {}s 
expiry: {}s", partition.getCatName(),
-partition.getDbName(), partition.getTableName(), 
pr.getPartitionName(), createdTime, currentEpochSecs,
-partitionAgeSeconds, partitionExpirySeconds);
-  }
+LOG.debug("Running with threads "+threadCount);

Review comment:
   Please use ` LOG.debug("Running with threads "+threadCount);`

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -311,48 +315,96 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
 
 Set partPaths = new HashSet<>();
 
-// check that the partition folders exist on disk
-for (Partition partition : parts) {
-  if (partition == null) {
-// most likely the user specified an invalid partition
-continue;
-  }
-  Path partPath = getDataLocation(table, partition);
-  if (partPath == null) {
-continue;
-  }
-  fs = partPath.getFileSystem(conf);
+int threadCount = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.METASTORE_MSCK_FS_HANDLER_THREADS_COUNT);
 
-  CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
-  prFromMetastore.setPartitionName(getPartitionName(table, partition));
-  prFromMetastore.setTableName(partition.getTableName());
-  if (!fs.exists(partPath)) {
-result.getPartitionsNotOnFs().add(prFromMetastore);
-  } else {
-result.getCorrectPartitions().add(prFromMetastore);
-  }
+Preconditions.checkArgument(!(threadCount < 
1),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be less than 1");
+Preconditions.checkArgument(!(threadCount > 
30),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be more than 30");
 
-  if (partitionExpirySeconds > 0) {
-long currentEpochSecs = Instant.now().getEpochSecond();
-long createdTime = partition.getCreateTime();
-long partitionAgeSeconds = currentEpochSecs - createdTime;
-if (partitionAgeSeconds > partitionExpirySeconds) {
-  CheckResult.PartitionResult pr = new CheckResult.PartitionResult();
-  pr.setPartitionName(getPartitionName(table, partition));
-  pr.setTableName(partition.getTableName());
-  

[jira] [Work logged] (HIVE-25990) Optimise multiple copies in case of CTAS in external tables for Object stores

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25990?focusedWorklogId=734581=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734581
 ]

ASF GitHub Bot logged work on HIVE-25990:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 13:44
Start Date: 01/Mar/22 13:44
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on pull request #3058:
URL: https://github.com/apache/hive/pull/3058#issuecomment-1055456931


   the test failure isn't related.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734581)
Remaining Estimate: 0h
Time Spent: 10m

> Optimise multiple copies in case of CTAS in external tables for Object stores
> -
>
> Key: HIVE-25990
> URL: https://issues.apache.org/jira/browse/HIVE-25990
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Presently for CTAS with external tables, there are two renames, operations, 
> one from tmp to _ext and then from _ext to actual target.
> In case of object stores, the renames lead to actual copy. Avoid renaming by 
> avoiding rename from tmp to _ext, but by creating a list of files to be 
> copied in that directly, which can be consumed in the move task, to copy 
> directly from tmp to actual target.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25990) Optimise multiple copies in case of CTAS in external tables for Object stores

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25990:
--
Labels: pull-request-available  (was: )

> Optimise multiple copies in case of CTAS in external tables for Object stores
> -
>
> Key: HIVE-25990
> URL: https://issues.apache.org/jira/browse/HIVE-25990
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Presently for CTAS with external tables, there are two renames, operations, 
> one from tmp to _ext and then from _ext to actual target.
> In case of object stores, the renames lead to actual copy. Avoid renaming by 
> avoiding rename from tmp to _ext, but by creating a list of files to be 
> copied in that directly, which can be consumed in the move task, to copy 
> directly from tmp to actual target.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-13384) Failed to create HiveMetaStoreClient object with proxy user when Kerberos enabled

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-13384?focusedWorklogId=734561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734561
 ]

ASF GitHub Bot logged work on HIVE-13384:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 13:21
Start Date: 01/Mar/22 13:21
Worklog Time Spent: 10m 
  Work Description: cuibo01 edited a comment on pull request #3064:
URL: https://github.com/apache/hive/pull/3064#issuecomment-1055436970


   @thejasmn @nandakumar131  @zabetak @sankarh pls review, thx


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734561)
Time Spent: 0.5h  (was: 20m)

> Failed to create HiveMetaStoreClient object with proxy user when Kerberos 
> enabled
> -
>
> Key: HIVE-13384
> URL: https://issues.apache.org/jira/browse/HIVE-13384
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0)
> But found that it can't new a HiveMetaStoreClient object successfully via a 
> proxy user in Kerberos env.
> ===
> 15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
> at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
> at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
> ==
> When I debugging on Hive, I found that the error came from open() method in 
> HiveMetaStoreClient class.
> Around line 406,
>  transport = UserGroupInformation.getCurrentUser().doAs(new 
> PrivilegedExceptionAction() {  //FAILED, because the current user 
> doesn't have the cridential
> But it will work if I change above line to
>  transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new 
> PrivilegedExceptionAction() {  //PASS
> I found DRILL-3413 fixes this error in Drill side as a workaround. But if I 
> submit a mapreduce job via Pig/HCatalog, it runs into the same issue again 
> when initialize the object via HCatalog.
> It would be better to fix this issue in Hive side.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-13384) Failed to create HiveMetaStoreClient object with proxy user when Kerberos enabled

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-13384?focusedWorklogId=734559=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734559
 ]

ASF GitHub Bot logged work on HIVE-13384:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 13:20
Start Date: 01/Mar/22 13:20
Worklog Time Spent: 10m 
  Work Description: cuibo01 commented on pull request #3064:
URL: https://github.com/apache/hive/pull/3064#issuecomment-1055436970


   @thejasmn @nandakumar131  @zabetak @sankarh pls review 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734559)
Time Spent: 20m  (was: 10m)

> Failed to create HiveMetaStoreClient object with proxy user when Kerberos 
> enabled
> -
>
> Key: HIVE-13384
> URL: https://issues.apache.org/jira/browse/HIVE-13384
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0)
> But found that it can't new a HiveMetaStoreClient object successfully via a 
> proxy user in Kerberos env.
> ===
> 15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
> at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
> at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
> ==
> When I debugging on Hive, I found that the error came from open() method in 
> HiveMetaStoreClient class.
> Around line 406,
>  transport = UserGroupInformation.getCurrentUser().doAs(new 
> PrivilegedExceptionAction() {  //FAILED, because the current user 
> doesn't have the cridential
> But it will work if I change above line to
>  transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new 
> PrivilegedExceptionAction() {  //PASS
> I found DRILL-3413 fixes this error in Drill side as a workaround. But if I 
> submit a mapreduce job via Pig/HCatalog, it runs into the same issue again 
> when initialize the object via HCatalog.
> It would be better to fix this issue in Hive side.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-13384) Failed to create HiveMetaStoreClient object with proxy user when Kerberos enabled

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-13384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-13384:
--
Labels: pull-request-available  (was: )

> Failed to create HiveMetaStoreClient object with proxy user when Kerberos 
> enabled
> -
>
> Key: HIVE-13384
> URL: https://issues.apache.org/jira/browse/HIVE-13384
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0)
> But found that it can't new a HiveMetaStoreClient object successfully via a 
> proxy user in Kerberos env.
> ===
> 15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
> at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
> at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
> ==
> When I debugging on Hive, I found that the error came from open() method in 
> HiveMetaStoreClient class.
> Around line 406,
>  transport = UserGroupInformation.getCurrentUser().doAs(new 
> PrivilegedExceptionAction() {  //FAILED, because the current user 
> doesn't have the cridential
> But it will work if I change above line to
>  transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new 
> PrivilegedExceptionAction() {  //PASS
> I found DRILL-3413 fixes this error in Drill side as a workaround. But if I 
> submit a mapreduce job via Pig/HCatalog, it runs into the same issue again 
> when initialize the object via HCatalog.
> It would be better to fix this issue in Hive side.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-13384) Failed to create HiveMetaStoreClient object with proxy user when Kerberos enabled

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-13384?focusedWorklogId=734555=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734555
 ]

ASF GitHub Bot logged work on HIVE-13384:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 13:16
Start Date: 01/Mar/22 13:16
Worklog Time Spent: 10m 
  Work Description: cuibo01 opened a new pull request #3064:
URL: https://github.com/apache/hive/pull/3064


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734555)
Remaining Estimate: 0h
Time Spent: 10m

> Failed to create HiveMetaStoreClient object with proxy user when Kerberos 
> enabled
> -
>
> Key: HIVE-13384
> URL: https://issues.apache.org/jira/browse/HIVE-13384
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0)
> But found that it can't new a HiveMetaStoreClient object successfully via a 
> proxy user in Kerberos env.
> ===
> 15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
> at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
> at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
> ==
> When I debugging on Hive, I found that the error came from open() method in 
> HiveMetaStoreClient class.
> Around line 406,
>  transport = UserGroupInformation.getCurrentUser().doAs(new 
> PrivilegedExceptionAction() {  //FAILED, because the current user 
> doesn't have the cridential
> But it will work if I change above line to
>  transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new 
> PrivilegedExceptionAction() {  //PASS
> I found DRILL-3413 fixes this error in Drill side as a workaround. But if I 
> submit a mapreduce job via Pig/HCatalog, it runs into the same issue again 
> when initialize the object via HCatalog.
> It would be better to fix this issue in Hive side.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25935) Cleanup IMetaStoreClient#getPartitionsByNames APIs

2022-03-01 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-25935:
---
Fix Version/s: 4.0.0-alpha-1

> Cleanup IMetaStoreClient#getPartitionsByNames APIs
> --
>
> Key: HIVE-25935
> URL: https://issues.apache.org/jira/browse/HIVE-25935
> Project: Hive
>  Issue Type: Task
>  Components: Metastore
>Reporter: Stamatis Zampetakis
>Priority: Major
> Fix For: 4.0.0-alpha-1
>
>
> Currently the 
> [IMetastoreClient|https://github.com/apache/hive/blob/4b7a948e45fd88372fef573be321cda40d189cc7/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java]
>  interface has 8 variants of the {{getPartitionsByNames}} method. Going 
> quickly over the concrete implementation it appears that not all of them are 
> useful/necessary so a bit of cleanup is needed.
> Below a few potential problems I observed:
> * Some of the APIs are not used anywhere in the project (neither by 
> production nor by test code).
> * Some of the APIs are deprecated in some concrete implementations but not 
> globally at the interface level without an explanation why.
> * Some of the implementations simply throw without doing anything.
> * Many of the APIs are partially tested or not tested at all.
> HIVE-24743, HIVE-25281 are related since they introduce/deprecate some of the 
> aforementioned APIs.
> It would be good to review the aforementioned APIs and decide what needs to 
> stay and what needs to go as well as complete necessary when relevant.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25935) Cleanup IMetaStoreClient#getPartitionsByNames APIs

2022-03-01 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-25935:
---
Target Version/s: 4.0.0, 4.0.0-alpha-1  (was: 4.0.0)

> Cleanup IMetaStoreClient#getPartitionsByNames APIs
> --
>
> Key: HIVE-25935
> URL: https://issues.apache.org/jira/browse/HIVE-25935
> Project: Hive
>  Issue Type: Task
>  Components: Metastore
>Reporter: Stamatis Zampetakis
>Priority: Major
>
> Currently the 
> [IMetastoreClient|https://github.com/apache/hive/blob/4b7a948e45fd88372fef573be321cda40d189cc7/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java]
>  interface has 8 variants of the {{getPartitionsByNames}} method. Going 
> quickly over the concrete implementation it appears that not all of them are 
> useful/necessary so a bit of cleanup is needed.
> Below a few potential problems I observed:
> * Some of the APIs are not used anywhere in the project (neither by 
> production nor by test code).
> * Some of the APIs are deprecated in some concrete implementations but not 
> globally at the interface level without an explanation why.
> * Some of the implementations simply throw without doing anything.
> * Many of the APIs are partially tested or not tested at all.
> HIVE-24743, HIVE-25281 are related since they introduce/deprecate some of the 
> aforementioned APIs.
> It would be good to review the aforementioned APIs and decide what needs to 
> stay and what needs to go as well as complete necessary when relevant.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25665) Checkstyle LGPL files must not be in the release sources/binaries

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25665?focusedWorklogId=734550=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734550
 ]

ASF GitHub Bot logged work on HIVE-25665:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 12:59
Start Date: 01/Mar/22 12:59
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #3063:
URL: https://github.com/apache/hive/pull/3063#issuecomment-1055417080


   can't we simply remove these `xsl` -s from the repo and fix this issue? 
   if this file is LGPL ; I think we should remove it if possible


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734550)
Time Spent: 20m  (was: 10m)

> Checkstyle LGPL files must not be in the release sources/binaries
> -
>
> Key: HIVE-25665
> URL: https://issues.apache.org/jira/browse/HIVE-25665
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 0.6.0
>Reporter: Stamatis Zampetakis
>Assignee: Peter Vary
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As discussed in the [dev 
> list|https://lists.apache.org/thread/r13e3236aa72a070b3267ed95f7cb3b45d3c4783fd4ca35f5376b1a35@%3cdev.hive.apache.org%3e]
>  LGPL files must not be present in the Apache released sources/binaries.
> The following files must not be present in the release:
> https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/checkstyle/checkstyle-noframes-sorted.xsl
> https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/storage-api/checkstyle/checkstyle-noframes-sorted.xsl
> https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/standalone-metastore/checkstyle/checkstyle-noframes-sorted.xsl
> There may be other checkstyle LGPL files in the repo. All these should either 
> be removed entirely from the repository or selectively excluded from the 
> release.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25989) CTLT HBaseStorageHandler is dropping underlying HBase table when failed

2022-03-01 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25989:
-

Assignee: Marton Bod

> CTLT HBaseStorageHandler is dropping underlying HBase table when failed
> ---
>
> Key: HIVE-25989
> URL: https://issues.apache.org/jira/browse/HIVE-25989
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Marton Bod
>Priority: Major
>
> With hive.strict.managed.tables & hive.create.as.acid, 
> Hive-Hbase rollback code is assuming it is a createTable failure instead of 
> CTLT & removing underlying hbase table while rolling back at here.
> [https://github.com/apache/hive/blob/master/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseMetaHook.java#L187-L195]
>  
> Repro
>  
> {code:java}
> hbase
> =
> hbase shell
> create 'hbase_hive_table', 'cf'
> beeline
> ===
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.strict.managed.tables=true;
> set hive.create.as.acid=true;
> set hive.create.as.insert.only=true;
> set hive.default.fileformat.managed=ORC;
> > CREATE EXTERNAL TABLE `hbase_hive_table`(                       
>    `key` int COMMENT '',                            
>    `value` string COMMENT '')                       
>  ROW FORMAT SERDE                                   
>    'org.apache.hadoop.hive.hbase.HBaseSerDe'        
>  STORED BY                                          
>    'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  
>  WITH SERDEPROPERTIES (                             
>    'hbase.columns.mapping'=':key,cf:cf')                      
>  TBLPROPERTIES ('hbase.table.name'='hbase_hive_table');
> > select * from hbase_hive_table;
> +---+-+
> | hbase_hive_table.key  | hbase_hive_table.value  |
> +---+-+
> +---+-+
> > create table new_hbase_hive_table like hbase_hive_table;
> Caused by: org.apache.hadoop.hive.metastore.api.MetaException: The table must 
> be stored using an ACID compliant format (such as ORC): 
> default.new_hbase_hive_table
> > select * from hbase_hive_table;
> Error: java.io.IOException: org.apache.hadoop.hbase.TableNotFoundException: 
> hbase_hive_table
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734545=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734545
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 12:31
Start Date: 01/Mar/22 12:31
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816726890



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/GenericUDFIcebergBucket.java
##
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.nio.ByteBuffer;
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
+import org.apache.hadoop.hive.serde2.io.HiveDecimalWritable;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableConstantIntObjectInspector;
+import org.apache.hadoop.hive.serde2.typeinfo.DecimalTypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;
+import org.apache.hadoop.io.BytesWritable;
+import org.apache.hadoop.io.DoubleWritable;
+import org.apache.hadoop.io.FloatWritable;
+import org.apache.hadoop.io.IntWritable;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.iceberg.transforms.Transform;
+import org.apache.iceberg.transforms.Transforms;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+/**
+ * GenericUDFIcebergBucket - UDF that wraps around Iceberg's bucket transform 
function
+ */
+@Description(name = "iceberg_bucket",
+value = "_FUNC_(value, bucketCount) - " +
+"Returns the bucket value calculated by Iceberg bucket transform 
function ",
+extended = "Example:\n  > SELECT _FUNC_('A bucket full of ice!', 5);\n  4")
+//@VectorizedExpressions({StringLength.class})
+public class GenericUDFIcebergBucket extends GenericUDF {
+  private final IntWritable result = new IntWritable();
+  private transient PrimitiveObjectInspector argumentOI;
+  private transient PrimitiveObjectInspectorConverter.StringConverter 
stringConverter;
+  private transient PrimitiveObjectInspectorConverter.BinaryConverter 
binaryConverter;
+  private transient PrimitiveObjectInspectorConverter.IntConverter 
intConverter;
+  private transient PrimitiveObjectInspectorConverter.LongConverter 
longConverter;
+  private transient PrimitiveObjectInspectorConverter.HiveDecimalConverter 
decimalConverter;
+  private transient PrimitiveObjectInspectorConverter.FloatConverter 
floatConverter;
+  private transient PrimitiveObjectInspectorConverter.DoubleConverter 
doubleConverter;
+  private transient Type.PrimitiveType icebergType;
+  private int numBuckets = -1;
+
+  @Override
+  public ObjectInspector initialize(ObjectInspector[] arguments) throws 
UDFArgumentException {
+if (arguments.length != 2) {
+  throw new UDFArgumentLengthException(
+  "ICEBERG_BUCKET requires 2 argument, got " + arguments.length);
+}
+
+if (arguments[0].getCategory() != ObjectInspector.Category.PRIMITIVE) {
+  throw new UDFArgumentException(
+  "ICEBERG_BUCKET first argument takes primitive types, got " + 
argumentOI.getTypeName());
+}
+argumentOI = (PrimitiveObjectInspector) arguments[0];
+
+PrimitiveObjectInspector.PrimitiveCategory inputType = 
argumentOI.getPrimitiveCategory();
+ObjectInspector outputOI = null;
+switch (inputType) {
+  case 

[jira] [Work logged] (HIVE-25665) Checkstyle LGPL files must not be in the release sources/binaries

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25665?focusedWorklogId=734544=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734544
 ]

ASF GitHub Bot logged work on HIVE-25665:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 12:31
Start Date: 01/Mar/22 12:31
Worklog Time Spent: 10m 
  Work Description: pvary opened a new pull request #3063:
URL: https://github.com/apache/hive/pull/3063


   ### What changes were proposed in this pull request?
   remove the checkstyle directories from the release binaries
   
   ### Why are the changes needed?
   The checkstyle files were licensed under LGPL
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Checked the binaries manually
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734544)
Remaining Estimate: 0h
Time Spent: 10m

> Checkstyle LGPL files must not be in the release sources/binaries
> -
>
> Key: HIVE-25665
> URL: https://issues.apache.org/jira/browse/HIVE-25665
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 0.6.0
>Reporter: Stamatis Zampetakis
>Assignee: Peter Vary
>Priority: Blocker
> Fix For: 4.0.0-alpha-1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As discussed in the [dev 
> list|https://lists.apache.org/thread/r13e3236aa72a070b3267ed95f7cb3b45d3c4783fd4ca35f5376b1a35@%3cdev.hive.apache.org%3e]
>  LGPL files must not be present in the Apache released sources/binaries.
> The following files must not be present in the release:
> https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/checkstyle/checkstyle-noframes-sorted.xsl
> https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/storage-api/checkstyle/checkstyle-noframes-sorted.xsl
> https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/standalone-metastore/checkstyle/checkstyle-noframes-sorted.xsl
> There may be other checkstyle LGPL files in the repo. All these should either 
> be removed entirely from the repository or selectively excluded from the 
> release.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25665) Checkstyle LGPL files must not be in the release sources/binaries

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25665:
--
Labels: pull-request-available  (was: )

> Checkstyle LGPL files must not be in the release sources/binaries
> -
>
> Key: HIVE-25665
> URL: https://issues.apache.org/jira/browse/HIVE-25665
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 0.6.0
>Reporter: Stamatis Zampetakis
>Assignee: Peter Vary
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As discussed in the [dev 
> list|https://lists.apache.org/thread/r13e3236aa72a070b3267ed95f7cb3b45d3c4783fd4ca35f5376b1a35@%3cdev.hive.apache.org%3e]
>  LGPL files must not be present in the Apache released sources/binaries.
> The following files must not be present in the release:
> https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/checkstyle/checkstyle-noframes-sorted.xsl
> https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/storage-api/checkstyle/checkstyle-noframes-sorted.xsl
> https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/standalone-metastore/checkstyle/checkstyle-noframes-sorted.xsl
> There may be other checkstyle LGPL files in the repo. All these should either 
> be removed entirely from the repository or selectively excluded from the 
> release.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734543=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734543
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 12:27
Start Date: 01/Mar/22 12:27
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816723868



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/GenericUDFIcebergBucket.java
##
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.nio.ByteBuffer;
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
+import org.apache.hadoop.hive.serde2.io.HiveDecimalWritable;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableConstantIntObjectInspector;
+import org.apache.hadoop.hive.serde2.typeinfo.DecimalTypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;
+import org.apache.hadoop.io.BytesWritable;
+import org.apache.hadoop.io.DoubleWritable;
+import org.apache.hadoop.io.FloatWritable;
+import org.apache.hadoop.io.IntWritable;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.iceberg.transforms.Transform;
+import org.apache.iceberg.transforms.Transforms;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+/**
+ * GenericUDFIcebergBucket - UDF that wraps around Iceberg's bucket transform 
function
+ */
+@Description(name = "iceberg_bucket",
+value = "_FUNC_(value, bucketCount) - " +
+"Returns the bucket value calculated by Iceberg bucket transform 
function ",
+extended = "Example:\n  > SELECT _FUNC_('A bucket full of ice!', 5);\n  4")
+//@VectorizedExpressions({StringLength.class})
+public class GenericUDFIcebergBucket extends GenericUDF {
+  private final IntWritable result = new IntWritable();
+  private transient PrimitiveObjectInspector argumentOI;
+  private transient PrimitiveObjectInspectorConverter.StringConverter 
stringConverter;
+  private transient PrimitiveObjectInspectorConverter.BinaryConverter 
binaryConverter;
+  private transient PrimitiveObjectInspectorConverter.IntConverter 
intConverter;
+  private transient PrimitiveObjectInspectorConverter.LongConverter 
longConverter;
+  private transient PrimitiveObjectInspectorConverter.HiveDecimalConverter 
decimalConverter;
+  private transient PrimitiveObjectInspectorConverter.FloatConverter 
floatConverter;
+  private transient PrimitiveObjectInspectorConverter.DoubleConverter 
doubleConverter;
+  private transient Type.PrimitiveType icebergType;
+  private int numBuckets = -1;
+
+  @Override
+  public ObjectInspector initialize(ObjectInspector[] arguments) throws 
UDFArgumentException {
+if (arguments.length != 2) {
+  throw new UDFArgumentLengthException(
+  "ICEBERG_BUCKET requires 2 argument, got " + arguments.length);

Review comment:
   nit: arguments
   also, maybe we should tell the user what those arguments are, i.e. (value, 
bucketCount)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time 

[jira] [Assigned] (HIVE-25665) Checkstyle LGPL files must not be in the release sources/binaries

2022-03-01 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-25665:
-

Assignee: Peter Vary

> Checkstyle LGPL files must not be in the release sources/binaries
> -
>
> Key: HIVE-25665
> URL: https://issues.apache.org/jira/browse/HIVE-25665
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 0.6.0
>Reporter: Stamatis Zampetakis
>Assignee: Peter Vary
>Priority: Blocker
> Fix For: 4.0.0-alpha-1
>
>
> As discussed in the [dev 
> list|https://lists.apache.org/thread/r13e3236aa72a070b3267ed95f7cb3b45d3c4783fd4ca35f5376b1a35@%3cdev.hive.apache.org%3e]
>  LGPL files must not be present in the Apache released sources/binaries.
> The following files must not be present in the release:
> https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/checkstyle/checkstyle-noframes-sorted.xsl
> https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/storage-api/checkstyle/checkstyle-noframes-sorted.xsl
> https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/standalone-metastore/checkstyle/checkstyle-noframes-sorted.xsl
> There may be other checkstyle LGPL files in the repo. All these should either 
> be removed entirely from the repository or selectively excluded from the 
> release.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-23556) Support hive.metastore.limit.partition.request for get_partitions_ps

2022-03-01 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-23556:

Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

merged into master. Thank you [~ibenny]!

> Support hive.metastore.limit.partition.request for get_partitions_ps
> 
>
> Key: HIVE-23556
> URL: https://issues.apache.org/jira/browse/HIVE-23556
> Project: Hive
>  Issue Type: Improvement
>Reporter: Toshihiko Uchida
>Assignee: iBenny
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23556.2.patch, HIVE-23556.3.patch, 
> HIVE-23556.4.patch, HIVE-23556.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-13884 added the configuration hive.metastore.limit.partition.request to 
> limit the number of partitions that can be requested.
> Currently, it takes in effect for the following MetaStore APIs
> * get_partitions,
> * get_partitions_with_auth,
> * get_partitions_by_filter,
> * get_partitions_spec_by_filter,
> * get_partitions_by_expr,
> but not for
> * get_partitions_ps,
> * get_partitions_ps_with_auth.
> This issue proposes to apply the configuration also to get_partitions_ps and 
> get_partitions_ps_with_auth.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-23556) Support hive.metastore.limit.partition.request for get_partitions_ps

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23556?focusedWorklogId=734537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734537
 ]

ASF GitHub Bot logged work on HIVE-23556:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 12:00
Start Date: 01/Mar/22 12:00
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #3021:
URL: https://github.com/apache/hive/pull/3021


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734537)
Time Spent: 20m  (was: 10m)

> Support hive.metastore.limit.partition.request for get_partitions_ps
> 
>
> Key: HIVE-23556
> URL: https://issues.apache.org/jira/browse/HIVE-23556
> Project: Hive
>  Issue Type: Improvement
>Reporter: Toshihiko Uchida
>Assignee: iBenny
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-23556.2.patch, HIVE-23556.3.patch, 
> HIVE-23556.4.patch, HIVE-23556.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-13884 added the configuration hive.metastore.limit.partition.request to 
> limit the number of partitions that can be requested.
> Currently, it takes in effect for the following MetaStore APIs
> * get_partitions,
> * get_partitions_with_auth,
> * get_partitions_by_filter,
> * get_partitions_spec_by_filter,
> * get_partitions_by_expr,
> but not for
> * get_partitions_ps,
> * get_partitions_ps_with_auth.
> This issue proposes to apply the configuration also to get_partitions_ps and 
> get_partitions_ps_with_auth.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25983) CONCAT_ The error thrown by WS function when the array passed in is a non string array is a problem

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25983?focusedWorklogId=734511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734511
 ]

ASF GitHub Bot logged work on HIVE-25983:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 11:12
Start Date: 01/Mar/22 11:12
Worklog Time Spent: 10m 
  Work Description: dh20 commented on pull request #3054:
URL: https://github.com/apache/hive/pull/3054#issuecomment-1055316709


   @kasakrisz Oh, yes. I see that this part of the code is the same as the 
lower version of the code, and there is no record of modification. Therefore, 
it is considered that the master branch has the same problem.
   Now it seems that this problem does not exist in the master branch. Thank 
you for reviewing PR for me!!!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734511)
Time Spent: 1h 10m  (was: 1h)

> CONCAT_ The error thrown by WS function when the array passed in is a non 
> string array is a problem
> ---
>
> Key: HIVE-25983
> URL: https://issues.apache.org/jira/browse/HIVE-25983
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: hao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> CONCAT_ The error thrown by WS function when the array passed in is a non 
> string array is a problem。
> This error still prompts the user to pass in an array, but the real error is 
> not to pass in an array. It should prompt the user to pass in an array of 
> string type



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25994) Analyze table runs into ClassNotFoundException-s in case binary distribution is used

2022-03-01 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-25994:

Fix Version/s: 4.0.0-alpha-1

> Analyze table runs into ClassNotFoundException-s in case binary distribution 
> is used
> 
>
> Key: HIVE-25994
> URL: https://issues.apache.org/jira/browse/HIVE-25994
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
> Fix For: 4.0.0-alpha-1
>
>
> any nightly release can be used to reproduce this:
> {code}
> create table t (a integer); insert into t values (1) ; analyze table t 
> compute statistics for columns;
> {code}
> results in
> {code}
> Caused by: java.lang.NoClassDefFoundError: org/antlr/runtime/tree/CommonTree
> at java.lang.ClassLoader.defineClass1(Native Method)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:757)
> at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
> at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
> at java.lang.Class.getDeclaredConstructors0(Native Method)
> at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
> at java.lang.Class.getConstructor0(Class.java:3075)
> at java.lang.Class.getDeclaredConstructor(Class.java:2178)
> at 
> org.apache.hive.com.esotericsoftware.reflectasm.ConstructorAccess.get(ConstructorAccess.java:65)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultInstantiatorStrategy.newInstantiatorOf(DefaultInstantiatorStrategy.java:60)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1119)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1128)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:153)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:118)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:729)
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ReflectField.read(ReflectField.java:125)
> ... 38 more
> Caused by: java.lang.ClassNotFoundException: org.antlr.runtime.tree.CommonTree
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25665) Checkstyle LGPL files must not be in the release sources/binaries

2022-03-01 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-25665:

Fix Version/s: 4.0.0-alpha-1

> Checkstyle LGPL files must not be in the release sources/binaries
> -
>
> Key: HIVE-25665
> URL: https://issues.apache.org/jira/browse/HIVE-25665
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 0.6.0
>Reporter: Stamatis Zampetakis
>Priority: Blocker
> Fix For: 4.0.0-alpha-1
>
>
> As discussed in the [dev 
> list|https://lists.apache.org/thread/r13e3236aa72a070b3267ed95f7cb3b45d3c4783fd4ca35f5376b1a35@%3cdev.hive.apache.org%3e]
>  LGPL files must not be present in the Apache released sources/binaries.
> The following files must not be present in the release:
> https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/checkstyle/checkstyle-noframes-sorted.xsl
> https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/storage-api/checkstyle/checkstyle-noframes-sorted.xsl
> https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/standalone-metastore/checkstyle/checkstyle-noframes-sorted.xsl
> There may be other checkstyle LGPL files in the repo. All these should either 
> be removed entirely from the repository or selectively excluded from the 
> release.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734500=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734500
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 10:50
Start Date: 01/Mar/22 10:50
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816657904



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
##
@@ -521,34 +570,45 @@ private void inferSortPositions(Operator fsParent,
   }
 }
 
-public ReduceSinkOperator getReduceSinkOp(List partitionPositions,
-List sortPositions, List sortOrder, List 
sortNullOrder,
-ArrayList allCols, ArrayList 
bucketColumns, int numBuckets,
-Operator parent, AcidUtils.Operation 
writeType) throws SemanticException {
+public ReduceSinkOperator getReduceSinkOp(List 
partitionPositions, List sortPositions,
+List, ExprNodeDesc>> customSortExprs, 
List sortOrder,
+List sortNullOrder, ArrayList allCols, 
ArrayList bucketColumns,
+int numBuckets, Operator parent, 
AcidUtils.Operation writeType) {
+
+  // Order of KEY columns, if custom sort is present partition and bucket 
columns are disregarded:
+  // 0) Custom sort expressions
+  //  1) Partition columns
+  //  2) Bucket number column
+  // 3) Sort columns
+
+  boolean customSortExprPresent = customSortExprs != null && 
!customSortExprs.isEmpty();
 
-  // Order of KEY columns
-  // 1) Partition columns
-  // 2) Bucket number column
-  // 3) Sort columns
   Set keyColsPosInVal = Sets.newLinkedHashSet();
   ArrayList keyCols = Lists.newArrayList();
   List newSortOrder = Lists.newArrayList();
   List newSortNullOrder = Lists.newArrayList();
 
+  if (customSortExprPresent) {
+partitionPositions = new ArrayList<>();
+bucketColumns = new ArrayList<>();
+numBuckets = -1;
+  }
+
   keyColsPosInVal.addAll(partitionPositions);
   if (bucketColumns != null && !bucketColumns.isEmpty()) {
 keyColsPosInVal.add(-1);
   }
   keyColsPosInVal.addAll(sortPositions);
 
+

Review comment:
   nit: not needed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734500)
Time Spent: 2.5h  (was: 2h 20m)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734499
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 10:50
Start Date: 01/Mar/22 10:50
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816657703



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
##
@@ -521,34 +570,45 @@ private void inferSortPositions(Operator fsParent,
   }
 }
 
-public ReduceSinkOperator getReduceSinkOp(List partitionPositions,
-List sortPositions, List sortOrder, List 
sortNullOrder,
-ArrayList allCols, ArrayList 
bucketColumns, int numBuckets,
-Operator parent, AcidUtils.Operation 
writeType) throws SemanticException {
+public ReduceSinkOperator getReduceSinkOp(List 
partitionPositions, List sortPositions,
+List, ExprNodeDesc>> customSortExprs, 
List sortOrder,
+List sortNullOrder, ArrayList allCols, 
ArrayList bucketColumns,
+int numBuckets, Operator parent, 
AcidUtils.Operation writeType) {
+
+  // Order of KEY columns, if custom sort is present partition and bucket 
columns are disregarded:
+  // 0) Custom sort expressions
+  //  1) Partition columns
+  //  2) Bucket number column

Review comment:
   nit: formatting




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734499)
Time Spent: 2h 20m  (was: 2h 10m)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734498=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734498
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 10:47
Start Date: 01/Mar/22 10:47
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816656105



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
##
@@ -169,17 +182,27 @@ public Object process(Node nd, Stack stack, 
NodeProcessorCtx procCtx,
 
   // unlink connection between FS and its parent
   Operator fsParent = 
fsOp.getParentOperators().get(0);
-  // if all dp columns got constant folded then disable this optimization
-  if (allStaticPartitions(fsParent, fsOp.getConf().getDynPartCtx())) {
+  DynamicPartitionCtx dpCtx = fsOp.getConf().getDynPartCtx();
+
+  ArrayList parentCols = 
Lists.newArrayList(fsParent.getSchema().getSignature());
+  ArrayList allRSCols = Lists.newArrayList();
+  for (ColumnInfo ci : parentCols) {
+allRSCols.add(new ExprNodeColumnDesc(ci));
+  }
+
+  // if all dp columns / custom sort expressions got constant folded then 
disable this optimization
+  if (allStaticPartitions(fsParent, allRSCols, dpCtx)) {
 LOG.debug("Bailing out of sorted dynamic partition optimizer as all 
dynamic partition" +
 " columns got constant folded (static partitioning)");
 return null;
   }
 
-  DynamicPartitionCtx dpCtx = fsOp.getConf().getDynPartCtx();
   List partitionPositions = getPartitionPositions(dpCtx, 
fsParent.getSchema());
+  LinkedList, ExprNodeDesc>> customSortExprs =
+  new LinkedList<>(dpCtx.getCustomSortExpressions());

Review comment:
   Nit: Lists.newLinkedList?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734498)
Time Spent: 2h 10m  (was: 2h)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734494=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734494
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 10:46
Start Date: 01/Mar/22 10:46
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816655367



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java
##
@@ -280,6 +283,21 @@ default boolean supportsPartitionTransform() {
 return null;
   }
 
+  /**
+   * Creates a DynnamicPartitionCtx instance that will be set up by the 
storage handler itself. Useful for non-native

Review comment:
   nit: `DynnamicPartitionCtx`-> `DynamicPartitionCtx`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734494)
Time Spent: 2h  (was: 1h 50m)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734493
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 10:46
Start Date: 01/Mar/22 10:46
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816654862



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
##
@@ -585,6 +585,13 @@
 system.registerGenericUDF(GenericUDFMaskShowFirstN.UDF_NAME, 
GenericUDFMaskShowFirstN.class);
 system.registerGenericUDF(GenericUDFMaskShowLastN.UDF_NAME, 
GenericUDFMaskShowLastN.class);
 system.registerGenericUDF(GenericUDFMaskHash.UDF_NAME, 
GenericUDFMaskHash.class);
+
+try {
+  system.registerGenericUDF("iceberg_bucket",

Review comment:
   What happens if we add the iceberg jar with the `ADD JAR` command?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734493)
Time Spent: 1h 50m  (was: 1h 40m)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734491=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734491
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 10:44
Start Date: 01/Mar/22 10:44
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816653572



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -307,6 +328,41 @@ public boolean supportsPartitionTransform() {
 }).collect(Collectors.toList());
   }
 
+  @Override
+  public DynamicPartitionCtx createDPContext(HiveConf hiveConf, 
org.apache.hadoop.hive.ql.metadata.Table hmsTable)
+  throws SemanticException {
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = IcebergTableUtil.getTable(conf, tableDesc.getProperties());
+if (table.spec().isUnpartitioned()) {
+  return null;
+}
+// Iceberg currently doesn't have publicly accessible partition transform 
information, hence use above string parse
+List partitionTransformSpecs = 
getPartitionTransformSpec(hmsTable);
+
+DynamicPartitionCtx dpCtx = new DynamicPartitionCtx(new LinkedHashMap<>(),
+hiveConf.getVar(HiveConf.ConfVars.DEFAULTPARTITIONNAME),
+hiveConf.getIntVar(HiveConf.ConfVars.DYNAMICPARTITIONMAXPARTSPERNODE));
+List, ExprNodeDesc>> customSortExprs = new 
LinkedList<>();
+dpCtx.setCustomSortExpressions(customSortExprs);
+
+Map fieldOrderMap = new HashMap<>();
+List fields = table.schema().columns();
+for (int i = 0; i < fields.size(); ++i) {
+  fieldOrderMap.put(fields.get(i).name(), i);
+}
+
+for (PartitionTransformSpec spec : partitionTransformSpecs) {
+  int order = fieldOrderMap.get(spec.getColumnName());

Review comment:
   Is it possible to create 2 transforms for a single column?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734491)
Time Spent: 1h 40m  (was: 1.5h)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734487=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734487
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 10:34
Start Date: 01/Mar/22 10:34
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816645988



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -307,6 +328,41 @@ public boolean supportsPartitionTransform() {
 }).collect(Collectors.toList());
   }
 
+  @Override
+  public DynamicPartitionCtx createDPContext(HiveConf hiveConf, 
org.apache.hadoop.hive.ql.metadata.Table hmsTable)
+  throws SemanticException {
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = IcebergTableUtil.getTable(conf, tableDesc.getProperties());
+if (table.spec().isUnpartitioned()) {
+  return null;
+}
+// Iceberg currently doesn't have publicly accessible partition transform 
information, hence use above string parse
+List partitionTransformSpecs = 
getPartitionTransformSpec(hmsTable);
+
+DynamicPartitionCtx dpCtx = new DynamicPartitionCtx(new LinkedHashMap<>(),
+hiveConf.getVar(HiveConf.ConfVars.DEFAULTPARTITIONNAME),
+hiveConf.getIntVar(HiveConf.ConfVars.DYNAMICPARTITIONMAXPARTSPERNODE));
+List, ExprNodeDesc>> customSortExprs = new 
LinkedList<>();
+dpCtx.setCustomSortExpressions(customSortExprs);
+
+Map fieldOrderMap = new HashMap<>();

Review comment:
   Nit: Maps.newHashMap()




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734487)
Time Spent: 1.5h  (was: 1h 20m)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734486=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734486
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 10:34
Start Date: 01/Mar/22 10:34
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816645719



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -307,6 +328,41 @@ public boolean supportsPartitionTransform() {
 }).collect(Collectors.toList());
   }
 
+  @Override
+  public DynamicPartitionCtx createDPContext(HiveConf hiveConf, 
org.apache.hadoop.hive.ql.metadata.Table hmsTable)
+  throws SemanticException {
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = IcebergTableUtil.getTable(conf, tableDesc.getProperties());
+if (table.spec().isUnpartitioned()) {
+  return null;
+}
+// Iceberg currently doesn't have publicly accessible partition transform 
information, hence use above string parse
+List partitionTransformSpecs = 
getPartitionTransformSpec(hmsTable);
+
+DynamicPartitionCtx dpCtx = new DynamicPartitionCtx(new LinkedHashMap<>(),
+hiveConf.getVar(HiveConf.ConfVars.DEFAULTPARTITIONNAME),
+hiveConf.getIntVar(HiveConf.ConfVars.DYNAMICPARTITIONMAXPARTSPERNODE));
+List, ExprNodeDesc>> customSortExprs = new 
LinkedList<>();

Review comment:
   Nit: Lists.newLinkedList()




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734486)
Time Spent: 1h 20m  (was: 1h 10m)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25980) Support HiveMetaStoreChecker.checkTable operation with multi-threaded

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=734477=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734477
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 10:16
Start Date: 01/Mar/22 10:16
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r816484985



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -303,56 +308,103 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
 if (tablePath == null) {
   return;
 }
-FileSystem fs = tablePath.getFileSystem(conf);
-if (!fs.exists(tablePath)) {
+final FileSystem[] fs = {tablePath.getFileSystem(conf)};
+if (!fs[0].exists(tablePath)) {
   result.getTablesNotOnFs().add(table.getTableName());
   return;
 }
 
 Set partPaths = new HashSet<>();
 
-// check that the partition folders exist on disk
-for (Partition partition : parts) {
-  if (partition == null) {
-// most likely the user specified an invalid partition
-continue;
-  }
-  Path partPath = getDataLocation(table, partition);
-  if (partPath == null) {
-continue;
-  }
-  fs = partPath.getFileSystem(conf);
+int threadCount = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.METASTORE_MSCK_FS_HANDLER_THREADS_COUNT);
 
-  CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
-  prFromMetastore.setPartitionName(getPartitionName(table, partition));
-  prFromMetastore.setTableName(partition.getTableName());
-  if (!fs.exists(partPath)) {
-result.getPartitionsNotOnFs().add(prFromMetastore);
-  } else {
-result.getCorrectPartitions().add(prFromMetastore);
-  }
+Preconditions.checkArgument(!(threadCount < 
1),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be less than 1");
+Preconditions.checkArgument(!(threadCount > 
30),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be more than 30");
 
-  if (partitionExpirySeconds > 0) {
-long currentEpochSecs = Instant.now().getEpochSecond();
-long createdTime = partition.getCreateTime();
-long partitionAgeSeconds = currentEpochSecs - createdTime;
-if (partitionAgeSeconds > partitionExpirySeconds) {
-  CheckResult.PartitionResult pr = new CheckResult.PartitionResult();
-  pr.setPartitionName(getPartitionName(table, partition));
-  pr.setTableName(partition.getTableName());
-  result.getExpiredPartitions().add(pr);
-  if (LOG.isDebugEnabled()) {
-LOG.debug("{}.{}.{}.{} expired. createdAt: {} current: {} age: {}s 
expiry: {}s", partition.getCatName(),
-partition.getDbName(), partition.getTableName(), 
pr.getPartitionName(), createdTime, currentEpochSecs,
-partitionAgeSeconds, partitionExpirySeconds);
-  }
+LOG.debug("Running with threads "+threadCount);
+
+// For Multi Threaded run, we do not want to wait for All partitions in 
queue to be processed,
+// instead we run in batch to avoid OOM, we set Min and Max Pool Size = 
METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+// and Waiting Queue size = METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+
+final ExecutorService pool = new ThreadPoolExecutor(threadCount,
+threadCount,
+0L,
+TimeUnit.MILLISECONDS,
+new ArrayBlockingQueue<>(threadCount),
+new ThreadPoolExecutor.CallerRunsPolicy());
+
+try {
+  Queue> futures = new LinkedList<>();
+  // check that the partition folders exist on disk
+  for (Partition partition : parts) {

Review comment:
   Or it is run in the original thead

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -303,56 +308,103 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
 if (tablePath == null) {
   return;
 }
-FileSystem fs = tablePath.getFileSystem(conf);
-if (!fs.exists(tablePath)) {
+final FileSystem[] fs = {tablePath.getFileSystem(conf)};
+if (!fs[0].exists(tablePath)) {
   result.getTablesNotOnFs().add(table.getTableName());
   return;
 }
 
 Set partPaths = new HashSet<>();
 
-// check that the partition folders exist on disk
-for (Partition partition : parts) {
-  if (partition == null) {
-// most likely the user specified an invalid partition
-continue;
-  }
-  Path partPath = getDataLocation(table, partition);
-  if (partPath == null) {
-continue;
-  }
-  fs = 

[jira] [Comment Edited] (HIVE-15579) Support HADOOP_PROXY_USER for secure impersonation in hive metastore client

2022-03-01 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17499418#comment-17499418
 ] 

Bo Cui edited comment on HIVE-15579 at 3/1/22, 9:50 AM:


hi [~nanda] [~thejas] 

Why is RealUser called? In most cases, the LoginUser's RealUser is null.

https://github.com/apache/hive/blob/bf69b32c878c0d53f242cc38b6634c8ee4346e76/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java#L243

 

UserGroupInformation ugi = UserGroupInformation.createProxyUser(proxyUserA, 
realUserB )

i think proxyUserA is `UserGroupInformation.getCurrentUser()` and  realUserB is 
`UserGroupInformation.getCurrentUser().getRealUser()`

 


was (Author: bo cui):
hi [~nanda] [~thejas] 

Why is RealUser called? In most cases, the LoginUser's RealUser is null.

!image-2022-03-01-17-45-03-213.png!

 

UserGroupInformation ugi = UserGroupInformation.createProxyUser(proxyUserA, 
realUserB )

i think proxyUserA is `UserGroupInformation.getCurrentUser()` and  realUserB is 
`UserGroupInformation.getCurrentUser().getRealUser()`

 

> Support HADOOP_PROXY_USER for secure impersonation in hive metastore client
> ---
>
> Key: HIVE-15579
> URL: https://issues.apache.org/jira/browse/HIVE-15579
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas Nair
>Assignee: Nandakumar
>Priority: Major
>  Labels: TODOC2.2
> Fix For: 2.3.0
>
> Attachments: HIVE-15579.000.patch, HIVE-15579.001.patch, 
> HIVE-15579.002.patch, HIVE-15579.003.patch, HIVE-15579.003.patch
>
>
> Hadoop clients support HADOOP_PROXY_USER for secure impersonation. It would 
> be useful to have similar feature for hive metastore client.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-15579) Support HADOOP_PROXY_USER for secure impersonation in hive metastore client

2022-03-01 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17499418#comment-17499418
 ] 

Bo Cui commented on HIVE-15579:
---

hi [~nanda] [~thejas] 

Why is RealUser called? In most cases, the LoginUser's RealUser is null.

!image-2022-03-01-17-45-03-213.png!

 

UserGroupInformation ugi = UserGroupInformation.createProxyUser(proxyUserA, 
realUserB )

i think proxyUserA is `UserGroupInformation.getCurrentUser()` and  realUserB is 
`UserGroupInformation.getCurrentUser().getRealUser()`

 

> Support HADOOP_PROXY_USER for secure impersonation in hive metastore client
> ---
>
> Key: HIVE-15579
> URL: https://issues.apache.org/jira/browse/HIVE-15579
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas Nair
>Assignee: Nandakumar
>Priority: Major
>  Labels: TODOC2.2
> Fix For: 2.3.0
>
> Attachments: HIVE-15579.000.patch, HIVE-15579.001.patch, 
> HIVE-15579.002.patch, HIVE-15579.003.patch, HIVE-15579.003.patch
>
>
> Hadoop clients support HADOOP_PROXY_USER for secure impersonation. It would 
> be useful to have similar feature for hive metastore client.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25980) Support HiveMetaStoreChecker.checkTable operation with multi-threaded

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=734462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734462
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 09:35
Start Date: 01/Mar/22 09:35
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on a change in pull request #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r816597275



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -303,56 +308,103 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
 if (tablePath == null) {
   return;
 }
-FileSystem fs = tablePath.getFileSystem(conf);
-if (!fs.exists(tablePath)) {
+final FileSystem[] fs = {tablePath.getFileSystem(conf)};
+if (!fs[0].exists(tablePath)) {
   result.getTablesNotOnFs().add(table.getTableName());
   return;
 }
 
 Set partPaths = new HashSet<>();
 
-// check that the partition folders exist on disk
-for (Partition partition : parts) {
-  if (partition == null) {
-// most likely the user specified an invalid partition
-continue;
-  }
-  Path partPath = getDataLocation(table, partition);
-  if (partPath == null) {
-continue;
-  }
-  fs = partPath.getFileSystem(conf);
+int threadCount = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.METASTORE_MSCK_FS_HANDLER_THREADS_COUNT);
 
-  CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
-  prFromMetastore.setPartitionName(getPartitionName(table, partition));
-  prFromMetastore.setTableName(partition.getTableName());
-  if (!fs.exists(partPath)) {
-result.getPartitionsNotOnFs().add(prFromMetastore);
-  } else {
-result.getCorrectPartitions().add(prFromMetastore);
-  }
+Preconditions.checkArgument(!(threadCount < 
1),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be less than 1");
+Preconditions.checkArgument(!(threadCount > 
30),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be more than 30");
 
-  if (partitionExpirySeconds > 0) {
-long currentEpochSecs = Instant.now().getEpochSecond();
-long createdTime = partition.getCreateTime();
-long partitionAgeSeconds = currentEpochSecs - createdTime;
-if (partitionAgeSeconds > partitionExpirySeconds) {
-  CheckResult.PartitionResult pr = new CheckResult.PartitionResult();
-  pr.setPartitionName(getPartitionName(table, partition));
-  pr.setTableName(partition.getTableName());
-  result.getExpiredPartitions().add(pr);
-  if (LOG.isDebugEnabled()) {
-LOG.debug("{}.{}.{}.{} expired. createdAt: {} current: {} age: {}s 
expiry: {}s", partition.getCatName(),
-partition.getDbName(), partition.getTableName(), 
pr.getPartitionName(), createdTime, currentEpochSecs,
-partitionAgeSeconds, partitionExpirySeconds);
-  }
+LOG.debug("Running with threads "+threadCount);
+
+// For Multi Threaded run, we do not want to wait for All partitions in 
queue to be processed,
+// instead we run in batch to avoid OOM, we set Min and Max Pool Size = 
METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+// and Waiting Queue size = METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+
+final ExecutorService pool = new ThreadPoolExecutor(threadCount,
+threadCount,
+0L,
+TimeUnit.MILLISECONDS,
+new ArrayBlockingQueue<>(threadCount),
+new ThreadPoolExecutor.CallerRunsPolicy());
+
+try {
+  Queue> futures = new LinkedList<>();
+  // check that the partition folders exist on disk
+  for (Partition partition : parts) {
+if (partition == null) {
+  // most likely the user specified an invalid partition
+  continue;
 }
-  }
+Path[] partPath = {getDataLocation(table, partition)};
+if (partPath[0] == null) {
+  continue;
+}
+futures.add(pool.submit(new Callable() {
+  @Override
+  public Object call() throws Exception {
+fs[0] = partPath[0].getFileSystem(conf);
+CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
+prFromMetastore.setPartitionName(getPartitionName(table, 
partition));
+prFromMetastore.setTableName(partition.getTableName());
+if (!fs[0].exists(partPath[0])) {

Review comment:
   We are anyhow scanning all the partition from the filesystem in 
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java#L398
   
   We could remove the 

[jira] [Work logged] (HIVE-25980) Support HiveMetaStoreChecker.checkTable operation with multi-threaded

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=734461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734461
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 09:34
Start Date: 01/Mar/22 09:34
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on a change in pull request #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r816597275



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -303,56 +308,103 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
 if (tablePath == null) {
   return;
 }
-FileSystem fs = tablePath.getFileSystem(conf);
-if (!fs.exists(tablePath)) {
+final FileSystem[] fs = {tablePath.getFileSystem(conf)};
+if (!fs[0].exists(tablePath)) {
   result.getTablesNotOnFs().add(table.getTableName());
   return;
 }
 
 Set partPaths = new HashSet<>();
 
-// check that the partition folders exist on disk
-for (Partition partition : parts) {
-  if (partition == null) {
-// most likely the user specified an invalid partition
-continue;
-  }
-  Path partPath = getDataLocation(table, partition);
-  if (partPath == null) {
-continue;
-  }
-  fs = partPath.getFileSystem(conf);
+int threadCount = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.METASTORE_MSCK_FS_HANDLER_THREADS_COUNT);
 
-  CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
-  prFromMetastore.setPartitionName(getPartitionName(table, partition));
-  prFromMetastore.setTableName(partition.getTableName());
-  if (!fs.exists(partPath)) {
-result.getPartitionsNotOnFs().add(prFromMetastore);
-  } else {
-result.getCorrectPartitions().add(prFromMetastore);
-  }
+Preconditions.checkArgument(!(threadCount < 
1),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be less than 1");
+Preconditions.checkArgument(!(threadCount > 
30),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be more than 30");
 
-  if (partitionExpirySeconds > 0) {
-long currentEpochSecs = Instant.now().getEpochSecond();
-long createdTime = partition.getCreateTime();
-long partitionAgeSeconds = currentEpochSecs - createdTime;
-if (partitionAgeSeconds > partitionExpirySeconds) {
-  CheckResult.PartitionResult pr = new CheckResult.PartitionResult();
-  pr.setPartitionName(getPartitionName(table, partition));
-  pr.setTableName(partition.getTableName());
-  result.getExpiredPartitions().add(pr);
-  if (LOG.isDebugEnabled()) {
-LOG.debug("{}.{}.{}.{} expired. createdAt: {} current: {} age: {}s 
expiry: {}s", partition.getCatName(),
-partition.getDbName(), partition.getTableName(), 
pr.getPartitionName(), createdTime, currentEpochSecs,
-partitionAgeSeconds, partitionExpirySeconds);
-  }
+LOG.debug("Running with threads "+threadCount);
+
+// For Multi Threaded run, we do not want to wait for All partitions in 
queue to be processed,
+// instead we run in batch to avoid OOM, we set Min and Max Pool Size = 
METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+// and Waiting Queue size = METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+
+final ExecutorService pool = new ThreadPoolExecutor(threadCount,
+threadCount,
+0L,
+TimeUnit.MILLISECONDS,
+new ArrayBlockingQueue<>(threadCount),
+new ThreadPoolExecutor.CallerRunsPolicy());
+
+try {
+  Queue> futures = new LinkedList<>();
+  // check that the partition folders exist on disk
+  for (Partition partition : parts) {
+if (partition == null) {
+  // most likely the user specified an invalid partition
+  continue;
 }
-  }
+Path[] partPath = {getDataLocation(table, partition)};
+if (partPath[0] == null) {
+  continue;
+}
+futures.add(pool.submit(new Callable() {
+  @Override
+  public Object call() throws Exception {
+fs[0] = partPath[0].getFileSystem(conf);
+CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
+prFromMetastore.setPartitionName(getPartitionName(table, 
partition));
+prFromMetastore.setTableName(partition.getTableName());
+if (!fs[0].exists(partPath[0])) {

Review comment:
   We are anyhow scanning all the partition from the filesystem in 
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java#L398
   
   We could remove the 

[jira] [Work logged] (HIVE-25983) CONCAT_ The error thrown by WS function when the array passed in is a non string array is a problem

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25983?focusedWorklogId=734448=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734448
 ]

ASF GitHub Bot logged work on HIVE-25983:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 09:14
Start Date: 01/Mar/22 09:14
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on pull request #3054:
URL: https://github.com/apache/hive/pull/3054#issuecomment-1055193248


   @dh20 
   I run the following query using apache master
   ```
   SELECT concat_ws(',', array(1, 2, 3));
   ```
   and got
   ```
   Argument 2 of function CONCAT_WS must be "string or array", but 
"array" was found.
   ```
   This seems to be ok since it clearly indicates that an `array of int` was 
passed but an `array of string` is expected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734448)
Time Spent: 1h  (was: 50m)

> CONCAT_ The error thrown by WS function when the array passed in is a non 
> string array is a problem
> ---
>
> Key: HIVE-25983
> URL: https://issues.apache.org/jira/browse/HIVE-25983
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: hao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> CONCAT_ The error thrown by WS function when the array passed in is a non 
> string array is a problem。
> This error still prompts the user to pass in an array, but the real error is 
> not to pass in an array. It should prompt the user to pass in an array of 
> string type



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734443=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734443
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 08:59
Start Date: 01/Mar/22 08:59
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816568606



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java
##
@@ -280,6 +283,21 @@ default boolean supportsPartitionTransform() {
 return null;
   }
 
+  /**
+   * Creates a DynnamicPartitionCtx instance that will be set up by the 
storage handler itself. Useful for non-native
+   * tables where partitions are not handled by Hive, and sorting is required 
in a custom way before writing the table.

Review comment:
   What happens if the return value is `null`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734443)
Time Spent: 1h 10m  (was: 1h)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734442=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734442
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 08:56
Start Date: 01/Mar/22 08:56
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816565844



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -307,6 +328,41 @@ public boolean supportsPartitionTransform() {
 }).collect(Collectors.toList());
   }
 
+  @Override
+  public DynamicPartitionCtx createDPContext(HiveConf hiveConf, 
org.apache.hadoop.hive.ql.metadata.Table hmsTable)
+  throws SemanticException {
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = IcebergTableUtil.getTable(conf, tableDesc.getProperties());
+if (table.spec().isUnpartitioned()) {
+  return null;
+}
+// Iceberg currently doesn't have publicly accessible partition transform 
information, hence use above string parse

Review comment:
   nit: newline




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734442)
Time Spent: 1h  (was: 50m)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734441
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 08:55
Start Date: 01/Mar/22 08:55
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816565009



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -96,6 +104,20 @@
   private static final String ICEBERG_URI_PREFIX = "iceberg://";
   private static final Splitter TABLE_NAME_SPLITTER = Splitter.on("..");
   private static final String TABLE_NAME_SEPARATOR = "..";
+  private static final transient BiFunction, ExprNodeDesc>>

Review comment:
   nit: Comment please




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734441)
Time Spent: 50m  (was: 40m)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734440=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734440
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 08:53
Start Date: 01/Mar/22 08:53
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816564026



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/GenericUDFIcebergBucket.java
##
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.nio.ByteBuffer;
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
+import org.apache.hadoop.hive.serde2.io.HiveDecimalWritable;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableConstantIntObjectInspector;
+import org.apache.hadoop.hive.serde2.typeinfo.DecimalTypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;
+import org.apache.hadoop.io.BytesWritable;
+import org.apache.hadoop.io.DoubleWritable;
+import org.apache.hadoop.io.FloatWritable;
+import org.apache.hadoop.io.IntWritable;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.iceberg.transforms.Transform;
+import org.apache.iceberg.transforms.Transforms;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+/**
+ * GenericUDFIcebergBucket - UDF that wraps around Iceberg's bucket transform 
function
+ */
+@Description(name = "iceberg_bucket",
+value = "_FUNC_(value, bucketCount) - " +
+"Returns the bucket value calculated by Iceberg bucket transform 
function ",
+extended = "Example:\n  > SELECT _FUNC_('A bucket full of ice!', 5);\n  4")
+//@VectorizedExpressions({StringLength.class})
+public class GenericUDFIcebergBucket extends GenericUDF {
+  private final IntWritable result = new IntWritable();
+  private transient PrimitiveObjectInspector argumentOI;
+  private transient PrimitiveObjectInspectorConverter.StringConverter 
stringConverter;
+  private transient PrimitiveObjectInspectorConverter.BinaryConverter 
binaryConverter;
+  private transient PrimitiveObjectInspectorConverter.IntConverter 
intConverter;
+  private transient PrimitiveObjectInspectorConverter.LongConverter 
longConverter;
+  private transient PrimitiveObjectInspectorConverter.HiveDecimalConverter 
decimalConverter;
+  private transient PrimitiveObjectInspectorConverter.FloatConverter 
floatConverter;
+  private transient PrimitiveObjectInspectorConverter.DoubleConverter 
doubleConverter;
+  private transient Type.PrimitiveType icebergType;
+  private int numBuckets = -1;
+
+  @Override
+  public ObjectInspector initialize(ObjectInspector[] arguments) throws 
UDFArgumentException {
+if (arguments.length != 2) {
+  throw new UDFArgumentLengthException(
+  "ICEBERG_BUCKET requires 2 argument, got " + arguments.length);
+}
+
+if (arguments[0].getCategory() != ObjectInspector.Category.PRIMITIVE) {
+  throw new UDFArgumentException(
+  "ICEBERG_BUCKET first argument takes primitive types, got " + 
argumentOI.getTypeName());
+}
+argumentOI = (PrimitiveObjectInspector) arguments[0];
+
+PrimitiveObjectInspector.PrimitiveCategory inputType = 
argumentOI.getPrimitiveCategory();
+ObjectInspector outputOI = null;
+switch (inputType) {
+  case CHAR:
+ 

[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734438=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734438
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 08:50
Start Date: 01/Mar/22 08:50
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816561376



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/GenericUDFIcebergBucket.java
##
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.nio.ByteBuffer;
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
+import org.apache.hadoop.hive.serde2.io.HiveDecimalWritable;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableConstantIntObjectInspector;
+import org.apache.hadoop.hive.serde2.typeinfo.DecimalTypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;
+import org.apache.hadoop.io.BytesWritable;
+import org.apache.hadoop.io.DoubleWritable;
+import org.apache.hadoop.io.FloatWritable;
+import org.apache.hadoop.io.IntWritable;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.iceberg.transforms.Transform;
+import org.apache.iceberg.transforms.Transforms;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+/**
+ * GenericUDFIcebergBucket - UDF that wraps around Iceberg's bucket transform 
function
+ */
+@Description(name = "iceberg_bucket",
+value = "_FUNC_(value, bucketCount) - " +
+"Returns the bucket value calculated by Iceberg bucket transform 
function ",
+extended = "Example:\n  > SELECT _FUNC_('A bucket full of ice!', 5);\n  4")
+//@VectorizedExpressions({StringLength.class})
+public class GenericUDFIcebergBucket extends GenericUDF {
+  private final IntWritable result = new IntWritable();
+  private transient PrimitiveObjectInspector argumentOI;
+  private transient PrimitiveObjectInspectorConverter.StringConverter 
stringConverter;
+  private transient PrimitiveObjectInspectorConverter.BinaryConverter 
binaryConverter;
+  private transient PrimitiveObjectInspectorConverter.IntConverter 
intConverter;
+  private transient PrimitiveObjectInspectorConverter.LongConverter 
longConverter;
+  private transient PrimitiveObjectInspectorConverter.HiveDecimalConverter 
decimalConverter;
+  private transient PrimitiveObjectInspectorConverter.FloatConverter 
floatConverter;
+  private transient PrimitiveObjectInspectorConverter.DoubleConverter 
doubleConverter;
+  private transient Type.PrimitiveType icebergType;
+  private int numBuckets = -1;
+
+  @Override
+  public ObjectInspector initialize(ObjectInspector[] arguments) throws 
UDFArgumentException {
+if (arguments.length != 2) {
+  throw new UDFArgumentLengthException(
+  "ICEBERG_BUCKET requires 2 argument, got " + arguments.length);
+}
+
+if (arguments[0].getCategory() != ObjectInspector.Category.PRIMITIVE) {
+  throw new UDFArgumentException(
+  "ICEBERG_BUCKET first argument takes primitive types, got " + 
argumentOI.getTypeName());
+}
+argumentOI = (PrimitiveObjectInspector) arguments[0];
+
+PrimitiveObjectInspector.PrimitiveCategory inputType = 
argumentOI.getPrimitiveCategory();
+ObjectInspector outputOI = null;
+switch (inputType) {
+  case CHAR:
+ 

[jira] [Assigned] (HIVE-25993) Query-based compaction doesn't work when partition column type is boolean

2022-03-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Végh reassigned HIVE-25993:
--


> Query-based compaction doesn't work when partition column type is boolean
> -
>
> Key: HIVE-25993
> URL: https://issues.apache.org/jira/browse/HIVE-25993
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>
> Query based compaction fails on tables with boolean partition column.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25992) Can not init HiveMetaStoreClient when using proxy user

2022-03-01 Thread chenruotao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenruotao updated HIVE-25992:
--
Description: 
when I use this code to access hms
{code:java}
UserGroupInformation proxyUser = 
UserGroupInformation.createProxyUser(proxyUserName, userGroupInformation);
proxyUser.doAs((PrivilegedExceptionAction) () -> {
    XX
    return true;
});
 {code}
there will throw exception 

!image-2022-03-01-16-33-49-012.png|width=868,height=346! so I debug with this, 
and find something  maybe wrong in the class named HiveMetaStoreClient:

!image-2022-03-01-16-36-52-539.png!

because the value of UserGroupInformation.getLoginUser().getRealUser() that in 
proxy.doas is null, so i change to UserGroupInformation.getLoginUser() and i 
works

!image-2022-03-01-16-39-57-935.png!

so what I did is no problem?

  was:
when I use this code to access hms
{code:java}
UserGroupInformation proxyUser = 
UserGroupInformation.createProxyUser(proxyUserName, userGroupInformation);
proxyUser.doAs((PrivilegedExceptionAction) () -> {
    XX
    return true;
});
 {code}
there will throw exception 

!image-2022-03-01-16-33-49-012.png! so I debug with this, and find something  
maybe wrong in the class named HiveMetaStoreClient:

!image-2022-03-01-16-36-52-539.png!

because the value of UserGroupInformation.getLoginUser().getRealUser() that in 
proxy.doas is null, so i change to UserGroupInformation.getLoginUser() and i 
works

!image-2022-03-01-16-39-57-935.png!

so what I did is no problem?


> Can not init HiveMetaStoreClient when using proxy user
> --
>
> Key: HIVE-25992
> URL: https://issues.apache.org/jira/browse/HIVE-25992
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.9
>Reporter: chenruotao
>Priority: Major
> Attachments: image-2022-03-01-16-33-49-012.png, 
> image-2022-03-01-16-36-52-539.png, image-2022-03-01-16-39-57-935.png
>
>
> when I use this code to access hms
> {code:java}
> UserGroupInformation proxyUser = 
> UserGroupInformation.createProxyUser(proxyUserName, userGroupInformation);
> proxyUser.doAs((PrivilegedExceptionAction) () -> {
>     XX
>     return true;
> });
>  {code}
> there will throw exception 
> !image-2022-03-01-16-33-49-012.png|width=868,height=346! so I debug with 
> this, and find something  maybe wrong in the class named HiveMetaStoreClient:
> !image-2022-03-01-16-36-52-539.png!
> because the value of UserGroupInformation.getLoginUser().getRealUser() that 
> in proxy.doas is null, so i change to UserGroupInformation.getLoginUser() and 
> i works
> !image-2022-03-01-16-39-57-935.png!
> so what I did is no problem?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25975) Optimize ClusteredWriter for bucketed Iceberg tables

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25975?focusedWorklogId=734433=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734433
 ]

ASF GitHub Bot logged work on HIVE-25975:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 08:43
Start Date: 01/Mar/22 08:43
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3060:
URL: https://github.com/apache/hive/pull/3060#discussion_r816556254



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/GenericUDFIcebergBucket.java
##
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.nio.ByteBuffer;
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
+import org.apache.hadoop.hive.serde2.io.HiveDecimalWritable;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableConstantIntObjectInspector;
+import org.apache.hadoop.hive.serde2.typeinfo.DecimalTypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;
+import org.apache.hadoop.io.BytesWritable;
+import org.apache.hadoop.io.DoubleWritable;
+import org.apache.hadoop.io.FloatWritable;
+import org.apache.hadoop.io.IntWritable;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.iceberg.transforms.Transform;
+import org.apache.iceberg.transforms.Transforms;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+/**
+ * GenericUDFIcebergBucket - UDF that wraps around Iceberg's bucket transform 
function
+ */
+@Description(name = "iceberg_bucket",
+value = "_FUNC_(value, bucketCount) - " +
+"Returns the bucket value calculated by Iceberg bucket transform 
function ",
+extended = "Example:\n  > SELECT _FUNC_('A bucket full of ice!', 5);\n  4")
+//@VectorizedExpressions({StringLength.class})

Review comment:
   nit: Why is this commented out? Shall we just remove it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734433)
Time Spent: 20m  (was: 10m)

> Optimize ClusteredWriter for bucketed Iceberg tables
> 
>
> Key: HIVE-25975
> URL: https://issues.apache.org/jira/browse/HIVE-25975
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The first version of the ClusteredWriter in Hive-Iceberg will be lenient for 
> bucketed tables: i.e. the records do not need to be ordered by the bucket 
> values, the writer will just close its current file and open a new one for 
> out-of-order records. 
> This is suboptimal for the long-term due to creating many small files. Spark 
> uses a UDF to compute the bucket value for each record and therefore it is 
> able to order the records by bucket values, achieving optimal clustering.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25992) Can not init HiveMetaStoreClient when using proxy user

2022-03-01 Thread chenruotao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenruotao updated HIVE-25992:
--
Description: 
when I use this code to access hms
{code:java}
UserGroupInformation proxyUser = 
UserGroupInformation.createProxyUser(proxyUserName, userGroupInformation);
proxyUser.doAs((PrivilegedExceptionAction) () -> {
    XX
    return true;
});
 {code}
there will throw exception 

!image-2022-03-01-16-33-49-012.png! so I debug with this, and find something  
maybe wrong in the class named HiveMetaStoreClient:

!image-2022-03-01-16-36-52-539.png!

because the value of UserGroupInformation.getLoginUser().getRealUser() that in 
proxy.doas is null, so i change to UserGroupInformation.getLoginUser() and i 
works

!image-2022-03-01-16-39-57-935.png!

so what I did is no problem?

  was:
when I use this code to access hms

 
{code:java}
UserGroupInformation proxyUser = 
UserGroupInformation.createProxyUser(proxyUserName, userGroupInformation);
proxyUser.doAs((PrivilegedExceptionAction) () -> {
    XX
    return true;
});
 {code}
 

there will throw exception 

!image-2022-03-01-16-33-49-012.png!

  so I debug with this, and find something  maybe wrong in the class named 
HiveMetaStoreClient:

!image-2022-03-01-16-36-52-539.png!

because the value of UserGroupInformation.getLoginUser().getRealUser() that in 
proxy.doas is null, so i change to UserGroupInformation.getLoginUser() and i 
works

!image-2022-03-01-16-39-57-935.png!

so what I did is no problem?


> Can not init HiveMetaStoreClient when using proxy user
> --
>
> Key: HIVE-25992
> URL: https://issues.apache.org/jira/browse/HIVE-25992
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.9
>Reporter: chenruotao
>Priority: Major
> Attachments: image-2022-03-01-16-33-49-012.png, 
> image-2022-03-01-16-36-52-539.png, image-2022-03-01-16-39-57-935.png
>
>
> when I use this code to access hms
> {code:java}
> UserGroupInformation proxyUser = 
> UserGroupInformation.createProxyUser(proxyUserName, userGroupInformation);
> proxyUser.doAs((PrivilegedExceptionAction) () -> {
>     XX
>     return true;
> });
>  {code}
> there will throw exception 
> !image-2022-03-01-16-33-49-012.png! so I debug with this, and find something  
> maybe wrong in the class named HiveMetaStoreClient:
> !image-2022-03-01-16-36-52-539.png!
> because the value of UserGroupInformation.getLoginUser().getRealUser() that 
> in proxy.doas is null, so i change to UserGroupInformation.getLoginUser() and 
> i works
> !image-2022-03-01-16-39-57-935.png!
> so what I did is no problem?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25980) Support HiveMetaStoreChecker.checkTable operation with multi-threaded

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=734416=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734416
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 08:20
Start Date: 01/Mar/22 08:20
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r816540070



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -303,56 +308,103 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
 if (tablePath == null) {
   return;
 }
-FileSystem fs = tablePath.getFileSystem(conf);
-if (!fs.exists(tablePath)) {
+final FileSystem[] fs = {tablePath.getFileSystem(conf)};
+if (!fs[0].exists(tablePath)) {
   result.getTablesNotOnFs().add(table.getTableName());
   return;
 }
 
 Set partPaths = new HashSet<>();
 
-// check that the partition folders exist on disk
-for (Partition partition : parts) {
-  if (partition == null) {
-// most likely the user specified an invalid partition
-continue;
-  }
-  Path partPath = getDataLocation(table, partition);
-  if (partPath == null) {
-continue;
-  }
-  fs = partPath.getFileSystem(conf);
+int threadCount = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.METASTORE_MSCK_FS_HANDLER_THREADS_COUNT);
 
-  CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
-  prFromMetastore.setPartitionName(getPartitionName(table, partition));
-  prFromMetastore.setTableName(partition.getTableName());
-  if (!fs.exists(partPath)) {
-result.getPartitionsNotOnFs().add(prFromMetastore);
-  } else {
-result.getCorrectPartitions().add(prFromMetastore);
-  }
+Preconditions.checkArgument(!(threadCount < 
1),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be less than 1");
+Preconditions.checkArgument(!(threadCount > 
30),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be more than 30");
 
-  if (partitionExpirySeconds > 0) {
-long currentEpochSecs = Instant.now().getEpochSecond();
-long createdTime = partition.getCreateTime();
-long partitionAgeSeconds = currentEpochSecs - createdTime;
-if (partitionAgeSeconds > partitionExpirySeconds) {
-  CheckResult.PartitionResult pr = new CheckResult.PartitionResult();
-  pr.setPartitionName(getPartitionName(table, partition));
-  pr.setTableName(partition.getTableName());
-  result.getExpiredPartitions().add(pr);
-  if (LOG.isDebugEnabled()) {
-LOG.debug("{}.{}.{}.{} expired. createdAt: {} current: {} age: {}s 
expiry: {}s", partition.getCatName(),
-partition.getDbName(), partition.getTableName(), 
pr.getPartitionName(), createdTime, currentEpochSecs,
-partitionAgeSeconds, partitionExpirySeconds);
-  }
+LOG.debug("Running with threads "+threadCount);
+
+// For Multi Threaded run, we do not want to wait for All partitions in 
queue to be processed,
+// instead we run in batch to avoid OOM, we set Min and Max Pool Size = 
METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+// and Waiting Queue size = METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+
+final ExecutorService pool = new ThreadPoolExecutor(threadCount,
+threadCount,
+0L,
+TimeUnit.MILLISECONDS,
+new ArrayBlockingQueue<>(threadCount),
+new ThreadPoolExecutor.CallerRunsPolicy());
+
+try {
+  Queue> futures = new LinkedList<>();
+  // check that the partition folders exist on disk
+  for (Partition partition : parts) {
+if (partition == null) {
+  // most likely the user specified an invalid partition
+  continue;
 }
-  }
+Path[] partPath = {getDataLocation(table, partition)};
+if (partPath[0] == null) {
+  continue;
+}
+futures.add(pool.submit(new Callable() {
+  @Override
+  public Object call() throws Exception {
+fs[0] = partPath[0].getFileSystem(conf);
+CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
+prFromMetastore.setPartitionName(getPartitionName(table, 
partition));
+prFromMetastore.setTableName(partition.getTableName());
+if (!fs[0].exists(partPath[0])) {
+  synchronized (result) {
+result.getPartitionsNotOnFs().add(prFromMetastore);
+  }
+} else {
+  synchronized (result) {
+result.getCorrectPartitions().add(prFromMetastore);
+  }
+

[jira] [Work logged] (HIVE-25980) Support HiveMetaStoreChecker.checkTable operation with multi-threaded

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=734415=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734415
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 08:19
Start Date: 01/Mar/22 08:19
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r816539245



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -303,56 +308,103 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
 if (tablePath == null) {
   return;
 }
-FileSystem fs = tablePath.getFileSystem(conf);
-if (!fs.exists(tablePath)) {
+final FileSystem[] fs = {tablePath.getFileSystem(conf)};
+if (!fs[0].exists(tablePath)) {
   result.getTablesNotOnFs().add(table.getTableName());
   return;
 }
 
 Set partPaths = new HashSet<>();
 
-// check that the partition folders exist on disk
-for (Partition partition : parts) {
-  if (partition == null) {
-// most likely the user specified an invalid partition
-continue;
-  }
-  Path partPath = getDataLocation(table, partition);
-  if (partPath == null) {
-continue;
-  }
-  fs = partPath.getFileSystem(conf);
+int threadCount = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.METASTORE_MSCK_FS_HANDLER_THREADS_COUNT);
 
-  CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
-  prFromMetastore.setPartitionName(getPartitionName(table, partition));
-  prFromMetastore.setTableName(partition.getTableName());
-  if (!fs.exists(partPath)) {
-result.getPartitionsNotOnFs().add(prFromMetastore);
-  } else {
-result.getCorrectPartitions().add(prFromMetastore);
-  }
+Preconditions.checkArgument(!(threadCount < 
1),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be less than 1");
+Preconditions.checkArgument(!(threadCount > 
30),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be more than 30");
 
-  if (partitionExpirySeconds > 0) {
-long currentEpochSecs = Instant.now().getEpochSecond();
-long createdTime = partition.getCreateTime();
-long partitionAgeSeconds = currentEpochSecs - createdTime;
-if (partitionAgeSeconds > partitionExpirySeconds) {
-  CheckResult.PartitionResult pr = new CheckResult.PartitionResult();
-  pr.setPartitionName(getPartitionName(table, partition));
-  pr.setTableName(partition.getTableName());
-  result.getExpiredPartitions().add(pr);
-  if (LOG.isDebugEnabled()) {
-LOG.debug("{}.{}.{}.{} expired. createdAt: {} current: {} age: {}s 
expiry: {}s", partition.getCatName(),
-partition.getDbName(), partition.getTableName(), 
pr.getPartitionName(), createdTime, currentEpochSecs,
-partitionAgeSeconds, partitionExpirySeconds);
-  }
+LOG.debug("Running with threads "+threadCount);
+
+// For Multi Threaded run, we do not want to wait for All partitions in 
queue to be processed,
+// instead we run in batch to avoid OOM, we set Min and Max Pool Size = 
METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+// and Waiting Queue size = METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+
+final ExecutorService pool = new ThreadPoolExecutor(threadCount,
+threadCount,
+0L,
+TimeUnit.MILLISECONDS,
+new ArrayBlockingQueue<>(threadCount),
+new ThreadPoolExecutor.CallerRunsPolicy());
+
+try {
+  Queue> futures = new LinkedList<>();
+  // check that the partition folders exist on disk
+  for (Partition partition : parts) {
+if (partition == null) {
+  // most likely the user specified an invalid partition
+  continue;
 }
-  }
+Path[] partPath = {getDataLocation(table, partition)};
+if (partPath[0] == null) {
+  continue;
+}
+futures.add(pool.submit(new Callable() {
+  @Override
+  public Object call() throws Exception {
+fs[0] = partPath[0].getFileSystem(conf);
+CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
+prFromMetastore.setPartitionName(getPartitionName(table, 
partition));
+prFromMetastore.setTableName(partition.getTableName());
+if (!fs[0].exists(partPath[0])) {
+  synchronized (result) {
+result.getPartitionsNotOnFs().add(prFromMetastore);
+  }
+} else {
+  synchronized (result) {
+result.getCorrectPartitions().add(prFromMetastore);
+  }
+

[jira] [Work logged] (HIVE-25980) Support HiveMetaStoreChecker.checkTable operation with multi-threaded

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=734410=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734410
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 08:16
Start Date: 01/Mar/22 08:16
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r816537279



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -303,56 +308,103 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
 if (tablePath == null) {
   return;
 }
-FileSystem fs = tablePath.getFileSystem(conf);
-if (!fs.exists(tablePath)) {
+final FileSystem[] fs = {tablePath.getFileSystem(conf)};
+if (!fs[0].exists(tablePath)) {
   result.getTablesNotOnFs().add(table.getTableName());
   return;
 }
 
 Set partPaths = new HashSet<>();
 
-// check that the partition folders exist on disk
-for (Partition partition : parts) {
-  if (partition == null) {
-// most likely the user specified an invalid partition
-continue;
-  }
-  Path partPath = getDataLocation(table, partition);
-  if (partPath == null) {
-continue;
-  }
-  fs = partPath.getFileSystem(conf);
+int threadCount = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.METASTORE_MSCK_FS_HANDLER_THREADS_COUNT);
 
-  CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
-  prFromMetastore.setPartitionName(getPartitionName(table, partition));
-  prFromMetastore.setTableName(partition.getTableName());
-  if (!fs.exists(partPath)) {
-result.getPartitionsNotOnFs().add(prFromMetastore);
-  } else {
-result.getCorrectPartitions().add(prFromMetastore);
-  }
+Preconditions.checkArgument(!(threadCount < 
1),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be less than 1");
+Preconditions.checkArgument(!(threadCount > 
30),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be more than 30");
 
-  if (partitionExpirySeconds > 0) {
-long currentEpochSecs = Instant.now().getEpochSecond();
-long createdTime = partition.getCreateTime();
-long partitionAgeSeconds = currentEpochSecs - createdTime;
-if (partitionAgeSeconds > partitionExpirySeconds) {
-  CheckResult.PartitionResult pr = new CheckResult.PartitionResult();
-  pr.setPartitionName(getPartitionName(table, partition));
-  pr.setTableName(partition.getTableName());
-  result.getExpiredPartitions().add(pr);
-  if (LOG.isDebugEnabled()) {
-LOG.debug("{}.{}.{}.{} expired. createdAt: {} current: {} age: {}s 
expiry: {}s", partition.getCatName(),
-partition.getDbName(), partition.getTableName(), 
pr.getPartitionName(), createdTime, currentEpochSecs,
-partitionAgeSeconds, partitionExpirySeconds);
-  }
+LOG.debug("Running with threads "+threadCount);
+
+// For Multi Threaded run, we do not want to wait for All partitions in 
queue to be processed,
+// instead we run in batch to avoid OOM, we set Min and Max Pool Size = 
METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+// and Waiting Queue size = METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+
+final ExecutorService pool = new ThreadPoolExecutor(threadCount,
+threadCount,
+0L,
+TimeUnit.MILLISECONDS,
+new ArrayBlockingQueue<>(threadCount),
+new ThreadPoolExecutor.CallerRunsPolicy());
+
+try {
+  Queue> futures = new LinkedList<>();
+  // check that the partition folders exist on disk
+  for (Partition partition : parts) {
+if (partition == null) {
+  // most likely the user specified an invalid partition
+  continue;
 }
-  }
+Path[] partPath = {getDataLocation(table, partition)};

Review comment:
   Also this might be problematic with concurrent execution - exactly 
because this is not final




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734410)
Time Spent: 2h 50m  (was: 2h 40m)

> Support HiveMetaStoreChecker.checkTable operation with multi-threaded
> -
>
> Key: HIVE-25980
> URL: https://issues.apache.org/jira/browse/HIVE-25980
> 

[jira] [Work logged] (HIVE-25980) Support HiveMetaStoreChecker.checkTable operation with multi-threaded

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=734409=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734409
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 08:14
Start Date: 01/Mar/22 08:14
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r816536599



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -303,56 +308,103 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
 if (tablePath == null) {
   return;
 }
-FileSystem fs = tablePath.getFileSystem(conf);
-if (!fs.exists(tablePath)) {
+final FileSystem[] fs = {tablePath.getFileSystem(conf)};
+if (!fs[0].exists(tablePath)) {
   result.getTablesNotOnFs().add(table.getTableName());
   return;
 }
 
 Set partPaths = new HashSet<>();
 
-// check that the partition folders exist on disk
-for (Partition partition : parts) {
-  if (partition == null) {
-// most likely the user specified an invalid partition
-continue;
-  }
-  Path partPath = getDataLocation(table, partition);
-  if (partPath == null) {
-continue;
-  }
-  fs = partPath.getFileSystem(conf);
+int threadCount = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.METASTORE_MSCK_FS_HANDLER_THREADS_COUNT);
 
-  CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
-  prFromMetastore.setPartitionName(getPartitionName(table, partition));
-  prFromMetastore.setTableName(partition.getTableName());
-  if (!fs.exists(partPath)) {
-result.getPartitionsNotOnFs().add(prFromMetastore);
-  } else {
-result.getCorrectPartitions().add(prFromMetastore);
-  }
+Preconditions.checkArgument(!(threadCount < 
1),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be less than 1");
+Preconditions.checkArgument(!(threadCount > 
30),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be more than 30");
 
-  if (partitionExpirySeconds > 0) {
-long currentEpochSecs = Instant.now().getEpochSecond();
-long createdTime = partition.getCreateTime();
-long partitionAgeSeconds = currentEpochSecs - createdTime;
-if (partitionAgeSeconds > partitionExpirySeconds) {
-  CheckResult.PartitionResult pr = new CheckResult.PartitionResult();
-  pr.setPartitionName(getPartitionName(table, partition));
-  pr.setTableName(partition.getTableName());
-  result.getExpiredPartitions().add(pr);
-  if (LOG.isDebugEnabled()) {
-LOG.debug("{}.{}.{}.{} expired. createdAt: {} current: {} age: {}s 
expiry: {}s", partition.getCatName(),
-partition.getDbName(), partition.getTableName(), 
pr.getPartitionName(), createdTime, currentEpochSecs,
-partitionAgeSeconds, partitionExpirySeconds);
-  }
+LOG.debug("Running with threads "+threadCount);
+
+// For Multi Threaded run, we do not want to wait for All partitions in 
queue to be processed,
+// instead we run in batch to avoid OOM, we set Min and Max Pool Size = 
METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+// and Waiting Queue size = METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+
+final ExecutorService pool = new ThreadPoolExecutor(threadCount,
+threadCount,
+0L,
+TimeUnit.MILLISECONDS,
+new ArrayBlockingQueue<>(threadCount),
+new ThreadPoolExecutor.CallerRunsPolicy());
+
+try {
+  Queue> futures = new LinkedList<>();
+  // check that the partition folders exist on disk
+  for (Partition partition : parts) {
+if (partition == null) {
+  // most likely the user specified an invalid partition
+  continue;
 }
-  }
+Path[] partPath = {getDataLocation(table, partition)};
+if (partPath[0] == null) {
+  continue;
+}
+futures.add(pool.submit(new Callable() {
+  @Override
+  public Object call() throws Exception {
+fs[0] = partPath[0].getFileSystem(conf);

Review comment:
   Why are we reusing the `fs[0]` object?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734409)
Time Spent: 2h 40m  (was: 2.5h)

> Support HiveMetaStoreChecker.checkTable operation with multi-threaded
> 

[jira] [Work logged] (HIVE-25980) Support HiveMetaStoreChecker.checkTable operation with multi-threaded

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=734403=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734403
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 08:13
Start Date: 01/Mar/22 08:13
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r816535345



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -303,56 +308,103 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
 if (tablePath == null) {
   return;
 }
-FileSystem fs = tablePath.getFileSystem(conf);
-if (!fs.exists(tablePath)) {
+final FileSystem[] fs = {tablePath.getFileSystem(conf)};
+if (!fs[0].exists(tablePath)) {
   result.getTablesNotOnFs().add(table.getTableName());
   return;
 }
 
 Set partPaths = new HashSet<>();
 
-// check that the partition folders exist on disk
-for (Partition partition : parts) {
-  if (partition == null) {
-// most likely the user specified an invalid partition
-continue;
-  }
-  Path partPath = getDataLocation(table, partition);
-  if (partPath == null) {
-continue;
-  }
-  fs = partPath.getFileSystem(conf);
+int threadCount = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.METASTORE_MSCK_FS_HANDLER_THREADS_COUNT);
 
-  CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
-  prFromMetastore.setPartitionName(getPartitionName(table, partition));
-  prFromMetastore.setTableName(partition.getTableName());
-  if (!fs.exists(partPath)) {
-result.getPartitionsNotOnFs().add(prFromMetastore);
-  } else {
-result.getCorrectPartitions().add(prFromMetastore);
-  }
+Preconditions.checkArgument(!(threadCount < 
1),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be less than 1");
+Preconditions.checkArgument(!(threadCount > 
30),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be more than 30");
 
-  if (partitionExpirySeconds > 0) {
-long currentEpochSecs = Instant.now().getEpochSecond();
-long createdTime = partition.getCreateTime();
-long partitionAgeSeconds = currentEpochSecs - createdTime;
-if (partitionAgeSeconds > partitionExpirySeconds) {
-  CheckResult.PartitionResult pr = new CheckResult.PartitionResult();
-  pr.setPartitionName(getPartitionName(table, partition));
-  pr.setTableName(partition.getTableName());
-  result.getExpiredPartitions().add(pr);
-  if (LOG.isDebugEnabled()) {
-LOG.debug("{}.{}.{}.{} expired. createdAt: {} current: {} age: {}s 
expiry: {}s", partition.getCatName(),
-partition.getDbName(), partition.getTableName(), 
pr.getPartitionName(), createdTime, currentEpochSecs,
-partitionAgeSeconds, partitionExpirySeconds);
-  }
+LOG.debug("Running with threads "+threadCount);
+
+// For Multi Threaded run, we do not want to wait for All partitions in 
queue to be processed,
+// instead we run in batch to avoid OOM, we set Min and Max Pool Size = 
METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+// and Waiting Queue size = METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+
+final ExecutorService pool = new ThreadPoolExecutor(threadCount,
+threadCount,
+0L,
+TimeUnit.MILLISECONDS,
+new ArrayBlockingQueue<>(threadCount),
+new ThreadPoolExecutor.CallerRunsPolicy());
+
+try {
+  Queue> futures = new LinkedList<>();
+  // check that the partition folders exist on disk
+  for (Partition partition : parts) {
+if (partition == null) {
+  // most likely the user specified an invalid partition
+  continue;
 }
-  }
+Path[] partPath = {getDataLocation(table, partition)};
+if (partPath[0] == null) {
+  continue;
+}
+futures.add(pool.submit(new Callable() {
+  @Override
+  public Object call() throws Exception {
+fs[0] = partPath[0].getFileSystem(conf);
+CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
+prFromMetastore.setPartitionName(getPartitionName(table, 
partition));
+prFromMetastore.setTableName(partition.getTableName());
+if (!fs[0].exists(partPath[0])) {
+  synchronized (result) {
+result.getPartitionsNotOnFs().add(prFromMetastore);
+  }
+} else {
+  synchronized (result) {
+result.getCorrectPartitions().add(prFromMetastore);
+  }
+

[jira] [Work logged] (HIVE-25980) Support HiveMetaStoreChecker.checkTable operation with multi-threaded

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=734399=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734399
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 08:11
Start Date: 01/Mar/22 08:11
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r816534259



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -303,56 +308,103 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
 if (tablePath == null) {
   return;
 }
-FileSystem fs = tablePath.getFileSystem(conf);
-if (!fs.exists(tablePath)) {
+final FileSystem[] fs = {tablePath.getFileSystem(conf)};
+if (!fs[0].exists(tablePath)) {
   result.getTablesNotOnFs().add(table.getTableName());
   return;
 }
 
 Set partPaths = new HashSet<>();
 
-// check that the partition folders exist on disk
-for (Partition partition : parts) {
-  if (partition == null) {
-// most likely the user specified an invalid partition
-continue;
-  }
-  Path partPath = getDataLocation(table, partition);
-  if (partPath == null) {
-continue;
-  }
-  fs = partPath.getFileSystem(conf);
+int threadCount = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.METASTORE_MSCK_FS_HANDLER_THREADS_COUNT);
 
-  CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
-  prFromMetastore.setPartitionName(getPartitionName(table, partition));
-  prFromMetastore.setTableName(partition.getTableName());
-  if (!fs.exists(partPath)) {
-result.getPartitionsNotOnFs().add(prFromMetastore);
-  } else {
-result.getCorrectPartitions().add(prFromMetastore);
-  }
+Preconditions.checkArgument(!(threadCount < 
1),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be less than 1");
+Preconditions.checkArgument(!(threadCount > 
30),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be more than 30");
 
-  if (partitionExpirySeconds > 0) {
-long currentEpochSecs = Instant.now().getEpochSecond();
-long createdTime = partition.getCreateTime();
-long partitionAgeSeconds = currentEpochSecs - createdTime;
-if (partitionAgeSeconds > partitionExpirySeconds) {
-  CheckResult.PartitionResult pr = new CheckResult.PartitionResult();
-  pr.setPartitionName(getPartitionName(table, partition));
-  pr.setTableName(partition.getTableName());
-  result.getExpiredPartitions().add(pr);
-  if (LOG.isDebugEnabled()) {
-LOG.debug("{}.{}.{}.{} expired. createdAt: {} current: {} age: {}s 
expiry: {}s", partition.getCatName(),
-partition.getDbName(), partition.getTableName(), 
pr.getPartitionName(), createdTime, currentEpochSecs,
-partitionAgeSeconds, partitionExpirySeconds);
-  }
+LOG.debug("Running with threads "+threadCount);
+
+// For Multi Threaded run, we do not want to wait for All partitions in 
queue to be processed,
+// instead we run in batch to avoid OOM, we set Min and Max Pool Size = 
METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+// and Waiting Queue size = METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+
+final ExecutorService pool = new ThreadPoolExecutor(threadCount,
+threadCount,
+0L,
+TimeUnit.MILLISECONDS,
+new ArrayBlockingQueue<>(threadCount),
+new ThreadPoolExecutor.CallerRunsPolicy());
+
+try {
+  Queue> futures = new LinkedList<>();
+  // check that the partition folders exist on disk
+  for (Partition partition : parts) {
+if (partition == null) {
+  // most likely the user specified an invalid partition
+  continue;
 }
-  }
+Path[] partPath = {getDataLocation(table, partition)};
+if (partPath[0] == null) {
+  continue;
+}
+futures.add(pool.submit(new Callable() {
+  @Override
+  public Object call() throws Exception {
+fs[0] = partPath[0].getFileSystem(conf);
+CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
+prFromMetastore.setPartitionName(getPartitionName(table, 
partition));
+prFromMetastore.setTableName(partition.getTableName());
+if (!fs[0].exists(partPath[0])) {
+  synchronized (result) {
+result.getPartitionsNotOnFs().add(prFromMetastore);
+  }
+} else {
+  synchronized (result) {
+result.getCorrectPartitions().add(prFromMetastore);
+  }
+

[jira] [Work logged] (HIVE-25980) Support HiveMetaStoreChecker.checkTable operation with multi-threaded

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=734396=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734396
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 08:07
Start Date: 01/Mar/22 08:07
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r816531781



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -303,56 +308,103 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
 if (tablePath == null) {
   return;
 }
-FileSystem fs = tablePath.getFileSystem(conf);
-if (!fs.exists(tablePath)) {
+final FileSystem[] fs = {tablePath.getFileSystem(conf)};

Review comment:
   Is this just for tricking the `final` to accept changes?
   Wouldn't be better for readability to use different variables instead?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734396)
Time Spent: 2h 10m  (was: 2h)

> Support HiveMetaStoreChecker.checkTable operation with multi-threaded
> -
>
> Key: HIVE-25980
> URL: https://issues.apache.org/jira/browse/HIVE-25980
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> MSCK Repair table for high partition table can perform slow on Cloud Storage 
> such as S3, one of the case we found where slowness was observed in 
> HiveMetaStoreChecker.checkTable.
> {code:java}
> "HiveServer2-Background-Pool: Thread-382" #382 prio=5 os_prio=0 
> tid=0x7f97fc4a4000 nid=0x5c2a runnable [0x7f97c41a8000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:171)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at 
> sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
>   at 
> sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
>   at 
> sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341)
>   at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)
>   at 
> sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>   at 
> com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
>   at 
> 

[jira] [Work logged] (HIVE-25980) Support HiveMetaStoreChecker.checkTable operation with multi-threaded

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=734395=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734395
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 08:06
Start Date: 01/Mar/22 08:06
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r816483649



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -303,56 +308,103 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
 if (tablePath == null) {
   return;
 }
-FileSystem fs = tablePath.getFileSystem(conf);
-if (!fs.exists(tablePath)) {
+final FileSystem[] fs = {tablePath.getFileSystem(conf)};
+if (!fs[0].exists(tablePath)) {

Review comment:
   Why is this change? 

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -303,56 +308,103 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
 if (tablePath == null) {
   return;
 }
-FileSystem fs = tablePath.getFileSystem(conf);
-if (!fs.exists(tablePath)) {
+final FileSystem[] fs = {tablePath.getFileSystem(conf)};

Review comment:
   Nit: formatting




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734395)
Time Spent: 2h  (was: 1h 50m)

> Support HiveMetaStoreChecker.checkTable operation with multi-threaded
> -
>
> Key: HIVE-25980
> URL: https://issues.apache.org/jira/browse/HIVE-25980
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> MSCK Repair table for high partition table can perform slow on Cloud Storage 
> such as S3, one of the case we found where slowness was observed in 
> HiveMetaStoreChecker.checkTable.
> {code:java}
> "HiveServer2-Background-Pool: Thread-382" #382 prio=5 os_prio=0 
> tid=0x7f97fc4a4000 nid=0x5c2a runnable [0x7f97c41a8000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:171)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at 
> sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
>   at 
> sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
>   at 
> sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341)
>   at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)
>   at 
> sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>   at 
> com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82)
>   at 
> 

[jira] [Work logged] (HIVE-25980) Support HiveMetaStoreChecker.checkTable operation with multi-threaded

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=734394=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734394
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 08:06
Start Date: 01/Mar/22 08:06
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r816530943



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -303,56 +308,103 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
 if (tablePath == null) {
   return;
 }
-FileSystem fs = tablePath.getFileSystem(conf);
-if (!fs.exists(tablePath)) {
+final FileSystem[] fs = {tablePath.getFileSystem(conf)};
+if (!fs[0].exists(tablePath)) {
   result.getTablesNotOnFs().add(table.getTableName());
   return;
 }
 
 Set partPaths = new HashSet<>();
 
-// check that the partition folders exist on disk
-for (Partition partition : parts) {
-  if (partition == null) {
-// most likely the user specified an invalid partition
-continue;
-  }
-  Path partPath = getDataLocation(table, partition);
-  if (partPath == null) {
-continue;
-  }
-  fs = partPath.getFileSystem(conf);
+int threadCount = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.METASTORE_MSCK_FS_HANDLER_THREADS_COUNT);
 
-  CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
-  prFromMetastore.setPartitionName(getPartitionName(table, partition));
-  prFromMetastore.setTableName(partition.getTableName());
-  if (!fs.exists(partPath)) {
-result.getPartitionsNotOnFs().add(prFromMetastore);
-  } else {
-result.getCorrectPartitions().add(prFromMetastore);
-  }
+Preconditions.checkArgument(!(threadCount < 
1),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be less than 1");
+Preconditions.checkArgument(!(threadCount > 
30),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be more than 30");
 
-  if (partitionExpirySeconds > 0) {
-long currentEpochSecs = Instant.now().getEpochSecond();
-long createdTime = partition.getCreateTime();
-long partitionAgeSeconds = currentEpochSecs - createdTime;
-if (partitionAgeSeconds > partitionExpirySeconds) {
-  CheckResult.PartitionResult pr = new CheckResult.PartitionResult();
-  pr.setPartitionName(getPartitionName(table, partition));
-  pr.setTableName(partition.getTableName());
-  result.getExpiredPartitions().add(pr);
-  if (LOG.isDebugEnabled()) {
-LOG.debug("{}.{}.{}.{} expired. createdAt: {} current: {} age: {}s 
expiry: {}s", partition.getCatName(),
-partition.getDbName(), partition.getTableName(), 
pr.getPartitionName(), createdTime, currentEpochSecs,
-partitionAgeSeconds, partitionExpirySeconds);
-  }
+LOG.debug("Running with threads "+threadCount);
+
+// For Multi Threaded run, we do not want to wait for All partitions in 
queue to be processed,
+// instead we run in batch to avoid OOM, we set Min and Max Pool Size = 
METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+// and Waiting Queue size = METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+
+final ExecutorService pool = new ThreadPoolExecutor(threadCount,
+threadCount,
+0L,
+TimeUnit.MILLISECONDS,
+new ArrayBlockingQueue<>(threadCount),
+new ThreadPoolExecutor.CallerRunsPolicy());
+
+try {
+  Queue> futures = new LinkedList<>();
+  // check that the partition folders exist on disk
+  for (Partition partition : parts) {
+if (partition == null) {
+  // most likely the user specified an invalid partition
+  continue;
 }
-  }
+Path[] partPath = {getDataLocation(table, partition)};

Review comment:
   Using arrays to trick `final` requirement is really strange for me.
   While it works we might want to more explicit in the code and that can help 
with the future readability

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -303,56 +308,103 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
 if (tablePath == null) {
   return;
 }
-FileSystem fs = tablePath.getFileSystem(conf);
-if (!fs.exists(tablePath)) {
+final FileSystem[] fs = {tablePath.getFileSystem(conf)};
+if (!fs[0].exists(tablePath)) {
   result.getTablesNotOnFs().add(table.getTableName());
   return;
 }
 
 Set partPaths = new HashSet<>();
 
-// 

[jira] [Work logged] (HIVE-25980) Support HiveMetaStoreChecker.checkTable operation with multi-threaded

2022-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=734393=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734393
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 01/Mar/22 08:03
Start Date: 01/Mar/22 08:03
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r816484342



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -303,56 +308,103 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
 if (tablePath == null) {
   return;
 }
-FileSystem fs = tablePath.getFileSystem(conf);
-if (!fs.exists(tablePath)) {
+final FileSystem[] fs = {tablePath.getFileSystem(conf)};
+if (!fs[0].exists(tablePath)) {
   result.getTablesNotOnFs().add(table.getTableName());
   return;
 }
 
 Set partPaths = new HashSet<>();
 
-// check that the partition folders exist on disk
-for (Partition partition : parts) {
-  if (partition == null) {
-// most likely the user specified an invalid partition
-continue;
-  }
-  Path partPath = getDataLocation(table, partition);
-  if (partPath == null) {
-continue;
-  }
-  fs = partPath.getFileSystem(conf);
+int threadCount = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.METASTORE_MSCK_FS_HANDLER_THREADS_COUNT);
 
-  CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
-  prFromMetastore.setPartitionName(getPartitionName(table, partition));
-  prFromMetastore.setTableName(partition.getTableName());
-  if (!fs.exists(partPath)) {
-result.getPartitionsNotOnFs().add(prFromMetastore);
-  } else {
-result.getCorrectPartitions().add(prFromMetastore);
-  }
+Preconditions.checkArgument(!(threadCount < 
1),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be less than 1");
+Preconditions.checkArgument(!(threadCount > 
30),"METASTORE_MSCK_FS_HANDLER_THREADS_COUNT cannot be more than 30");
 
-  if (partitionExpirySeconds > 0) {
-long currentEpochSecs = Instant.now().getEpochSecond();
-long createdTime = partition.getCreateTime();
-long partitionAgeSeconds = currentEpochSecs - createdTime;
-if (partitionAgeSeconds > partitionExpirySeconds) {
-  CheckResult.PartitionResult pr = new CheckResult.PartitionResult();
-  pr.setPartitionName(getPartitionName(table, partition));
-  pr.setTableName(partition.getTableName());
-  result.getExpiredPartitions().add(pr);
-  if (LOG.isDebugEnabled()) {
-LOG.debug("{}.{}.{}.{} expired. createdAt: {} current: {} age: {}s 
expiry: {}s", partition.getCatName(),
-partition.getDbName(), partition.getTableName(), 
pr.getPartitionName(), createdTime, currentEpochSecs,
-partitionAgeSeconds, partitionExpirySeconds);
-  }
+LOG.debug("Running with threads "+threadCount);
+
+// For Multi Threaded run, we do not want to wait for All partitions in 
queue to be processed,
+// instead we run in batch to avoid OOM, we set Min and Max Pool Size = 
METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+// and Waiting Queue size = METASTORE_MSCK_FS_HANDLER_THREADS_COUNT
+
+final ExecutorService pool = new ThreadPoolExecutor(threadCount,
+threadCount,
+0L,
+TimeUnit.MILLISECONDS,
+new ArrayBlockingQueue<>(threadCount),
+new ThreadPoolExecutor.CallerRunsPolicy());
+
+try {
+  Queue> futures = new LinkedList<>();
+  // check that the partition folders exist on disk
+  for (Partition partition : parts) {
+if (partition == null) {
+  // most likely the user specified an invalid partition
+  continue;
 }
-  }
+Path[] partPath = {getDataLocation(table, partition)};

Review comment:
   Nit: formatting 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 734393)
Time Spent: 1h 40m  (was: 1.5h)

> Support HiveMetaStoreChecker.checkTable operation with multi-threaded
> -
>
> Key: HIVE-25980
> URL: https://issues.apache.org/jira/browse/HIVE-25980
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone