date:20211122

[jira] [Work logged] (HIVE-25680) Authorize #get_table_meta HiveMetastore Server API to use any of the HiveMetastore Authorization model

2021-11-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25680?focusedWorklogId=685066=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-685066
 ]

ASF GitHub Bot logged work on HIVE-25680:
-

Author: ASF GitHub Bot
Created on: 23/Nov/21 05:48
Start Date: 23/Nov/21 05:48
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on pull request #2770:
URL: https://github.com/apache/hive/pull/2770#issuecomment-976183831


   @kgyrtkirk - Thanks for the review. Are we good to merge this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 685066)
Time Spent: 3h 10m  (was: 3h)

> Authorize #get_table_meta HiveMetastore Server API to use any of the 
> HiveMetastore Authorization model
> --
>
> Key: HIVE-25680
> URL: https://issues.apache.org/jira/browse/HIVE-25680
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Screenshot 2021-11-08 at 2.39.30 PM.png
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> When Apache Hue or any other application which uses #get_table_meta API is 
> not gated to use any of the authorization model which HiveMetastore provides.
> For more information on Storage based Authorization Model : 
> https://cwiki.apache.org/confluence/display/Hive/HCatalog+Authorization
> You can easily reproduce this with Apache Hive + Apache Hue
> {code:java}
>   
> hive.security.metastore.authorization.manager
> 
> org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider
>   
> 
> hive.security.metastore.authenticator.manager
> 
> org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator
>   
> 
> hive.metastore.pre.event.listeners
> 
> org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener
>   
> {code}
> {code:java}
> #!/bin/bash
> set -x
> hdfs dfs -mkdir /datasets
> hdfs dfs -mkdir /datasets/database1
> hdfs dfs -mkdir /datasets/database1/table1
> echo "stefano,1992" | hdfs dfs -put - /datasets/database1/table1/file1.csv
> hdfs dfs -chmod -R 700 /datasets/database1
> sudo tee -a setup.hql > /dev/null < CREATE DATABASE IF NOT EXISTS database1 LOCATION "/datasets/database1";
> CREATE EXTERNAL TABLE IF NOT EXISTS database1.table1 (
>   name string, 
>   year int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> LOCATION
>   '/datasets/database1/table1';
> EOT
> hive -f setup.hql
> {code}
> 1. Login to Hue => create the first user called "admin" and provide a 
> password Access the Hive Editor
> 2. On the SQL section on the left under Databases you should see default and 
> database1 listed. Click on database1
> 3. As you can see a table called table1 is listed => this should not be 
> possible as our admin user has no HDFS grants on /datasets/database1
> 4. run from the Hive editor the following query SHOW TABLES; The output shows 
> a Permission denied error => this is the expected behavior



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-25731) Differentiate between failover revert and complete

2021-11-22 Thread Haymant Mangla (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haymant Mangla resolved HIVE-25731.
---
Resolution: Won't Do

> Differentiate between failover revert and complete
> --
>
> Key: HIVE-25731
> URL: https://issues.apache.org/jira/browse/HIVE-25731
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25721) Outer join result is wrong

2021-11-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25721?focusedWorklogId=685033=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-685033
 ]

ASF GitHub Bot logged work on HIVE-25721:
-

Author: ASF GitHub Bot
Created on: 23/Nov/21 02:14
Start Date: 23/Nov/21 02:14
Worklog Time Spent: 10m 
  Work Description: SparksFyz edited a comment on pull request #2798:
URL: https://github.com/apache/hive/pull/2798#issuecomment-976103753


   @zabetak 
   
   Thanks for reminding. Misplaced file position for q.out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 685033)
Time Spent: 1h  (was: 50m)

> Outer join result is wrong
> --
>
> Key: HIVE-25721
> URL: https://issues.apache.org/jira/browse/HIVE-25721
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: All Versions
>Reporter: Yizhen Fan
>Assignee: Yizhen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-25721.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Outer Join results is wrong, here is a left join case.
> select b.fields from a left join b on a.key=b.key and a.filter=xxx
> there are some necessary condition to produce this problem:
>  # `select` clause only contains right table fields
>  # `on` clause contains left table condition, and this condition can filter 
> records 
> h3. cause:
> candidateStorage[tag].addRow(value); // CommonMergeJoinOperator.process
> row of left table cannot be add into row container because tblDesc of left 
> table is null, while left table data can not be ignored in this case.
> h3. Reproducible steps are mentioned below.
> 
> set hive.auto.convert.join=false;
> create table t_smj_left (key string, value int);
> insert into t_smj_left values
> ('key1', 1),
> ('key1', 2);
> create table t_smj_right (key string, value int);
> insert into t_smj_right values
> ('key1', 1);
> select
> t2.value
> from t_smj_left t1
> left join t_smj_right t2 on t1.key=t2.key and t1.value=2;
>  
> Result:
> +
> NULL
> NULL
> +
> Expected Output:
> +
> 1
> NULL
> +



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25721) Outer join result is wrong

2021-11-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25721?focusedWorklogId=685032=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-685032
 ]

ASF GitHub Bot logged work on HIVE-25721:
-

Author: ASF GitHub Bot
Created on: 23/Nov/21 02:14
Start Date: 23/Nov/21 02:14
Worklog Time Spent: 10m 
  Work Description: SparksFyz commented on pull request #2798:
URL: https://github.com/apache/hive/pull/2798#issuecomment-976103753


   > @SparksFyz There seems to be test failures in CI relevant to the changes 
in this PR. Can you please have a look?
   
   Thanks for reminding. Misplaced file position for q.out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 685032)
Time Spent: 50m  (was: 40m)

> Outer join result is wrong
> --
>
> Key: HIVE-25721
> URL: https://issues.apache.org/jira/browse/HIVE-25721
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: All Versions
>Reporter: Yizhen Fan
>Assignee: Yizhen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-25721.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Outer Join results is wrong, here is a left join case.
> select b.fields from a left join b on a.key=b.key and a.filter=xxx
> there are some necessary condition to produce this problem:
>  # `select` clause only contains right table fields
>  # `on` clause contains left table condition, and this condition can filter 
> records 
> h3. cause:
> candidateStorage[tag].addRow(value); // CommonMergeJoinOperator.process
> row of left table cannot be add into row container because tblDesc of left 
> table is null, while left table data can not be ignored in this case.
> h3. Reproducible steps are mentioned below.
> 
> set hive.auto.convert.join=false;
> create table t_smj_left (key string, value int);
> insert into t_smj_left values
> ('key1', 1),
> ('key1', 2);
> create table t_smj_right (key string, value int);
> insert into t_smj_right values
> ('key1', 1);
> select
> t2.value
> from t_smj_left t1
> left join t_smj_right t2 on t1.key=t2.key and t1.value=2;
>  
> Result:
> +
> NULL
> NULL
> +
> Expected Output:
> +
> 1
> NULL
> +



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-25731) Differentiate between failover revert and complete

2021-11-22 Thread Haymant Mangla (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haymant Mangla reassigned HIVE-25731:
-


> Differentiate between failover revert and complete
> --
>
> Key: HIVE-25731
> URL: https://issues.apache.org/jira/browse/HIVE-25731
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-24969) Predicates may be removed when decorrelating subqueries with lateral

2021-11-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24969?focusedWorklogId=685009=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-685009
 ]

ASF GitHub Bot logged work on HIVE-24969:
-

Author: ASF GitHub Bot
Created on: 23/Nov/21 00:11
Start Date: 23/Nov/21 00:11
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #2145:
URL: https://github.com/apache/hive/pull/2145#issuecomment-976027177


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 685009)
Time Spent: 2.5h  (was: 2h 20m)

> Predicates may be removed when decorrelating subqueries with lateral
> 
>
> Key: HIVE-24969
> URL: https://issues.apache.org/jira/browse/HIVE-24969
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Step to reproduce：
> {code:java}
> select count(distinct logItem.triggerId)
> from service_stat_log LATERAL VIEW explode(logItems) LogItemTable AS logItem
> where logItem.dsp in ('delivery', 'ocpa')
> and logItem.iswin = true
> and logItem.adid in (
>  select distinct adId
>  from ad_info
>  where subAccountId in (16010, 14863));  {code}
> For predicates _logItem.dsp in ('delivery', 'ocpa')_  and _logItem.iswin = 
> true_ are removed when doing ppd: JOIN ->   RS  -> LVJ.  The JOIN has 
> candicates: logitem -> [logItem.dsp in ('delivery', 'ocpa'), logItem.iswin = 
> true]，when pushing them to the RS followed by LVJ,  none of them are pushed, 
> the candicates of logitem are removed finally by default, which cause to the 
> wrong result.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25698) Hive column update performance too low when table partition over 700

2021-11-22 Thread Zoltan Haindrich (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447400#comment-17447400
 ] 

Zoltan Haindrich commented on HIVE-25698:
-

most likely column stats are being updated - I think you are changing the 
columns of the table but to be on the same page you should give an example as a 
set of sql statements;
if you indeed changing the columns some related tickets could be: HIVE-23806 
HIVE-23959

I think you should ask for help from your vendor.

> Hive column update performance too low when table partition over 700
> 
>
> Key: HIVE-25698
> URL: https://issues.apache.org/jira/browse/HIVE-25698
> Project: Hive
>  Issue Type: Bug
>  Components: Clients, Server Infrastructure
>Affects Versions: 3.1.1
> Environment: CentOS 7.8 
> Hadoop 3.1.1
> Impala 3.4.0
>Reporter: JungHyun An
>Priority: Minor
> Fix For: All Versions
>
>
> Now we using hive 3.1.1
>  
> Currently in our hive we have tables with hundreds of partitions and hundreds 
> of gigabytes of data.
>  
> When updating the column information of the corresponding table, it was 
> confirmed that the performance was several tens of times slower than the Hive 
> 1.1 version of the existing CDH.
>  
> I would like to ask if there is any architectural change that makes column 
> updates slower than Hive 1 in Hive 3 and later versions.
>  
> Thank you.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25714) Some tests are flaky because docker is not able to start in 5 seconds

2021-11-22 Thread Zoltan Haindrich (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447361#comment-17447361
 ] 

Zoltan Haindrich commented on HIVE-25714:
-

forgot to link the green flaky check result; but its here: 
http://ci.hive.apache.org/job/hive-flaky-check/471/ :D

> Some tests are flaky because docker is not able to start in 5 seconds
> -
>
> Key: HIVE-25714
> URL: https://issues.apache.org/jira/browse/HIVE-25714
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> there are some testruns failing with; and on the test site multiple pods are 
> running in parallel - its not an ideal environment for tight deadlines
> {code}
> Unexpected exception java.lang.RuntimeException: Process docker failed to run 
> in 5 seconds
>  at 
> org.apache.hadoop.hive.ql.externalDB.AbstractExternalDB.runCmd(AbstractExternalDB.java:92)
>  at 
> org.apache.hadoop.hive.ql.externalDB.AbstractExternalDB.launchDockerContainer(AbstractExternalDB.java:123)
>  at 
> org.apache.hadoop.hive.ql.qoption.QTestDatabaseHandler.beforeTest(QTestDatabaseHandler.java:111)
>  at 
> org.apache.hadoop.hive.ql.qoption.QTestOptionDispatcher.beforeTest(QTestOptionDispatcher.java:79)
> {code}
> http://ci.hive.apache.org/job/hive-precommit/job/PR-1674/4/testReport/junit/org.apache.hadoop.hive.cli.split19/TestMiniLlapLocalCliDriver/Testing___split_14___PostProcess___testCliDriver_qt_database_all_/



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

2021-11-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=684646=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-684646
 ]

ASF GitHub Bot logged work on HIVE-25448:
-

Author: ASF GitHub Bot
Created on: 22/Nov/21 12:00
Start Date: 22/Nov/21 12:00
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #2585:
URL: https://github.com/apache/hive/pull/2585#issuecomment-975448819


   > @dengzhhu653 do you happen to have a testcase for this?
   
   Not yet, I have tested on our environment for the skew table, shows that it 
can get pretty performance gain(mr).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 684646)
Time Spent: 1h 10m  (was: 1h)

> Invalid partition columns when skew with distinct
> -
>
> Key: HIVE-25448
> URL: https://issues.apache.org/jira/browse/HIVE-25448
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When hive.groupby.skewindata is enabled,  we spray by the grouping key and 
> distinct key if distinct is present in the first reduce sink operator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-25679) Use serdeContants collection delim in MultiDelimSerDe

2021-11-22 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-25679.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you [~Stelyus] for your contribution!

> Use serdeContants collection delim in MultiDelimSerDe
> -
>
> Key: HIVE-25679
> URL: https://issues.apache.org/jira/browse/HIVE-25679
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Franck Thang
>Assignee: Franck Thang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Since collection.delim typo has been fixed in HIVE-16922, we can use now 
> collection delim from Constants in MultiDelimSerde



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25679) Use serdeContants collection delim in MultiDelimSerDe

2021-11-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25679?focusedWorklogId=684644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-684644
 ]

ASF GitHub Bot logged work on HIVE-25679:
-

Author: ASF GitHub Bot
Created on: 22/Nov/21 11:57
Start Date: 22/Nov/21 11:57
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #2768:
URL: https://github.com/apache/hive/pull/2768


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 684644)
Time Spent: 0.5h  (was: 20m)

> Use serdeContants collection delim in MultiDelimSerDe
> -
>
> Key: HIVE-25679
> URL: https://issues.apache.org/jira/browse/HIVE-25679
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Franck Thang
>Assignee: Franck Thang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Since collection.delim typo has been fixed in HIVE-16922, we can use now 
> collection delim from Constants in MultiDelimSerde



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

2021-11-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25448?focusedWorklogId=684642=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-684642
 ]

ASF GitHub Bot logged work on HIVE-25448:
-

Author: ASF GitHub Bot
Created on: 22/Nov/21 11:45
Start Date: 22/Nov/21 11:45
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #2585:
URL: https://github.com/apache/hive/pull/2585#issuecomment-975438145


   @dengzhhu653 do you happen to have a testcase for this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 684642)
Time Spent: 1h  (was: 50m)

> Invalid partition columns when skew with distinct
> -
>
> Key: HIVE-25448
> URL: https://issues.apache.org/jira/browse/HIVE-25448
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When hive.groupby.skewindata is enabled,  we spray by the grouping key and 
> distinct key if distinct is present in the first reduce sink operator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25048) Refine the start/end functions in HMSHandler

2021-11-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25048?focusedWorklogId=684641=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-684641
 ]

ASF GitHub Bot logged work on HIVE-25048:
-

Author: ASF GitHub Bot
Created on: 22/Nov/21 11:37
Start Date: 22/Nov/21 11:37
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #2441:
URL: https://github.com/apache/hive/pull/2441#discussion_r754179495



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -913,74 +900,6 @@ private static void logAndAudit(final String m) {
 logAuditEvent(m);
   }
 
-  private String startFunction(String function, String extraLogInfo) {
-incrementCounter(function);
-logAndAudit((getThreadLocalIpAddress() == null ? "" : "source:" + 
getThreadLocalIpAddress() + " ") +
-function + extraLogInfo);
-com.codahale.metrics.Timer timer =
-Metrics.getOrCreateTimer(MetricsConstants.API_PREFIX + function);

Review comment:
   yes, we have double counted the metrics of api calling by the 
startFunction and the PerfLogger, e.g, the total call count and the active 
count of the api. 

##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestMetaStoreEndFunctionListener.java
##
@@ -83,7 +86,7 @@ public void testEndFunctionListener() throws Exception {
 listSize = DummyEndFunctionListener.funcNameList.size();
 String func_name = DummyEndFunctionListener.funcNameList.get(listSize-1);
 MetaStoreEndFunctionContext context = 
DummyEndFunctionListener.contextList.get(listSize-1);
-assertEquals(func_name,"get_database");
+assertEquals(func_name,"get_database_req");

Review comment:
   Thanks a lot for the feedback! you are right, this is where may break 
compatibility, including the MetaStoreEndFunctionContext(we cannot retrieve 
inputTableName by method `getInputTableName()` any more). I not so sure if we 
can make such changes though this can help clean some codes...

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -4059,7 +3806,7 @@ public Partition 
append_partition_with_environment_context(final String dbName,
 String location;
 
 PartValEqWrapperLite(Partition partition) {
-  this.values = partition.isSetValues()? partition.getValues() : null;
+  this.values = partition.isSetValues()? new 
ArrayList<>(partition.getValues()) : null;

Review comment:
   Not a bug really,  just to clean up the codes.  The following pieces of 
codes can be simply treated as `lhsValues.equals(rhsValues)` by `equals` of an 
ArrayList.
   
if (lhsValues.size() != rhsValues.size()) {
   return false;
  }
   
 for (int i=0; ihttp://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore;
+
+import java.util.List;
+import java.util.Map;
+
+import org.apache.hadoop.hive.common.TableName;
+
+import static org.apache.commons.lang3.StringUtils.join;
+
+/**
+ * Generate the audit log in a builder manner.
+ */
+public class MetaStoreAuditLogBuilder {
+  // the function
+  private final String functionName;
+  private final StringBuilder builder;
+
+  private MetaStoreAuditLogBuilder(String functionName) {
+this.functionName = functionName;
+this.builder = new StringBuilder();
+  }
+
+  public static MetaStoreAuditLogBuilder functionName(String functionName) {
+MetaStoreAuditLogBuilder builder = new 
MetaStoreAuditLogBuilder(functionName);
+return builder;
+  }
+
+  public MetaStoreAuditLogBuilder connectorName(String connectorName) {
+builder.append("connector=").append(connectorName).append(" ");
+return this;
+  }
+
+  public MetaStoreAuditLogBuilder catalogName(String catalogName) {
+builder.append("catName=").append(catalogName).append(" ");
+return this;
+  }
+
+  public MetaStoreAuditLogBuilder dbName(String dbName) {
+builder.append("db=").append(dbName).append(" ");
+return this;
+  }
+
+  public MetaStoreAuditLogBuilder tableName(String tableName) {
+builder.append("tbl=").append(tableName).append(" ");
+return this;
+  }
+
+  public MetaStoreAuditLogBuilder packageName(String packageName) {
+builder.append("package=").append(packageName).append(" ");
+return this;
+  }
+
+  public MetaStoreAuditLogBuilder typeName(String typeName) {
+

[jira] [Work logged] (HIVE-25679) Use serdeContants collection delim in MultiDelimSerDe

2021-11-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25679?focusedWorklogId=684636=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-684636
 ]

ASF GitHub Bot logged work on HIVE-25679:
-

Author: ASF GitHub Bot
Created on: 22/Nov/21 11:29
Start Date: 22/Nov/21 11:29
Worklog Time Spent: 10m 
  Work Description: Stelyus commented on pull request #2768:
URL: https://github.com/apache/hive/pull/2768#issuecomment-975426851


   @kgyrtkirk could you please have a look ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 684636)
Time Spent: 20m  (was: 10m)

> Use serdeContants collection delim in MultiDelimSerDe
> -
>
> Key: HIVE-25679
> URL: https://issues.apache.org/jira/browse/HIVE-25679
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Franck Thang
>Assignee: Franck Thang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Since collection.delim typo has been fixed in HIVE-16922, we can use now 
> collection delim from Constants in MultiDelimSerde



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-11-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24484?focusedWorklogId=684635=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-684635
 ]

ASF GitHub Bot logged work on HIVE-24484:
-

Author: ASF GitHub Bot
Created on: 22/Nov/21 11:27
Start Date: 22/Nov/21 11:27
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #1742:
URL: https://github.com/apache/hive/pull/1742#issuecomment-975424894


   what are the unresolved blockers of 3.3.1 upgrade at the moment?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 684635)
Time Spent: 7.05h  (was: 6h 53m)

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7.05h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-24849) Create external table socket timeout when location has large number of files

2021-11-22 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24849.
-
Fix Version/s: 4.0.0
 Assignee: Sungwoo
   Resolution: Fixed

merged into master. Thank you [~glapark]!

> Create external table socket timeout when location has large number of files
> 
>
> Key: HIVE-24849
> URL: https://issues.apache.org/jira/browse/HIVE-24849
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.4, 3.1.2, 4.0.0
> Environment: AWS EMR 5.23 with default Hive metastore and external 
> location S3
>  
>Reporter: Mithun Antony
>Assignee: Sungwoo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> # The create table API call timeout when during an external table creation on 
> a location where the number files in the S3 location is large ( ie: ~10K 
> objects ).
> The default timeout `hive.metastore.client.socket.timeout` is `600s` current 
> workaround is it to increase the timeout to a higher value
> {code:java}
> 2021-03-04T01:37:42,761 ERROR [66b8024b-e52f-42b8-8629-a45383bcac0c 
> main([])]: exec.DDLTask (DDLTask.java:failed(639)) - 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.transport.TTransportException: 
> java.net.SocketTimeoutException: Read timed out
>  at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:873)
>  at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:878)
>  at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4356)
>  at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:354)
>  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
>  at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
>  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
>  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
>  at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
>  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
>  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
>  at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474)
>  at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:490)
>  at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
>  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: org.apache.thrift.transport.TTransportException: 
> java.net.SocketTimeoutException: Read timed out
>  at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
>  at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>  at 
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
>  at 
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
>  at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
>  at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:1199)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_with_environment_context(ThriftHiveMetastore.java:1185)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2399)
>  at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.create_table_with_environment_context(SessionHiveMetaStoreClient.java:93)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:752)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:740)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
>

[jira] [Work logged] (HIVE-24849) Create external table socket timeout when location has large number of files

2021-11-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24849?focusedWorklogId=684634=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-684634
 ]

ASF GitHub Bot logged work on HIVE-24849:
-

Author: ASF GitHub Bot
Created on: 22/Nov/21 11:23
Start Date: 22/Nov/21 11:23
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #2567:
URL: https://github.com/apache/hive/pull/2567


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 684634)
Time Spent: 1h  (was: 50m)

> Create external table socket timeout when location has large number of files
> 
>
> Key: HIVE-24849
> URL: https://issues.apache.org/jira/browse/HIVE-24849
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.4, 3.1.2, 4.0.0
> Environment: AWS EMR 5.23 with default Hive metastore and external 
> location S3
>  
>Reporter: Mithun Antony
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> # The create table API call timeout when during an external table creation on 
> a location where the number files in the S3 location is large ( ie: ~10K 
> objects ).
> The default timeout `hive.metastore.client.socket.timeout` is `600s` current 
> workaround is it to increase the timeout to a higher value
> {code:java}
> 2021-03-04T01:37:42,761 ERROR [66b8024b-e52f-42b8-8629-a45383bcac0c 
> main([])]: exec.DDLTask (DDLTask.java:failed(639)) - 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.transport.TTransportException: 
> java.net.SocketTimeoutException: Read timed out
>  at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:873)
>  at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:878)
>  at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4356)
>  at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:354)
>  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
>  at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
>  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
>  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
>  at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
>  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
>  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
>  at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474)
>  at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:490)
>  at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
>  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: org.apache.thrift.transport.TTransportException: 
> java.net.SocketTimeoutException: Read timed out
>  at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
>  at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>  at 
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
>  at 
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
>  at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
>  at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:1199)
>  at 
>

[jira] [Work logged] (HIVE-25401) Insert overwrite a table which location is on other cluster fail in kerberos cluster

2021-11-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25401?focusedWorklogId=684633=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-684633
 ]

ASF GitHub Bot logged work on HIVE-25401:
-

Author: ASF GitHub Bot
Created on: 22/Nov/21 11:23
Start Date: 22/Nov/21 11:23
Worklog Time Spent: 10m 
  Work Description: Neilxzn commented on a change in pull request #2544:
URL: https://github.com/apache/hive/pull/2544#discussion_r754177729



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
##
@@ -4990,4 +4991,19 @@ public static boolean arePathsEqualOrWithin(Path p1, 
Path p2) {
 return ((p1.toString().toLowerCase().indexOf(p2.toString().toLowerCase()) 
> -1) ||
 (p2.toString().toLowerCase().indexOf(p1.toString().toLowerCase()) > 
-1)) ? true : false;
   }
+
+  /**
+   * Convenience method to obtain delegation tokens
+   * corresponding to the paths passed for mapReduce job.
+   * @param job jonconf
+   * @param ps array of paths
+   */
+  public static void setToken(JobConf job, Path[] ps) {
+try {
+  TokenCache.obtainTokensForNamenodes(job.getCredentials(),
+  ps, job);
+} catch (IOException ex) {
+  LOG.error("Error in setToken ", ex);

Review comment:
   Thank you for your review! 
   Agree with you and I have removed the `try catch`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 684633)
Time Spent: 1h 20m  (was: 1h 10m)

> Insert overwrite  a table which location is on other cluster fail  in 
> kerberos cluster
> --
>
> Key: HIVE-25401
> URL: https://issues.apache.org/jira/browse/HIVE-25401
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.0, 3.1.2
> Environment: hive 2.3 
> hadoop3 cluster with kerberos 
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-25401.patch, image-2021-07-29-14-25-23-418.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> we have tow hdfs clusters with kerberos security,  it means that mapreduce 
> task need delegation tokens to authenticate namenode when hive on mapreduce 
> run.
> Insert overwrite a table which location is on other cluster fail in kerberos 
> cluster. For example, 
>  # yarn cluster's default fs is hdfs://cluster1
>  # tb1's location is hdfs://cluster1/tb1
>  # tb2's location is hdfs://cluster2/tb2 
>  #  sql `INSERT OVERWRITE TABLE  tb2 SELECT * from tb1` run on yarn cluster 
> will fail
>  
> reduce task error log:
> !image-2021-07-29-14-25-23-418.png!
> How to fix:
> After dig it, web found mapreduce job just obtain delegation tokens for input 
> files in FileInputFormat. But Hive context get extendal scratchDir base on 
> table's location, If the table 's location is on other cluster, the 
> delegation token will not be obtained. 
> So we need to obtaine delegation tokens for hive scratchDirs before hive 
> submit mapreduce job.
>  
> How to test:
> no test
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25721) Outer join result is wrong

2021-11-22 Thread Zoltan Haindrich (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447335#comment-17447335
 ] 

Zoltan Haindrich commented on HIVE-25721:
-

issue also affects tez/llap

> Outer join result is wrong
> --
>
> Key: HIVE-25721
> URL: https://issues.apache.org/jira/browse/HIVE-25721
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: All Versions
>Reporter: Yizhen Fan
>Assignee: Yizhen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-25721.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Outer Join results is wrong, here is a left join case.
> select b.fields from a left join b on a.key=b.key and a.filter=xxx
> there are some necessary condition to produce this problem:
>  # `select` clause only contains right table fields
>  # `on` clause contains left table condition, and this condition can filter 
> records 
> h3. cause:
> candidateStorage[tag].addRow(value); // CommonMergeJoinOperator.process
> row of left table cannot be add into row container because tblDesc of left 
> table is null, while left table data can not be ignored in this case.
> h3. Reproducible steps are mentioned below.
> 
> set hive.auto.convert.join=false;
> create table t_smj_left (key string, value int);
> insert into t_smj_left values
> ('key1', 1),
> ('key1', 2);
> create table t_smj_right (key string, value int);
> insert into t_smj_right values
> ('key1', 1);
> select
> t2.value
> from t_smj_left t1
> left join t_smj_right t2 on t1.key=t2.key and t1.value=2;
>  
> Result:
> +
> NULL
> NULL
> +
> Expected Output:
> +
> 1
> NULL
> +



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-11-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24484?focusedWorklogId=684617=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-684617
 ]

ASF GitHub Bot logged work on HIVE-24484:
-

Author: ASF GitHub Bot
Created on: 22/Nov/21 10:52
Start Date: 22/Nov/21 10:52
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1742:
URL: https://github.com/apache/hive/pull/1742#issuecomment-975396279


   this PR is not making much progress - I think this in its current form will 
not work; or will not land soon:\
   I think it would make sesnse to consider:
   * split this thing up into some pieces which we could get in...
   * or even... upgrade to hadoop 3.1.MAX or 3.2.ANYTHING to grab some of the 
changes we have to cover in upgrading to 3.3.1
   instead of waiting this thing to get in with JDK11 support and everything? - 
what do you guys think?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 684617)
Time Spent: 6h 53m  (was: 6h 43m)

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 53m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-24830) Revise RowSchema mutability usage

2021-11-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24830?focusedWorklogId=684615=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-684615
 ]

ASF GitHub Bot logged work on HIVE-24830:
-

Author: ASF GitHub Bot
Created on: 22/Nov/21 10:46
Start Date: 22/Nov/21 10:46
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #2019:
URL: https://github.com/apache/hive/pull/2019#issuecomment-975391948


   .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 684615)
Time Spent: 1h  (was: 50m)

> Revise RowSchema mutability usage
> -
>
> Key: HIVE-24830
> URL: https://issues.apache.org/jira/browse/HIVE-24830
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> RowSchema is essentially a container class for a list of fields.
> * it can be constructed from a "list"
> * the list can be set
> * the list can be accessed
> none of the above methods try to protect the data inside; hence the following 
> could easily  happen:
> {code}
> s=o1.getSchema();
> col=s.getCol("favourite")
> col.setInternalName("asd"); // will modify o1 schema
> newSchema.add(col);
> o2.setSchema(newSchema);
> o2.getSchema().get("asd").setInternalName("xxx"); // will modify o1 and o2 
> schema
> [...]
> {code}
> not sure how much of this is actually cruical; exploratory testrun revealed 
> some cases
> https://github.com/apache/hive/pull/2019



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25048) Refine the start/end functions in HMSHandler

2021-11-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25048?focusedWorklogId=684614=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-684614
 ]

ASF GitHub Bot logged work on HIVE-25048:
-

Author: ASF GitHub Bot
Created on: 22/Nov/21 10:45
Start Date: 22/Nov/21 10:45
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #2441:
URL: https://github.com/apache/hive/pull/2441#discussion_r754132199



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/metrics/PerfLogger.java
##
@@ -186,24 +187,19 @@ public Long getDuration(String method) {
   private transient Timer.Context totalApiCallsTimerContext = null;
 
   private void beginMetrics(String method) {
-Timer timer = Metrics.getOrCreateTimer(MetricsConstants.API_PREFIX + 
method);
-if (timer != null) {
-  timerContexts.put(method, timer.time());
-}
-timer = Metrics.getOrCreateTimer(MetricsConstants.TOTAL_API_CALLS);
-if (timer != null) {
-  totalApiCallsTimerContext = timer.time();
-}
+Optional.ofNullable(Metrics.getOrCreateTimer(MetricsConstants.API_PREFIX + 
method))
+.ifPresent(timer -> timerContexts.put(method, timer.time()));
+
Optional.ofNullable(Metrics.getOrCreateTimer(MetricsConstants.TOTAL_API_CALLS))
+.ifPresent(timer -> {totalApiCallsTimerContext = timer.time();});
+
Optional.ofNullable(Metrics.getOrCreateCounter(MetricsConstants.ACTIVE_CALLS + 
method))
+.ifPresent(counter -> counter.inc());

Review comment:
   I'm usually not agains changes like this - however this seems like a 
pretty hot method:
   
   the old implementation have only created 1 new string objects which will be 
garbage collected - however the new method creates 3 sets of `Nullable` objects 
and some function objects to make the call...
   
   can we add the `ACTIVE_CALLS` stuff the boring way?

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreAuditLogBuilder.java
##
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore;
+
+import java.util.List;
+import java.util.Map;
+
+import org.apache.hadoop.hive.common.TableName;
+
+import static org.apache.commons.lang3.StringUtils.join;
+
+/**
+ * Generate the audit log in a builder manner.
+ */
+public class MetaStoreAuditLogBuilder {
+  // the function
+  private final String functionName;
+  private final StringBuilder builder;
+
+  private MetaStoreAuditLogBuilder(String functionName) {
+this.functionName = functionName;
+this.builder = new StringBuilder();
+  }
+
+  public static MetaStoreAuditLogBuilder functionName(String functionName) {
+MetaStoreAuditLogBuilder builder = new 
MetaStoreAuditLogBuilder(functionName);
+return builder;
+  }
+
+  public MetaStoreAuditLogBuilder connectorName(String connectorName) {
+builder.append("connector=").append(connectorName).append(" ");
+return this;
+  }
+
+  public MetaStoreAuditLogBuilder catalogName(String catalogName) {
+builder.append("catName=").append(catalogName).append(" ");
+return this;
+  }
+
+  public MetaStoreAuditLogBuilder dbName(String dbName) {
+builder.append("db=").append(dbName).append(" ");
+return this;
+  }
+
+  public MetaStoreAuditLogBuilder tableName(String tableName) {
+builder.append("tbl=").append(tableName).append(" ");
+return this;
+  }
+
+  public MetaStoreAuditLogBuilder packageName(String packageName) {
+builder.append("package=").append(packageName).append(" ");
+return this;
+  }
+
+  public MetaStoreAuditLogBuilder typeName(String typeName) {
+builder.append("type=").append(typeName).append(" ");
+return this;
+  }
+
+  public MetaStoreAuditLogBuilder pattern(String pattern) {
+builder.append("pat=").append(pattern).append(" ");
+return this;
+  }
+
+  public MetaStoreAuditLogBuilder extraInfo(String extraInfo) {

Review comment:
   all the callsites of this method look like:
   ```
   " some=" + value

[jira] [Resolved] (HIVE-25727) Iceberg hive catalog should create table object with initialised SerdeParams

2021-11-22 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25727.
---
Resolution: Fixed

> Iceberg hive catalog should create table object with initialised SerdeParams
> 
>
> Key: HIVE-25727
> URL: https://issues.apache.org/jira/browse/HIVE-25727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently we leave the serdeInfo.parameters as null when we create the table 
> object to be persisted during commit time in Iceberg hive catalog. We should 
> init the params with an empty map to avoid any NPE possibilities.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25727) Iceberg hive catalog should create table object with initialised SerdeParams

2021-11-22 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447322#comment-17447322
 ] 

Marton Bod commented on HIVE-25727:
---

Pushed to master. Thanks [~pvary] for reviewing it!

> Iceberg hive catalog should create table object with initialised SerdeParams
> 
>
> Key: HIVE-25727
> URL: https://issues.apache.org/jira/browse/HIVE-25727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently we leave the serdeInfo.parameters as null when we create the table 
> object to be persisted during commit time in Iceberg hive catalog. We should 
> init the params with an empty map to avoid any NPE possibilities.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25352) Optimise DBTokenStore for RDBMS

2021-11-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25352?focusedWorklogId=684606=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-684606
 ]

ASF GitHub Bot logged work on HIVE-25352:
-

Author: ASF GitHub Bot
Created on: 22/Nov/21 10:11
Start Date: 22/Nov/21 10:11
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #2499:
URL: https://github.com/apache/hive/pull/2499#issuecomment-975360915


   @sahana-bhat is this change ready for review?
   it seems like quite a few days have passed since it was opened...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 684606)
Remaining Estimate: 0h
Time Spent: 10m

> Optimise DBTokenStore for RDBMS
> ---
>
> Key: HIVE-25352
> URL: https://issues.apache.org/jira/browse/HIVE-25352
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sahana Bhat
>Assignee: Sahana Bhat
>Priority: Major
>  Labels: pull-request-available, pull_request_available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The existing DBTokenStore implementation is very under optimised when an 
> RDBMS is used.
>  * All available tokens are fetched from the DB. The validity of each token 
> is determined based on its max date and renew date and deleted if required. 
> For a relational database like MySQL, a *query to fetch all rows with no 
> filters or pagination* can be costly and impact the performance of the 
> database and the server. 
>  * From the token identifiers fetched, if the token hasn’t breached its max 
> date, the token information is again fetched from the database to validate 
> its renew date.  
>  * The token expiration daemon is part of the Hive system. In a cluster of 
> tens or hundreds of Hive servers, the daemon runs on each of the servers. 
> This means that the flow of fetching of tokens, validation for expiration and 
> deleting them is executed in duplication in each of the servers. The 
> *duplication of the functionality in every server* along with the problems 
> discussed in Point 1 & 2, can severely degrade the performance of the 
> database.
> This issue will address the issues mentioned in 1 & 2.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-25730) Hive column update performance too low when table partition over 700

2021-11-22 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-25730.

Resolution: Duplicate

> Hive column update performance too low when table partition over 700
> 
>
> Key: HIVE-25730
> URL: https://issues.apache.org/jira/browse/HIVE-25730
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.1.1
>Reporter: JungHyun An
>Priority: Critical
> Fix For: 3.1.1
>
>
> Hi, we using hive 3.1.1
> Currently in our hive have tables with hundreds of partitions and hundreds of 
> gigabytes of data.
> When updating the column information of the corresponding table, it was 
> confirmed that the performance was several tens of times slower than the Hive 
> 1.1 version of the older CDH.
> I would like to ask if there is any architectural change that makes column 
> updates slower than Hive 1 in Hive 3 and later versions.
> Thank you.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-25352) Optimise DBTokenStore for RDBMS

2021-11-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25352:
--
Labels: pull-request-available pull_request_available  (was: 
pull_request_available)

> Optimise DBTokenStore for RDBMS
> ---
>
> Key: HIVE-25352
> URL: https://issues.apache.org/jira/browse/HIVE-25352
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sahana Bhat
>Assignee: Sahana Bhat
>Priority: Major
>  Labels: pull-request-available, pull_request_available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The existing DBTokenStore implementation is very under optimised when an 
> RDBMS is used.
>  * All available tokens are fetched from the DB. The validity of each token 
> is determined based on its max date and renew date and deleted if required. 
> For a relational database like MySQL, a *query to fetch all rows with no 
> filters or pagination* can be costly and impact the performance of the 
> database and the server. 
>  * From the token identifiers fetched, if the token hasn’t breached its max 
> date, the token information is again fetched from the database to validate 
> its renew date.  
>  * The token expiration daemon is part of the Hive system. In a cluster of 
> tens or hundreds of Hive servers, the daemon runs on each of the servers. 
> This means that the flow of fetching of tokens, validation for expiration and 
> deleting them is executed in duplication in each of the servers. The 
> *duplication of the functionality in every server* along with the problems 
> discussed in Point 1 & 2, can severely degrade the performance of the 
> database.
> This issue will address the issues mentioned in 1 & 2.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25346) cleanTxnToWriteIdTable breaks SNAPSHOT isolation

2021-11-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25346?focusedWorklogId=684603=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-684603
 ]

ASF GitHub Bot logged work on HIVE-25346:
-

Author: ASF GitHub Bot
Created on: 22/Nov/21 10:06
Start Date: 22/Nov/21 10:06
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #2547:
URL: https://github.com/apache/hive/pull/2547#issuecomment-975356703


   merged as #2716


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 684603)
Time Spent: 14h 50m  (was: 14h 40m)

> cleanTxnToWriteIdTable breaks SNAPSHOT isolation
> 
>
> Key: HIVE-25346
> URL: https://issues.apache.org/jira/browse/HIVE-25346
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Chovan
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 14h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25346) cleanTxnToWriteIdTable breaks SNAPSHOT isolation

2021-11-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25346?focusedWorklogId=684604=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-684604
 ]

ASF GitHub Bot logged work on HIVE-25346:
-

Author: ASF GitHub Bot
Created on: 22/Nov/21 10:06
Start Date: 22/Nov/21 10:06
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk closed pull request #2547:
URL: https://github.com/apache/hive/pull/2547


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 684604)
Time Spent: 15h  (was: 14h 50m)

> cleanTxnToWriteIdTable breaks SNAPSHOT isolation
> 
>
> Key: HIVE-25346
> URL: https://issues.apache.org/jira/browse/HIVE-25346
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Chovan
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 15h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25721) Outer join result is wrong

2021-11-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25721?focusedWorklogId=684599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-684599
 ]

ASF GitHub Bot logged work on HIVE-25721:
-

Author: ASF GitHub Bot
Created on: 22/Nov/21 10:02
Start Date: 22/Nov/21 10:02
Worklog Time Spent: 10m 
  Work Description: zabetak commented on pull request #2798:
URL: https://github.com/apache/hive/pull/2798#issuecomment-975353044


   @SparksFyz There seems to be test failures in CI relevant to the changes in 
this PR. Can you please have a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 684599)
Time Spent: 40m  (was: 0.5h)

> Outer join result is wrong
> --
>
> Key: HIVE-25721
> URL: https://issues.apache.org/jira/browse/HIVE-25721
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: All Versions
>Reporter: Yizhen Fan
>Assignee: Yizhen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-25721.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Outer Join results is wrong, here is a left join case.
> select b.fields from a left join b on a.key=b.key and a.filter=xxx
> there are some necessary condition to produce this problem:
>  # `select` clause only contains right table fields
>  # `on` clause contains left table condition, and this condition can filter 
> records 
> h3. cause:
> candidateStorage[tag].addRow(value); // CommonMergeJoinOperator.process
> row of left table cannot be add into row container because tblDesc of left 
> table is null, while left table data can not be ignored in this case.
> h3. Reproducible steps are mentioned below.
> 
> set hive.auto.convert.join=false;
> create table t_smj_left (key string, value int);
> insert into t_smj_left values
> ('key1', 1),
> ('key1', 2);
> create table t_smj_right (key string, value int);
> insert into t_smj_right values
> ('key1', 1);
> select
> t2.value
> from t_smj_left t1
> left join t_smj_right t2 on t1.key=t2.key and t1.value=2;
>  
> Result:
> +
> NULL
> NULL
> +
> Expected Output:
> +
> 1
> NULL
> +



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25728) ParseException while gathering Column Stats

2021-11-22 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447286#comment-17447286
 ] 

Stamatis Zampetakis commented on HIVE-25728:


[~soumyakanti.das] If possible please include a scenario reproducing the 
problem in the description along with the full stack trace inside \{noformat\} 
tags. For someone hitting a {{ParseException}} in the future it will be easier 
to identify if this is the bug which corresponds to their use-case.

> ParseException while gathering Column Stats
> ---
>
> Key: HIVE-25728
> URL: https://issues.apache.org/jira/browse/HIVE-25728
> Project: Hive
>  Issue Type: Bug
>Reporter: Soumyakanti Das
>Priority: Major
>
> The {{columnName}} is escaped twice in {{ColumnStatsSemanticAnalyzer}} at 
> [line 
> 262|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java#L262],
>  which can cause ParseException. Potential solution is to simply not escape 
> it second time.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25401) Insert overwrite a table which location is on other cluster fail in kerberos cluster

2021-11-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25401?focusedWorklogId=684589=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-684589
 ]

ASF GitHub Bot logged work on HIVE-25401:
-

Author: ASF GitHub Bot
Created on: 22/Nov/21 09:45
Start Date: 22/Nov/21 09:45
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #2544:
URL: https://github.com/apache/hive/pull/2544#discussion_r754101348



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
##
@@ -4990,4 +4991,19 @@ public static boolean arePathsEqualOrWithin(Path p1, 
Path p2) {
 return ((p1.toString().toLowerCase().indexOf(p2.toString().toLowerCase()) 
> -1) ||
 (p2.toString().toLowerCase().indexOf(p1.toString().toLowerCase()) > 
-1)) ? true : false;
   }
+
+  /**
+   * Convenience method to obtain delegation tokens
+   * corresponding to the paths passed for mapReduce job.
+   * @param job jonconf
+   * @param ps array of paths
+   */
+  public static void setToken(JobConf job, Path[] ps) {
+try {
+  TokenCache.obtainTokensForNamenodes(job.getCredentials(),
+  ps, job);
+} catch (IOException ex) {
+  LOG.error("Error in setToken ", ex);

Review comment:
   I don't think any errors should be ignored here - if it will not be able 
to obtain the token; it will not work; or that's not the case?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 684589)
Time Spent: 1h 10m  (was: 1h)

> Insert overwrite  a table which location is on other cluster fail  in 
> kerberos cluster
> --
>
> Key: HIVE-25401
> URL: https://issues.apache.org/jira/browse/HIVE-25401
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.0, 3.1.2
> Environment: hive 2.3 
> hadoop3 cluster with kerberos 
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-25401.patch, image-2021-07-29-14-25-23-418.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> we have tow hdfs clusters with kerberos security,  it means that mapreduce 
> task need delegation tokens to authenticate namenode when hive on mapreduce 
> run.
> Insert overwrite a table which location is on other cluster fail in kerberos 
> cluster. For example, 
>  # yarn cluster's default fs is hdfs://cluster1
>  # tb1's location is hdfs://cluster1/tb1
>  # tb2's location is hdfs://cluster2/tb2 
>  #  sql `INSERT OVERWRITE TABLE  tb2 SELECT * from tb1` run on yarn cluster 
> will fail
>  
> reduce task error log:
> !image-2021-07-29-14-25-23-418.png!
> How to fix:
> After dig it, web found mapreduce job just obtain delegation tokens for input 
> files in FileInputFormat. But Hive context get extendal scratchDir base on 
> table's location, If the table 's location is on other cluster, the 
> delegation token will not be obtained. 
> So we need to obtaine delegation tokens for hive scratchDirs before hive 
> submit mapreduce job.
>  
> How to test:
> no test
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-24545) jdbc.HiveStatement: Number of rows is greater than Integer.MAX_VALUE

2021-11-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24545?focusedWorklogId=684578=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-684578
 ]

ASF GitHub Bot logged work on HIVE-24545:
-

Author: ASF GitHub Bot
Created on: 22/Nov/21 09:07
Start Date: 22/Nov/21 09:07
Worklog Time Spent: 10m 
  Work Description: abstractdog opened a new pull request #1789:
URL: https://github.com/apache/hive/pull/1789


   ### What changes were proposed in this pull request?
   We should use java.sql.getLargeUpdateCount() where it's possible. 
User-facing case is beeline output.
   
   ### Why are the changes needed?
   Because this can be confusing for the user on beeline output:
   ```
   20/12/16 01:37:36 [main]: WARN jdbc.HiveStatement: Number of rows is greater 
than Integer.MAX_VALUE
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, beeline is supposed to return row numbers > Integer.MAX_VALUE properly.
   
   ### How was this patch tested?
   Not yet tested.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 684578)
Time Spent: 40m  (was: 0.5h)

> jdbc.HiveStatement: Number of rows is greater than Integer.MAX_VALUE
> 
>
> Key: HIVE-24545
> URL: https://issues.apache.org/jira/browse/HIVE-24545
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I found this while IOW on TPCDS 10TB:
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 1 ..  llap SUCCEEDED   4210   421000  
>  0 362
> Reducer 2 ..  llap SUCCEEDED10110100  
>  0   2
> Reducer 3 ..  llap SUCCEEDED   1009   100900  
>  0   1
> --
> VERTICES: 03/03  [==>>] 100%  ELAPSED TIME: 12613.62 s
> --
> 20/12/16 01:37:36 [main]: WARN jdbc.HiveStatement: Number of rows is greater 
> than Integer.MAX_VALUE
> {code}
> my scenario was:
> {code}
> set hive.exec.max.dynamic.partitions=2000;
> drop table if exists test_sales_2;
> create table test_sales_2 like 
> tpcds_bin_partitioned_acid_orc_1.store_sales;
> insert overwrite table test_sales_2 select * from 
> tpcds_bin_partitioned_acid_orc_1.store_sales where ss_sold_date_sk > 
> 2451868;
> {code}
> regarding affected row numbers:
> {code}
> select count(*) from tpcds_bin_partitioned_acid_orc_1.store_sales where 
> ss_sold_date_sk > 2451868;
> +--+
> | _c0  |
> +--+
> | 12287871907  |
> +--+
> {code}
> I guess we should switch to long



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25680) Authorize #get_table_meta HiveMetastore Server API to use any of the HiveMetastore Authorization model

2021-11-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25680?focusedWorklogId=684571=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-684571
 ]

ASF GitHub Bot logged work on HIVE-25680:
-

Author: ASF GitHub Bot
Created on: 22/Nov/21 08:50
Start Date: 22/Nov/21 08:50
Worklog Time Spent: 10m 
  Work Description: shameersss1 removed a comment on pull request #2770:
URL: https://github.com/apache/hive/pull/2770#issuecomment-971208942


   @kgyrtkirk - Could you please re-review?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 684571)
Time Spent: 3h  (was: 2h 50m)

> Authorize #get_table_meta HiveMetastore Server API to use any of the 
> HiveMetastore Authorization model
> --
>
> Key: HIVE-25680
> URL: https://issues.apache.org/jira/browse/HIVE-25680
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Screenshot 2021-11-08 at 2.39.30 PM.png
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> When Apache Hue or any other application which uses #get_table_meta API is 
> not gated to use any of the authorization model which HiveMetastore provides.
> For more information on Storage based Authorization Model : 
> https://cwiki.apache.org/confluence/display/Hive/HCatalog+Authorization
> You can easily reproduce this with Apache Hive + Apache Hue
> {code:java}
>   
> hive.security.metastore.authorization.manager
> 
> org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider
>   
> 
> hive.security.metastore.authenticator.manager
> 
> org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator
>   
> 
> hive.metastore.pre.event.listeners
> 
> org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener
>   
> {code}
> {code:java}
> #!/bin/bash
> set -x
> hdfs dfs -mkdir /datasets
> hdfs dfs -mkdir /datasets/database1
> hdfs dfs -mkdir /datasets/database1/table1
> echo "stefano,1992" | hdfs dfs -put - /datasets/database1/table1/file1.csv
> hdfs dfs -chmod -R 700 /datasets/database1
> sudo tee -a setup.hql > /dev/null < CREATE DATABASE IF NOT EXISTS database1 LOCATION "/datasets/database1";
> CREATE EXTERNAL TABLE IF NOT EXISTS database1.table1 (
>   name string, 
>   year int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> LOCATION
>   '/datasets/database1/table1';
> EOT
> hive -f setup.hql
> {code}
> 1. Login to Hue => create the first user called "admin" and provide a 
> password Access the Hive Editor
> 2. On the SQL section on the left under Databases you should see default and 
> database1 listed. Click on database1
> 3. As you can see a table called table1 is listed => this should not be 
> possible as our admin user has no HDFS grants on /datasets/database1
> 4. run from the Hive editor the following query SHOW TABLES; The output shows 
> a Permission denied error => this is the expected behavior



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25680) Authorize #get_table_meta HiveMetastore Server API to use any of the HiveMetastore Authorization model

[jira] [Resolved] (HIVE-25731) Differentiate between failover revert and complete

[jira] [Work logged] (HIVE-25721) Outer join result is wrong

[jira] [Work logged] (HIVE-25721) Outer join result is wrong

[jira] [Assigned] (HIVE-25731) Differentiate between failover revert and complete

[jira] [Work logged] (HIVE-24969) Predicates may be removed when decorrelating subqueries with lateral

[jira] [Commented] (HIVE-25698) Hive column update performance too low when table partition over 700

[jira] [Commented] (HIVE-25714) Some tests are flaky because docker is not able to start in 5 seconds

[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

[jira] [Resolved] (HIVE-25679) Use serdeContants collection delim in MultiDelimSerDe

[jira] [Work logged] (HIVE-25679) Use serdeContants collection delim in MultiDelimSerDe

[jira] [Work logged] (HIVE-25448) Invalid partition columns when skew with distinct

[jira] [Work logged] (HIVE-25048) Refine the start/end functions in HMSHandler

[jira] [Work logged] (HIVE-25679) Use serdeContants collection delim in MultiDelimSerDe

[jira] [Work logged] (HIVE-24484) Upgrade Hadoop to 3.3.1

[jira] [Resolved] (HIVE-24849) Create external table socket timeout when location has large number of files

[jira] [Work logged] (HIVE-24849) Create external table socket timeout when location has large number of files

[jira] [Work logged] (HIVE-25401) Insert overwrite a table which location is on other cluster fail in kerberos cluster

[jira] [Commented] (HIVE-25721) Outer join result is wrong

[jira] [Work logged] (HIVE-24484) Upgrade Hadoop to 3.3.1

[jira] [Work logged] (HIVE-24830) Revise RowSchema mutability usage

[jira] [Work logged] (HIVE-25048) Refine the start/end functions in HMSHandler

[jira] [Resolved] (HIVE-25727) Iceberg hive catalog should create table object with initialised SerdeParams

[jira] [Commented] (HIVE-25727) Iceberg hive catalog should create table object with initialised SerdeParams

[jira] [Work logged] (HIVE-25352) Optimise DBTokenStore for RDBMS

[jira] [Resolved] (HIVE-25730) Hive column update performance too low when table partition over 700

[jira] [Updated] (HIVE-25352) Optimise DBTokenStore for RDBMS

[jira] [Work logged] (HIVE-25346) cleanTxnToWriteIdTable breaks SNAPSHOT isolation

[jira] [Work logged] (HIVE-25346) cleanTxnToWriteIdTable breaks SNAPSHOT isolation

[jira] [Work logged] (HIVE-25721) Outer join result is wrong

[jira] [Commented] (HIVE-25728) ParseException while gathering Column Stats

[jira] [Work logged] (HIVE-25401) Insert overwrite a table which location is on other cluster fail in kerberos cluster

[jira] [Work logged] (HIVE-24545) jdbc.HiveStatement: Number of rows is greater than Integer.MAX_VALUE

[jira] [Work logged] (HIVE-25680) Authorize #get_table_meta HiveMetastore Server API to use any of the HiveMetastore Authorization model

34 matches

Site Navigation

Mail list logo

Footer information