[jira] [Updated] (HIVE-27985) Avoid duplicate files.

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27985:
--
Affects Version/s: 4.0.0

> Avoid duplicate files.
> --
>
> Key: HIVE-27985
> URL: https://issues.apache.org/jira/browse/HIVE-27985
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 4.0.0
>Reporter: Chenyu Zheng
>Assignee: Chenyu Zheng
>Priority: Major
> Attachments: how tez examples commit.png
>
>
> *1 introducation*
> Hive on Tez occasionally produces duplicated files, especially speculative 
> execution is enable. Hive identifies and removes duplicate files through 
> removeTempOrDuplicateFiles. However, this logic often does not take effect. 
> For example, the killed task attempt may commit files during the execution of 
> this method. Or the files under HIVE_UNION_SUBDIR_X are not recognized during 
> union all. There are many issues to solve these problems, mainly focusing on 
> how to identify duplicate files. *This issue mainly solves this problem by 
> avoiding the generation of duplicate files.*
> *2 How Tez avoids duplicate files?*
> After testing, I found that Hadoop MapReduce examples and Tez examples do not 
> have this problem. Through OutputCommitter, duplicate files can be avoided if 
> designed properly. Let's analyze how Tez avoids duplicate files.
> {color:#172b4d} _Note: Compared with Tez, Hadoop MapReduce has one more 
> commitPending, which is not critical, so only analyzing Tez._{color}
> !how tez examples commit.png|width=778,height=483!
>  
> Let’s analyze this step:
>  * (1) {*}process records{*}: Process records.
>  * (2) {*}send canCommit request{*}: After all Records are processed, call 
> canCommit remotely to AM.
>  * (3) {*}update commitAttempt{*}: After AM receives the canCommit request, 
> it will check whether there are other tasksattempts in the current task that 
> have already executed canCommit. If there is no other taskattempt to execute 
> canCommit first, return true. Otherwise return false. This ensures that only 
> one taskattempt is committed for each task.
>  * (4) {*}return canCommit response{*}: Task receives AM's response. If 
> returns true, it means it can be committed. If false is returned, it means 
> that another task attempt has already executed the commit first, and you 
> cannot commit. The task will jump into (2) loop to execute canCommit until it 
> is killed or other tasks fail.
>  * (5) {*}output.commit{*}: Execute commit, specifically rename the generated 
> temporary file to the final file.
>  * (6) {*}notify succeeded{*}: Although the task has completed the final 
> file, AM still needs to be notified that its work is completed. Therefore, AM 
> needs to be notified through heartbeat that the current task attempt has been 
> completed.
> There is a problem in the above steps. That is, if an exception occurs in the 
> task after (5) and before (6), AM does not know that the Task attempt has 
> been completed, so AM will still start a new task attempt, and the new task 
> attempt will generate a new file, so It will cause duplication. I added code 
> for randomly throwing exceptions between (5) and (6), and found that in fact, 
> Tez example did not produce data duplication. Why? Mainly because the final 
> file generated by which task attempt is the same is the same. When a new task 
> attempt commits and finds that the final file exists (this file was generated 
> by the previous task attempt), it will be deleted firstly, then renamed. 
> Regardless of whether the previous task attempt was committed normally, the 
> last successful task will clear the previous error results.
> To summarize, tez-examples uses two methods to avoid duplicate files:
>  * (1) Avoid repeated commit through canCommit. This is particularly 
> effective for tasks with speculative execution turned on.
>  * (2) The final file names generated by different task attempts are the 
> same. Combined with canCommit, it can be guaranteed that only one file 
> generated in the end, and it can only be generated by a successful task 
> attempt.
> *3 Why can't Hive on Tez avoid duplicate files?*
> Hive on Tez does not have the two mechanisms mentioned in the Tez example.
> First of all, Hive on Tez does not call canCommit.TezProcessor inherited from 
> AbstractLogicalIOProcessor. The logic of canCommit in Tez examples is mainly 
> in SimpleMRProcessor.
> Secondly, the file names generated for each file under Hive on Tez are not 
> same. The file generated by the first attempt of a task is 00_0, and the 
> file generated by the second attempt is 00_1.
> *4 How to improve?*
> Use canCommit to ensure that speculative tasks will not be submitted at the 
> same time. (HIVE-27899)
> Let different task attempts for each task 

[jira] [Commented] (HIVE-28110) MetastoreConf - String casting of default values breaks Hive

2024-04-06 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834519#comment-17834519
 ] 

Ayush Saxena commented on HIVE-28110:
-

I don't think we can do {{toString()}} here. The correct way would be the 
client to call {{getAsString()}}  or {{getBoolVar()}}, There maybe cases where 
the defaultVal object doesn't have its own {{toString()}} in that case it would 
return the value from the Object class.

I don't think it is a bug, but a wrong way of using the method

> MetastoreConf - String casting of default values breaks Hive
> 
>
> Key: HIVE-28110
> URL: https://issues.apache.org/jira/browse/HIVE-28110
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration
>Affects Versions: All Versions
> Environment: Ubuntu 22.04
> VSCode with Extension Pack for Java
> CommitHash: bee33d2018 on Apache Hive master branch 
> (https://github.com/apache/hive)
>Reporter: Dominik Diedrich
>Assignee: tanishqchugh
>Priority: Minor
>  Labels: easyfix, pull-request-available
>
> When using the *getVar(Configuration conf, ConfVars var)* method of the 
> MetastoreConf class, Apache breaks when e.g. trying to retrieve the 
> environment variable "USE_SSL" and it isn't set in the system. The method 
> then tries to cast the default value, which is the boolean false for USE_SSL, 
> to a String which can't work.
>  
> {quote}{{return val == null ? conf.get(var.hiveName, 
> {color:#FF}*(String)var.defaultVal*{color}) : val;}}{quote}
>  
> Also in the *getStringCollection(Configuration conf, ConfVars var)* method it 
> tries to cast any default values to a String.
>  
> Strangely, e.g. in the method *get(Configuration conf, String key)* the 
> default value isn't casted but the .toString() method is called, which should 
> be also done for the 2 methods I mentioned above.
>  
> If nobody has time for that fix, I could open a PR for that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28173) Issues with staging dirs with materialized views on HDFS encrypted table

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28173:
--
Affects Version/s: 4.0.0

> Issues with staging dirs with materialized views on HDFS encrypted table
> 
>
> Key: HIVE-28173
> URL: https://issues.apache.org/jira/browse/HIVE-28173
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
>
> In the materialized view registry thread which runs in the background, there 
> are 2 issues involving staging directories on hdfs encrypted tables
> 1) The staging directory is created at compile time.  For non hdfs encrypted 
> tables, the "mkdir" flag is set to false.  There is no such flag for hdfs 
> encrypted tables.
> 2) The "FileSystem.deleteOnFileExit()" method is not called from the 
> HiveMaterializedViewRegistry thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-24291) Compaction Cleaner prematurely cleans up deltas

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-24291:
--
Affects Version/s: 4.0.0

> Compaction Cleaner prematurely cleans up deltas
> ---
>
> Key: HIVE-24291
> URL: https://issues.apache.org/jira/browse/HIVE-24291
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Since HIVE-23107 the cleaner can clean up deltas that are still used by 
> running queries.
> Example:
>  * TxnId 1-5 writes to a partition, all commits
>  * Compactor starts with txnId=6
>  * Long running query starts with txnId=7, it sees txnId=6 as open in its 
> snapshot
>  * Compaction commits
>  * Cleaner runs
> Previously min_history_level table would have prevented the Cleaner to delete 
> the deltas1-5 until txnId=7 is open, but now they will be deleted and the 
> long running query may fail if its tries to access the files.
> Solution could be to not run the cleaner until any txn is open that was 
> opened before the compaction was committed (CQ_NEXT_TXN_ID)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28143) After HIVE-27492 fix, some HPLSQL built-in functions like trim, lower are not working when used in insert statement.

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28143:
--
Affects Version/s: 4.0.0

> After HIVE-27492 fix, some HPLSQL built-in functions like trim, lower are not 
> working when used in insert statement.
> 
>
> Key: HIVE-28143
> URL: https://issues.apache.org/jira/browse/HIVE-28143
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Affects Versions: 4.0.0
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: pull-request-available
>
> After HIVE-27492 fix, some HPLSQL built-in functions like trim, lower are not 
> working when used in insert statement.
> Steps to reproduce:
> {noformat}
> CREATE TABLE result (name String);
> CREATE PROCEDURE p1(s1 string)
>   BEGIN\n" +
> INSERT INTO result VALUES(lower(s1));
>   END;
> call p1('abcd');
> SELECT * FROM result;{noformat}
> Error reported:
> {noformat}
> ERROR : Ln:3 identifier 'LOWER' must be declared.{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-24291) Compaction Cleaner prematurely cleans up deltas

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-24291:
--
Affects Version/s: (was: 4.0.0)

> Compaction Cleaner prematurely cleans up deltas
> ---
>
> Key: HIVE-24291
> URL: https://issues.apache.org/jira/browse/HIVE-24291
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Since HIVE-23107 the cleaner can clean up deltas that are still used by 
> running queries.
> Example:
>  * TxnId 1-5 writes to a partition, all commits
>  * Compactor starts with txnId=6
>  * Long running query starts with txnId=7, it sees txnId=6 as open in its 
> snapshot
>  * Compaction commits
>  * Cleaner runs
> Previously min_history_level table would have prevented the Cleaner to delete 
> the deltas1-5 until txnId=7 is open, but now they will be deleted and the 
> long running query may fail if its tries to access the files.
> Solution could be to not run the cleaner until any txn is open that was 
> opened before the compaction was committed (CQ_NEXT_TXN_ID)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26339) HIVE-26047 Related LIKE pattern issues

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26339:
--
Target Version/s: 4.1.0

> HIVE-26047 Related LIKE pattern issues
> --
>
> Key: HIVE-26339
> URL: https://issues.apache.org/jira/browse/HIVE-26339
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 4.0.0
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Fixed https://issues.apache.org/jira/browse/HIVE-26047 without using regular 
> expressions. Current code also confirmed that the current regular expression 
> pattern cannot be supported by the following LIKE patterns.
> End pattern
> {code:java}
> %abc\%def {code}
> Start pattern
> {code:java}
> abc\%def% {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26339) HIVE-26047 Related LIKE pattern issues

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26339:
--
Component/s: Vectorization

> HIVE-26047 Related LIKE pattern issues
> --
>
> Key: HIVE-26339
> URL: https://issues.apache.org/jira/browse/HIVE-26339
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 4.0.0
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Fixed https://issues.apache.org/jira/browse/HIVE-26047 without using regular 
> expressions. Current code also confirmed that the current regular expression 
> pattern cannot be supported by the following LIKE patterns.
> End pattern
> {code:java}
> %abc\%def {code}
> Start pattern
> {code:java}
> abc\%def% {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26339) HIVE-26047 Related LIKE pattern issues

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26339:
--
Affects Version/s: 4.0.0

> HIVE-26047 Related LIKE pattern issues
> --
>
> Key: HIVE-26339
> URL: https://issues.apache.org/jira/browse/HIVE-26339
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Fixed https://issues.apache.org/jira/browse/HIVE-26047 without using regular 
> expressions. Current code also confirmed that the current regular expression 
> pattern cannot be supported by the following LIKE patterns.
> End pattern
> {code:java}
> %abc\%def {code}
> Start pattern
> {code:java}
> abc\%def% {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28185) Hive 4.1.0 backlog

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28185:
--
Description: 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20HIVE%20AND%20(priority%20in%20(Critical%2C%20Blocker)%20or%20priority%20%3D%20Major%20AND%20cf%5B12310320%5D%20%3D%204.1.0%20)%20AND%20resolution%20%3D%20Unresolved%20AND%20(affectedVersion%20in%20(4.0.0-alpha-1%2C%204.0.0-alpha-2%2C%204.0.0-beta-1%2C%204.0.0%2C%204.1.0)%20%20or%20affectedVersion%20%3D%20EMPTY)%20and%20NOT%20(cf%5B12310320%5D%20in%20(3.0.0%2C%203.1.0%2C%203.2.0))%20and%20updated%20%3E%3D%20-104w%20ORDER%20BY%20created%20DESC

> Hive 4.1.0 backlog
> --
>
> Key: HIVE-28185
> URL: https://issues.apache.org/jira/browse/HIVE-28185
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20HIVE%20AND%20(priority%20in%20(Critical%2C%20Blocker)%20or%20priority%20%3D%20Major%20AND%20cf%5B12310320%5D%20%3D%204.1.0%20)%20AND%20resolution%20%3D%20Unresolved%20AND%20(affectedVersion%20in%20(4.0.0-alpha-1%2C%204.0.0-alpha-2%2C%204.0.0-beta-1%2C%204.0.0%2C%204.1.0)%20%20or%20affectedVersion%20%3D%20EMPTY)%20and%20NOT%20(cf%5B12310320%5D%20in%20(3.0.0%2C%203.1.0%2C%203.2.0))%20and%20updated%20%3E%3D%20-104w%20ORDER%20BY%20created%20DESC



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28185) Hive 4.1.0 backlog

2024-04-06 Thread Denys Kuzmenko (Jira)
Denys Kuzmenko created HIVE-28185:
-

 Summary: Hive 4.1.0 backlog
 Key: HIVE-28185
 URL: https://issues.apache.org/jira/browse/HIVE-28185
 Project: Hive
  Issue Type: Task
Reporter: Denys Kuzmenko






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-19566) Vectorization: Fix NULL / Wrong Results issues in Complex Type Functions

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-19566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-19566:
--
Issue Type: Test  (was: Bug)

> Vectorization: Fix NULL / Wrong Results issues in Complex Type Functions
> 
>
> Key: HIVE-19566
> URL: https://issues.apache.org/jira/browse/HIVE-19566
> Project: Hive
>  Issue Type: Test
>Reporter: Matt McCline
>Priority: Major
>
> Write new UT tests that use random data and intentional isRepeating batches 
> to checks for NULL and Wrong Results for vectorized Complex Type functions:
>  * index
>  * (StructField)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-19566) Vectorization: Fix NULL / Wrong Results issues in Complex Type Functions

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-19566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-19566:
--
Priority: Major  (was: Critical)

> Vectorization: Fix NULL / Wrong Results issues in Complex Type Functions
> 
>
> Key: HIVE-19566
> URL: https://issues.apache.org/jira/browse/HIVE-19566
> Project: Hive
>  Issue Type: Bug
>Reporter: Matt McCline
>Priority: Major
>
> Write new UT tests that use random data and intentional isRepeating batches 
> to checks for NULL and Wrong Results for vectorized Complex Type functions:
>  * index
>  * (StructField)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-20282) HiveServer2 incorrect queue name when using Tez instead of MR

2024-04-06 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-20282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834516#comment-17834516
 ] 

Denys Kuzmenko commented on HIVE-20282:
---

hi [~steveyeom2017], would you mind raising a PR for that fix

> HiveServer2 incorrect queue name when using Tez instead of MR
> -
>
> Key: HIVE-20282
> URL: https://issues.apache.org/jira/browse/HIVE-20282
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Steve Yeom
>Assignee: Steve Yeom
>Priority: Critical
> Attachments: HIVE-20282.01.patch
>
>
> Ambari -> Tez view has 
> "Hive Queries" and "All DAGs" view pages.
> The queue names from a query id and from its DAG id does not match for Tez 
> engine context.
> The one from a query is not correct.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-20282) HiveServer2 incorrect queue name when using Tez instead of MR

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-20282:
--
Target Version/s: 4.1.0

> HiveServer2 incorrect queue name when using Tez instead of MR
> -
>
> Key: HIVE-20282
> URL: https://issues.apache.org/jira/browse/HIVE-20282
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Steve Yeom
>Assignee: Steve Yeom
>Priority: Critical
> Attachments: HIVE-20282.01.patch
>
>
> Ambari -> Tez view has 
> "Hive Queries" and "All DAGs" view pages.
> The queue names from a query id and from its DAG id does not match for Tez 
> engine context.
> The one from a query is not correct.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-23721) MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-23721:
--
Priority: Major  (was: Critical)

> MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL
> ---
>
> Key: HIVE-23721
> URL: https://issues.apache.org/jira/browse/HIVE-23721
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
> Environment: Hadoop 3.1(1700+ nodes)
> YARN 3.1 (with timelineserver enabled,https enabled)
> Hive 3.1 (15 HS2 instance)
> 6+ YARN Applications every day
>Reporter: YulongZ
>Assignee: Butao Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23721.01.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> From Hive3.0,catalog added to hivemeta,many schema of metastore added column 
> “catName”,and index for table added column “catName”。
> In MetaStoreDirectSql.ensureDbInit() ,two queries below
> “
>   initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == 
> ''"));
>   initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName 
> == ''"));
> ”
> should use "catName == ''" instead of "dbName == ''",because “catName” is the 
> first index column。
> When  data of metastore become large,for example, table of 
> MPartitionColumnStatistics have millions of lines。The 
> “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore 
> executed very slowly,and the query “show tables“ for hiveserver2 executed 
> very slowly too。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-23721) MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-23721:
--
Target Version/s: 4.1.0

> MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL
> ---
>
> Key: HIVE-23721
> URL: https://issues.apache.org/jira/browse/HIVE-23721
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
> Environment: Hadoop 3.1(1700+ nodes)
> YARN 3.1 (with timelineserver enabled,https enabled)
> Hive 3.1 (15 HS2 instance)
> 6+ YARN Applications every day
>Reporter: YulongZ
>Assignee: Butao Zhang
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-23721.01.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> From Hive3.0,catalog added to hivemeta,many schema of metastore added column 
> “catName”,and index for table added column “catName”。
> In MetaStoreDirectSql.ensureDbInit() ,two queries below
> “
>   initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == 
> ''"));
>   initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName 
> == ''"));
> ”
> should use "catName == ''" instead of "dbName == ''",because “catName” is the 
> first index column。
> When  data of metastore become large,for example, table of 
> MPartitionColumnStatistics have millions of lines。The 
> “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore 
> executed very slowly,and the query “show tables“ for hiveserver2 executed 
> very slowly too。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-23721) MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-23721:
--
Component/s: Standalone Metastore

> MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL
> ---
>
> Key: HIVE-23721
> URL: https://issues.apache.org/jira/browse/HIVE-23721
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
> Environment: Hadoop 3.1(1700+ nodes)
> YARN 3.1 (with timelineserver enabled,https enabled)
> Hive 3.1 (15 HS2 instance)
> 6+ YARN Applications every day
>Reporter: YulongZ
>Assignee: Butao Zhang
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-23721.01.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> From Hive3.0,catalog added to hivemeta,many schema of metastore added column 
> “catName”,and index for table added column “catName”。
> In MetaStoreDirectSql.ensureDbInit() ,two queries below
> “
>   initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == 
> ''"));
>   initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName 
> == ''"));
> ”
> should use "catName == ''" instead of "dbName == ''",because “catName” is the 
> first index column。
> When  data of metastore become large,for example, table of 
> MPartitionColumnStatistics have millions of lines。The 
> “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore 
> executed very slowly,and the query “show tables“ for hiveserver2 executed 
> very slowly too。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27274) Unsecure cluster does not need set delagation token when build hms client

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27274:
--
Priority: Major  (was: Critical)

> Unsecure cluster does not need set delagation token when build hms client
> -
>
> Key: HIVE-27274
> URL: https://issues.apache.org/jira/browse/HIVE-27274
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.1.0, 4.0.0-alpha-2
>Reporter: zhaolong
>Priority: Major
> Attachments: image-2023-04-20-15-02-21-917.png
>
>
> In unscure cluster, if set HADOOP_PROXY_USER  will get delegation failed. And 
> HMSClient init failed. And delegation tolen only use when sasl enabled.
> !image-2023-04-20-15-02-21-917.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26877) Parquet CTAS with JOIN on decimals with different precision/scale fail

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26877:
--
Labels: hive-4.1.0-must  (was: )

> Parquet CTAS with JOIN on decimals with different precision/scale fail
> --
>
> Key: HIVE-26877
> URL: https://issues.apache.org/jira/browse/HIVE-26877
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0, 4.0.0-alpha-2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Critical
>  Labels: hive-4.1.0-must
> Attachments: ctas_parquet_join.q
>
>
> Creating a Parquet table using CREATE TABLE AS SELECT syntax (CTAS) leads to 
> runtime error when the SELECT statement joins columns with different 
> precision/scale.
> Steps to reproduce:
> {code:sql}
> CREATE TABLE table_a (col_dec decimal(5,0));
> CREATE TABLE table_b(col_dec decimal(38,10));
> INSERT INTO table_a VALUES (1);
> INSERT INTO table_b VALUES (1.00);
> set hive.default.fileformat=parquet;
> create table target as
> select table_a.col_dec
> from table_a
> left outer join table_b on
> table_a.col_dec = table_b.col_dec;
> {code}
> Stacktrace:
> {noformat}
> 2022-12-20T07:02:52,237  INFO [2dfbd95a-7553-467b-b9d0-629100785502 Listener 
> at 0.0.0.0/46609] reexec.ReExecuteLostAMQueryPlugin: Got exception message: 
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1671548565336_0001_3_02, 
> diagnostics=[Task failed, taskId=task_1671548565336_0001_3_02_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1671548565336_0001_3_02_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Hive Runtime Error while closing operators: Fixed 
> Binary size 16 does not match field type length 3
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators: Fixed Binary size 16 does not match field type length 3
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:379)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:310)
>   ... 15 more
> Caused by: java.lang.IllegalArgumentException: Fixed Binary size 16 does not 
> match field type length 3
>   at 
> org.apache.parquet.column.values.plain.FixedLenByteArrayPlainValuesWriter.writeBytes(FixedLenByteArrayPlainValuesWriter.java:56)
>   at 
> org.apache.parquet.column.impl.ColumnWriterBase.write(ColumnWriterBase.java:174)
>   at 
> org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.addBinary(MessageColumnIO.java:476)
>   at 
> org.apache.parquet.io.RecordConsumerLoggingWrapper.addBinary(RecordConsumerLoggingWrapper.java:116)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$DecimalDataWriter.write(DataWritableWriter.java:571)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:228)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:251)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:115)
>   at 
> 

[jira] [Updated] (HIVE-26877) Parquet CTAS with JOIN on decimals with different precision/scale fail

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26877:
--
Target Version/s: 4.1.0

> Parquet CTAS with JOIN on decimals with different precision/scale fail
> --
>
> Key: HIVE-26877
> URL: https://issues.apache.org/jira/browse/HIVE-26877
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0, 4.0.0-alpha-2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Critical
> Attachments: ctas_parquet_join.q
>
>
> Creating a Parquet table using CREATE TABLE AS SELECT syntax (CTAS) leads to 
> runtime error when the SELECT statement joins columns with different 
> precision/scale.
> Steps to reproduce:
> {code:sql}
> CREATE TABLE table_a (col_dec decimal(5,0));
> CREATE TABLE table_b(col_dec decimal(38,10));
> INSERT INTO table_a VALUES (1);
> INSERT INTO table_b VALUES (1.00);
> set hive.default.fileformat=parquet;
> create table target as
> select table_a.col_dec
> from table_a
> left outer join table_b on
> table_a.col_dec = table_b.col_dec;
> {code}
> Stacktrace:
> {noformat}
> 2022-12-20T07:02:52,237  INFO [2dfbd95a-7553-467b-b9d0-629100785502 Listener 
> at 0.0.0.0/46609] reexec.ReExecuteLostAMQueryPlugin: Got exception message: 
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1671548565336_0001_3_02, 
> diagnostics=[Task failed, taskId=task_1671548565336_0001_3_02_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1671548565336_0001_3_02_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Hive Runtime Error while closing operators: Fixed 
> Binary size 16 does not match field type length 3
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators: Fixed Binary size 16 does not match field type length 3
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:379)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:310)
>   ... 15 more
> Caused by: java.lang.IllegalArgumentException: Fixed Binary size 16 does not 
> match field type length 3
>   at 
> org.apache.parquet.column.values.plain.FixedLenByteArrayPlainValuesWriter.writeBytes(FixedLenByteArrayPlainValuesWriter.java:56)
>   at 
> org.apache.parquet.column.impl.ColumnWriterBase.write(ColumnWriterBase.java:174)
>   at 
> org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.addBinary(MessageColumnIO.java:476)
>   at 
> org.apache.parquet.io.RecordConsumerLoggingWrapper.addBinary(RecordConsumerLoggingWrapper.java:116)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$DecimalDataWriter.write(DataWritableWriter.java:571)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:228)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:251)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:115)
>   at 
> 

[jira] [Updated] (HIVE-26505) Case When Some result data is lost when there are common column conditions and partitioned column conditions

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26505:
--
Target Version/s: 4.1.0

> Case When Some result data is lost when there are common column conditions 
> and partitioned column conditions 
> -
>
> Key: HIVE-26505
> URL: https://issues.apache.org/jira/browse/HIVE-26505
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0, 4.0.0
>Reporter: GuangMing Lu
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: check, pull-request-available
>
> {code:java}https://issues.apache.org/jira/browse/HIVE-26505#
> create table test0831 (id string) partitioned by (cp string);
> insert into test0831 values ('a', '2022-08-23'),('c', '2022-08-23'),('d', 
> '2022-08-23');
> insert into test0831 values ('a', '2022-08-24'),('b', '2022-08-24');
> select * from test0831;
> +-+--+
> | test0831.id | test0831.cp  |
> +-+--+
> | a     | 2022-08-23   |
> | b        | 2022-08-23   |
> | a        | 2022-08-23   |
> | c        | 2022-08-24   |
> | d        | 2022-08-24   |
> +-+--+
> select * from test0831 where (case when id='a' and cp='2022-08-23' then 1 
> else 0 end)=0;  
> +--+--+
> | test0830.id  | test0830.cp  |
> +--+--+
> | a        | 2022-08-24   |
> | b        | 2022-08-24   |
> +--+--+
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26505) Case When Some result data is lost when there are common column conditions and partitioned column conditions

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26505:
--
Labels: check hive-4.1.0-must pull-request-available  (was: check 
pull-request-available)

> Case When Some result data is lost when there are common column conditions 
> and partitioned column conditions 
> -
>
> Key: HIVE-26505
> URL: https://issues.apache.org/jira/browse/HIVE-26505
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0, 4.0.0
>Reporter: GuangMing Lu
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: check, hive-4.1.0-must, pull-request-available
>
> {code:java}https://issues.apache.org/jira/browse/HIVE-26505#
> create table test0831 (id string) partitioned by (cp string);
> insert into test0831 values ('a', '2022-08-23'),('c', '2022-08-23'),('d', 
> '2022-08-23');
> insert into test0831 values ('a', '2022-08-24'),('b', '2022-08-24');
> select * from test0831;
> +-+--+
> | test0831.id | test0831.cp  |
> +-+--+
> | a     | 2022-08-23   |
> | b        | 2022-08-23   |
> | a        | 2022-08-23   |
> | c        | 2022-08-24   |
> | d        | 2022-08-24   |
> +-+--+
> select * from test0831 where (case when id='a' and cp='2022-08-23' then 1 
> else 0 end)=0;  
> +--+--+
> | test0830.id  | test0830.cp  |
> +--+--+
> | a        | 2022-08-24   |
> | b        | 2022-08-24   |
> +--+--+
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-25351) stddev(), stddev_pop() with CBO enable returning null

2024-04-06 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834515#comment-17834515
 ] 

Denys Kuzmenko commented on HIVE-25351:
---

hi [~Dayakar], does it affect Hive-4.0 release?

> stddev(), stddev_pop() with CBO enable returning null
> -
>
> Key: HIVE-25351
> URL: https://issues.apache.org/jira/browse/HIVE-25351
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashish Sharma
>Assignee: Dayakar M
>Priority: Blocker
>  Labels: pull-request-available
>
> *script used to repro*
> create table cbo_test (key string, v1 double, v2 decimal(30,2), v3 
> decimal(30,2));
> insert into cbo_test values ("00140006375905", 10230.72, 
> 10230.72, 10230.69), ("00140006375905", 10230.72, 10230.72, 
> 10230.69), ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69);
> select stddev(v1), stddev(v2), stddev(v3) from cbo_test;
> *Enable CBO*
> ++
> |  Explain   |
> ++
> | Plan optimized by CBO. |
> ||
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)|
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2 vectorized |
> |   File Output Operator [FS_13] |
> | Select Operator [SEL_12] (rows=1 width=24) |
> |   Output:["_col0","_col1","_col2"] |
> |   Group By Operator [GBY_11] (rows=1 width=72) |
> | 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(VALUE._col0)","sum(VALUE._col1)","count(VALUE._col2)","sum(VALUE._col3)","sum(VALUE._col4)","count(VALUE._col5)","sum(VALUE._col6)","sum(VALUE._col7)","count(VALUE._col8)"]
>  |
> |   <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized  |
> | PARTITION_ONLY_SHUFFLE [RS_10] |
> |   Group By Operator [GBY_9] (rows=1 width=72) |
> | 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(_col3)","sum(_col0)","count(_col0)","sum(_col5)","sum(_col4)","count(_col1)","sum(_col7)","sum(_col6)","count(_col2)"]
>  |
> | Select Operator [SEL_8] (rows=6 width=232) |
> |   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7"] |
> |   TableScan [TS_0] (rows=6 width=232) |
> | default@cbo_test,cbo_test, ACID 
> table,Tbl:COMPLETE,Col:COMPLETE,Output:["v1","v2","v3"] |
> ||
> ++
> *Query Result* 
> _c0   _c1 _c2
> 0.0   NaN NaN
> *Disable CBO*
> ++
> |  Explain   |
> ++
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)|
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2 vectorized |
> |   File Output Operator [FS_11] |
> | Group By Operator [GBY_10] (rows=1 width=24) |
> |   
> Output:["_col0","_col1","_col2"],aggregations:["stddev(VALUE._col0)","stddev(VALUE._col1)","stddev(VALUE._col2)"]
>  |
> | <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized|
> |   PARTITION_ONLY_SHUFFLE [RS_9]|
> | Group By Operator [GBY_8] (rows=1 width=240) |
> |   
> Output:["_col0","_col1","_col2"],aggregations:["stddev(v1)","stddev(v2)","stddev(v3)"]
>  |
> |   Select Operator [SEL_7] (rows=6 width=232) |
> | Output:["v1","v2","v3"]|
> | TableScan [TS_0] (rows=6 width=232) |
> |   default@cbo_test,cbo_test, ACID 
> 

[jira] [Updated] (HIVE-23586) load data overwrite into bucket table failed

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-23586:
--
Target Version/s: 4.1.0

> load data overwrite into bucket table failed
> 
>
> Key: HIVE-23586
> URL: https://issues.apache.org/jira/browse/HIVE-23586
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0, 3.1.2, 4.0.0
>Reporter: zhaolong
>Assignee: zhaolong
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-23586.01.patch, image-2020-06-01-21-40-21-726.png, 
> image-2020-06-01-21-41-28-732.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> load data overwrite into bucket table is failed if filename is not like 
> 00_0, but insert new data in the table.
>  
> for example:
> CREATE EXTERNAL TABLE IF NOT EXISTS test_hive2 (name string,account string) 
> PARTITIONED BY (logdate string) CLUSTERED BY (account) INTO 4 BUCKETS row 
> format delimited fields terminated by '|' STORED AS textfile;
>  load data inpath 'hdfs://hacluster/tmp/zltest' overwrite into table 
> default.test_hive2 partition (logdate='20200508');
>  !image-2020-06-01-21-40-21-726.png!
>  load data inpath 'hdfs://hacluster/tmp/zltest' overwrite into table 
> default.test_hive2 partition (logdate='20200508');// should overwrite but 
> insert new data
>  !image-2020-06-01-21-41-28-732.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-23586) load data overwrite into bucket table failed

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-23586:
--
Component/s: HiveServer2

> load data overwrite into bucket table failed
> 
>
> Key: HIVE-23586
> URL: https://issues.apache.org/jira/browse/HIVE-23586
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0, 3.1.2, 4.0.0
>Reporter: zhaolong
>Assignee: zhaolong
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-23586.01.patch, image-2020-06-01-21-40-21-726.png, 
> image-2020-06-01-21-41-28-732.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> load data overwrite into bucket table is failed if filename is not like 
> 00_0, but insert new data in the table.
>  
> for example:
> CREATE EXTERNAL TABLE IF NOT EXISTS test_hive2 (name string,account string) 
> PARTITIONED BY (logdate string) CLUSTERED BY (account) INTO 4 BUCKETS row 
> format delimited fields terminated by '|' STORED AS textfile;
>  load data inpath 'hdfs://hacluster/tmp/zltest' overwrite into table 
> default.test_hive2 partition (logdate='20200508');
>  !image-2020-06-01-21-40-21-726.png!
>  load data inpath 'hdfs://hacluster/tmp/zltest' overwrite into table 
> default.test_hive2 partition (logdate='20200508');// should overwrite but 
> insert new data
>  !image-2020-06-01-21-41-28-732.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-13157) MetaStoreEventListener.onAlter triggered for INSERT and SELECT

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-13157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-13157:
--
Priority: Major  (was: Critical)

> MetaStoreEventListener.onAlter triggered for INSERT and SELECT
> --
>
> Key: HIVE-13157
> URL: https://issues.apache.org/jira/browse/HIVE-13157
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1, 3.1.3, 4.0.0
>Reporter: Eugen Stoianovici
>Priority: Major
>  Labels: obsolete?
>
> The event onAlter from 
> org.apache.hadoop.hive.metastore.MetaStoreEventListener is triggered when 
> INSERT or SELECT statements are executed on the target table.
> Furthermore, the value of transient_lastDdl is updated in table properties 
> for INSERT statements.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-13484) LLAPTaskScheduler should handle situations where the node may be blacklisted by Tez

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-13484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-13484:
--
Priority: Major  (was: Critical)

> LLAPTaskScheduler should handle situations where the node may be blacklisted 
> by Tez
> ---
>
> Key: HIVE-13484
> URL: https://issues.apache.org/jira/browse/HIVE-13484
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Major
>  Labels: obsolete?
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-13484) LLAPTaskScheduler should handle situations where the node may be blacklisted by Tez

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-13484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-13484:
--
Labels: obsolete?  (was: )

> LLAPTaskScheduler should handle situations where the node may be blacklisted 
> by Tez
> ---
>
> Key: HIVE-13484
> URL: https://issues.apache.org/jira/browse/HIVE-13484
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Critical
>  Labels: obsolete?
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-11117) Hive external table - skip header and trailer property issue

2024-04-06 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834513#comment-17834513
 ] 

Denys Kuzmenko commented on HIVE-7:
---

MR engine is deprecated in Hive-4.0. 
[~jana_chander], please set the relevant affected versions. 

> Hive external table - skip header and trailer property issue
> 
>
> Key: HIVE-7
> URL: https://issues.apache.org/jira/browse/HIVE-7
> Project: Hive
>  Issue Type: Bug
> Environment: Production
>Reporter: Janarthanan
>Priority: Critical
>  Labels: check, mapreduce
>
> I am using an external hive table pointing to a HDFS location. The external 
> table is partitioned on year/mm/dd folders. When there are more than one 
> partition folder (ex: /2015/01/02/file.txt & /2015/01/03/file2.txt), the 
> select on external table, skips the DATA RECORD instead of skipping the 
> header/trailer record from one of the file). 
> tblproperties ("skip.header.line.count"="1");
> Resolution: On enabling hive.input format instead of text input format and 
> execution using TEZ engine instead of MapReduce resovled the issue. 
> How to resolve the problem without setting these parameters ? I don't want to 
> run the hive query using TEZ.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27744) privileges check is skipped when using partly dynamic partition write.

2024-04-06 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834512#comment-17834512
 ] 

Denys Kuzmenko commented on HIVE-27744:
---

hi [~shuaiqi.guo], is that relevant to Hive-4.0?

> privileges check is skipped when using partly dynamic partition write.
> --
>
> Key: HIVE-27744
> URL: https://issues.apache.org/jira/browse/HIVE-27744
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: All Versions
>Reporter: shuaiqi.guo
>Assignee: shuaiqi.guo
>Priority: Blocker
> Fix For: 2.3.5
>
> Attachments: HIVE-27744.patch
>
>
> the privileges check will be skiped when using dynamic partition write with 
> part of the partition specified, just like the following example:
> {code:java}
> insert overwrite table test_privilege partition (`date` = '2023-09-27', hour)
> ... {code}
> hive will execute it directly without checking write privileges.
>  
> use the following patch to fix this bug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27944) When HIVE-LLAP reads the ICEBERG table, a deadlock may occur.

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27944:
--
Target Version/s: 4.1.0

> When HIVE-LLAP reads the ICEBERG table, a deadlock may occur.
> -
>
> Key: HIVE-27944
> URL: https://issues.apache.org/jira/browse/HIVE-27944
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.3, 4.0.0, 4.0.0-beta-1
>Reporter: yongzhi.shao
>Assignee: yongzhi.shao
>Priority: Critical
>  Labels: hive-4.1.0-must, pull-request-available
> Attachments: image-2023-12-08-14-17-53-822.png, 
> image-2023-12-08-14-22-18-998.png, image-2023-12-10-16-24-34-351.png
>
>
> We found that org.apache.hadoop.hive.ql.plan.PartitionDesc.equals() may 
> deadlock in a multithreaded environment.
> Here's the deadlock information we've gathered: 
> {code:java}
> "DAG44-Input-4-16" Id=161 BLOCKED on 
> org.apache.hadoop.hive.common.CopyOnFirstWriteProperties@44196d35 owned by 
> "DAG44-Input-4-15" Id=160
>     at 
> org.apache.hadoop.hive.common.CopyOnFirstWriteProperties.size(CopyOnFirstWriteProperties.java:315)
>     -  blocked on 
> org.apache.hadoop.hive.common.CopyOnFirstWriteProperties@44196d35
>     at java.util.Hashtable.equals(Hashtable.java:801)
>     -  locked java.util.Properties@77a541be < but blocks 3 other threads!
>     at 
> org.apache.hadoop.hive.common.CopyOnFirstWriteProperties.equals(CopyOnFirstWriteProperties.java:213)
>     -  locked 
> org.apache.hadoop.hive.common.CopyOnFirstWriteProperties@2d973aa3
>     at 
> org.apache.hadoop.hive.ql.plan.PartitionDesc.equals(PartitionDesc.java:327)
>     at java.util.AbstractMap.equals(AbstractMap.java:495)
>     at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:940)
>     at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getFromPathRecursively(HiveFileFormatUtils.java:374)
>     at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getFromPathRecursively(HiveFileFormatUtils.java:359)
>     at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getFromPathRecursively(HiveFileFormatUtils.java:354)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.SplitGrouper.schemaEvolved(SplitGrouper.java:278)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.SplitGrouper.generateGroupedSplits(SplitGrouper.java:183)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.SplitGrouper.generateGroupedSplits(SplitGrouper.java:160)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:287)
>  {code}
> Since the Properties object implement HashTable interface, all the methods of 
> the HashTable interface are synchronised.
> In a multi-threaded environment, a deadlock will occur when 
> propA.equals(propB)  and propB.equals(propA) occur at the same time.
>  
> I have a fix-idea for this, when we call CopyOnFirstWriteProperties.equals(), 
> we can do a copy of the object within this method. Compare it with the copied 
> object. If there are no problems with this solution, I will submit a PR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27944) When HIVE-LLAP reads the ICEBERG table, a deadlock may occur.

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27944:
--
Labels: hive-4.1.0-must pull-request-available  (was: 
pull-request-available)

> When HIVE-LLAP reads the ICEBERG table, a deadlock may occur.
> -
>
> Key: HIVE-27944
> URL: https://issues.apache.org/jira/browse/HIVE-27944
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.3, 4.0.0, 4.0.0-beta-1
>Reporter: yongzhi.shao
>Assignee: yongzhi.shao
>Priority: Critical
>  Labels: hive-4.1.0-must, pull-request-available
> Attachments: image-2023-12-08-14-17-53-822.png, 
> image-2023-12-08-14-22-18-998.png, image-2023-12-10-16-24-34-351.png
>
>
> We found that org.apache.hadoop.hive.ql.plan.PartitionDesc.equals() may 
> deadlock in a multithreaded environment.
> Here's the deadlock information we've gathered: 
> {code:java}
> "DAG44-Input-4-16" Id=161 BLOCKED on 
> org.apache.hadoop.hive.common.CopyOnFirstWriteProperties@44196d35 owned by 
> "DAG44-Input-4-15" Id=160
>     at 
> org.apache.hadoop.hive.common.CopyOnFirstWriteProperties.size(CopyOnFirstWriteProperties.java:315)
>     -  blocked on 
> org.apache.hadoop.hive.common.CopyOnFirstWriteProperties@44196d35
>     at java.util.Hashtable.equals(Hashtable.java:801)
>     -  locked java.util.Properties@77a541be < but blocks 3 other threads!
>     at 
> org.apache.hadoop.hive.common.CopyOnFirstWriteProperties.equals(CopyOnFirstWriteProperties.java:213)
>     -  locked 
> org.apache.hadoop.hive.common.CopyOnFirstWriteProperties@2d973aa3
>     at 
> org.apache.hadoop.hive.ql.plan.PartitionDesc.equals(PartitionDesc.java:327)
>     at java.util.AbstractMap.equals(AbstractMap.java:495)
>     at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:940)
>     at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getFromPathRecursively(HiveFileFormatUtils.java:374)
>     at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getFromPathRecursively(HiveFileFormatUtils.java:359)
>     at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getFromPathRecursively(HiveFileFormatUtils.java:354)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.SplitGrouper.schemaEvolved(SplitGrouper.java:278)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.SplitGrouper.generateGroupedSplits(SplitGrouper.java:183)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.SplitGrouper.generateGroupedSplits(SplitGrouper.java:160)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:287)
>  {code}
> Since the Properties object implement HashTable interface, all the methods of 
> the HashTable interface are synchronised.
> In a multi-threaded environment, a deadlock will occur when 
> propA.equals(propB)  and propB.equals(propA) occur at the same time.
>  
> I have a fix-idea for this, when we call CopyOnFirstWriteProperties.equals(), 
> we can do a copy of the object within this method. Compare it with the copied 
> object. If there are no problems with this solution, I will submit a PR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26986) A DAG created by OperatorGraph is not equal to the Tez DAG.

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26986:
--
Labels: hive-4.1.0-must pull-request-available  (was: 
pull-request-available)

> A DAG created by OperatorGraph is not equal to the Tez DAG.
> ---
>
> Key: HIVE-26986
> URL: https://issues.apache.org/jira/browse/HIVE-26986
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 4.0.0-alpha-2
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: hive-4.1.0-must, pull-request-available
> Attachments: Query71 OperatorGraph.png, Query71 TezDAG.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> A DAG created by OperatorGraph is not equal to the corresponding DAG that is 
> submitted to Tez.
> Because of this problem, ParallelEdgeFixer reports a pair of normal edges to 
> a parallel edge.
> We observe this problem by comparing OperatorGraph and Tez DAG when running 
> TPC-DS query 71 on 1TB ORC format managed table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26986) A DAG created by OperatorGraph is not equal to the Tez DAG.

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26986:
--
Target Version/s: 4.1.0

> A DAG created by OperatorGraph is not equal to the Tez DAG.
> ---
>
> Key: HIVE-26986
> URL: https://issues.apache.org/jira/browse/HIVE-26986
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 4.0.0-alpha-2
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: pull-request-available
> Attachments: Query71 OperatorGraph.png, Query71 TezDAG.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> A DAG created by OperatorGraph is not equal to the corresponding DAG that is 
> submitted to Tez.
> Because of this problem, ParallelEdgeFixer reports a pair of normal edges to 
> a parallel edge.
> We observe this problem by comparing OperatorGraph and Tez DAG when running 
> TPC-DS query 71 on 1TB ORC format managed table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26654) Test with the TPC-DS benchmark

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26654:
--
Target Version/s:   (was: 4.0.0)

> Test with the TPC-DS benchmark 
> ---
>
> Key: HIVE-26654
> URL: https://issues.apache.org/jira/browse/HIVE-26654
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0-alpha-2
>Reporter: Sungwoo Park
>Priority: Major
>  Labels: hive-4.0.0-must
>
> This Jira reports the result of running system tests using the TPC-DS 
> benchmark. The test scenario is:
> 1) create a database consisting of external tables from a 100GB or 1TB TPC-DS 
> text dataset
> 2) load a database consisting of ORC tables
> 3) compute column statistics
> 4) run TPC-DS queries
> 5) check the results for correctness
> For step 5), we will compare the results against Hive 3 (which has been 
> tested against SparkSQL and Presto). We use Hive on Tez.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26654) Test with the TPC-DS benchmark

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26654:
--
Priority: Major  (was: Blocker)

> Test with the TPC-DS benchmark 
> ---
>
> Key: HIVE-26654
> URL: https://issues.apache.org/jira/browse/HIVE-26654
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0-alpha-2
>Reporter: Sungwoo Park
>Priority: Major
>  Labels: hive-4.0.0-must
>
> This Jira reports the result of running system tests using the TPC-DS 
> benchmark. The test scenario is:
> 1) create a database consisting of external tables from a 100GB or 1TB TPC-DS 
> text dataset
> 2) load a database consisting of ORC tables
> 3) compute column statistics
> 4) run TPC-DS queries
> 5) check the results for correctness
> For step 5), we will compare the results against Hive 3 (which has been 
> tested against SparkSQL and Presto). We use Hive on Tez.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28120) When insert overwrite the iceberg table, data will loss if the sql contains union all

2024-04-06 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834511#comment-17834511
 ] 

Denys Kuzmenko commented on HIVE-28120:
---

[~xinmingchang], is this ticket still valid, or could be closed?

> When insert overwrite the iceberg table, data will loss if the sql contains 
> union all
> -
>
> Key: HIVE-28120
> URL: https://issues.apache.org/jira/browse/HIVE-28120
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
> Environment: hadoop version: 3.3.1
> hive version: 4.0.0-beta-1
> iceberg version: 1.3.0
>Reporter: xinmingchang
>Priority: Critical
>
> {{(1)}}
> create table tmp.test_iceberg_overwrite_union_all(
> a string
> )
> stored by iceberg
> ;
> {{(2)}}
> insert overwrite table tmp.test_iceberg_overwrite_union_all
> select distinct 'a' union all select distinct 'b';
> {{(3)}}
> select * from tmp.test_iceberg_overwrite_union_all;
>  
> the result only has one record:
> +-+
> | test_iceberg_overwrite_union_all.a  |
> +-+
> | a                                   |
> +-+
> According to the hiveserver log, this query will start two jobs, and each job 
> will be committed. The problem is that the job that is committed later is 
> also an overwrite, causing the result of the first commit to be overwritten. 
> like this:
> 2024-03-05T22:10:12,995 INFO  [iceberg-commit-table-pool-0]: 
> hive.HiveIcebergOutputCommitter () - Committing job has started for table: 
> default_iceberg.tmp.test_iceberg_overwrite_union_all
> 2024-03-05T22:10:13,081 INFO  [iceberg-commit-table-pool-1]: 
> hive.HiveIcebergOutputCommitter () - Committing job has started for table: 
> default_iceberg.tmp.test_iceberg_overwrite_union_all
> 2024-03-05T22:10:15,152 INFO  [iceberg-commit-table-pool-0]: 
> hive.HiveIcebergOutputCommitter () - Overwrite commit took 2157 ms for table: 
> default_iceberg.tmp.test_iceberg_overwrite_union_all with 1 file(s)
> 2024-03-05T22:10:16,980 INFO  [iceberg-commit-table-pool-1]: 
> hive.HiveIcebergOutputCommitter () - Overwrite commit took 3899 ms for table: 
> default_iceberg.tmp.test_iceberg_overwrite_union_all with 1 file(s)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28120) When insert overwrite the iceberg table, data will loss if the sql contains union all

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28120:
--
Priority: Major  (was: Critical)

> When insert overwrite the iceberg table, data will loss if the sql contains 
> union all
> -
>
> Key: HIVE-28120
> URL: https://issues.apache.org/jira/browse/HIVE-28120
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
> Environment: hadoop version: 3.3.1
> hive version: 4.0.0-beta-1
> iceberg version: 1.3.0
>Reporter: xinmingchang
>Priority: Major
>
> {{(1)}}
> create table tmp.test_iceberg_overwrite_union_all(
> a string
> )
> stored by iceberg
> ;
> {{(2)}}
> insert overwrite table tmp.test_iceberg_overwrite_union_all
> select distinct 'a' union all select distinct 'b';
> {{(3)}}
> select * from tmp.test_iceberg_overwrite_union_all;
>  
> the result only has one record:
> +-+
> | test_iceberg_overwrite_union_all.a  |
> +-+
> | a                                   |
> +-+
> According to the hiveserver log, this query will start two jobs, and each job 
> will be committed. The problem is that the job that is committed later is 
> also an overwrite, causing the result of the first commit to be overwritten. 
> like this:
> 2024-03-05T22:10:12,995 INFO  [iceberg-commit-table-pool-0]: 
> hive.HiveIcebergOutputCommitter () - Committing job has started for table: 
> default_iceberg.tmp.test_iceberg_overwrite_union_all
> 2024-03-05T22:10:13,081 INFO  [iceberg-commit-table-pool-1]: 
> hive.HiveIcebergOutputCommitter () - Committing job has started for table: 
> default_iceberg.tmp.test_iceberg_overwrite_union_all
> 2024-03-05T22:10:15,152 INFO  [iceberg-commit-table-pool-0]: 
> hive.HiveIcebergOutputCommitter () - Overwrite commit took 2157 ms for table: 
> default_iceberg.tmp.test_iceberg_overwrite_union_all with 1 file(s)
> 2024-03-05T22:10:16,980 INFO  [iceberg-commit-table-pool-1]: 
> hive.HiveIcebergOutputCommitter () - Overwrite commit took 3899 ms for table: 
> default_iceberg.tmp.test_iceberg_overwrite_union_all with 1 file(s)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-24167) TPC-DS query 14 fails while generating plan for the filter

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-24167:
--
Target Version/s: 4.1.0

> TPC-DS query 14 fails while generating plan for the filter
> --
>
> Key: HIVE-24167
> URL: https://issues.apache.org/jira/browse/HIVE-24167
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: hive-4.1.0-must, pull-request-available
>
> TPC-DS query 14 (cbo_query14.q and query4.q) fail with NPE on the metastore 
> with the partitioned TPC-DS 30TB dataset while generating the plan for the 
> filter.
> The problem can be reproduced using the PR in HIVE-23965.
> The current stacktrace shows that the NPE appears while trying to display the 
> debug message but even if this line didn't exist it would fail again later on.
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10867)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11765)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11635)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlanForSubQueryPredicate(SemanticAnalyzer.java:3375)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:3473)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10819)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11765)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11625)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11625)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11635)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12417)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:718)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12519)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:443)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220)
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:173)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:414)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:363)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:357)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:129)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:231)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:740)
> at 
> 

[jira] [Updated] (HIVE-27226) FullOuterJoin with filter expressions is not computed correctly

2024-04-06 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27226:
--
Target Version/s: 4.1.0

> FullOuterJoin with filter expressions is not computed correctly
> ---
>
> Key: HIVE-27226
> URL: https://issues.apache.org/jira/browse/HIVE-27226
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-beta-1
>Reporter: Seonggon Namgung
>Priority: Major
>  Labels: hive-4.1.0-must, known_issue
>
> I tested many OuterJoin queries as an extension of HIVE-27138, and I found 
> that Hive returns incorrect result for a query containing FullOuterJoin with 
> filter expressions. In a nutshell, all JoinOperators that run on Tez engine 
> return incorrect result for OuterJoin queries, and one of the reason for 
> incorrect computation comes from CommonJoinOperator, which is the base of all 
> JoinOperators. I attached the queries and configuration that I used at the 
> bottom of the document. I am still inspecting this problems, and I will share 
> an update once when I find out another reason. Also any comments and opinions 
> would be appreciated.
> First of all, I observed that current Hive ignores filter expressions 
> contained in MapJoinOperator. For example, the attached result of query1 
> shows that MapJoinOperator performs inner join, not full outer join. This 
> problem stems from removal of filterMap. When converting JoinOperator to 
> MapJoinOperator, ConvertJoinMapJoin#convertJoinDynamicPartitionedHashJoin() 
> removes filterMap of MapJoinOperator. Because MapJoinOperator does not 
> evaluate filter expressions if filterMap is null, this change makes 
> MapJoinOperator ignore filter expressions and it always joins tables 
> regardless whether they satisfy filter expressions or not. To solve this 
> problem, I disable FullOuterMapJoinOptimization and apply path for 
> HIVE-27138, which prevents NPE. (The patch is available at the following 
> link: LINK.) The rest of this document uses this modified Hive, but most of 
> problems happen to current Hive, too.
> The second problem I found is that Hive returns the same left-null or 
> right-null rows multiple time when it uses MapJoinOperator or 
> CommonMergeJoinOperator. This is caused by the logic of current 
> CommonJoinOperator. Both of the two JoinOperators joins tables in 2 steps. 
> First, they create RowContainers, each of which is a group of rows from one 
> table and has the same key. Second, they call 
> CommonJoinOperator#checkAndGenObject() with created RowContainers. This 
> method checks filterTag of each row in RowContainers and forwards joined row 
> if they meet all filter conditions. For OuterJoin, checkAndGenObject() 
> forwards non-matching rows if there is no matching row in RowContainer. The 
> problem happens when there are multiple RowContainer for the same key and 
> table. For example, suppose that there are two left RowContainers and one 
> right RowContainer. If none of the row in two left RowContainers satisfies 
> filter condition, then checkAndGenObject() will forward Left-Null row for 
> each right row. Because checkAndGenObject() is called with each left 
> RowContainer, there will be two duplicated Left-Null rows for every right row.
> In the case of MapJoinOperator, it always creates singleton RowContainer for 
> big table. Therefore, it always produces duplicated non-matching rows. 
> CommonMergeJoinOperator also creates multiple RowContainer for big table, 
> whose size is hive.join.emit.interval. In the below experiment, I also set 
> hive.join.shortcut.unmatched.rows=false, and hive.exec.reducers.max=1 to 
> disable specialized algorithm for OuterJoin of 2 tables and force calling 
> checkAndGenObject() before all rows with the same keys are gathered. I didn't 
> observe this problem when using VectorMapJoinOperator, and I will inspect 
> VectorMapJoinOperator whether we can reproduce the problem with it.
> I think the second problem is not limited to FullOuterJoin, but I couldn't 
> find such query as of now. This will also be added to this issue if I can 
> write a query that reproduces the second problem without FullOuterJoin.
> I also found that Hive returns wrong result for query2 even when I used 
> VectorMapJoinOperator. I am still inspecting this problem and I will add an 
> update on it when I find out the reason.
>  
> Experiment:
>  
> {code:java}
>  Configuration
> set hive.optimize.shared.work=false;
> -- Std MapJoin
> set hive.auto.convert.join=true;
> set hive.vectorized.execution.enabled=false;
> -- Vec MapJoin
> set hive.auto.convert.join=true;
> set hive.vectorized.execution.enabled=true;
> -- MergeJoin
> set hive.auto.convert.join=false;
> set hive.vectorized.execution.enabled=false;
>