[jira] [Commented] (HIVE-27926) Iceberg: Allow restricting Iceberg data file reads to table location

2023-12-04 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17793133#comment-17793133
 ] 

Ayush Saxena commented on HIVE-27926:
-

Committed to master & branch-4.0

Thanx [~dkuzmenko] for the review!!!

> Iceberg: Allow restricting Iceberg data file reads to table location
> 
>
> Key: HIVE-27926
> URL: https://issues.apache.org/jira/browse/HIVE-27926
> Project: Hive
>  Issue Type: Sub-task
>  Components: Iceberg integration
>Affects Versions: 4.0.0-alpha-2
>Reporter: Janos Kovacs
>Assignee: Ayush Saxena
>Priority: Blocker
>
> As the first quick solution there should be a configuration flag to allow us 
> to restrict Iceberg reads to data files located only inside of the table 
> locations.
> e.g. with the following definition
> {noformat}
> CREATE EXTERNAL TABLE default.iceloc1 (txt string, secret string)
> STORED BY ICEBERG 
> LOCATION '/data/hive/external/iceloc1/'
> TBLPROPERTIES (
>   'external.table.purge'='true',
>   'write.metadata.path'='/data/ice/meta/iceloc1/',
>   'write.data.path'='/data/ice/data/iceloc1/');
> {noformat}
> The restricted location should be 
> {noformat}
> /data/hive/external/iceloc1/
> {noformat}
> Note: this configuration should not be enabled by default as this breaks 
> Iceberg's functionality storing data files in different locations and would 
> only be useful when users use iceberg only as standard external tables with 
> meta+data under table location.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27926) Iceberg: Allow restricting Iceberg data file reads to table location

2023-12-04 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HIVE-27926.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Iceberg: Allow restricting Iceberg data file reads to table location
> 
>
> Key: HIVE-27926
> URL: https://issues.apache.org/jira/browse/HIVE-27926
> Project: Hive
>  Issue Type: Sub-task
>  Components: Iceberg integration
>Affects Versions: 4.0.0-alpha-2
>Reporter: Janos Kovacs
>Assignee: Ayush Saxena
>Priority: Blocker
> Fix For: 4.0.0
>
>
> As the first quick solution there should be a configuration flag to allow us 
> to restrict Iceberg reads to data files located only inside of the table 
> locations.
> e.g. with the following definition
> {noformat}
> CREATE EXTERNAL TABLE default.iceloc1 (txt string, secret string)
> STORED BY ICEBERG 
> LOCATION '/data/hive/external/iceloc1/'
> TBLPROPERTIES (
>   'external.table.purge'='true',
>   'write.metadata.path'='/data/ice/meta/iceloc1/',
>   'write.data.path'='/data/ice/data/iceloc1/');
> {noformat}
> The restricted location should be 
> {noformat}
> /data/hive/external/iceloc1/
> {noformat}
> Note: this configuration should not be enabled by default as this breaks 
> Iceberg's functionality storing data files in different locations and would 
> only be useful when users use iceberg only as standard external tables with 
> meta+data under table location.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27936) Flaky test testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites

2023-12-04 Thread Butao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Butao Zhang updated HIVE-27936:
---
Description: 
[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4917/1/tests/]

[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4909/2/tests/]

[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4905/2/tests/]

[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4913/1/tests/]

[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4894/2/tests]

 

It seems this test is flaky. I often seen the test failed with execption:
{code:java}
ORC split generation failed with exception: org.apache.orc.FileFormatException: 
Malformed ORC file 
hdfs://localhost:43317/warehouse1/replicated_testbootstrapacidtablesduringincrementalwithconcurrentwrites_1701444759339.db/t1/-tmp.delta_003_003_/02_0.manifest.
 Invalid postscript. {code}

  was:
[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4917/1/tests/]

[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4909/2/tests/]

[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4905/2/tests/]

[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4913/1/tests/]

[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4894/2/tIt
 
|http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4894/2/tests/]

It seems this test is flaky. I often seen the test failed with execption:
{code:java}
ORC split generation failed with exception: org.apache.orc.FileFormatException: 
Malformed ORC file 
hdfs://localhost:43317/warehouse1/replicated_testbootstrapacidtablesduringincrementalwithconcurrentwrites_1701444759339.db/t1/-tmp.delta_003_003_/02_0.manifest.
 Invalid postscript. {code}


> Flaky test testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites
> ---
>
> Key: HIVE-27936
> URL: https://issues.apache.org/jira/browse/HIVE-27936
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Butao Zhang
>Priority: Major
>
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4917/1/tests/]
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4909/2/tests/]
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4905/2/tests/]
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4913/1/tests/]
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4894/2/tests]
>  
> It seems this test is flaky. I often seen the test failed with execption:
> {code:java}
> ORC split generation failed with exception: 
> org.apache.orc.FileFormatException: Malformed ORC file 
> hdfs://localhost:43317/warehouse1/replicated_testbootstrapacidtablesduringincrementalwithconcurrentwrites_1701444759339.db/t1/-tmp.delta_003_003_/02_0.manifest.
>  Invalid postscript. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27936) Flaky test testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites

2023-12-04 Thread Butao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Butao Zhang updated HIVE-27936:
---
Description: 
[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4917/1/tests/]

[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4909/2/tests/]

[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4905/2/tests/]

[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4913/1/tests/]

[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4894/2/tIt
 
|http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4894/2/tests/]

It seems this test is flaky. I often seen the test failed with execption:
{code:java}
ORC split generation failed with exception: org.apache.orc.FileFormatException: 
Malformed ORC file 
hdfs://localhost:43317/warehouse1/replicated_testbootstrapacidtablesduringincrementalwithconcurrentwrites_1701444759339.db/t1/-tmp.delta_003_003_/02_0.manifest.
 Invalid postscript. {code}

  was:
[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4917/1/tests/]

[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4909/2/tests/]

[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4905/2/tests/]

[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4913/1/tests/]

 
{code:java}
ORC split generation failed with exception: org.apache.orc.FileFormatException: 
Malformed ORC file 
hdfs://localhost:43317/warehouse1/replicated_testbootstrapacidtablesduringincrementalwithconcurrentwrites_1701444759339.db/t1/-tmp.delta_003_003_/02_0.manifest.
 Invalid postscript. {code}


> Flaky test testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites
> ---
>
> Key: HIVE-27936
> URL: https://issues.apache.org/jira/browse/HIVE-27936
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Butao Zhang
>Priority: Major
>
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4917/1/tests/]
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4909/2/tests/]
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4905/2/tests/]
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4913/1/tests/]
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4894/2/tIt
>  
> |http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4894/2/tests/]
> It seems this test is flaky. I often seen the test failed with execption:
> {code:java}
> ORC split generation failed with exception: 
> org.apache.orc.FileFormatException: Malformed ORC file 
> hdfs://localhost:43317/warehouse1/replicated_testbootstrapacidtablesduringincrementalwithconcurrentwrites_1701444759339.db/t1/-tmp.delta_003_003_/02_0.manifest.
>  Invalid postscript. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27936) Flaky test testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites

2023-12-04 Thread Butao Zhang (Jira)
Butao Zhang created HIVE-27936:
--

 Summary: Flaky test 
testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites
 Key: HIVE-27936
 URL: https://issues.apache.org/jira/browse/HIVE-27936
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Butao Zhang


[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4917/1/tests/]

[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4909/2/tests/]

[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4905/2/tests/]

[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4913/1/tests/]

 
{code:java}
ORC split generation failed with exception: org.apache.orc.FileFormatException: 
Malformed ORC file 
hdfs://localhost:43317/warehouse1/replicated_testbootstrapacidtablesduringincrementalwithconcurrentwrites_1701444759339.db/t1/-tmp.delta_003_003_/02_0.manifest.
 Invalid postscript. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27935) Add qtest for Avro invalid schema and field names

2023-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27935:
--
Labels: pull-request-available  (was: )

> Add qtest for Avro invalid schema and field names
> -
>
> Key: HIVE-27935
> URL: https://issues.apache.org/jira/browse/HIVE-27935
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0-beta-1
>Reporter: Akshat Mathur
>Assignee: Akshat Mathur
>Priority: Major
>  Labels: pull-request-available
>
> Add qtest to verify working of AVRO-3827 and AVRO-3820



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-27935) Add qtest for Avro invalid schema and field names

2023-12-04 Thread Akshat Mathur (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-27935 started by Akshat Mathur.

> Add qtest for Avro invalid schema and field names
> -
>
> Key: HIVE-27935
> URL: https://issues.apache.org/jira/browse/HIVE-27935
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0-beta-1
>Reporter: Akshat Mathur
>Assignee: Akshat Mathur
>Priority: Major
>
> Add qtest to verify working of AVRO-3827 and AVRO-3820



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27935) Add qtest for Avro invalid schema and field names

2023-12-04 Thread Akshat Mathur (Jira)
Akshat Mathur created HIVE-27935:


 Summary: Add qtest for Avro invalid schema and field names
 Key: HIVE-27935
 URL: https://issues.apache.org/jira/browse/HIVE-27935
 Project: Hive
  Issue Type: Improvement
Affects Versions: 4.0.0-beta-1
Reporter: Akshat Mathur
Assignee: Akshat Mathur


Add qtest to verify working of AVRO-3827 and AVRO-3820



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27931) Update documentation with new features/improvements

2023-12-04 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-27931:

Description: 
Improve wiki documentation for new features/improvements coming in 4.0 release

 

https://cwiki.apache.org/confluence/display/Hive/Apache+Hive+4.0.0

  was:Improve wiki documentation for new features/improvements coming in 4.0 
release


> Update documentation with new features/improvements
> ---
>
> Key: HIVE-27931
> URL: https://issues.apache.org/jira/browse/HIVE-27931
> Project: Hive
>  Issue Type: Task
>Reporter: Ayush Saxena
>Priority: Major
>
> Improve wiki documentation for new features/improvements coming in 4.0 release
>  
> https://cwiki.apache.org/confluence/display/Hive/Apache+Hive+4.0.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27931) Update documentation with new features/improvements

2023-12-04 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792990#comment-17792990
 ] 

Ayush Saxena commented on HIVE-27931:
-

Have created a space on wiki for 4.0.0 specific stuff. Please add a page or 
link the relevant existing ones or copy an improvised version of older versions 
into the same space.

[https://cwiki.apache.org/confluence/display/Hive/Apache+Hive+4.0.0]

 

All committers have access, they login using their Apache creds, for non 
committer just let me or anyone on the PMC know via this ticket or the dev 
mailing list, if you want to volunteer helping with the documentation and 
anyone amongst us would get you write access to the wiki

> Update documentation with new features/improvements
> ---
>
> Key: HIVE-27931
> URL: https://issues.apache.org/jira/browse/HIVE-27931
> Project: Hive
>  Issue Type: Task
>Reporter: Ayush Saxena
>Priority: Major
>
> Improve wiki documentation for new features/improvements coming in 4.0 release



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27801) Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan

2023-12-04 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27801:
--
Labels: Regression hive-4.0.0-must  (was: hive-4.0.0-must regression)

> Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan
> --
>
> Key: HIVE-27801
> URL: https://issues.apache.org/jira/browse/HIVE-27801
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Denys Kuzmenko
>Priority: Critical
>  Labels: Regression, hive-4.0.0-must
> Attachments: Screenshot 2023-10-10 at 20.14.03.png
>
>
> reproduce (no rows should be returned):
> {code}
> set hive.explain.user=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.stats.autogather=false;
> set hive.exec.dynamic.partition.mode=nonstrict;
> drop table if exists store_sales;
> create table store_sales (ss_promo_sk int, ss_sales_price int, ss_list_price 
> int) stored as orc tblproperties('transactional'='true');
> insert into store_sales values (1, 20, 15), (1, 15, 20), (1, 10, 15);
> explain cbo 
> select * from store_sales A where exists ( 
> select 1 from store_sales B 
> where a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price 
> and A.ss_sales_price 
> select * from store_sales A where exists( 
> select 1 from store_sales B 
> where A.ss_promo_sk=B.ss_promo_sk and A.ss_sales_price>B.ss_list_price 
> and A.ss_sales_price 
> explain cbo
> select * from store_sales A 
> LEFT SEMI JOIN store_sales B 
> ON a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
> A.ss_sales_price select * from store_sales A 
> LEFT SEMI JOIN store_sales B 
> ON a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
> A.ss_sales_price {code}
> plan diff:
> [^Screenshot 2023-10-10 at 20.14.03.png]
> expectation: query should return 0 rows 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27890) Tez Progress bar is not displayed in Beeline upon setting session level execution engine to Tez

2023-12-04 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha resolved HIVE-27890.
-
Fix Version/s: 4.1.0
   Resolution: Fixed

> Tez Progress bar is not displayed in Beeline upon setting session level 
> execution engine to Tez
> ---
>
> Key: HIVE-27890
> URL: https://issues.apache.org/jira/browse/HIVE-27890
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Shivangi Jha
>Assignee: Shivangi Jha
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> When queries are executed through Beeline and the server-level execution 
> engine is configured to MapReduce (MR), while the session-level execution 
> engine is set to Tez, it has been observed that the Tez Progress bar is not 
> rendered in the output.
>  # When default engine was set to Tez in Hive conf.
>  ## With no session level changes in execution engine, progress bar is seen.
> Default Engine=Tez, session level=Tez
>  ## When session level execution engine is set to MR, progress bar is not 
> seen.
> Default Engine=Tez, session level=MR
>  # When default engine was set to MR in Hive conf.
>  ## When session level execution engine is set to Tez, progress bar is NOT 
> seen.
> Default Engine=MR, session level=TEZ. 
>  ## With no session level changes in execution engine.  progress bar is not 
> seen.
> Default Engine=MR, session level=MR
>  
> Steps to Reproduce:
>  # Set default execution engine to MR.
>  # Start Beeline session for query execution.
>  # Run {{set hive.execution.engine=tez;}}
>  # Upon running a query, the Tez Progress bar is not displayed in the console.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27801) Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan

2023-12-04 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27801:
--
Labels: hive-4.0.0-must regression  (was: hive-4.0.0-must)

> Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan
> --
>
> Key: HIVE-27801
> URL: https://issues.apache.org/jira/browse/HIVE-27801
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Denys Kuzmenko
>Priority: Critical
>  Labels: hive-4.0.0-must, regression
> Attachments: Screenshot 2023-10-10 at 20.14.03.png
>
>
> reproduce (no rows should be returned):
> {code}
> set hive.explain.user=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.stats.autogather=false;
> set hive.exec.dynamic.partition.mode=nonstrict;
> drop table if exists store_sales;
> create table store_sales (ss_promo_sk int, ss_sales_price int, ss_list_price 
> int) stored as orc tblproperties('transactional'='true');
> insert into store_sales values (1, 20, 15), (1, 15, 20), (1, 10, 15);
> explain cbo 
> select * from store_sales A where exists ( 
> select 1 from store_sales B 
> where a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price 
> and A.ss_sales_price 
> select * from store_sales A where exists( 
> select 1 from store_sales B 
> where A.ss_promo_sk=B.ss_promo_sk and A.ss_sales_price>B.ss_list_price 
> and A.ss_sales_price 
> explain cbo
> select * from store_sales A 
> LEFT SEMI JOIN store_sales B 
> ON a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
> A.ss_sales_price select * from store_sales A 
> LEFT SEMI JOIN store_sales B 
> ON a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
> A.ss_sales_price {code}
> plan diff:
> [^Screenshot 2023-10-10 at 20.14.03.png]
> expectation: query should return 0 rows 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27801) Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan

2023-12-04 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27801:
--
Description: 
reproduce (no rows should be returned):
{code}
set hive.explain.user=false;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.stats.autogather=false;
set hive.exec.dynamic.partition.mode=nonstrict;

drop table if exists store_sales;

create table store_sales (ss_promo_sk int, ss_sales_price int, ss_list_price 
int) stored as orc tblproperties('transactional'='true');
insert into store_sales values (1, 20, 15), (1, 15, 20), (1, 10, 15);

explain cbo 
select * from store_sales A where exists ( 
select 1 from store_sales B 
where a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_priceB.ss_list_price and 
A.ss_sales_price Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan
> --
>
> Key: HIVE-27801
> URL: https://issues.apache.org/jira/browse/HIVE-27801
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Denys Kuzmenko
>Priority: Critical
>  Labels: hive-4.0.0-must
> Attachments: Screenshot 2023-10-10 at 20.14.03.png
>
>
> reproduce (no rows should be returned):
> {code}
> set hive.explain.user=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.stats.autogather=false;
> set hive.exec.dynamic.partition.mode=nonstrict;
> drop table if exists store_sales;
> create table store_sales (ss_promo_sk int, ss_sales_price int, ss_list_price 
> int) stored as orc tblproperties('transactional'='true');
> insert into store_sales values (1, 20, 15), (1, 15, 20), (1, 10, 15);
> explain cbo 
> select * from store_sales A where exists ( 
> select 1 from store_sales B 
> where a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price 
> and A.ss_sales_price 
> select * from store_sales A where exists( 
> select 1 from store_sales B 
> where A.ss_promo_sk=B.ss_promo_sk and A.ss_sales_price>B.ss_list_price 
> and A.ss_sales_price 
> explain cbo
> select * from store_sales A 
> LEFT SEMI JOIN store_sales B 
> ON a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
> A.ss_sales_price select * from store_sales A 
> LEFT SEMI JOIN store_sales B 
> ON a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
> A.ss_sales_price {code}
> plan diff:
> [^Screenshot 2023-10-10 at 20.14.03.png]
> expectation: query should return 0 rows 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27801) Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan

2023-12-04 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792989#comment-17792989
 ] 

Denys Kuzmenko commented on HIVE-27801:
---

it's a regression, not reproducible in 3.1.3

> Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan
> --
>
> Key: HIVE-27801
> URL: https://issues.apache.org/jira/browse/HIVE-27801
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Denys Kuzmenko
>Priority: Critical
>  Labels: hive-4.0.0-must
> Attachments: Screenshot 2023-10-10 at 20.14.03.png
>
>
> reproduce (no rows should be returned):
> {code}
> set hive.explain.user=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.stats.autogather=false;
> set hive.exec.dynamic.partition.mode=nonstrict;
> drop table if exists store_sales;
> create table store_sales (ss_promo_sk int, ss_sales_price int, ss_list_price 
> int) stored as orc tblproperties('transactional'='true');
> insert into store_sales values (1, 20, 15), (1, 15, 20), (1, 10, 15);
> explain cbo 
> select * from store_sales A where exists ( 
> select 1 from store_sales B 
> where a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price 
> and A.ss_sales_price 
> select * from store_sales A where exists( 
> select 1 from store_sales B 
> where A.ss_promo_sk=B.ss_promo_sk and A.ss_sales_price>B.ss_list_price 
> and A.ss_sales_price 
> explain cbo
> select * from store_sales A 
> LEFT SEMI JOIN store_sales B 
> ON a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
> A.ss_sales_price select * from store_sales A 
> LEFT SEMI JOIN store_sales B 
> ON a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
> A.ss_sales_price {code}
> plan diff:
> [^Screenshot 2023-10-10 at 20.14.03.png]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-27894) Enhance HMS Handler Logs for all 'get_partition' functions.

2023-12-04 Thread Shivangi Jha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-27894 started by Shivangi Jha.
---
> Enhance HMS Handler Logs for all 'get_partition' functions.
> ---
>
> Key: HIVE-27894
> URL: https://issues.apache.org/jira/browse/HIVE-27894
> Project: Hive
>  Issue Type: Improvement
>Reporter: Shivangi Jha
>Assignee: Shivangi Jha
>Priority: Major
>  Labels: pull-request-available
>
> The HMSHandler 
> (standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java)
>  class encompasses various functions pertaining to partition information, yet 
> its current implementation lacks comprehensive logging of substantial 
> partition data. Enhancing this aspect would significantly contribute to 
> improved log readability and facilitate more effective debugging processes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27934) Fix doc README.md

2023-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27934:
--
Labels: pull-request-available  (was: )

> Fix doc README.md
> -
>
> Key: HIVE-27934
> URL: https://issues.apache.org/jira/browse/HIVE-27934
> Project: Hive
>  Issue Type: Improvement
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27890) Tez Progress bar is not displayed in Beeline upon setting session level execution engine to Tez

2023-12-04 Thread Pravin Sinha (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792911#comment-17792911
 ] 

Pravin Sinha commented on HIVE-27890:
-

Missed to add Jira id in the commit message. [~shivijha30], just a suggestion, 
please add Jira id in the commit message in the future commit.

> Tez Progress bar is not displayed in Beeline upon setting session level 
> execution engine to Tez
> ---
>
> Key: HIVE-27890
> URL: https://issues.apache.org/jira/browse/HIVE-27890
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Shivangi Jha
>Assignee: Shivangi Jha
>Priority: Major
>  Labels: pull-request-available
>
> When queries are executed through Beeline and the server-level execution 
> engine is configured to MapReduce (MR), while the session-level execution 
> engine is set to Tez, it has been observed that the Tez Progress bar is not 
> rendered in the output.
>  # When default engine was set to Tez in Hive conf.
>  ## With no session level changes in execution engine, progress bar is seen.
> Default Engine=Tez, session level=Tez
>  ## When session level execution engine is set to MR, progress bar is not 
> seen.
> Default Engine=Tez, session level=MR
>  # When default engine was set to MR in Hive conf.
>  ## When session level execution engine is set to Tez, progress bar is NOT 
> seen.
> Default Engine=MR, session level=TEZ. 
>  ## With no session level changes in execution engine.  progress bar is not 
> seen.
> Default Engine=MR, session level=MR
>  
> Steps to Reproduce:
>  # Set default execution engine to MR.
>  # Start Beeline session for query execution.
>  # Run {{set hive.execution.engine=tez;}}
>  # Upon running a query, the Tez Progress bar is not displayed in the console.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27934) Fix doc README.md

2023-12-04 Thread Butao Zhang (Jira)
Butao Zhang created HIVE-27934:
--

 Summary: Fix doc README.md
 Key: HIVE-27934
 URL: https://issues.apache.org/jira/browse/HIVE-27934
 Project: Hive
  Issue Type: Improvement
Reporter: Butao Zhang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27934) Fix doc README.md

2023-12-04 Thread Butao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Butao Zhang reassigned HIVE-27934:
--

Assignee: Butao Zhang

> Fix doc README.md
> -
>
> Key: HIVE-27934
> URL: https://issues.apache.org/jira/browse/HIVE-27934
> Project: Hive
>  Issue Type: Improvement
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27933) Update documentation about supported execution engine(s)

2023-12-04 Thread Jira
László Bodor created HIVE-27933:
---

 Summary: Update documentation about supported execution engine(s)
 Key: HIVE-27933
 URL: https://issues.apache.org/jira/browse/HIVE-27933
 Project: Hive
  Issue Type: Sub-task
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-27890) Tez Progress bar is not displayed in Beeline upon setting session level execution engine to Tez

2023-12-04 Thread Shivangi Jha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-27890 started by Shivangi Jha.
---
> Tez Progress bar is not displayed in Beeline upon setting session level 
> execution engine to Tez
> ---
>
> Key: HIVE-27890
> URL: https://issues.apache.org/jira/browse/HIVE-27890
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Shivangi Jha
>Assignee: Shivangi Jha
>Priority: Major
>  Labels: pull-request-available
>
> When queries are executed through Beeline and the server-level execution 
> engine is configured to MapReduce (MR), while the session-level execution 
> engine is set to Tez, it has been observed that the Tez Progress bar is not 
> rendered in the output.
>  # When default engine was set to Tez in Hive conf.
>  ## With no session level changes in execution engine, progress bar is seen.
> Default Engine=Tez, session level=Tez
>  ## When session level execution engine is set to MR, progress bar is not 
> seen.
> Default Engine=Tez, session level=MR
>  # When default engine was set to MR in Hive conf.
>  ## When session level execution engine is set to Tez, progress bar is NOT 
> seen.
> Default Engine=MR, session level=TEZ. 
>  ## With no session level changes in execution engine.  progress bar is not 
> seen.
> Default Engine=MR, session level=MR
>  
> Steps to Reproduce:
>  # Set default execution engine to MR.
>  # Start Beeline session for query execution.
>  # Run {{set hive.execution.engine=tez;}}
>  # Upon running a query, the Tez Progress bar is not displayed in the console.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27662) Incorrect parsing of nested complex types containing map during vectorized text processing

2023-12-04 Thread Chinna Rao Lalam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam resolved HIVE-27662.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Incorrect parsing of nested complex types containing map during vectorized 
> text processing
> --
>
> Key: HIVE-27662
> URL: https://issues.apache.org/jira/browse/HIVE-27662
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Raghav Aggarwal
>Assignee: Raghav Aggarwal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When reading a text table with vectorization on and 
> hive.fetch.task.conversion as none, wrong parsing of delimiter is happening 
> in nested complex types containing map. For example, if a columns schema is 
> like: map then \u0004 char is coming in 
> the output. Here is a example:
>  
> Sample q file:
>  
> {code:java}
> set hive.fetch.task.conversion=none;
> set hive.vectorized.execution.enabled=true;
> create EXTERNAL table `table4` as
> select
>   'bob' as name,
>   map(
>       "Map_Key1",
>         named_struct(
>             'Id',
>             'Id_Value1',
>             'Name',
>             'Name_Value1'
>         ),
>       "Map_Key2",
>         named_struct(
>             'Id',
>             'Id_Value2',
>             'Name',
>             'Name_Value2'
>         )
>   ) as testmarks;
> select * from table4;
> set hive.vectorized.execution.enabled=false;
> select * from table4;
> {code}
> Output of 1st select statement:
> {code:java}
> bob·    
> {"Map_Key1":{"id":"Id_Value1\u0004Name_Value1","name":null},"Map_Key2":{"id":"Id_Value2\u0004Name_Value2","name":null}}{code}
> Output of 2nd select statement:
> {code:java}
> bob·    
> {"Map_Key1":{"id":"Id_Value1","name":"Name_Value1"},"Map_Key2":{"id":"Id_Value2","name":"Name_Value2"}}{code}
>  
> MAP Complex type is not handling the scenario where it contains a nested 
> complex type like STRUCT, ARRAY, UNION.
>  
> *To reproduce this issue:*
> *mvn test -Dtest=TestCliDriver -Pitests -Dqfile=`qfile_name`-pl itests/qtest 
> -Dtest.output.overwrite*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27226) FullOuterJoin with filter expressions is not computed correctly

2023-12-04 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792904#comment-17792904
 ] 

Denys Kuzmenko commented on HIVE-27226:
---

[~seonggon], how hard would be to disable optimization from HIVE-18908 when we 
have filter expressions?

> FullOuterJoin with filter expressions is not computed correctly
> ---
>
> Key: HIVE-27226
> URL: https://issues.apache.org/jira/browse/HIVE-27226
> Project: Hive
>  Issue Type: Bug
>Reporter: Seonggon Namgung
>Priority: Major
>  Labels: hive-4.0.0-must
>
> I tested many OuterJoin queries as an extension of HIVE-27138, and I found 
> that Hive returns incorrect result for a query containing FullOuterJoin with 
> filter expressions. In a nutshell, all JoinOperators that run on Tez engine 
> return incorrect result for OuterJoin queries, and one of the reason for 
> incorrect computation comes from CommonJoinOperator, which is the base of all 
> JoinOperators. I attached the queries and configuration that I used at the 
> bottom of the document. I am still inspecting this problems, and I will share 
> an update once when I find out another reason. Also any comments and opinions 
> would be appreciated.
> First of all, I observed that current Hive ignores filter expressions 
> contained in MapJoinOperator. For example, the attached result of query1 
> shows that MapJoinOperator performs inner join, not full outer join. This 
> problem stems from removal of filterMap. When converting JoinOperator to 
> MapJoinOperator, ConvertJoinMapJoin#convertJoinDynamicPartitionedHashJoin() 
> removes filterMap of MapJoinOperator. Because MapJoinOperator does not 
> evaluate filter expressions if filterMap is null, this change makes 
> MapJoinOperator ignore filter expressions and it always joins tables 
> regardless whether they satisfy filter expressions or not. To solve this 
> problem, I disable FullOuterMapJoinOptimization and apply path for 
> HIVE-27138, which prevents NPE. (The patch is available at the following 
> link: LINK.) The rest of this document uses this modified Hive, but most of 
> problems happen to current Hive, too.
> The second problem I found is that Hive returns the same left-null or 
> right-null rows multiple time when it uses MapJoinOperator or 
> CommonMergeJoinOperator. This is caused by the logic of current 
> CommonJoinOperator. Both of the two JoinOperators joins tables in 2 steps. 
> First, they create RowContainers, each of which is a group of rows from one 
> table and has the same key. Second, they call 
> CommonJoinOperator#checkAndGenObject() with created RowContainers. This 
> method checks filterTag of each row in RowContainers and forwards joined row 
> if they meet all filter conditions. For OuterJoin, checkAndGenObject() 
> forwards non-matching rows if there is no matching row in RowContainer. The 
> problem happens when there are multiple RowContainer for the same key and 
> table. For example, suppose that there are two left RowContainers and one 
> right RowContainer. If none of the row in two left RowContainers satisfies 
> filter condition, then checkAndGenObject() will forward Left-Null row for 
> each right row. Because checkAndGenObject() is called with each left 
> RowContainer, there will be two duplicated Left-Null rows for every right row.
> In the case of MapJoinOperator, it always creates singleton RowContainer for 
> big table. Therefore, it always produces duplicated non-matching rows. 
> CommonMergeJoinOperator also creates multiple RowContainer for big table, 
> whose size is hive.join.emit.interval. In the below experiment, I also set 
> hive.join.shortcut.unmatched.rows=false, and hive.exec.reducers.max=1 to 
> disable specialized algorithm for OuterJoin of 2 tables and force calling 
> checkAndGenObject() before all rows with the same keys are gathered. I didn't 
> observe this problem when using VectorMapJoinOperator, and I will inspect 
> VectorMapJoinOperator whether we can reproduce the problem with it.
> I think the second problem is not limited to FullOuterJoin, but I couldn't 
> find such query as of now. This will also be added to this issue if I can 
> write a query that reproduces the second problem without FullOuterJoin.
> I also found that Hive returns wrong result for query2 even when I used 
> VectorMapJoinOperator. I am still inspecting this problem and I will add an 
> update on it when I find out the reason.
>  
> Experiment:
>  
> {code:java}
>  Configuration
> set hive.optimize.shared.work=false;
> -- Std MapJoin
> set hive.auto.convert.join=true;
> set hive.vectorized.execution.enabled=false;
> -- Vec MapJoin
> set hive.auto.convert.join=true;
> set hive.vectorized.execution.enabled=true;
> -- MergeJoin
> set hive.auto.convert.join=

[jira] [Updated] (HIVE-27890) Tez Progress bar is not displayed in Beeline upon setting session level execution engine to Tez

2023-12-04 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-27890:

Summary: Tez Progress bar is not displayed in Beeline upon setting session 
level execution engine to Tez  (was: Tez Progress Bar does not appear while 
setting session execution engine to Tez in Beeline)

> Tez Progress bar is not displayed in Beeline upon setting session level 
> execution engine to Tez
> ---
>
> Key: HIVE-27890
> URL: https://issues.apache.org/jira/browse/HIVE-27890
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Shivangi Jha
>Assignee: Shivangi Jha
>Priority: Major
>  Labels: pull-request-available
>
> When queries are executed through Beeline and the server-level execution 
> engine is configured to MapReduce (MR), while the session-level execution 
> engine is set to Tez, it has been observed that the Tez Progress bar is not 
> rendered in the output.
>  # When default engine was set to Tez in Hive conf.
>  ## With no session level changes in execution engine, progress bar is seen.
> Default Engine=Tez, session level=Tez
>  ## When session level execution engine is set to MR, progress bar is not 
> seen.
> Default Engine=Tez, session level=MR
>  # When default engine was set to MR in Hive conf.
>  ## When session level execution engine is set to Tez, progress bar is NOT 
> seen.
> Default Engine=MR, session level=TEZ. 
>  ## With no session level changes in execution engine.  progress bar is not 
> seen.
> Default Engine=MR, session level=MR
>  
> Steps to Reproduce:
>  # Set default execution engine to MR.
>  # Start Beeline session for query execution.
>  # Run {{set hive.execution.engine=tez;}}
>  # Upon running a query, the Tez Progress bar is not displayed in the console.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27932) Update documentation for Hive-Iceberg integration

2023-12-04 Thread Ayush Saxena (Jira)
Ayush Saxena created HIVE-27932:
---

 Summary: Update documentation for Hive-Iceberg integration
 Key: HIVE-27932
 URL: https://issues.apache.org/jira/browse/HIVE-27932
 Project: Hive
  Issue Type: Sub-task
Reporter: Ayush Saxena
Assignee: Ayush Saxena


Add wiki around hive-iceberg integration



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27856) Change the default value of hive.optimize.cte.materialize.threshold to -1

2023-12-04 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27856:
--
Affects Version/s: 4.0.0

> Change the default value of hive.optimize.cte.materialize.threshold to -1
> -
>
> Key: HIVE-27856
> URL: https://issues.apache.org/jira/browse/HIVE-27856
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: pull-request-available
>
> Change the default value of 'hive.optimize.cte.materialize.threshold' to -1 
> in order to prevent NPE when compiling TPC-DS query14.
> See HIVE-24167 for more details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27856) Change the default value of hive.optimize.cte.materialize.threshold to -1

2023-12-04 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27856:
--
Status: Patch Available  (was: Open)

> Change the default value of hive.optimize.cte.materialize.threshold to -1
> -
>
> Key: HIVE-27856
> URL: https://issues.apache.org/jira/browse/HIVE-27856
> Project: Hive
>  Issue Type: Improvement
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: pull-request-available
>
> Change the default value of 'hive.optimize.cte.materialize.threshold' to -1 
> in order to prevent NPE when compiling TPC-DS query14.
> See HIVE-24167 for more details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27931) Update documentation with new features/improvements

2023-12-04 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-27931:

Parent: (was: HIVE-27921)
Issue Type: Task  (was: Sub-task)

> Update documentation with new features/improvements
> ---
>
> Key: HIVE-27931
> URL: https://issues.apache.org/jira/browse/HIVE-27931
> Project: Hive
>  Issue Type: Task
>Reporter: Ayush Saxena
>Priority: Major
>
> Improve wiki documentation for new features/improvements coming in 4.0 release



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27662) Incorrect parsing of nested complex types containing map during vectorized text processing

2023-12-04 Thread Chinna Rao Lalam (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792899#comment-17792899
 ] 

Chinna Rao Lalam commented on HIVE-27662:
-

Merged to master !! Thanks for the patch [~Aggarwal_Raghav] 

> Incorrect parsing of nested complex types containing map during vectorized 
> text processing
> --
>
> Key: HIVE-27662
> URL: https://issues.apache.org/jira/browse/HIVE-27662
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Raghav Aggarwal
>Assignee: Raghav Aggarwal
>Priority: Major
>  Labels: pull-request-available
>
> When reading a text table with vectorization on and 
> hive.fetch.task.conversion as none, wrong parsing of delimiter is happening 
> in nested complex types containing map. For example, if a columns schema is 
> like: map then \u0004 char is coming in 
> the output. Here is a example:
>  
> Sample q file:
>  
> {code:java}
> set hive.fetch.task.conversion=none;
> set hive.vectorized.execution.enabled=true;
> create EXTERNAL table `table4` as
> select
>   'bob' as name,
>   map(
>       "Map_Key1",
>         named_struct(
>             'Id',
>             'Id_Value1',
>             'Name',
>             'Name_Value1'
>         ),
>       "Map_Key2",
>         named_struct(
>             'Id',
>             'Id_Value2',
>             'Name',
>             'Name_Value2'
>         )
>   ) as testmarks;
> select * from table4;
> set hive.vectorized.execution.enabled=false;
> select * from table4;
> {code}
> Output of 1st select statement:
> {code:java}
> bob·    
> {"Map_Key1":{"id":"Id_Value1\u0004Name_Value1","name":null},"Map_Key2":{"id":"Id_Value2\u0004Name_Value2","name":null}}{code}
> Output of 2nd select statement:
> {code:java}
> bob·    
> {"Map_Key1":{"id":"Id_Value1","name":"Name_Value1"},"Map_Key2":{"id":"Id_Value2","name":"Name_Value2"}}{code}
>  
> MAP Complex type is not handling the scenario where it contains a nested 
> complex type like STRUCT, ARRAY, UNION.
>  
> *To reproduce this issue:*
> *mvn test -Dtest=TestCliDriver -Pitests -Dqfile=`qfile_name`-pl itests/qtest 
> -Dtest.output.overwrite*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27931) Update documentation with new features/improvements

2023-12-04 Thread Ayush Saxena (Jira)
Ayush Saxena created HIVE-27931:
---

 Summary: Update documentation with new features/improvements
 Key: HIVE-27931
 URL: https://issues.apache.org/jira/browse/HIVE-27931
 Project: Hive
  Issue Type: Sub-task
Reporter: Ayush Saxena


Improve wiki documentation for new features/improvements coming in 4.0 release



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27890) Tez Progress Bar does not appear while setting session execution engine to Tez in Beeline

2023-12-04 Thread Pravin Sinha (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792889#comment-17792889
 ] 

Pravin Sinha commented on HIVE-27890:
-

Merged to master !! Thanks for the patch [~shivijha30] !!

> Tez Progress Bar does not appear while setting session execution engine to 
> Tez in Beeline
> -
>
> Key: HIVE-27890
> URL: https://issues.apache.org/jira/browse/HIVE-27890
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Shivangi Jha
>Assignee: Shivangi Jha
>Priority: Major
>  Labels: pull-request-available
>
> When queries are executed through Beeline and the server-level execution 
> engine is configured to MapReduce (MR), while the session-level execution 
> engine is set to Tez, it has been observed that the Tez Progress bar is not 
> rendered in the output.
>  # When default engine was set to Tez in Hive conf.
>  ## With no session level changes in execution engine, progress bar is seen.
> Default Engine=Tez, session level=Tez
>  ## When session level execution engine is set to MR, progress bar is not 
> seen.
> Default Engine=Tez, session level=MR
>  # When default engine was set to MR in Hive conf.
>  ## When session level execution engine is set to Tez, progress bar is NOT 
> seen.
> Default Engine=MR, session level=TEZ. 
>  ## With no session level changes in execution engine.  progress bar is not 
> seen.
> Default Engine=MR, session level=MR
>  
> Steps to Reproduce:
>  # Set default execution engine to MR.
>  # Start Beeline session for query execution.
>  # Run {{set hive.execution.engine=tez;}}
>  # Upon running a query, the Tez Progress bar is not displayed in the console.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27801) Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan

2023-12-04 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27801:
--
Affects Version/s: 4.0.0
   (was: 4.0.0-beta-1)

> Exists subquery rewrite into LEFT SEMI JOIN produce incorrect plan
> --
>
> Key: HIVE-27801
> URL: https://issues.apache.org/jira/browse/HIVE-27801
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Denys Kuzmenko
>Priority: Critical
>  Labels: hive-4.0.0-must
> Attachments: Screenshot 2023-10-10 at 20.14.03.png
>
>
> reproduce (no rows should be returned):
> {code}
> set hive.explain.user=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.stats.autogather=false;
> set hive.exec.dynamic.partition.mode=nonstrict;
> drop table if exists store_sales;
> create table store_sales (ss_promo_sk int, ss_sales_price int, ss_list_price 
> int) stored as orc tblproperties('transactional'='true');
> insert into store_sales values (1, 20, 15), (1, 15, 20), (1, 10, 15);
> explain cbo 
> select * from store_sales A where exists ( 
> select 1 from store_sales B 
> where a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price 
> and A.ss_sales_price 
> select * from store_sales A where exists( 
> select 1 from store_sales B 
> where A.ss_promo_sk=B.ss_promo_sk and A.ss_sales_price>B.ss_list_price 
> and A.ss_sales_price 
> explain cbo
> select * from store_sales A 
> LEFT SEMI JOIN store_sales B 
> ON a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
> A.ss_sales_price select * from store_sales A 
> LEFT SEMI JOIN store_sales B 
> ON a.ss_promo_sk=b.ss_promo_sk and A.ss_sales_price>B.ss_list_price and 
> A.ss_sales_price {code}
> plan diff:
> [^Screenshot 2023-10-10 at 20.14.03.png]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26505) Case When Some result data is lost when there are common column conditions and partitioned column conditions

2023-12-04 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26505:
--
Labels: check  (was: check hive-4.0.0-must)

> Case When Some result data is lost when there are common column conditions 
> and partitioned column conditions 
> -
>
> Key: HIVE-26505
> URL: https://issues.apache.org/jira/browse/HIVE-26505
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0, 4.0.0-alpha-1
>Reporter: GuangMing Lu
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: check
>
> {code:java}https://issues.apache.org/jira/browse/HIVE-26505#
> create table test0831 (id string) partitioned by (cp string);
> insert into test0831 values ('a', '2022-08-23'),('c', '2022-08-23'),('d', 
> '2022-08-23');
> insert into test0831 values ('a', '2022-08-24'),('b', '2022-08-24');
> select * from test0831;
> +-+--+
> | test0831.id | test0831.cp  |
> +-+--+
> | a     | 2022-08-23   |
> | b        | 2022-08-23   |
> | a        | 2022-08-23   |
> | c        | 2022-08-24   |
> | d        | 2022-08-24   |
> +-+--+
> select * from test0831 where (case when id='a' and cp='2022-08-23' then 1 
> else 0 end)=0;  
> +--+--+
> | test0830.id  | test0830.cp  |
> +--+--+
> | a        | 2022-08-24   |
> | b        | 2022-08-24   |
> +--+--+
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26505) Case When Some result data is lost when there are common column conditions and partitioned column conditions

2023-12-04 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26505:
--
Affects Version/s: 4.0.0
   (was: 4.0.0-alpha-1)

> Case When Some result data is lost when there are common column conditions 
> and partitioned column conditions 
> -
>
> Key: HIVE-26505
> URL: https://issues.apache.org/jira/browse/HIVE-26505
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0, 4.0.0
>Reporter: GuangMing Lu
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: check
>
> {code:java}https://issues.apache.org/jira/browse/HIVE-26505#
> create table test0831 (id string) partitioned by (cp string);
> insert into test0831 values ('a', '2022-08-23'),('c', '2022-08-23'),('d', 
> '2022-08-23');
> insert into test0831 values ('a', '2022-08-24'),('b', '2022-08-24');
> select * from test0831;
> +-+--+
> | test0831.id | test0831.cp  |
> +-+--+
> | a     | 2022-08-23   |
> | b        | 2022-08-23   |
> | a        | 2022-08-23   |
> | c        | 2022-08-24   |
> | d        | 2022-08-24   |
> +-+--+
> select * from test0831 where (case when id='a' and cp='2022-08-23' then 1 
> else 0 end)=0;  
> +--+--+
> | test0830.id  | test0830.cp  |
> +--+--+
> | a        | 2022-08-24   |
> | b        | 2022-08-24   |
> +--+--+
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27905) Some GenericUDFs wrongly cast ObjectInspectors

2023-12-04 Thread okumin (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

okumin updated HIVE-27905:
--
Description: 
For example, GenericUDFSplit throws ClassCastException when a non-primitive 
type is given.
{code:java}
0: jdbc:hive2://hive-hiveserver2:1/defaul> select split(array('a,b,c'), 
',');
Error: Error while compiling statement: FAILED: ClassCastException 
org.apache.hadoop.hive.serde2.objectinspector.StandardConstantListObjectInspector
 cannot be cast to 
org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector 
(state=42000,code=4) {code}
 

  was:
GenericUDFSplit throws ClassCastException when a non-primitive type is given.
{code:java}
0: jdbc:hive2://hive-hiveserver2:1/defaul> select split(array('a,b,c'), 
',');
Error: Error while compiling statement: FAILED: ClassCastException 
org.apache.hadoop.hive.serde2.objectinspector.StandardConstantListObjectInspector
 cannot be cast to 
org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector 
(state=42000,code=4) {code}

Summary: Some GenericUDFs wrongly cast ObjectInspectors  (was: SPLIT 
throws ClassCastException)

> Some GenericUDFs wrongly cast ObjectInspectors
> --
>
> Key: HIVE-27905
> URL: https://issues.apache.org/jira/browse/HIVE-27905
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0-beta-1
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
>
> For example, GenericUDFSplit throws ClassCastException when a non-primitive 
> type is given.
> {code:java}
> 0: jdbc:hive2://hive-hiveserver2:1/defaul> select split(array('a,b,c'), 
> ',');
> Error: Error while compiling statement: FAILED: ClassCastException 
> org.apache.hadoop.hive.serde2.objectinspector.StandardConstantListObjectInspector
>  cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector 
> (state=42000,code=4) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27929) Run TPC-DS queries and validate results correctness

2023-12-04 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27929:
--
Description: 
release branch: *branch-4.0*
https://github.com/apache/hive/tree/branch-4.0

  was:test branch: *branch-4.0*


> Run TPC-DS queries and validate results correctness
> ---
>
> Key: HIVE-27929
> URL: https://issues.apache.org/jira/browse/HIVE-27929
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Dmitriy Fingerman
>Priority: Major
>
> release branch: *branch-4.0*
> https://github.com/apache/hive/tree/branch-4.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27930) Insert overwrite table partition does not clean up directory before overwriting

2023-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27930:
--
Labels: pull-request-available  (was: )

> Insert overwrite table partition does not clean up directory before 
> overwriting
> ---
>
> Key: HIVE-27930
> URL: https://issues.apache.org/jira/browse/HIVE-27930
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0-beta-1
>Reporter: Kiran Velumuri
>Assignee: Kiran Velumuri
>Priority: Major
>  Labels: pull-request-available
>
> When insert overwrite table statement contains static partitions, the table 
> directory is not cleaned for partitions not in metastore but exist in the 
> filesystem. This happens when partition is dropped from the table but 
> partition files exist in the filesystem or partition files are manually added 
> in the filesystem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)