date:20210928

[jira] [Work logged] (HIVE-25550) Increase the RM_PROGRESS column max length to fit metrics stat

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25550?focusedWorklogId=657013=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-657013
 ]

ASF GitHub Bot logged work on HIVE-25550:
-

Author: ASF GitHub Bot
Created on: 29/Sep/21 05:52
Start Date: 29/Sep/21 05:52
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2668:
URL: https://github.com/apache/hive/pull/2668#discussion_r718173826



##
File path: standalone-metastore/metastore-server/src/main/resources/package.jdo
##
@@ -1556,7 +1556,7 @@
 
   
   
-
+

Review comment:
   will this work for oracle?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 657013)
Time Spent: 0.5h  (was: 20m)

> Increase the RM_PROGRESS column max length to fit metrics stat
> --
>
> Key: HIVE-25550
> URL: https://issues.apache.org/jira/browse/HIVE-25550
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Presently it fails with the following trace:
> {noformat}
> [[Event Name: EVENT_ALLOC_WRITE_ID; Total Number: 213; Total Time: 85347.0; 
> Mean: 400.6901408450704; Median: 392.0; Standard Deviation: 
> 33.99178239314741; Variance: 1155.4412702630862; Kurtosis: 83.69411620601193; 
> Skewness: 83.69411620601193; 25th Percentile: 384.0; 50th Percentile: 392.0; 
> 75th Percentile: 408.0; 90th Percentile: 417.0; Top 5 EventIds(EventId=Time) 
> {1498476=791, 1498872=533, 1497805=508, 1498808=500, 1499027=492};]]}"}]}" in 
> column ""RM_PROGRESS"" that has maximum length of 4000. Please correct your 
> data!
> at 
> org.datanucleus.store.rdbms.mapping.datastore.CharRDBMSMapping.setString(CharRDBMSMapping.java:254)
>  ~[datanucleus-rdbms-4.1.19.jar:?]
> at 
> org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.setString(SingleFieldMapping.java:180)
>  ~{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25571) Fix Metastore script for Oracle Database

2021-09-28 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-25571:

Labels: pull-request-available  (was: )

> Fix Metastore script for Oracle Database
> 
>
> Key: HIVE-25571
> URL: https://issues.apache.org/jira/browse/HIVE-25571
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>
> Error:1
> {noformat}
> 354/359      CREATE UNIQUE INDEX DBPRIVILEGEINDEX ON DC_PRIVS 
> (AUTHORIZER,NAME,PRINCIPAL_NAME,PRINCIPAL_TYPE,DC_PRIV,GRANTOR,GRANTOR_TYPE);
> Error: ORA-00955: name is already used by an existing object 
> (state=42000,code=955)
> Aborting command set because "force" is false and command failed: "CREATE 
> UNIQUE INDEX DBPRIVILEGEINDEX ON DC_PRIVS 
> (AUTHORIZER,NAME,PRINCIPAL_NAME,PRINCIPAL_TYPE,DC_PRIV,GRANTOR,GRANTOR_TYPE);"
> [ERROR] 2021-09-29 09:18:59.075 [main] MetastoreSchemaTool - Schema 
> initialization FAILED! Metastore state would be inconsistent!
> Schema initialization FAILED! Metastore state would be inconsistent!{noformat}
> Error:2
> {noformat}
> Error: ORA-00900: invalid SQL statement (state=42000,code=900)
> Aborting command set because "force" is false and command failed: "===
> -- HIVE-24396
> -- Create DataCo{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25571) Fix Metastore script for Oracle Database

2021-09-28 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HIVE-25571:
---


> Fix Metastore script for Oracle Database
> 
>
> Key: HIVE-25571
> URL: https://issues.apache.org/jira/browse/HIVE-25571
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>
> Error:1
> {noformat}
> 354/359      CREATE UNIQUE INDEX DBPRIVILEGEINDEX ON DC_PRIVS 
> (AUTHORIZER,NAME,PRINCIPAL_NAME,PRINCIPAL_TYPE,DC_PRIV,GRANTOR,GRANTOR_TYPE);
> Error: ORA-00955: name is already used by an existing object 
> (state=42000,code=955)
> Aborting command set because "force" is false and command failed: "CREATE 
> UNIQUE INDEX DBPRIVILEGEINDEX ON DC_PRIVS 
> (AUTHORIZER,NAME,PRINCIPAL_NAME,PRINCIPAL_TYPE,DC_PRIV,GRANTOR,GRANTOR_TYPE);"
> [ERROR] 2021-09-29 09:18:59.075 [main] MetastoreSchemaTool - Schema 
> initialization FAILED! Metastore state would be inconsistent!
> Schema initialization FAILED! Metastore state would be inconsistent!{noformat}
> Error:2
> {noformat}
> Error: ORA-00900: invalid SQL statement (state=42000,code=900)
> Aborting command set because "force" is false and command failed: "===
> -- HIVE-24396
> -- Create DataCo{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25570) Hive should send full URL path for authorization for the command insert overwrite location

2021-09-28 Thread Sai Hemanth Gantasala (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala reassigned HIVE-25570:



> Hive should send full URL path for authorization for the command insert 
> overwrite location
> --
>
> Key: HIVE-25570
> URL: https://issues.apache.org/jira/browse/HIVE-25570
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, HiveServer2
>Affects Versions: 4.0.0
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>
> For authorization, Hive is currently sending the path given as input from the 
> user for the command, for eg
> {code:java}
> insert overwrite directory 
> '/user/warehouse/tablespace/external/something/new/test_new_tb1' select * 
> from test_tb1;
> {code}
> Hive is sending the path as 
> '/user/warehouse/tablespace/external/something/new/test_new_tb1' 
> Instead, Hive should send a fully qualified path for authorization,  for e.g: 
> 'hdfs://hostname:port_name/user/warehouse/tablespace/external/something/new/test_new_tb1'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25546) Enable incremental rebuild of Materialized views with insert only source tables

2021-09-28 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-25546:
--
Status: Patch Available  (was: Open)

> Enable incremental rebuild of Materialized views with insert only source 
> tables
> ---
>
> Key: HIVE-25546
> URL: https://issues.apache.org/jira/browse/HIVE-25546
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> create table t1(a int, b int, c int) stored as parquet TBLPROPERTIES 
> ('transactional'='true', 'transactional_properties'='insert_only');
> create materialized view mat1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as
> select a, b, c from t1 where a > 10;
> {code}
> Currently materialized view *mat1* can not be rebuilt incrementally because 
> it has an insert only source table (t1). Such tables does not have 
> ROW_ID.write_id which is required to identify newly inserted records since 
> the last rebuild.
> HIVE-25406 adds the ability to query write_id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25557) Hive 3.1.2 with Tez is slow to clount data in parquet format

2021-09-28 Thread katty he (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421905#comment-17421905
 ] 

katty he commented on HIVE-25557:
-

count(*) on MR wil faster than Tez, normally, count operation can only read 
parquet metadata, but in this case it read all the data and compute, do i am 
confused and there is plan:

!image-2021-09-29-11-07-04-118.png!

> Hive 3.1.2 with Tez is slow to clount data in parquet format
> 
>
> Key: HIVE-25557
> URL: https://issues.apache.org/jira/browse/HIVE-25557
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.1.2
> Environment: Tez *0.10.1*
>Reporter: katty he
>Priority: Major
> Attachments: image-2021-09-29-11-07-04-118.png
>
>
> recently, i use test a sql like seelct count(*) from table in Hive 3.1.2 with 
> Tez, and the table is in parquet format, normally, when counting, the query 
> engin can read metadata instead of reading the full data, but in my case,  
> Tez can not get count by metadata only, it will read the data, so it's slow, 
> when count 2 billion data, tez wil use 500s , and spend 60s to initialized, 
> ts that a problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25557) Hive 3.1.2 with Tez is slow to clount data in parquet format

2021-09-28 Thread katty he (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

katty he updated HIVE-25557:

Attachment: image-2021-09-29-11-07-04-118.png

> Hive 3.1.2 with Tez is slow to clount data in parquet format
> 
>
> Key: HIVE-25557
> URL: https://issues.apache.org/jira/browse/HIVE-25557
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.1.2
> Environment: Tez *0.10.1*
>Reporter: katty he
>Priority: Major
> Attachments: image-2021-09-29-11-07-04-118.png
>
>
> recently, i use test a sql like seelct count(*) from table in Hive 3.1.2 with 
> Tez, and the table is in parquet format, normally, when counting, the query 
> engin can read metadata instead of reading the full data, but in my case,  
> Tez can not get count by metadata only, it will read the data, so it's slow, 
> when count 2 billion data, tez wil use 500s , and spend 60s to initialized, 
> ts that a problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25541) JsonSerDe: TBLPROPERTY treating nested json as String

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25541?focusedWorklogId=656970=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656970
 ]

ASF GitHub Bot logged work on HIVE-25541:
-

Author: ASF GitHub Bot
Created on: 29/Sep/21 01:38
Start Date: 29/Sep/21 01:38
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #2664:
URL: https://github.com/apache/hive/pull/2664#discussion_r718082739



##
File path: serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonReader.java
##
@@ -393,7 +402,16 @@ private Object visitLeafNode(final JsonNode leafNode,
 case DOUBLE:
   return Double.valueOf(leafNode.asDouble());
 case STRING:
-  return leafNode.asText();
+  if (leafNode.isValueNode()) {
+return leafNode.asText();
+  } else {
+if (isEnabled(Feature.STRINGIFY_COMPLEX_FIELDS)) {
+  return leafNode.toString();
+} else {
+  throw new SerDeException(
+  "Complex field found in JSON does not match table definition: " 
+ typeInfo.getTypeName());

Review comment:
   Sorry for this, I wonder that if the column is defined as varchar or 
char in hive schema, but corresponds to a complex field the in json, should we 
do something for such cases?
   Thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656970)
Time Spent: 1h 50m  (was: 1h 40m)

> JsonSerDe: TBLPROPERTY treating nested json as String
> -
>
> Key: HIVE-25541
> URL: https://issues.apache.org/jira/browse/HIVE-25541
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Native Jsonserde 'org.apache.hive.hcatalog.data.JsonSerDe' currently does not 
> support loading nested json into a string type directly. It requires the 
> declaring the column as complex type (struct, map, array) to unpack nested 
> json data.
> Even though the data field is not a valid JSON String type there is value 
> treating it as plain String instead of throwing an exception as we currently 
> do.
> {code:java}
> create table json_table(data string, messageid string, publish_time bigint, 
> attributes string);
> {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}
> {code}
> This JIRA introduces an extra Table Property allowing to Stringify Complex 
> JSON values instead of forcing the User to define the complete nested 
> structure



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25324) Add option to disable PartitionManagementTask

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25324?focusedWorklogId=656955=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656955
 ]

ASF GitHub Bot logged work on HIVE-25324:
-

Author: ASF GitHub Bot
Created on: 29/Sep/21 00:10
Start Date: 29/Sep/21 00:10
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #2470:
URL: https://github.com/apache/hive/pull/2470


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656955)
Time Spent: 50m  (was: 40m)

> Add option to disable PartitionManagementTask
> -
>
> Key: HIVE-25324
> URL: https://issues.apache.org/jira/browse/HIVE-25324
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When large number of tables (e.g 2000) and databases are present, 
> PartitionManagementTask scans all tables and partitions causing pressure on 
> HMS.
> Currently there is no way to disable PartitionManagementTask as well. Round 
> about option is to provide pattern via 
> "metastore.partition.management.database.pattern / 
> metastore.partition.management.table.pattern".
> It will be good to provide an option to disable it completely.{color:#807d6e}
> {color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25343) Create or replace view should clean the old table properties

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25343?focusedWorklogId=656953=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656953
 ]

ASF GitHub Bot logged work on HIVE-25343:
-

Author: ASF GitHub Bot
Created on: 29/Sep/21 00:10
Start Date: 29/Sep/21 00:10
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #2492:
URL: https://github.com/apache/hive/pull/2492


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656953)
Time Spent: 40m  (was: 0.5h)

> Create or replace view should clean the old table properties
> 
>
> Key: HIVE-25343
> URL: https://issues.apache.org/jira/browse/HIVE-25343
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.2.0
>Reporter: Lantao Jin
>Assignee: Lantao Jin
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Shot 2021-07-19 at 15.36.29.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In many cases, users use Spark and Hive together. When a user creates a view 
> via Spark, the table output columns will store in table properties, such as 
>  !Screen Shot 2021-07-19 at 15.36.29.png|width=80%!
> After that, if the user runs the command "create or replace view" via Hive, 
> to change the schema. The old table properties added by Spark are not cleaned 
> by Hive. Then users read the table via Spark. The schema didn't change. It 
> very confused users.
> How to reproduce:
> {code}
> spark-sql>create table lajin_table (a int, b int) stored as parquet;
> spark-sql>create view lajin_view as select * from lajin_table;
> spark-sql> desc lajin_view;
> a   int NULLNULL
> b   int NULLNULL
> hive>desc lajin_view;
> a   int 
> b   int
> hive>create or replace view lajin_view as select a, b, 3 as c from 
> lajin_table;
> hive>desc lajin_view;
> a   int 
> b   int 
> c   int
> spark-sql> desc lajin_view; -- not changed
> a   int NULLNULL
> b   int NULLNULL
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23756) Added more constraints to the package.jdo file

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23756?focusedWorklogId=656954=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656954
 ]

ASF GitHub Bot logged work on HIVE-23756:
-

Author: ASF GitHub Bot
Created on: 29/Sep/21 00:10
Start Date: 29/Sep/21 00:10
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #2254:
URL: https://github.com/apache/hive/pull/2254


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656954)
Time Spent: 1h 40m  (was: 1.5h)

> Added more constraints to the package.jdo file
> --
>
> Key: HIVE-23756
> URL: https://issues.apache.org/jira/browse/HIVE-23756
> Project: Hive
>  Issue Type: Bug
>Reporter: Ganesha Shreedhara
>Assignee: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23756.1.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Drop table command fails intermittently with the following exception.
> {code:java}
> Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent 
> row: a foreign key constraint fails ("metastore"."COLUMNS_V2", CONSTRAINT 
> "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID")) App > at 
> com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1815)at
>  com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1277) 
> Appat 
> org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:372)
> at 
> org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:628)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:207)
> at 
> org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:179)
> at 
> org.datanucleus.store.rdbms.scostore.JoinMapStore.clearInternal(JoinMapStore.java:901)
> ... 36 more 
> Caused by: 
> com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: 
> Cannot delete or update a parent row: a foreign key constraint fails 
> ("metastore"."COLUMNS_V2", CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") 
> REFERENCES "CDS" ("CD_ID"))
> at sun.reflect.GeneratedConstructorAccessor121.newInstance(Unknown Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
> at com.mysql.jdbc.Util.getInstance(Util.java:360)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:971)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823){code}
> Although HIVE-19994 resolves this issue, the FK constraint name of COLUMNS_V2 
> table specified in package.jdo file is not same as the FK constraint name 
> used while creating COLUMNS_V2 table ([Ref|#L60]]). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25517) Follow up on HIVE-24951: External Table created with Uppercase name using CTAS does not produce result for select queries

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25517?focusedWorklogId=656928=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656928
 ]

ASF GitHub Bot logged work on HIVE-25517:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 21:36
Start Date: 28/Sep/21 21:36
Worklog Time Spent: 10m 
  Work Description: sourabh912 commented on pull request #2638:
URL: https://github.com/apache/hive/pull/2638#issuecomment-929643289


   Thank you @nrg4878 for the review. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656928)
Time Spent: 2h  (was: 1h 50m)

> Follow up on HIVE-24951: External Table created with Uppercase name using 
> CTAS does not produce result for select queries
> -
>
> Key: HIVE-25517
> URL: https://issues.apache.org/jira/browse/HIVE-25517
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Sourabh Goyal
>Assignee: Sourabh Goyal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> In [PR|https://github.com/apache/hive/pull/2125] for HIVE-24951, the 
> recommendation was to use getDefaultTablePath() to set the location for an 
> external table. This Jira addresses that and makes getDefaultTablePath() more 
> generic.
>  
> cc - [~ngangam]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25517) Follow up on HIVE-24951: External Table created with Uppercase name using CTAS does not produce result for select queries

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25517?focusedWorklogId=656927=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656927
 ]

ASF GitHub Bot logged work on HIVE-25517:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 21:36
Start Date: 28/Sep/21 21:36
Worklog Time Spent: 10m 
  Work Description: sourabh912 closed pull request #2638:
URL: https://github.com/apache/hive/pull/2638


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656927)
Time Spent: 1h 50m  (was: 1h 40m)

> Follow up on HIVE-24951: External Table created with Uppercase name using 
> CTAS does not produce result for select queries
> -
>
> Key: HIVE-25517
> URL: https://issues.apache.org/jira/browse/HIVE-25517
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Sourabh Goyal
>Assignee: Sourabh Goyal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In [PR|https://github.com/apache/hive/pull/2125] for HIVE-24951, the 
> recommendation was to use getDefaultTablePath() to set the location for an 
> external table. This Jira addresses that and makes getDefaultTablePath() more 
> generic.
>  
> cc - [~ngangam]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25349) Skip password authentication when a trusted header is present in the Http request

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25349?focusedWorklogId=656889=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656889
 ]

ASF GitHub Bot logged work on HIVE-25349:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 20:28
Start Date: 28/Sep/21 20:28
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on pull request #2496:
URL: https://github.com/apache/hive/pull/2496#issuecomment-929587566


   recheck


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656889)
Time Spent: 40m  (was: 0.5h)

> Skip password authentication when a trusted header is present in the Http 
> request
> -
>
> Key: HIVE-25349
> URL: https://issues.apache.org/jira/browse/HIVE-25349
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available, security-review-needed
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Whenever a trusted header is present in the HTTP servlet request, skip the 
> password based authentication, since the user is pre-authorized and extract 
> the user name from Authorization header.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25538) CommitTxn replay failing during incremental run

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25538?focusedWorklogId=656863=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656863
 ]

ASF GitHub Bot logged work on HIVE-25538:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 20:25
Start Date: 28/Sep/21 20:25
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2655:
URL: https://github.com/apache/hive/pull/2655#discussion_r716418809



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosIncrementalLoadAcidTables.java
##
@@ -243,6 +248,53 @@ public void testReplCM() throws Throwable {
 Lists.newArrayList(result, result));
   }
 
+  @Test
+  public void testReplCommitTransactionOnSourceDelete() throws Throwable {
+String tableName = "testReplCommitTransactionOnSourceDelete";
+String[] result = new String[] { "5" };
+
+// Do a bootstrap dump.
+WarehouseInstance.Tuple bootStrapDump = primary.dump(primaryDbName);
+replica.load(replicatedDbName, primaryDbName).run("REPL STATUS " + 
replicatedDbName)
+.verifyResult(bootStrapDump.lastReplicationId);
+
+// Add some data to the table & do a incremental dump.
+ReplicationTestUtils.insertRecords(primary, primaryDbName, 
primaryDbNameExtra, tableName, null, false,
+ReplicationTestUtils.OperationType.REPL_TEST_ACID_INSERT);
+WarehouseInstance.Tuple incrementalDump = primary.dump(primaryDbName);

Review comment:
   Can you please add the tables with following property:
   - ORC Format (I think covered)
   - bucketed 
   - text input format
   All these tables should have a drop table use case like you are targeting 
now?

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosIncrementalLoadAcidTables.java
##
@@ -243,6 +248,64 @@ public void testReplCM() throws Throwable {
 Lists.newArrayList(result, result));
   }
 
+  @Test
+  public void testReplCommitTransactionOnSourceDeleteORC() throws Throwable {
+// Run test with ORC format & with transactional true.
+testReplCommitTransactionOnSourceDelete("STORED AS ORC", 
"'transactional'='true'");
+  }
+
+  @Test
+  public void testReplCommitTransactionOnSourceDeleteText() throws Throwable {
+// Run test with TEXT format & with transactional true.

Review comment:
   false?

##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
##
@@ -401,6 +404,18 @@ private boolean isSourceFileMismatch(FileSystem sourceFs, 
ReplChangeManager.File
 return false;
   }
 
+  @VisibleForTesting
+  private void runTestOnlyExecutions() throws IOException {

Review comment:
   Wondering if this logic can be. moved to test itself

##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
##
@@ -65,6 +65,8 @@
   private final String copyAsUser;
   private FileSystem destinationFs;
   private final int maxParallelCopyTask;
+  @VisibleForTesting

Review comment:
   If the method is public does the annotation VisibleForTesting  have  any 
impact?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656863)
Time Spent: 1h 20m  (was: 1h 10m)

> CommitTxn replay failing during incremental run
> ---
>
> Key: HIVE-25538
> URL: https://issues.apache.org/jira/browse/HIVE-25538
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> CommitTxn Fails during incremental run, in case the source file is deleted 
> post copy & before checksum validation.
> {noformat}
> 2021-09-21T07:53:40,898 ERROR [TThreadPoolServer WorkerProcess-%d] 
> thrift.ProcessFunction: Internal error processing commit_txn
> org.apache.thrift.TException: 
> /warehouse1/replicated_testreplcommittransactiononsourcedelete_1632235978675.db/testreplcommittransactiononsourcedelete/load_date=2016-03-01/delta_002_002_
>  (is not a directory)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolvePath(FSDirectory.java:677)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:151)
>   at 
>

[jira] [Work logged] (HIVE-25545) Add/Drop constraints events on table should create authorizable events in HS2

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25545?focusedWorklogId=656797=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656797
 ]

ASF GitHub Bot logged work on HIVE-25545:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 20:20
Start Date: 28/Sep/21 20:20
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #2665:
URL: https://github.com/apache/hive/pull/2665#discussion_r717176887



##
File path: ql/src/test/queries/clientnegative/groupby_join_pushdown.q
##
@@ -22,45 +22,45 @@ FROM src f JOIN src g ON(f.key = g.key)
 GROUP BY f.key, g.key;
 
 EXPLAIN
-SELECT  f.ctinyint, g.ctinyint, SUM(f.cbigint)  
+SELECT  f.ctinyint, g.ctinyint, SUM(f.cbigint)

Review comment:
   did you remove the spaces on purpose or a consequence of IDE? 

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/constraint/drop/AlterTableDropConstraintAnalyzer.java
##
@@ -47,11 +51,18 @@ protected void analyzeCommand(TableName tableName, 
Map partition
 String constraintName = unescapeIdentifier(command.getChild(0).getText());
 
 AlterTableDropConstraintDesc desc = new 
AlterTableDropConstraintDesc(tableName, null, constraintName);
-rootTasks.add(TaskFactory.get(new DDLWork(getInputs(), getOutputs(), 
desc)));
 
 Table table = getTable(tableName);
+WriteEntity.WriteType writeType = null;
 if (AcidUtils.isTransactionalTable(table)) {
   setAcidDdlDesc(desc);
+  writeType = WriteType.DDL_EXCLUSIVE;
+} else {
+  writeType = 
WriteEntity.determineAlterTableWriteType(AlterTableType.DROP_CONSTRAINT);
 }
+inputs.add(new ReadEntity(table));

Review comment:
   can we not call addInputsOutputsAlterTable() like we did for ADD 
CONSTRAINT? It seems like all alter can use this method.

##
File path: ql/src/test/results/clientnegative/groupby_join_pushdown.q.out
##
@@ -1358,249 +1358,15 @@ STAGE PLANS:
 
 PREHOOK: query: ALTER TABLE alltypesorc ADD CONSTRAINT pk_alltypesorc_1 
PRIMARY KEY (ctinyint) DISABLE RELY
 PREHOOK: type: ALTERTABLE_ADDCONSTRAINT
-POSTHOOK: query: ALTER TABLE alltypesorc ADD CONSTRAINT pk_alltypesorc_1 
PRIMARY KEY (ctinyint) DISABLE RELY
-POSTHOOK: type: ALTERTABLE_ADDCONSTRAINT
-PREHOOK: query: explain
-SELECT sum(f.cint), f.ctinyint
-FROM alltypesorc f JOIN alltypesorc g ON(f.ctinyint = g.ctinyint)
-GROUP BY f.ctinyint, g.ctinyint
-PREHOOK: type: QUERY
-PREHOOK: Input: default@alltypesorc
- A masked pattern was here 
-POSTHOOK: query: explain
-SELECT sum(f.cint), f.ctinyint
-FROM alltypesorc f JOIN alltypesorc g ON(f.ctinyint = g.ctinyint)
-GROUP BY f.ctinyint, g.ctinyint
-POSTHOOK: type: QUERY
-POSTHOOK: Input: default@alltypesorc
- A masked pattern was here 
-STAGE DEPENDENCIES:
-  Stage-1 is a root stage
-  Stage-0 depends on stages: Stage-1
-
-STAGE PLANS:
-  Stage: Stage-1
-Tez
- A masked pattern was here 
-  Edges:
-Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
-Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
- A masked pattern was here 
-  Vertices:
-Map 1 
-Map Operator Tree:
-TableScan
-  alias: f
-  Statistics: Num rows: 12288 Data size: 73392 Basic stats: 
COMPLETE Column stats: COMPLETE
-  Select Operator
-expressions: ctinyint (type: tinyint), cint (type: int)
-outputColumnNames: _col0, _col1
-Statistics: Num rows: 12288 Data size: 73392 Basic stats: 
COMPLETE Column stats: COMPLETE
-Reduce Output Operator
-  key expressions: _col0 (type: tinyint)
-  null sort order: z
-  sort order: +
-  Map-reduce partition columns: _col0 (type: tinyint)
-  Statistics: Num rows: 12288 Data size: 73392 Basic 
stats: COMPLETE Column stats: COMPLETE
-  value expressions: _col1 (type: int)
-Execution mode: vectorized, llap
-LLAP IO: all inputs
-Map 4 
-Map Operator Tree:
-TableScan
-  alias: g
-  Statistics: Num rows: 12288 Data size: 36696 Basic stats: 
COMPLETE Column stats: COMPLETE
-  Select Operator
-expressions: ctinyint (type: tinyint)
-outputColumnNames: _col0
-Statistics: Num rows: 12288 Data size: 36696 Basic stats: 
COMPLETE Column stats: COMPLETE
-Reduce Output Operator
-  key expressions: _col0 (type: tinyint)
-  null sort order: z
-  sort order: +
-  Map-reduce partition columns: _col0 (type: tinyint)
-

[jira] [Work logged] (HIVE-25566) Show column constraints for "DESC FORMATTED TABLE"

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25566?focusedWorklogId=656767=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656767
 ]

ASF GitHub Bot logged work on HIVE-25566:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 20:17
Start Date: 28/Sep/21 20:17
Worklog Time Spent: 10m 
  Work Description: soumyakanti3578 opened a new pull request #2678:
URL: https://github.com/apache/hive/pull/2678


   
   
   ### What changes were proposed in this pull request?
   Column constraints are added with the data type to increase readability.
   
   
   ### Why are the changes needed?
   Improves readability.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   mvn test  -Dtest=TestMiniLlapLocalCliDriver -Dtest.output.overwrite=true 
-Dqfile=show_create_table.q
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656767)
Time Spent: 20m  (was: 10m)

> Show column constraints for "DESC FORMATTED TABLE"
> --
>
> Key: HIVE-25566
> URL: https://issues.apache.org/jira/browse/HIVE-25566
> Project: Hive
>  Issue Type: New Feature
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, column constraints are not shown with the data type of columns. 
> They are shown all together at the end, but showing them with the data type 
> will make the description more readable.
>  
> Example:
> Create table
>   
> {code:java}
> CREATE TABLE TEST(
>   col1 varchar(100) NOT NULL COMMENT "comment for column 1",
>   col2 timestamp DEFAULT CURRENT_TIMESTAMP() COMMENT "comment for column 2",
>   col3 decimal,
>   col4 varchar(512) NOT NULL,
>   col5 varchar(100),
>   primary key(col1, col2) disable novalidate)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
> {code}
>  
> Currently, {{DESC FORMATTED TABLE }} returns,
> {code:java}
> # col_namedata_type   comment 
> col1  varchar(100)comment for column 1
> col2  timestamp   comment for column 2
> col3  decimal(10,0)   
> col4  varchar(512)
> col5  varchar(100)
> # Detailed Table Information   
> Database: default  
>  A masked pattern was here 
> Retention:0
>  A masked pattern was here 
> Table Type:   MANAGED_TABLE
> Table Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col2\":\"true\",\"col3\":\"true\",\"col4\":\"true\",\"col5\":\"true\"}}
>   bucketing_version   2   
>   numFiles0   
>   numRows 0   
>   rawDataSize 0   
>   totalSize   0   
>  A masked pattern was here 
> # Storage Information  
> SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde
> InputFormat:  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat  
> OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>  
> Compressed:   No   
> Num Buckets:  -1   
> Bucket Columns:   []   
> Sort Columns: []   
> Storage Desc Params:   
>   serialization.format1   
> # Constraints  
> # Primary Key  
> Table:default.test 
> Constraint Name:   A masked pattern was here   
> Column Name:  col1 
> Column Name:  col2 
> # Not Null Constraints 
> Table:default.test 
> Constraint Name:   A masked pattern was here   
> Column Name:  col1 
> Constraint Name:   A masked pattern was here   
> Column Name:  col4

[jira] [Work logged] (HIVE-25561) Killed task should not commit file.

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25561?focusedWorklogId=656719=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656719
 ]

ASF GitHub Bot logged work on HIVE-25561:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 20:13
Start Date: 28/Sep/21 20:13
Worklog Time Spent: 10m 
  Work Description: zhengchenyu commented on pull request #2674:
URL: https://github.com/apache/hive/pull/2674#issuecomment-928998346


   @abstractdog  Can you help me review it, or give me some suggestion?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656719)
Time Spent: 0.5h  (was: 20m)

> Killed task should not commit file.
> ---
>
> Key: HIVE-25561
> URL: https://issues.apache.org/jira/browse/HIVE-25561
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.2.1, 2.3.8, 2.4.0
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For tez engine in our cluster, I found some duplicate line, especially tez 
> speculation is enabled. In partition dir, I found both 02_0 and 02_1 
> exist.
> It's a very low probability event. HIVE-10429 has fix some bug about 
> interrupt, but some exception was not caught.
> In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was 
> called, hdfs client will close. Then will raise exception, but abort may not 
> set to true.
> Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate 
> file will retain. 
> (Notes: Driver first list dir, then Task commit file, then Driver remove 
> duplicate file. It is a inconsistency case)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25550) Increase the RM_PROGRESS column max length to fit metrics stat

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25550?focusedWorklogId=656677=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656677
 ]

ASF GitHub Bot logged work on HIVE-25550:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 20:10
Start Date: 28/Sep/21 20:10
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2668:
URL: https://github.com/apache/hive/pull/2668#discussion_r717396745



##
File path: metastore/scripts/upgrade/derby/058-HIVE-23516.derby.sql
##
@@ -4,7 +4,7 @@ CREATE TABLE "APP"."REPLICATION_METRICS" (
   "RM_POLICY" varchar(256) NOT NULL,
   "RM_DUMP_EXECUTION_ID" bigint NOT NULL,
   "RM_METADATA" varchar(4000),
-  "RM_PROGRESS" varchar(4000),
+  "RM_PROGRESS" varchar(24000),

Review comment:
   these files are older ones. You can skip updating them. Only the scripts 
inside standalone-metastore should be updated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656677)
Time Spent: 20m  (was: 10m)

> Increase the RM_PROGRESS column max length to fit metrics stat
> --
>
> Key: HIVE-25550
> URL: https://issues.apache.org/jira/browse/HIVE-25550
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Presently it fails with the following trace:
> {noformat}
> [[Event Name: EVENT_ALLOC_WRITE_ID; Total Number: 213; Total Time: 85347.0; 
> Mean: 400.6901408450704; Median: 392.0; Standard Deviation: 
> 33.99178239314741; Variance: 1155.4412702630862; Kurtosis: 83.69411620601193; 
> Skewness: 83.69411620601193; 25th Percentile: 384.0; 50th Percentile: 392.0; 
> 75th Percentile: 408.0; 90th Percentile: 417.0; Top 5 EventIds(EventId=Time) 
> {1498476=791, 1498872=533, 1497805=508, 1498808=500, 1499027=492};]]}"}]}" in 
> column ""RM_PROGRESS"" that has maximum length of 4000. Please correct your 
> data!
> at 
> org.datanucleus.store.rdbms.mapping.datastore.CharRDBMSMapping.setString(CharRDBMSMapping.java:254)
>  ~[datanucleus-rdbms-4.1.19.jar:?]
> at 
> org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.setString(SingleFieldMapping.java:180)
>  ~{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24579?focusedWorklogId=656624=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656624
 ]

ASF GitHub Bot logged work on HIVE-24579:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 20:05
Start Date: 28/Sep/21 20:05
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged pull request #2656:
URL: https://github.com/apache/hive/pull/2656


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656624)
Time Spent: 1h 40m  (was: 1.5h)

> Incorrect Result For Groupby With Limit
> ---
>
> Key: HIVE-24579
> URL: https://issues.apache.org/jira/browse/HIVE-24579
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> {code:sql}
> create table test(id int);
> explain extended select id,count(*) from test group by id limit 10;
> {code}
> There is an TopN unexpectly for map phase, which casues incorrect result.
> {code:sql}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: test
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   GatherStats: false
>   Select Operator
> expressions: id (type: int)
> outputColumnNames: id
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   aggregations: count()
>   keys: id (type: int)
>   mode: hash
>   outputColumnNames: _col0, _col1
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> null sort order: a
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> tag: -1
> TopN: 10
> TopN Hash Memory Usage: 0.1
> value expressions: _col1 (type: bigint)
> auto parallelism: true
> Execution mode: vectorized
> Path -> Alias:
>   file:/user/hive/warehouse/test [test]
> Path -> Partition:
>   file:/user/hive/warehouse/test 
> Partition
>   base file name: test
>   input format: org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   properties:
> COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
> bucket_count -1
> bucketing_version 2
> column.name.delimiter ,
> columns id
> columns.comments 
> columns.types int
> file.inputformat org.apache.hadoop.mapred.TextInputFormat
> file.outputformat 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> location file:/user/hive/warehouse/test
> name default.test
> numFiles 0
> numRows 0
> rawDataSize 0
> serialization.ddl struct test { i32 id}
> serialization.format 1
> serialization.lib 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> totalSize 0
> transient_lastDdlTime 1609730190
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

[jira] [Work logged] (HIVE-25541) JsonSerDe: TBLPROPERTY treating nested json as String

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25541?focusedWorklogId=656623=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656623
 ]

ASF GitHub Bot logged work on HIVE-25541:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 20:05
Start Date: 28/Sep/21 20:05
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #2664:
URL: https://github.com/apache/hive/pull/2664#discussion_r717160062



##
File path: serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonReader.java
##
@@ -393,7 +402,16 @@ private Object visitLeafNode(final JsonNode leafNode,
 case DOUBLE:
   return Double.valueOf(leafNode.asDouble());
 case STRING:
-  return leafNode.asText();
+  if (leafNode.isValueNode()) {
+return leafNode.asText();
+  } else {
+if (isEnabled(Feature.STRINGIFY_COMPLEX_FIELDS)) {
+  return leafNode.toString();
+} else {
+  throw new SerDeException(
+  "Complex field found in JSON does not match table definition: " 
+ typeInfo.getTypeName());

Review comment:
   could we do the same for the input of varchars or chars? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656623)
Time Spent: 1h 40m  (was: 1.5h)

> JsonSerDe: TBLPROPERTY treating nested json as String
> -
>
> Key: HIVE-25541
> URL: https://issues.apache.org/jira/browse/HIVE-25541
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Native Jsonserde 'org.apache.hive.hcatalog.data.JsonSerDe' currently does not 
> support loading nested json into a string type directly. It requires the 
> declaring the column as complex type (struct, map, array) to unpack nested 
> json data.
> Even though the data field is not a valid JSON String type there is value 
> treating it as plain String instead of throwing an exception as we currently 
> do.
> {code:java}
> create table json_table(data string, messageid string, publish_time bigint, 
> attributes string);
> {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}
> {code}
> This JIRA introduces an extra Table Property allowing to Stringify Complex 
> JSON values instead of forcing the User to define the complete nested 
> structure



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25569) Enable table definition over a single file

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25569?focusedWorklogId=656497=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656497
 ]

ASF GitHub Bot logged work on HIVE-25569:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 19:54
Start Date: 28/Sep/21 19:54
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk opened a new pull request #2680:
URL: https://github.com/apache/hive/pull/2680


   Change-Id: I6e8afa3463951c5b4e032df390df06a0d634fde7
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656497)
Time Spent: 20m  (was: 10m)

> Enable table definition over a single file
> --
>
> Key: HIVE-25569
> URL: https://issues.apache.org/jira/browse/HIVE-25569
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Suppose there is a directory where multiple files are present - and by a 3rd 
> party database system this is perfectly normal - because its treating a 
> single file as the contents of the table.
> Tables defined in the metastore follow a different principle - tables are 
> considered to be under a directory - and all files under that directory are 
> the contents of that directory.
> To enable seamless migration/evaluation of Hive and other databases using HMS 
> as a metadatabackend the ability to define a table over a single file would 
> be usefull.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-20303) INSERT OVERWRITE TABLE db.table PARTITION (...) IF NOT EXISTS throws InvalidTableException

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-20303?focusedWorklogId=656500=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656500
 ]

ASF GitHub Bot logged work on HIVE-20303:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 19:54
Start Date: 28/Sep/21 19:54
Worklog Time Spent: 10m 
  Work Description: zabetak opened a new pull request #2679:
URL: https://github.com/apache/hive/pull/2679


   ### What changes were proposed in this pull request?
   Extract the full table reference (DB name + table name) from the AST.
   
   ### Why are the changes needed?
   Without these changes queries fail with `InvalidTableException`.
   
   ### Does this PR introduce _any_ user-facing change?
   Queries will not fail.
   
   ### How was this patch tested?
   `mvn test -Dtest=TestMiniLlapLocalCliDriver 
-Dqfile=insert2_overwrite_partitions.q -Dtest.output.overwrite`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656500)
Time Spent: 20m  (was: 10m)

> INSERT OVERWRITE TABLE db.table PARTITION (...) IF NOT EXISTS throws 
> InvalidTableException
> --
>
> Key: HIVE-20303
> URL: https://issues.apache.org/jira/browse/HIVE-20303
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.2.0
>Reporter: xhmz
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The following scenario reproduces the problem:
> {code:sql}
> CREATE DATABASE db2;
> CREATE TABLE db2.destinTable (one STRING, two STRING) PARTITIONED BY (ds 
> STRING);
> INSERT OVERWRITE TABLE db2.destinTable PARTITION (ds='2011-11-11') IF NOT 
> EXISTS SELECT 100, 200;
> {code}
> The last query ({{INSERT OVERWRITE ...}}) fails with the following stack 
> trace:
> {noformat}
> 2021-09-28T04:25:47,330 ERROR [e3399094-860f-4381-bfd3-d2acfa8a885d main] 
> ql.Driver: FAILED: SemanticException 
> org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2
> org.apache.hadoop.hive.ql.parse.SemanticException: 
> org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1918)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1959)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12393)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12506)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:454)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317)
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:500)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:453)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:417)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:411)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:804)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:774)
> at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:175)
> at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
> at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
> {noformat}
> The problem does not reproduce when the

[jira] [Work logged] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24579?focusedWorklogId=656481=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656481
 ]

ASF GitHub Bot logged work on HIVE-24579:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 19:52
Start Date: 28/Sep/21 19:52
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #2656:
URL: https://github.com/apache/hive/pull/2656#discussion_r717361733



##
File path: ql/src/test/results/clientpositive/llap/groupby1_limit.q.out
##
@@ -71,33 +71,34 @@ STAGE PLANS:
 mode: mergepartial
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 316 Data size: 30020 Basic stats: 
COMPLETE Column stats: COMPLETE
-Limit
-  Number of rows: 5
-  Statistics: Num rows: 5 Data size: 475 Basic stats: COMPLETE 
Column stats: COMPLETE
-  Reduce Output Operator
-null sort order: 
-sort order: 
-Statistics: Num rows: 5 Data size: 475 Basic stats: 
COMPLETE Column stats: COMPLETE
-TopN Hash Memory Usage: 0.1
-value expressions: _col0 (type: string), _col1 (type: 
double)
+Reduce Output Operator

Review comment:
   we lost the `Limit` operator from here - as a result we will be 
shuffling all input rows.
   I think this could become more costly for larger tables than the old plan.
   
   I don't see TopN hash enabled on the reduce operator - which could possibly 
save the day in this case; why did we loose that as well?

##
File path: ql/src/test/results/clientpositive/llap/limit_pushdown.q.out
##
@@ -1075,6 +1072,13 @@ STAGE PLANS:
   Map-reduce partition columns: _col0 (type: string)
   Statistics: Num rows: 316 Data size: 30020 Basic 
stats: COMPLETE Column stats: COMPLETE
   value expressions: _col1 (type: bigint)
+Execution mode: vectorized, llap
+LLAP IO: all inputs
+Map 3 
+Map Operator Tree:
+TableScan
+  alias: src
+  Statistics: Num rows: 500 Data size: 43500 Basic stats: 
COMPLETE Column stats: COMPLETE
   Top N Key Operator

Review comment:
   in the old plan: did we have 2 TopN key operators in this plan which are 
equal?
   this is unrelated to this patch; but we may have an issue with its 
comparision - and because of that SWO is not able to simplify them

##
File path: ql/src/test/results/clientpositive/llap/groupby1_limit.q.out
##
@@ -71,33 +71,34 @@ STAGE PLANS:
 mode: mergepartial
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 316 Data size: 30020 Basic stats: 
COMPLETE Column stats: COMPLETE
-Limit
-  Number of rows: 5
-  Statistics: Num rows: 5 Data size: 475 Basic stats: COMPLETE 
Column stats: COMPLETE
-  Reduce Output Operator
-null sort order: 
-sort order: 
-Statistics: Num rows: 5 Data size: 475 Basic stats: 
COMPLETE Column stats: COMPLETE
-TopN Hash Memory Usage: 0.1
-value expressions: _col0 (type: string), _col1 (type: 
double)
+Reduce Output Operator

Review comment:
   I missed that - most likely because the the row estimate was >100.
   In that case this doesn't seem to be a problem; however we should fix the 
stat estimate for the TNKO - could you open a ticket?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656481)
Time Spent: 1.5h  (was: 1h 20m)

> Incorrect Result For Groupby With Limit
> ---
>
> Key: HIVE-24579
> URL: https://issues.apache.org/jira/browse/HIVE-24579
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> {code:sql}
> create table test(id int);
> explain extended select id,count(*) from test group by id limit 10;
> {code}
> There is an TopN unexpectly for map phase, which

[jira] [Work logged] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24579?focusedWorklogId=656446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656446
 ]

ASF GitHub Bot logged work on HIVE-24579:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 19:50
Start Date: 28/Sep/21 19:50
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2656:
URL: https://github.com/apache/hive/pull/2656#discussion_r717404007



##
File path: ql/src/test/results/clientpositive/llap/groupby1_limit.q.out
##
@@ -71,33 +71,34 @@ STAGE PLANS:
 mode: mergepartial
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 316 Data size: 30020 Basic stats: 
COMPLETE Column stats: COMPLETE
-Limit
-  Number of rows: 5
-  Statistics: Num rows: 5 Data size: 475 Basic stats: COMPLETE 
Column stats: COMPLETE
-  Reduce Output Operator
-null sort order: 
-sort order: 
-Statistics: Num rows: 5 Data size: 475 Basic stats: 
COMPLETE Column stats: COMPLETE
-TopN Hash Memory Usage: 0.1
-value expressions: _col0 (type: string), _col1 (type: 
double)
+Reduce Output Operator

Review comment:
   But we still have TopNKey operator in the Mapper (both old and new plan) 
it filters out the majority of the rows.
   
   This query has the same issue like the example in the jira: it has gby with 
limit + aggregate function in the project:
   ```
   SELECT src.key, sum(substr(src.value,5)) GROUP BY src.key LIMIT 5
   ``` 
   If no ordering is specified we may end up with incorrect aggregations.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656446)
Time Spent: 1h 20m  (was: 1h 10m)

> Incorrect Result For Groupby With Limit
> ---
>
> Key: HIVE-24579
> URL: https://issues.apache.org/jira/browse/HIVE-24579
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {code:sql}
> create table test(id int);
> explain extended select id,count(*) from test group by id limit 10;
> {code}
> There is an TopN unexpectly for map phase, which casues incorrect result.
> {code:sql}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: test
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   GatherStats: false
>   Select Operator
> expressions: id (type: int)
> outputColumnNames: id
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   aggregations: count()
>   keys: id (type: int)
>   mode: hash
>   outputColumnNames: _col0, _col1
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> null sort order: a
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> tag: -1
> TopN: 10
> TopN Hash Memory Usage: 0.1
> value expressions: _col1 (type: bigint)
> auto parallelism: true
> Execution mode: vectorized
> Path -> Alias:
>   file:/user/hive/warehouse/test [test]
> Path -> Partition:
>   file:/user/hive/warehouse/test 
>

[jira] [Work logged] (HIVE-25517) Follow up on HIVE-24951: External Table created with Uppercase name using CTAS does not produce result for select queries

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25517?focusedWorklogId=656453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656453
 ]

ASF GitHub Bot logged work on HIVE-25517:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 19:50
Start Date: 28/Sep/21 19:50
Worklog Time Spent: 10m 
  Work Description: sourabh912 commented on pull request #2638:
URL: https://github.com/apache/hive/pull/2638#issuecomment-928142540


   The test failure does not seem related to this patch. 
   ```
   [2021-09-22T22:28:25.641Z] [INFO]  T E S T S
   [2021-09-22T22:28:25.641Z] [INFO] 
---
   [2021-09-22T22:28:26.707Z] [INFO] Running 
org.apache.hadoop.hive.metastore.dbinstall.ITestPostgres
   [2021-09-22T22:29:05.786Z] [ERROR] Tests run: 2, Failures: 1, Errors: 0, 
Skipped: 0, Time elapsed: 32.595 s <<< FAILURE! - in 
org.apache.hadoop.hive.metastore.dbinstall.ITestPostgres
   [2021-09-22T22:29:05.786Z] [ERROR] 
install(org.apache.hadoop.hive.metastore.dbinstall.ITestPostgres)  Time 
elapsed: 6.768 s  <<< FAILURE!
   [2021-09-22T22:29:05.786Z] java.lang.AssertionError: expected:<0> but was:<1>
   [2021-09-22T22:29:05.786Z] 
   [2021-09-22T22:29:05.786Z] [INFO] 
   [2021-09-22T22:29:05.786Z] [INFO] Results:
   [2021-09-22T22:29:05.786Z] [INFO] 
   [2021-09-22T22:29:05.786Z] [ERROR] Failures: 
   [2021-09-22T22:29:05.786Z] [ERROR]   ITestPostgres>DbInstallBase.install:30 
expected:<0> but was:<1>
   [2021-09-22T22:29:05.786Z] [INFO] 
   [2021-09-22T22:29:05.786Z] [ERROR] Tests run: 2, Failures: 1, Errors: 0, 
Skipped: 0
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656453)
Time Spent: 1h 40m  (was: 1.5h)

> Follow up on HIVE-24951: External Table created with Uppercase name using 
> CTAS does not produce result for select queries
> -
>
> Key: HIVE-25517
> URL: https://issues.apache.org/jira/browse/HIVE-25517
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Sourabh Goyal
>Assignee: Sourabh Goyal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In [PR|https://github.com/apache/hive/pull/2125] for HIVE-24951, the 
> recommendation was to use getDefaultTablePath() to set the location for an 
> external table. This Jira addresses that and makes getDefaultTablePath() more 
> generic.
>  
> cc - [~ngangam]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25517) Follow up on HIVE-24951: External Table created with Uppercase name using CTAS does not produce result for select queries

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25517?focusedWorklogId=656447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656447
 ]

ASF GitHub Bot logged work on HIVE-25517:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 19:50
Start Date: 28/Sep/21 19:50
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on pull request #2638:
URL: https://github.com/apache/hive/pull/2638#issuecomment-929434572


   Fix has been merged to master. Please close the PR. Thank you for the work 
on this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656447)
Time Spent: 1.5h  (was: 1h 20m)

> Follow up on HIVE-24951: External Table created with Uppercase name using 
> CTAS does not produce result for select queries
> -
>
> Key: HIVE-25517
> URL: https://issues.apache.org/jira/browse/HIVE-25517
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Sourabh Goyal
>Assignee: Sourabh Goyal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In [PR|https://github.com/apache/hive/pull/2125] for HIVE-24951, the 
> recommendation was to use getDefaultTablePath() to set the location for an 
> external table. This Jira addresses that and makes getDefaultTablePath() more 
> generic.
>  
> cc - [~ngangam]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25541) JsonSerDe: TBLPROPERTY treating nested json as String

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25541?focusedWorklogId=656422=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656422
 ]

ASF GitHub Bot logged work on HIVE-25541:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 19:47
Start Date: 28/Sep/21 19:47
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #2664:
URL: https://github.com/apache/hive/pull/2664#discussion_r717554831



##
File path: serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonReader.java
##
@@ -393,7 +402,16 @@ private Object visitLeafNode(final JsonNode leafNode,
 case DOUBLE:
   return Double.valueOf(leafNode.asDouble());
 case STRING:
-  return leafNode.asText();
+  if (leafNode.isValueNode()) {
+return leafNode.asText();
+  } else {
+if (isEnabled(Feature.STRINGIFY_COMPLEX_FIELDS)) {
+  return leafNode.toString();
+} else {
+  throw new SerDeException(
+  "Complex field found in JSON does not match table definition: " 
+ typeInfo.getTypeName());

Review comment:
   Hey @dengzhhu653 not sure what you are referring to here -- this PR is 
targeting complex fields with non defined Hive schema (like a map of maps which 
is defined as a simple map)
   
   Enabling this feature will cause the JSON reader to treat the above complex 
field as a String (the input type is not important here) -- does it make sense?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656422)
Time Spent: 1.5h  (was: 1h 20m)

> JsonSerDe: TBLPROPERTY treating nested json as String
> -
>
> Key: HIVE-25541
> URL: https://issues.apache.org/jira/browse/HIVE-25541
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Native Jsonserde 'org.apache.hive.hcatalog.data.JsonSerDe' currently does not 
> support loading nested json into a string type directly. It requires the 
> declaring the column as complex type (struct, map, array) to unpack nested 
> json data.
> Even though the data field is not a valid JSON String type there is value 
> treating it as plain String instead of throwing an exception as we currently 
> do.
> {code:java}
> create table json_table(data string, messageid string, publish_time bigint, 
> attributes string);
> {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}
> {code}
> This JIRA introduces an extra Table Property allowing to Stringify Complex 
> JSON values instead of forcing the User to define the complete nested 
> structure



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25538) CommitTxn replay failing during incremental run

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25538?focusedWorklogId=656416=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656416
 ]

ASF GitHub Bot logged work on HIVE-25538:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 19:47
Start Date: 28/Sep/21 19:47
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2655:
URL: https://github.com/apache/hive/pull/2655#discussion_r717329217



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
##
@@ -65,6 +65,8 @@
   private final String copyAsUser;
   private FileSystem destinationFs;
   private final int maxParallelCopyTask;
+  @VisibleForTesting

Review comment:
   Functionality wise, I think NO. It is for the devs most probably.
   
   #Copied ->
   The point of an annotation is that its convention and could be used in 
static code analysis, whereas a simple comment could not.
   
   It serves the same purpose as the normal annotations like LimitedPrivate,The 
InterfaceStability ones

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosIncrementalLoadAcidTables.java
##
@@ -243,6 +248,64 @@ public void testReplCM() throws Throwable {
 Lists.newArrayList(result, result));
   }
 
+  @Test
+  public void testReplCommitTransactionOnSourceDeleteORC() throws Throwable {
+// Run test with ORC format & with transactional true.
+testReplCommitTransactionOnSourceDelete("STORED AS ORC", 
"'transactional'='true'");
+  }
+
+  @Test
+  public void testReplCommitTransactionOnSourceDeleteText() throws Throwable {
+// Run test with TEXT format & with transactional true.

Review comment:
   Yeps, Thanx Corrected

##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
##
@@ -401,6 +404,18 @@ private boolean isSourceFileMismatch(FileSystem sourceFs, 
ReplChangeManager.File
 return false;
   }
 
+  @VisibleForTesting
+  private void runTestOnlyExecutions() throws IOException {

Review comment:
   Yahh, My first try was to do so, I thought of using PowerMock, But 
MiniDfs has issue with it. Which is part of Hadoop, We can't bother that.
   The most I could pull out is a Callable into the test. So to avoid the 
delete or FS operations here, and in case in future that can be used later as 
well. Let me know if there is any other way out. :-)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656416)
Time Spent: 1h 10m  (was: 1h)

> CommitTxn replay failing during incremental run
> ---
>
> Key: HIVE-25538
> URL: https://issues.apache.org/jira/browse/HIVE-25538
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> CommitTxn Fails during incremental run, in case the source file is deleted 
> post copy & before checksum validation.
> {noformat}
> 2021-09-21T07:53:40,898 ERROR [TThreadPoolServer WorkerProcess-%d] 
> thrift.ProcessFunction: Internal error processing commit_txn
> org.apache.thrift.TException: 
> /warehouse1/replicated_testreplcommittransactiononsourcedelete_1632235978675.db/testreplcommittransactiononsourcedelete/load_date=2016-03-01/delta_002_002_
>  (is not a directory)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolvePath(FSDirectory.java:677)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:151)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1927)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:424)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)

[jira] [Commented] (HIVE-25566) Show column constraints for "DESC FORMATTED TABLE"

2021-09-28 Thread Soumyakanti Das (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421574#comment-17421574
 ] 

Soumyakanti Das commented on HIVE-25566:


Done! 

> Show column constraints for "DESC FORMATTED TABLE"
> --
>
> Key: HIVE-25566
> URL: https://issues.apache.org/jira/browse/HIVE-25566
> Project: Hive
>  Issue Type: New Feature
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, column constraints are not shown with the data type of columns. 
> They are shown all together at the end, but showing them with the data type 
> will make the description more readable.
>  
> Example:
> Create table
>   
> {code:java}
> CREATE TABLE TEST(
>   col1 varchar(100) NOT NULL COMMENT "comment for column 1",
>   col2 timestamp DEFAULT CURRENT_TIMESTAMP() COMMENT "comment for column 2",
>   col3 decimal,
>   col4 varchar(512) NOT NULL,
>   col5 varchar(100),
>   primary key(col1, col2) disable novalidate)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
> {code}
>  
> Currently, {{DESC FORMATTED TABLE }} returns,
> {code:java}
> # col_namedata_type   comment 
> col1  varchar(100)comment for column 1
> col2  timestamp   comment for column 2
> col3  decimal(10,0)   
> col4  varchar(512)
> col5  varchar(100)
> # Detailed Table Information   
> Database: default  
>  A masked pattern was here 
> Retention:0
>  A masked pattern was here 
> Table Type:   MANAGED_TABLE
> Table Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col2\":\"true\",\"col3\":\"true\",\"col4\":\"true\",\"col5\":\"true\"}}
>   bucketing_version   2   
>   numFiles0   
>   numRows 0   
>   rawDataSize 0   
>   totalSize   0   
>  A masked pattern was here 
> # Storage Information  
> SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde
> InputFormat:  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat  
> OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>  
> Compressed:   No   
> Num Buckets:  -1   
> Bucket Columns:   []   
> Sort Columns: []   
> Storage Desc Params:   
>   serialization.format1   
> # Constraints  
> # Primary Key  
> Table:default.test 
> Constraint Name:   A masked pattern was here   
> Column Name:  col1 
> Column Name:  col2 
> # Not Null Constraints 
> Table:default.test 
> Constraint Name:   A masked pattern was here   
> Column Name:  col1 
> Constraint Name:   A masked pattern was here   
> Column Name:  col4 
> # Default Constraints  
> Table:default.test 
> Constraint Name:   A masked pattern was here   
> Column Name:col2  Default Value:CURRENT_TIMESTAMP()   
> {code}
>  
> Adding the column constraints will look something like,
> {code:java}
> # col_namedata_type   
> comment 
> col1  varchar(100) PRIMARY KEY NOT NULL   
> comment for column 1
> col2  timestamp PRIMARY KEY DEFAULT CURRENT_TIMESTAMP()   
> comment for column 2
> col3  decimal(10,0)   
> col4  varchar(512) NOT NULL   
> col5  varchar(100)
> # Detailed Table Information   
> Database: default  
>  A masked pattern was here 
> Retention:0
>  A masked pattern was here 
> Table Type:   MANAGED_TABLE
> Table Parameters:

[jira] [Updated] (HIVE-25566) Show column constraints for "DESC FORMATTED TABLE"

2021-09-28 Thread Soumyakanti Das (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soumyakanti Das updated HIVE-25566:
---
Description: 
Currently, column constraints are not shown with the data type of columns. They 
are shown all together at the end, but showing them with the data type will 
make the description more readable.

 

Example:

Create table
  
{code:java}
CREATE TABLE TEST(
  col1 varchar(100) NOT NULL COMMENT "comment for column 1",
  col2 timestamp DEFAULT CURRENT_TIMESTAMP() COMMENT "comment for column 2",
  col3 decimal,
  col4 varchar(512) NOT NULL,
  col5 varchar(100),
  primary key(col1, col2) disable novalidate)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
{code}
 

Currently, {{DESC FORMATTED TABLE }} returns,
{code:java}
# col_name  data_type   comment 
col1varchar(100)comment for column 1
col2timestamp   comment for column 2
col3decimal(10,0)   
col4varchar(512)
col5varchar(100)

# Detailed Table Information 
Database:   default  
 A masked pattern was here 
Retention:  0
 A masked pattern was here 
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE   
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col2\":\"true\",\"col3\":\"true\",\"col4\":\"true\",\"col5\":\"true\"}}
bucketing_version   2   
numFiles0   
numRows 0   
rawDataSize 0   
totalSize   0   
 A masked pattern was here 

# Storage Information
SerDe Library:  org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat  
OutputFormat:   org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
 
Compressed: No   
Num Buckets:-1   
Bucket Columns: []   
Sort Columns:   []   
Storage Desc Params: 
serialization.format1   

# Constraints

# Primary Key
Table:  default.test 
Constraint Name: A masked pattern was here   
Column Name:col1 
Column Name:col2 

# Not Null Constraints   
Table:  default.test 
Constraint Name: A masked pattern was here   
Column Name:col1 

Constraint Name: A masked pattern was here   
Column Name:col4 


# Default Constraints
Table:  default.test 
Constraint Name: A masked pattern was here   
Column Name:col2Default Value:CURRENT_TIMESTAMP()   
{code}
 

Adding the column constraints will look something like,
{code:java}
# col_name  data_type   
comment 
col1varchar(100) PRIMARY KEY NOT NULL   
comment for column 1
col2timestamp PRIMARY KEY DEFAULT CURRENT_TIMESTAMP()   
comment for column 2
col3decimal(10,0)   
col4varchar(512) NOT NULL   
col5varchar(100)

# Detailed Table Information 
Database:   default  
 A masked pattern was here 
Retention:  0
 A masked pattern was here 
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE   
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col2\":\"true\",\"col3\":\"true\",\"col4\":\"true\",\"col5\":\"true\"}}
bucketing_version   2   
numFiles0   
numRows 0   
rawDataSize 0   
totalSize   0   
 A masked pattern was here 

# Storage Information
SerDe Library:  org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:

[jira] [Updated] (HIVE-25566) Show column constraints for "DESC FORMATTED TABLE"

2021-09-28 Thread Soumyakanti Das (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soumyakanti Das updated HIVE-25566:
---
Issue Type: New Feature  (was: Improvement)

> Show column constraints for "DESC FORMATTED TABLE"
> --
>
> Key: HIVE-25566
> URL: https://issues.apache.org/jira/browse/HIVE-25566
> Project: Hive
>  Issue Type: New Feature
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, column constraints are not shown with the data type of columns. 
> They are shown all together at the end, but showing them with the data type 
> will make the description more readable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25517) Follow up on HIVE-24951: External Table created with Uppercase name using CTAS does not produce result for select queries

2021-09-28 Thread Naveen Gangam (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam resolved HIVE-25517.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Fix has been committed to master. Closing the jira. Thank you for the fix 
[~sourabh912]

> Follow up on HIVE-24951: External Table created with Uppercase name using 
> CTAS does not produce result for select queries
> -
>
> Key: HIVE-25517
> URL: https://issues.apache.org/jira/browse/HIVE-25517
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Sourabh Goyal
>Assignee: Sourabh Goyal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In [PR|https://github.com/apache/hive/pull/2125] for HIVE-24951, the 
> recommendation was to use getDefaultTablePath() to set the location for an 
> external table. This Jira addresses that and makes getDefaultTablePath() more 
> generic.
>  
> cc - [~ngangam]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25517) Follow up on HIVE-24951: External Table created with Uppercase name using CTAS does not produce result for select queries

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25517?focusedWorklogId=656260=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656260
 ]

ASF GitHub Bot logged work on HIVE-25517:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 17:18
Start Date: 28/Sep/21 17:18
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on pull request #2638:
URL: https://github.com/apache/hive/pull/2638#issuecomment-929434572


   Fix has been merged to master. Please close the PR. Thank you for the work 
on this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656260)
Time Spent: 1h 20m  (was: 1h 10m)

> Follow up on HIVE-24951: External Table created with Uppercase name using 
> CTAS does not produce result for select queries
> -
>
> Key: HIVE-25517
> URL: https://issues.apache.org/jira/browse/HIVE-25517
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Sourabh Goyal
>Assignee: Sourabh Goyal
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In [PR|https://github.com/apache/hive/pull/2125] for HIVE-24951, the 
> recommendation was to use getDefaultTablePath() to set the location for an 
> external table. This Jira addresses that and makes getDefaultTablePath() more 
> generic.
>  
> cc - [~ngangam]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25568) Estimate TopNKey operator statistics.

2021-09-28 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-25568:
--
Description: 
Currently TopNKey operator has the same statistics as it's parent operator:
{code}
TableScan
  alias: src
  Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column 
stats: COMPLETE
  Top N Key Operator
sort order: +
keys: key (type: string)
null sort order: z
Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column 
stats: COMPLETE
top n: 5
{code}
This operator filters out rows and this should be indicated in statistics.

  was:
Currently TopNKey operator has the same statistics as it's parent operator:
{code}
 TableScan
  alias: src
  Statistics: Num rows: 500 Data size: 89000 Basic stats: 
COMPLETE Column stats: COMPLETE
  Top N Key Operator
sort order: +
keys: key (type: string)
null sort order: z
Statistics: Num rows: 500 Data size: 89000 Basic stats: 
COMPLETE Column stats: COMPLETE
top n: 5
{code}
This operator filters out rows and this should be indicated in statistics.


> Estimate TopNKey operator statistics.
> -
>
> Key: HIVE-25568
> URL: https://issues.apache.org/jira/browse/HIVE-25568
> Project: Hive
>  Issue Type: Improvement
>Reporter: Krisztian Kasa
>Priority: Major
>
> Currently TopNKey operator has the same statistics as it's parent operator:
> {code}
> TableScan
>   alias: src
>   Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column 
> stats: COMPLETE
>   Top N Key Operator
> sort order: +
> keys: key (type: string)
> null sort order: z
> Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column 
> stats: COMPLETE
> top n: 5
> {code}
> This operator filters out rows and this should be indicated in statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25569) Enable table definition over a single file

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25569:
--
Labels: pull-request-available  (was: )

> Enable table definition over a single file
> --
>
> Key: HIVE-25569
> URL: https://issues.apache.org/jira/browse/HIVE-25569
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Suppose there is a directory where multiple files are present - and by a 3rd 
> party database system this is perfectly normal - because its treating a 
> single file as the contents of the table.
> Tables defined in the metastore follow a different principle - tables are 
> considered to be under a directory - and all files under that directory are 
> the contents of that directory.
> To enable seamless migration/evaluation of Hive and other databases using HMS 
> as a metadatabackend the ability to define a table over a single file would 
> be usefull.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25569) Enable table definition over a single file

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25569?focusedWorklogId=656191=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656191
 ]

ASF GitHub Bot logged work on HIVE-25569:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 14:42
Start Date: 28/Sep/21 14:42
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk opened a new pull request #2680:
URL: https://github.com/apache/hive/pull/2680


   Change-Id: I6e8afa3463951c5b4e032df390df06a0d634fde7
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656191)
Remaining Estimate: 0h
Time Spent: 10m

> Enable table definition over a single file
> --
>
> Key: HIVE-25569
> URL: https://issues.apache.org/jira/browse/HIVE-25569
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Suppose there is a directory where multiple files are present - and by a 3rd 
> party database system this is perfectly normal - because its treating a 
> single file as the contents of the table.
> Tables defined in the metastore follow a different principle - tables are 
> considered to be under a directory - and all files under that directory are 
> the contents of that directory.
> To enable seamless migration/evaluation of Hive and other databases using HMS 
> as a metadatabackend the ability to define a table over a single file would 
> be usefull.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25569) Enable table definition over a single file

2021-09-28 Thread Zoltan Haindrich (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421431#comment-17421431
 ] 

Zoltan Haindrich commented on HIVE-25569:
-

Proposed solution: SingleFileSystem

Suppose we have a file in a regular filesystem (hdfs://tmp/f1.txt) - over we 
want to define a table. To avoid the problems we could get into by setting its 
parent directory as the table's dir. An sfs wrapped URI could be used: 
sfs+hdfs://tmp/f1.txt/SINGLEFILE.
Specifying the SINGLEFILE path element instructs this filesystem to show only 
the f1.txt under that directory.

{code}
$ hdfs dfs -find 'hdfs://localhost:20500/tmp/d1/'
hdfs://localhost:20500/tmp/d1
hdfs://localhost:20500/tmp/d1/f1
hdfs://localhost:20500/tmp/d1/f2
$ hdfs dfs -find 'sfs+hdfs://localhost:20500/tmp/d1/'
sfs+hdfs://localhost:20500/tmp/d1
sfs+hdfs://localhost:20500/tmp/d1/f1
sfs+hdfs://localhost:20500/tmp/d1/f1/SINGLEFILE
sfs+hdfs://localhost:20500/tmp/d1/f1/SINGLEFILE/f1
sfs+hdfs://localhost:20500/tmp/d1/f2
sfs+hdfs://localhost:20500/tmp/d1/f2/SINGLEFILE
sfs+hdfs://localhost:20500/tmp/d1/f2/SINGLEFILE/f2
{code}


> Enable table definition over a single file
> --
>
> Key: HIVE-25569
> URL: https://issues.apache.org/jira/browse/HIVE-25569
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> Suppose there is a directory where multiple files are present - and by a 3rd 
> party database system this is perfectly normal - because its treating a 
> single file as the contents of the table.
> Tables defined in the metastore follow a different principle - tables are 
> considered to be under a directory - and all files under that directory are 
> the contents of that directory.
> To enable seamless migration/evaluation of Hive and other databases using HMS 
> as a metadatabackend the ability to define a table over a single file would 
> be usefull.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25569) Enable table definition over a single file

2021-09-28 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-25569:
---


> Enable table definition over a single file
> --
>
> Key: HIVE-25569
> URL: https://issues.apache.org/jira/browse/HIVE-25569
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> Suppose there is a directory where multiple files are present - and by a 3rd 
> party database system this is perfectly normal - because its treating a 
> single file as the contents of the table.
> Tables defined in the metastore follow a different principle - tables are 
> considered to be under a directory - and all files under that directory are 
> the contents of that directory.
> To enable seamless migration/evaluation of Hive and other databases using HMS 
> as a metadatabackend the ability to define a table over a single file would 
> be usefull.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-20303) INSERT OVERWRITE TABLE db.table PARTITION (...) IF NOT EXISTS throws InvalidTableException

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-20303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-20303:
--
Labels: pull-request-available  (was: )

> INSERT OVERWRITE TABLE db.table PARTITION (...) IF NOT EXISTS throws 
> InvalidTableException
> --
>
> Key: HIVE-20303
> URL: https://issues.apache.org/jira/browse/HIVE-20303
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.2.0
>Reporter: xhmz
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The following scenario reproduces the problem:
> {code:sql}
> CREATE DATABASE db2;
> CREATE TABLE db2.destinTable (one STRING, two STRING) PARTITIONED BY (ds 
> STRING);
> INSERT OVERWRITE TABLE db2.destinTable PARTITION (ds='2011-11-11') IF NOT 
> EXISTS SELECT 100, 200;
> {code}
> The last query ({{INSERT OVERWRITE ...}}) fails with the following stack 
> trace:
> {noformat}
> 2021-09-28T04:25:47,330 ERROR [e3399094-860f-4381-bfd3-d2acfa8a885d main] 
> ql.Driver: FAILED: SemanticException 
> org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2
> org.apache.hadoop.hive.ql.parse.SemanticException: 
> org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1918)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1959)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12393)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12506)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:454)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317)
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:500)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:453)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:417)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:411)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:804)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:774)
> at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:175)
> at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
> at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
> {noformat}
> The problem does not reproduce when the {{IF NOT EXISTS}} clause is not 
> present in the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-20303) INSERT OVERWRITE TABLE db.table PARTITION (...) IF NOT EXISTS throws InvalidTableException

2021-09-28 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-20303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-20303:
--

Assignee: Stamatis Zampetakis

> INSERT OVERWRITE TABLE db.table PARTITION (...) IF NOT EXISTS throws 
> InvalidTableException
> --
>
> Key: HIVE-20303
> URL: https://issues.apache.org/jira/browse/HIVE-20303
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.2.0
>Reporter: xhmz
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The following scenario reproduces the problem:
> {code:sql}
> CREATE DATABASE db2;
> CREATE TABLE db2.destinTable (one STRING, two STRING) PARTITIONED BY (ds 
> STRING);
> INSERT OVERWRITE TABLE db2.destinTable PARTITION (ds='2011-11-11') IF NOT 
> EXISTS SELECT 100, 200;
> {code}
> The last query ({{INSERT OVERWRITE ...}}) fails with the following stack 
> trace:
> {noformat}
> 2021-09-28T04:25:47,330 ERROR [e3399094-860f-4381-bfd3-d2acfa8a885d main] 
> ql.Driver: FAILED: SemanticException 
> org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2
> org.apache.hadoop.hive.ql.parse.SemanticException: 
> org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1918)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1959)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12393)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12506)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:454)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317)
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:500)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:453)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:417)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:411)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:804)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:774)
> at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:175)
> at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
> at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
> {noformat}
> The problem does not reproduce when the {{IF NOT EXISTS}} clause is not 
> present in the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-20303) INSERT OVERWRITE TABLE db.table PARTITION (...) IF NOT EXISTS throws InvalidTableException

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-20303?focusedWorklogId=656167=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656167
 ]

ASF GitHub Bot logged work on HIVE-20303:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 14:08
Start Date: 28/Sep/21 14:08
Worklog Time Spent: 10m 
  Work Description: zabetak opened a new pull request #2679:
URL: https://github.com/apache/hive/pull/2679


   ### What changes were proposed in this pull request?
   Extract the full table reference (DB name + table name) from the AST.
   
   ### Why are the changes needed?
   Without these changes queries fail with `InvalidTableException`.
   
   ### Does this PR introduce _any_ user-facing change?
   Queries will not fail.
   
   ### How was this patch tested?
   `mvn test -Dtest=TestMiniLlapLocalCliDriver 
-Dqfile=insert2_overwrite_partitions.q -Dtest.output.overwrite`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656167)
Remaining Estimate: 0h
Time Spent: 10m

> INSERT OVERWRITE TABLE db.table PARTITION (...) IF NOT EXISTS throws 
> InvalidTableException
> --
>
> Key: HIVE-20303
> URL: https://issues.apache.org/jira/browse/HIVE-20303
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.2.0
>Reporter: xhmz
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The following scenario reproduces the problem:
> {code:sql}
> CREATE DATABASE db2;
> CREATE TABLE db2.destinTable (one STRING, two STRING) PARTITIONED BY (ds 
> STRING);
> INSERT OVERWRITE TABLE db2.destinTable PARTITION (ds='2011-11-11') IF NOT 
> EXISTS SELECT 100, 200;
> {code}
> The last query ({{INSERT OVERWRITE ...}}) fails with the following stack 
> trace:
> {noformat}
> 2021-09-28T04:25:47,330 ERROR [e3399094-860f-4381-bfd3-d2acfa8a885d main] 
> ql.Driver: FAILED: SemanticException 
> org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2
> org.apache.hadoop.hive.ql.parse.SemanticException: 
> org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1918)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1959)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12393)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12506)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:454)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317)
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:500)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:453)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:417)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:411)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:804)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:774)
> at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:175)
> at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
> at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
> {noformat}
> The problem does not reproduce when the {{IF NOT EXISTS}} clause is not 
> present in the query.

[jira] [Work logged] (HIVE-25541) JsonSerDe: TBLPROPERTY treating nested json as String

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25541?focusedWorklogId=656135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656135
 ]

ASF GitHub Bot logged work on HIVE-25541:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 13:05
Start Date: 28/Sep/21 13:05
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #2664:
URL: https://github.com/apache/hive/pull/2664#discussion_r717554831



##
File path: serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonReader.java
##
@@ -393,7 +402,16 @@ private Object visitLeafNode(final JsonNode leafNode,
 case DOUBLE:
   return Double.valueOf(leafNode.asDouble());
 case STRING:
-  return leafNode.asText();
+  if (leafNode.isValueNode()) {
+return leafNode.asText();
+  } else {
+if (isEnabled(Feature.STRINGIFY_COMPLEX_FIELDS)) {
+  return leafNode.toString();
+} else {
+  throw new SerDeException(
+  "Complex field found in JSON does not match table definition: " 
+ typeInfo.getTypeName());

Review comment:
   Hey @dengzhhu653 not sure what you are referring to here -- this PR is 
targeting complex fields with non defined Hive schema (like a map of maps which 
is defined as a simple map)
   
   Enabling this feature will cause the JSON reader to treat the above complex 
field as a String (the input type is not important here) -- does it make sense?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656135)
Time Spent: 1h 20m  (was: 1h 10m)

> JsonSerDe: TBLPROPERTY treating nested json as String
> -
>
> Key: HIVE-25541
> URL: https://issues.apache.org/jira/browse/HIVE-25541
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Native Jsonserde 'org.apache.hive.hcatalog.data.JsonSerDe' currently does not 
> support loading nested json into a string type directly. It requires the 
> declaring the column as complex type (struct, map, array) to unpack nested 
> json data.
> Even though the data field is not a valid JSON String type there is value 
> treating it as plain String instead of throwing an exception as we currently 
> do.
> {code:java}
> create table json_table(data string, messageid string, publish_time bigint, 
> attributes string);
> {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}
> {code}
> This JIRA introduces an extra Table Property allowing to Stringify Complex 
> JSON values instead of forcing the User to define the complete nested 
> structure



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-20303) INSERT OVERWRITE TABLE db.table PARTITION (...) IF NOT EXISTS throws InvalidTableException

2021-09-28 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-20303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-20303:
---
Summary: INSERT OVERWRITE TABLE db.table PARTITION (...) IF NOT EXISTS 
throws InvalidTableException  (was: INSERT OVERWRITE TABLE   db.table PARTITION 
()  if not exists . will error as Table not found db (state=42000,code=4) )

> INSERT OVERWRITE TABLE db.table PARTITION (...) IF NOT EXISTS throws 
> InvalidTableException
> --
>
> Key: HIVE-20303
> URL: https://issues.apache.org/jira/browse/HIVE-20303
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.2.0
>Reporter: xhmz
>Priority: Major
>
> The following scenario reproduces the problem:
> {code:sql}
> CREATE DATABASE db2;
> CREATE TABLE db2.destinTable (one STRING, two STRING) PARTITIONED BY (ds 
> STRING);
> INSERT OVERWRITE TABLE db2.destinTable PARTITION (ds='2011-11-11') IF NOT 
> EXISTS SELECT 100, 200;
> {code}
> The last query ({{INSERT OVERWRITE ...}}) fails with the following stack 
> trace:
> {noformat}
> 2021-09-28T04:25:47,330 ERROR [e3399094-860f-4381-bfd3-d2acfa8a885d main] 
> ql.Driver: FAILED: SemanticException 
> org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2
> org.apache.hadoop.hive.ql.parse.SemanticException: 
> org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1918)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1959)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12393)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12506)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:454)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317)
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:500)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:453)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:417)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:411)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:804)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:774)
> at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:175)
> at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
> at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
> {noformat}
> The problem does not reproduce when the {{IF NOT EXISTS}} clause is not 
> present in the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-20303) INSERT OVERWRITE TABLE db.table PARTITION () if not exists . will error as Table not found db (state=42000,code=40000)

2021-09-28 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-20303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-20303:
---
Description: 
The following scenario reproduces the problem:

{code:sql}
CREATE DATABASE db2;
CREATE TABLE db2.destinTable (one STRING, two STRING) PARTITIONED BY (ds 
STRING);
INSERT OVERWRITE TABLE db2.destinTable PARTITION (ds='2011-11-11') IF NOT 
EXISTS SELECT 100, 200;
{code}

The last query ({{INSERT OVERWRITE ...}}) fails with the following stack trace:

{noformat}
2021-09-28T04:25:47,330 ERROR [e3399094-860f-4381-bfd3-d2acfa8a885d main] 
ql.Driver: FAILED: SemanticException 
org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2
org.apache.hadoop.hive.ql.parse.SemanticException: 
org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1918)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1959)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12393)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12506)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:454)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:500)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:453)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:417)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:411)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:804)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:774)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:175)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
at 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
{noformat}

The problem does not reproduce when the {{IF NOT EXISTS}} clause is not present 
in the query.

  was:
if i use INSERT OVERWRITE TABLE   db.table PARTITION ()  if not exists select 
xx, it wii error

as 

Error: Error while compiling statement: FAILED: SemanticException 
org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db 
(state=42000,code=4)

 

but   INSERT OVERWRITE TABLE   db.table PARTITION ()  select,  do not use [if 
not exists], it is ok


> INSERT OVERWRITE TABLE   db.table PARTITION ()  if not exists . will error as 
> Table not found db (state=42000,code=4) 
> --
>
> Key: HIVE-20303
> URL: https://issues.apache.org/jira/browse/HIVE-20303
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.2.0
>Reporter: xhmz
>Priority: Major
>
> The following scenario reproduces the problem:
> {code:sql}
> CREATE DATABASE db2;
> CREATE TABLE db2.destinTable (one STRING, two STRING) PARTITIONED BY (ds 
> STRING);
> INSERT OVERWRITE TABLE db2.destinTable PARTITION (ds='2011-11-11') IF NOT 
> EXISTS SELECT 100, 200;
> {code}
> The last query ({{INSERT OVERWRITE ...}}) fails with the following stack 
> trace:
> {noformat}
> 2021-09-28T04:25:47,330 ERROR [e3399094-860f-4381-bfd3-d2acfa8a885d main] 
> ql.Driver: FAILED: SemanticException 
> org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2
> org.apache.hadoop.hive.ql.parse.SemanticException: 
> org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found db2
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1918)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1959)
> at 
>

[jira] [Resolved] (HIVE-25378) Enable removal of old builds on hive ci

2021-09-28 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-25378.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Krisztian for reviewing the changes!

> Enable removal of old builds on hive ci
> ---
>
> Key: HIVE-25378
> URL: https://issues.apache.org/jira/browse/HIVE-25378
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We are using the github plugin to run builds on PRs
> However to remove old builds that plugin needs to have periodic branch 
> scanning enabled - however since we also use the plugins merge mechanism; 
> this will cause to rediscover all open PRs after there is a new commit on the 
> target branch. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-09-28 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-24579:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Pushed to master. Thanks [~kgyrtkirk] and [~nemon] for review.

> Incorrect Result For Groupby With Limit
> ---
>
> Key: HIVE-24579
> URL: https://issues.apache.org/jira/browse/HIVE-24579
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {code:sql}
> create table test(id int);
> explain extended select id,count(*) from test group by id limit 10;
> {code}
> There is an TopN unexpectly for map phase, which casues incorrect result.
> {code:sql}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: test
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   GatherStats: false
>   Select Operator
> expressions: id (type: int)
> outputColumnNames: id
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   aggregations: count()
>   keys: id (type: int)
>   mode: hash
>   outputColumnNames: _col0, _col1
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> null sort order: a
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> tag: -1
> TopN: 10
> TopN Hash Memory Usage: 0.1
> value expressions: _col1 (type: bigint)
> auto parallelism: true
> Execution mode: vectorized
> Path -> Alias:
>   file:/user/hive/warehouse/test [test]
> Path -> Partition:
>   file:/user/hive/warehouse/test 
> Partition
>   base file name: test
>   input format: org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   properties:
> COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
> bucket_count -1
> bucketing_version 2
> column.name.delimiter ,
> columns id
> columns.comments 
> columns.types int
> file.inputformat org.apache.hadoop.mapred.TextInputFormat
> file.outputformat 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> location file:/user/hive/warehouse/test
> name default.test
> numFiles 0
> numRows 0
> rawDataSize 0
> serialization.ddl struct test { i32 id}
> serialization.format 1
> serialization.lib 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> totalSize 0
> transient_lastDdlTime 1609730190
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> 
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> properties:
>   COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
>   bucket_count -1
>   bucketing_version 2
>   column.name.delimiter ,
>   columns id
>   columns.comments 
>   columns.types int
>

[jira] [Work logged] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24579?focusedWorklogId=656081=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656081
 ]

ASF GitHub Bot logged work on HIVE-24579:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 10:40
Start Date: 28/Sep/21 10:40
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged pull request #2656:
URL: https://github.com/apache/hive/pull/2656


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656081)
Time Spent: 1h 10m  (was: 1h)

> Incorrect Result For Groupby With Limit
> ---
>
> Key: HIVE-24579
> URL: https://issues.apache.org/jira/browse/HIVE-24579
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {code:sql}
> create table test(id int);
> explain extended select id,count(*) from test group by id limit 10;
> {code}
> There is an TopN unexpectly for map phase, which casues incorrect result.
> {code:sql}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: test
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   GatherStats: false
>   Select Operator
> expressions: id (type: int)
> outputColumnNames: id
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   aggregations: count()
>   keys: id (type: int)
>   mode: hash
>   outputColumnNames: _col0, _col1
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> null sort order: a
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> tag: -1
> TopN: 10
> TopN Hash Memory Usage: 0.1
> value expressions: _col1 (type: bigint)
> auto parallelism: true
> Execution mode: vectorized
> Path -> Alias:
>   file:/user/hive/warehouse/test [test]
> Path -> Partition:
>   file:/user/hive/warehouse/test 
> Partition
>   base file name: test
>   input format: org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   properties:
> COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
> bucket_count -1
> bucketing_version 2
> column.name.delimiter ,
> columns id
> columns.comments 
> columns.types int
> file.inputformat org.apache.hadoop.mapred.TextInputFormat
> file.outputformat 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> location file:/user/hive/warehouse/test
> name default.test
> numFiles 0
> numRows 0
> rawDataSize 0
> serialization.ddl struct test { i32 id}
> serialization.format 1
> serialization.lib 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> totalSize 0
> transient_lastDdlTime 1609730190
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>

[jira] [Commented] (HIVE-25566) Show column constraints for "DESC FORMATTED TABLE"

2021-09-28 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421307#comment-17421307
 ] 

Stamatis Zampetakis commented on HIVE-25566:


[~soumyakanti.das] Can you include a sample query with before and after output 
in the description to better understand the benefit of this change?

> Show column constraints for "DESC FORMATTED TABLE"
> --
>
> Key: HIVE-25566
> URL: https://issues.apache.org/jira/browse/HIVE-25566
> Project: Hive
>  Issue Type: Improvement
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, column constraints are not shown with the data type of columns. 
> They are shown all together at the end, but showing them with the data type 
> will make the description more readable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25561) Killed task should not commit file.

2021-09-28 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421306#comment-17421306
 ] 

Stamatis Zampetakis commented on HIVE-25561:


[~zhengchenyu] Did you mean to write "duplicate file" instead of "duplicate 
line"? Are the contents of the files identical? 

> Killed task should not commit file.
> ---
>
> Key: HIVE-25561
> URL: https://issues.apache.org/jira/browse/HIVE-25561
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.2.1, 2.3.8, 2.4.0
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For tez engine in our cluster, I found some duplicate line, especially tez 
> speculation is enabled. In partition dir, I found both 02_0 and 02_1 
> exist.
> It's a very low probability event. HIVE-10429 has fix some bug about 
> interrupt, but some exception was not caught.
> In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was 
> called, hdfs client will close. Then will raise exception, but abort may not 
> set to true.
> Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate 
> file will retain. 
> (Notes: Driver first list dir, then Task commit file, then Driver remove 
> duplicate file. It is a inconsistency case)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24579?focusedWorklogId=656073=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656073
 ]

ASF GitHub Bot logged work on HIVE-24579:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 09:47
Start Date: 28/Sep/21 09:47
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #2656:
URL: https://github.com/apache/hive/pull/2656#discussion_r717408925



##
File path: ql/src/test/results/clientpositive/llap/groupby1_limit.q.out
##
@@ -71,33 +71,34 @@ STAGE PLANS:
 mode: mergepartial
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 316 Data size: 30020 Basic stats: 
COMPLETE Column stats: COMPLETE
-Limit
-  Number of rows: 5
-  Statistics: Num rows: 5 Data size: 475 Basic stats: COMPLETE 
Column stats: COMPLETE
-  Reduce Output Operator
-null sort order: 
-sort order: 
-Statistics: Num rows: 5 Data size: 475 Basic stats: 
COMPLETE Column stats: COMPLETE
-TopN Hash Memory Usage: 0.1
-value expressions: _col0 (type: string), _col1 (type: 
double)
+Reduce Output Operator

Review comment:
   I missed that - most likely because the the row estimate was >100.
   In that case this doesn't seem to be a problem; however we should fix the 
stat estimate for the TNKO - could you open a ticket?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656073)
Time Spent: 1h  (was: 50m)

> Incorrect Result For Groupby With Limit
> ---
>
> Key: HIVE-24579
> URL: https://issues.apache.org/jira/browse/HIVE-24579
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {code:sql}
> create table test(id int);
> explain extended select id,count(*) from test group by id limit 10;
> {code}
> There is an TopN unexpectly for map phase, which casues incorrect result.
> {code:sql}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: test
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   GatherStats: false
>   Select Operator
> expressions: id (type: int)
> outputColumnNames: id
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   aggregations: count()
>   keys: id (type: int)
>   mode: hash
>   outputColumnNames: _col0, _col1
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> null sort order: a
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> tag: -1
> TopN: 10
> TopN Hash Memory Usage: 0.1
> value expressions: _col1 (type: bigint)
> auto parallelism: true
> Execution mode: vectorized
> Path -> Alias:
>   file:/user/hive/warehouse/test [test]
> Path -> Partition:
>   file:/user/hive/warehouse/test 
> Partition
>   base file name: test
>   input format: org.apache.hadoop.mapred.TextInputFormat
>   output format: 
>

[jira] [Work logged] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24579?focusedWorklogId=656071=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656071
 ]

ASF GitHub Bot logged work on HIVE-24579:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 09:40
Start Date: 28/Sep/21 09:40
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2656:
URL: https://github.com/apache/hive/pull/2656#discussion_r717404007



##
File path: ql/src/test/results/clientpositive/llap/groupby1_limit.q.out
##
@@ -71,33 +71,34 @@ STAGE PLANS:
 mode: mergepartial
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 316 Data size: 30020 Basic stats: 
COMPLETE Column stats: COMPLETE
-Limit
-  Number of rows: 5
-  Statistics: Num rows: 5 Data size: 475 Basic stats: COMPLETE 
Column stats: COMPLETE
-  Reduce Output Operator
-null sort order: 
-sort order: 
-Statistics: Num rows: 5 Data size: 475 Basic stats: 
COMPLETE Column stats: COMPLETE
-TopN Hash Memory Usage: 0.1
-value expressions: _col0 (type: string), _col1 (type: 
double)
+Reduce Output Operator

Review comment:
   But we still have TopNKey operator in the Mapper (both old and new plan) 
it filters out the majority of the rows.
   
   This query has the same issue like the example in the jira: it has gby with 
limit + aggregate function in the project:
   ```
   SELECT src.key, sum(substr(src.value,5)) GROUP BY src.key LIMIT 5
   ``` 
   If no ordering is specified we may end up with incorrect aggregations.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656071)
Time Spent: 50m  (was: 40m)

> Incorrect Result For Groupby With Limit
> ---
>
> Key: HIVE-24579
> URL: https://issues.apache.org/jira/browse/HIVE-24579
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code:sql}
> create table test(id int);
> explain extended select id,count(*) from test group by id limit 10;
> {code}
> There is an TopN unexpectly for map phase, which casues incorrect result.
> {code:sql}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: test
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   GatherStats: false
>   Select Operator
> expressions: id (type: int)
> outputColumnNames: id
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   aggregations: count()
>   keys: id (type: int)
>   mode: hash
>   outputColumnNames: _col0, _col1
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> null sort order: a
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> tag: -1
> TopN: 10
> TopN Hash Memory Usage: 0.1
> value expressions: _col1 (type: bigint)
> auto parallelism: true
> Execution mode: vectorized
> Path -> Alias:
>   file:/user/hive/warehouse/test [test]
> Path -> Partition:
>   file:/user/hive/warehouse/test 
> Partition
>

[jira] [Work logged] (HIVE-25550) Increase the RM_PROGRESS column max length to fit metrics stat

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25550?focusedWorklogId=656067=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656067
 ]

ASF GitHub Bot logged work on HIVE-25550:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 09:30
Start Date: 28/Sep/21 09:30
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2668:
URL: https://github.com/apache/hive/pull/2668#discussion_r717396745



##
File path: metastore/scripts/upgrade/derby/058-HIVE-23516.derby.sql
##
@@ -4,7 +4,7 @@ CREATE TABLE "APP"."REPLICATION_METRICS" (
   "RM_POLICY" varchar(256) NOT NULL,
   "RM_DUMP_EXECUTION_ID" bigint NOT NULL,
   "RM_METADATA" varchar(4000),
-  "RM_PROGRESS" varchar(4000),
+  "RM_PROGRESS" varchar(24000),

Review comment:
   these files are older ones. You can skip updating them. Only the scripts 
inside standalone-metastore should be updated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656067)
Remaining Estimate: 0h
Time Spent: 10m

> Increase the RM_PROGRESS column max length to fit metrics stat
> --
>
> Key: HIVE-25550
> URL: https://issues.apache.org/jira/browse/HIVE-25550
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Presently it fails with the following trace:
> {noformat}
> [[Event Name: EVENT_ALLOC_WRITE_ID; Total Number: 213; Total Time: 85347.0; 
> Mean: 400.6901408450704; Median: 392.0; Standard Deviation: 
> 33.99178239314741; Variance: 1155.4412702630862; Kurtosis: 83.69411620601193; 
> Skewness: 83.69411620601193; 25th Percentile: 384.0; 50th Percentile: 392.0; 
> 75th Percentile: 408.0; 90th Percentile: 417.0; Top 5 EventIds(EventId=Time) 
> {1498476=791, 1498872=533, 1497805=508, 1498808=500, 1499027=492};]]}"}]}" in 
> column ""RM_PROGRESS"" that has maximum length of 4000. Please correct your 
> data!
> at 
> org.datanucleus.store.rdbms.mapping.datastore.CharRDBMSMapping.setString(CharRDBMSMapping.java:254)
>  ~[datanucleus-rdbms-4.1.19.jar:?]
> at 
> org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.setString(SingleFieldMapping.java:180)
>  ~{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25550) Increase the RM_PROGRESS column max length to fit metrics stat

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25550:
--
Labels: pull-request-available  (was: )

> Increase the RM_PROGRESS column max length to fit metrics stat
> --
>
> Key: HIVE-25550
> URL: https://issues.apache.org/jira/browse/HIVE-25550
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Presently it fails with the following trace:
> {noformat}
> [[Event Name: EVENT_ALLOC_WRITE_ID; Total Number: 213; Total Time: 85347.0; 
> Mean: 400.6901408450704; Median: 392.0; Standard Deviation: 
> 33.99178239314741; Variance: 1155.4412702630862; Kurtosis: 83.69411620601193; 
> Skewness: 83.69411620601193; 25th Percentile: 384.0; 50th Percentile: 392.0; 
> 75th Percentile: 408.0; 90th Percentile: 417.0; Top 5 EventIds(EventId=Time) 
> {1498476=791, 1498872=533, 1497805=508, 1498808=500, 1499027=492};]]}"}]}" in 
> column ""RM_PROGRESS"" that has maximum length of 4000. Please correct your 
> data!
> at 
> org.datanucleus.store.rdbms.mapping.datastore.CharRDBMSMapping.setString(CharRDBMSMapping.java:254)
>  ~[datanucleus-rdbms-4.1.19.jar:?]
> at 
> org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.setString(SingleFieldMapping.java:180)
>  ~{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-22098) Data loss occurs when multiple tables are join with different bucket_version

2021-09-28 Thread hengtao tantai (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-22098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421271#comment-17421271
 ] 

hengtao tantai edited comment on HIVE-22098 at 9/28/21, 9:24 AM:
-

hi [~brahmareddy]  i found this issus in non transactional tables


was (Author: zergtant):
hi [~brahmareddy]  i found this issus in non transactional

> Data loss occurs when multiple tables are join with different bucket_version
> 
>
> Key: HIVE-22098
> URL: https://issues.apache.org/jira/browse/HIVE-22098
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 3.1.0, 3.1.2
>Reporter: GuangMing Lu
>Priority: Blocker
>  Labels: data-loss, wrongresults
> Attachments: HIVE-22098.1.patch, image-2019-08-12-18-45-15-771.png, 
> join_test.sql, table_a_data.orc, table_b_data.orc, table_c_data.orc
>
>
> When different bucketVersion of tables do join and no of reducers is greater 
> than 2, the result is incorrect (*data loss*).
>  *Scenario 1*: Three tables join. The temporary result data of table_a in the 
> first table and table_b in the second table joins result is recorded as 
> tmp_a_b, When it joins with the third table, the bucket_version=2 of the 
> table created by default after hive-3.0.0, temporary data tmp_a_b initialized 
> the bucketVerison=-1, and then ReduceSinkOperator Verketison=-1 is joined. In 
> the init method, the hash algorithm of selecting join column is selected 
> according to bucketVersion. If bucketVersion = 2 and is not an acid 
> operation, it will acquired the new algorithm of hash. Otherwise, the old 
> algorithm of hash is acquired. Because of the inconsistency of the algorithm 
> of hash, the partition of data allocation caused are different. At stage of 
> Reducer, Data with the same key can not be paired resulting in data loss.
> *Scenario 2*: create two test tables, create table 
> table_bucketversion_1(col_1 string, col_2 string) TBLPROPERTIES 
> ('bucketing_version'='1'); table_bucketversion_2(col_1 string, col_2 string) 
> TBLPROPERTIES ('bucketing_version'='2');
>  when use table_bucketversion_1 to join table_bucketversion_2, partial result 
> data will be loss due to bucketVerison is different.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-22098) Data loss occurs when multiple tables are join with different bucket_version

2021-09-28 Thread hengtao tantai (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-22098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421271#comment-17421271
 ] 

hengtao tantai commented on HIVE-22098:
---

hi [~brahmareddy]  i found this issus in non transactional

> Data loss occurs when multiple tables are join with different bucket_version
> 
>
> Key: HIVE-22098
> URL: https://issues.apache.org/jira/browse/HIVE-22098
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 3.1.0, 3.1.2
>Reporter: GuangMing Lu
>Priority: Blocker
>  Labels: data-loss, wrongresults
> Attachments: HIVE-22098.1.patch, image-2019-08-12-18-45-15-771.png, 
> join_test.sql, table_a_data.orc, table_b_data.orc, table_c_data.orc
>
>
> When different bucketVersion of tables do join and no of reducers is greater 
> than 2, the result is incorrect (*data loss*).
>  *Scenario 1*: Three tables join. The temporary result data of table_a in the 
> first table and table_b in the second table joins result is recorded as 
> tmp_a_b, When it joins with the third table, the bucket_version=2 of the 
> table created by default after hive-3.0.0, temporary data tmp_a_b initialized 
> the bucketVerison=-1, and then ReduceSinkOperator Verketison=-1 is joined. In 
> the init method, the hash algorithm of selecting join column is selected 
> according to bucketVersion. If bucketVersion = 2 and is not an acid 
> operation, it will acquired the new algorithm of hash. Otherwise, the old 
> algorithm of hash is acquired. Because of the inconsistency of the algorithm 
> of hash, the partition of data allocation caused are different. At stage of 
> Reducer, Data with the same key can not be paired resulting in data loss.
> *Scenario 2*: create two test tables, create table 
> table_bucketversion_1(col_1 string, col_2 string) TBLPROPERTIES 
> ('bucketing_version'='1'); table_bucketversion_2(col_1 string, col_2 string) 
> TBLPROPERTIES ('bucketing_version'='2');
>  when use table_bucketversion_1 to join table_bucketversion_2, partial result 
> data will be loss due to bucketVerison is different.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25559) to_unix_timestamp udf result incorrect

2021-09-28 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421266#comment-17421266
 ] 

Stamatis Zampetakis commented on HIVE-25559:


[~zengxl] there are many reported issues around {{UNIX_TIMESTAMP}} function. 
Have you checked if this is already reported before? I guess this was caused by 
HIVE-20007, HIVE-12192. This problem may also affect master (Hive-4) can you 
verify?

> to_unix_timestamp udf result incorrect
> --
>
> Key: HIVE-25559
> URL: https://issues.apache.org/jira/browse/HIVE-25559
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 3.1.2
>Reporter: zengxl
>Assignee: zengxl
>Priority: Critical
> Attachments: HIVE-25559.1.branch-3.1.2patch
>
>
> when I use *unix_timestamp* udf,What this function actually calls is 
> *to_unix_timestamp* udf.This return result is incorrect.Here is my SQL:
> {code:java}
> //代码占位符
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/local/hive/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/local/hadoop-3.2.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Hive Session ID = 3a04a9cf-1fdb-4017-a4bb-14763a3163c7Logging initialized 
> using configuration in file:/usr/local/hive/conf/hive-log4j2.properties 
> Async: true
> Hive Session ID = 92ca916b-cfde-43b5-bd86-10d50ff7d861
> Hive-on-MR is deprecated in Hive 2 and may not be available in the future 
> versions. Consider using a different execution engine (i.e. spark, tez) or 
> using Hive 1.X releases.
> hive> select unix_timestamp('2021-09-24 00:00:00');
> OK
> 1632441600
> Time taken: 3.729 seconds, Fetched: 1 row(s)
> {code}
> We see GenericUDFToUnixTimeStamp class code,I found that the fixed time zone 
> is set {color:#de350b}UTC{color}, not according to the user time zone.Time 
> zones vary with users，My time zone is {color:#de350b}Asia/Shanghai{color} 
> .Therefore, the function should use the user time zone Here is the code I 
> modified   
> {code:java}
> //代码占位符
> SessionState ss = SessionState.get(); String timeZoneStr = 
> ss.getConf().get("hive.local.time.zone"); if (timeZoneStr == null || 
> timeZoneStr.trim().isEmpty() || timeZoneStr.toLowerCase().equals("local")) { 
> timeZoneStr = System.getProperty("user.timezone"); } 
> formatter.setTimeZone(TimeZone.getTimeZone(timeZoneStr));
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25561) Killed task should not commit file.

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25561?focusedWorklogId=656061=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656061
 ]

ASF GitHub Bot logged work on HIVE-25561:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 09:03
Start Date: 28/Sep/21 09:03
Worklog Time Spent: 10m 
  Work Description: zhengchenyu commented on pull request #2674:
URL: https://github.com/apache/hive/pull/2674#issuecomment-928998346


   @abstractdog  Can you help me review it, or give me some suggestion?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656061)
Time Spent: 20m  (was: 10m)

> Killed task should not commit file.
> ---
>
> Key: HIVE-25561
> URL: https://issues.apache.org/jira/browse/HIVE-25561
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.2.1, 2.3.8, 2.4.0
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For tez engine in our cluster, I found some duplicate line, especially tez 
> speculation is enabled. In partition dir, I found both 02_0 and 02_1 
> exist.
> It's a very low probability event. HIVE-10429 has fix some bug about 
> interrupt, but some exception was not caught.
> In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was 
> called, hdfs client will close. Then will raise exception, but abort may not 
> set to true.
> Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate 
> file will retain. 
> (Notes: Driver first list dir, then Task commit file, then Driver remove 
> duplicate file. It is a inconsistency case)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24579?focusedWorklogId=656060=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656060
 ]

ASF GitHub Bot logged work on HIVE-24579:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 08:59
Start Date: 28/Sep/21 08:59
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #2656:
URL: https://github.com/apache/hive/pull/2656#discussion_r717361733



##
File path: ql/src/test/results/clientpositive/llap/groupby1_limit.q.out
##
@@ -71,33 +71,34 @@ STAGE PLANS:
 mode: mergepartial
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 316 Data size: 30020 Basic stats: 
COMPLETE Column stats: COMPLETE
-Limit
-  Number of rows: 5
-  Statistics: Num rows: 5 Data size: 475 Basic stats: COMPLETE 
Column stats: COMPLETE
-  Reduce Output Operator
-null sort order: 
-sort order: 
-Statistics: Num rows: 5 Data size: 475 Basic stats: 
COMPLETE Column stats: COMPLETE
-TopN Hash Memory Usage: 0.1
-value expressions: _col0 (type: string), _col1 (type: 
double)
+Reduce Output Operator

Review comment:
   we lost the `Limit` operator from here - as a result we will be 
shuffling all input rows.
   I think this could become more costly for larger tables than the old plan.
   
   I don't see TopN hash enabled on the reduce operator - which could possibly 
save the day in this case; why did we loose that as well?

##
File path: ql/src/test/results/clientpositive/llap/limit_pushdown.q.out
##
@@ -1075,6 +1072,13 @@ STAGE PLANS:
   Map-reduce partition columns: _col0 (type: string)
   Statistics: Num rows: 316 Data size: 30020 Basic 
stats: COMPLETE Column stats: COMPLETE
   value expressions: _col1 (type: bigint)
+Execution mode: vectorized, llap
+LLAP IO: all inputs
+Map 3 
+Map Operator Tree:
+TableScan
+  alias: src
+  Statistics: Num rows: 500 Data size: 43500 Basic stats: 
COMPLETE Column stats: COMPLETE
   Top N Key Operator

Review comment:
   in the old plan: did we have 2 TopN key operators in this plan which are 
equal?
   this is unrelated to this patch; but we may have an issue with its 
comparision - and because of that SWO is not able to simplify them




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656060)
Time Spent: 40m  (was: 0.5h)

> Incorrect Result For Groupby With Limit
> ---
>
> Key: HIVE-24579
> URL: https://issues.apache.org/jira/browse/HIVE-24579
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code:sql}
> create table test(id int);
> explain extended select id,count(*) from test group by id limit 10;
> {code}
> There is an TopN unexpectly for map phase, which casues incorrect result.
> {code:sql}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: test
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   GatherStats: false
>   Select Operator
> expressions: id (type: int)
> outputColumnNames: id
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   aggregations: count()
>   keys: id (type: int)
>   mode: hash
>   outputColumnNames: _col0, _col1
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
>

[jira] [Resolved] (HIVE-25558) create two tables and want to make some partitions on the table ,joins,union also

2021-09-28 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-25558.

Resolution: Incomplete

Closing this as incomplete. The summary is not enough to understand the context 
and the description is empty so it is impossible to understand the context.

[~shivanagaraju] If it is a question please send an email to the user@hive list 
with adequate information. If you would like to report a bug or a feature make 
sure the summary is clear and the description has all the necessary details. 

> create two tables and want to make some partitions on the table ,joins,union 
> also 
> --
>
> Key: HIVE-25558
> URL: https://issues.apache.org/jira/browse/HIVE-25558
> Project: Hive
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: 3.1.1
>Reporter: shiva
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25557) Hive 3.1.2 with Tez is slow to clount data in parquet format

2021-09-28 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421247#comment-17421247
 ] 

Stamatis Zampetakis commented on HIVE-25557:


I am not sure I understand if the problem is in Tez, Parquet or the 
combination. Is the COUNT query fast with MR and Parquet? Is the COUNT query 
fast with Tez and other format e.g., ORC? 

Please also include the plans ({{EXPLAIN}}) for the queries you are testing.

> Hive 3.1.2 with Tez is slow to clount data in parquet format
> 
>
> Key: HIVE-25557
> URL: https://issues.apache.org/jira/browse/HIVE-25557
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.1.2
> Environment: Tez *0.10.1*
>Reporter: katty he
>Priority: Major
>
> recently, i use test a sql like seelct count(*) from table in Hive 3.1.2 with 
> Tez, and the table is in parquet format, normally, when counting, the query 
> engin can read metadata instead of reading the full data, but in my case,  
> Tez can not get count by metadata only, it will read the data, so it's slow, 
> when count 2 billion data, tez wil use 500s , and spend 60s to initialized, 
> ts that a problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25538) CommitTxn replay failing during incremental run

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25538?focusedWorklogId=656041=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656041
 ]

ASF GitHub Bot logged work on HIVE-25538:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 08:10
Start Date: 28/Sep/21 08:10
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2655:
URL: https://github.com/apache/hive/pull/2655#discussion_r717331035



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
##
@@ -401,6 +404,18 @@ private boolean isSourceFileMismatch(FileSystem sourceFs, 
ReplChangeManager.File
 return false;
   }
 
+  @VisibleForTesting
+  private void runTestOnlyExecutions() throws IOException {

Review comment:
   Yahh, My first try was to do so, I thought of using PowerMock, But 
MiniDfs has issue with it. Which is part of Hadoop, We can't bother that.
   The most I could pull out is a Callable into the test. So to avoid the 
delete or FS operations here, and in case in future that can be used later as 
well. Let me know if there is any other way out. :-)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656041)
Time Spent: 1h  (was: 50m)

> CommitTxn replay failing during incremental run
> ---
>
> Key: HIVE-25538
> URL: https://issues.apache.org/jira/browse/HIVE-25538
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> CommitTxn Fails during incremental run, in case the source file is deleted 
> post copy & before checksum validation.
> {noformat}
> 2021-09-21T07:53:40,898 ERROR [TThreadPoolServer WorkerProcess-%d] 
> thrift.ProcessFunction: Internal error processing commit_txn
> org.apache.thrift.TException: 
> /warehouse1/replicated_testreplcommittransactiononsourcedelete_1632235978675.db/testreplcommittransactiononsourcedelete/load_date=2016-03-01/delta_002_002_
>  (is not a directory)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolvePath(FSDirectory.java:677)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:151)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1927)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:424)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
>   at 
> org.apache.hadoop.hive.metastore.HMSHandler.commit_txn(HMSHandler.java:8652) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at sun.reflect.GeneratedMethodAccessor118.invoke(Unknown Source) ~[?:?]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_261]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_261]
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at com.sun.proxy.$Proxy55.commit_txn(Unknown Source) ~[?:?]
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$commit_txn.getResult(ThriftHiveMetastore.java:23159)
>

[jira] [Work logged] (HIVE-25538) CommitTxn replay failing during incremental run

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25538?focusedWorklogId=656039=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656039
 ]

ASF GitHub Bot logged work on HIVE-25538:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 08:08
Start Date: 28/Sep/21 08:08
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2655:
URL: https://github.com/apache/hive/pull/2655#discussion_r717329217



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
##
@@ -65,6 +65,8 @@
   private final String copyAsUser;
   private FileSystem destinationFs;
   private final int maxParallelCopyTask;
+  @VisibleForTesting

Review comment:
   Functionality wise, I think NO. It is for the devs most probably.
   
   #Copied ->
   The point of an annotation is that its convention and could be used in 
static code analysis, whereas a simple comment could not.
   
   It serves the same purpose as the normal annotations like LimitedPrivate,The 
InterfaceStability ones

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosIncrementalLoadAcidTables.java
##
@@ -243,6 +248,64 @@ public void testReplCM() throws Throwable {
 Lists.newArrayList(result, result));
   }
 
+  @Test
+  public void testReplCommitTransactionOnSourceDeleteORC() throws Throwable {
+// Run test with ORC format & with transactional true.
+testReplCommitTransactionOnSourceDelete("STORED AS ORC", 
"'transactional'='true'");
+  }
+
+  @Test
+  public void testReplCommitTransactionOnSourceDeleteText() throws Throwable {
+// Run test with TEXT format & with transactional true.

Review comment:
   Yeps, Thanx Corrected




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656039)
Time Spent: 50m  (was: 40m)

> CommitTxn replay failing during incremental run
> ---
>
> Key: HIVE-25538
> URL: https://issues.apache.org/jira/browse/HIVE-25538
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> CommitTxn Fails during incremental run, in case the source file is deleted 
> post copy & before checksum validation.
> {noformat}
> 2021-09-21T07:53:40,898 ERROR [TThreadPoolServer WorkerProcess-%d] 
> thrift.ProcessFunction: Internal error processing commit_txn
> org.apache.thrift.TException: 
> /warehouse1/replicated_testreplcommittransactiononsourcedelete_1632235978675.db/testreplcommittransactiononsourcedelete/load_date=2016-03-01/delta_002_002_
>  (is not a directory)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolvePath(FSDirectory.java:677)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:151)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1927)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:424)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
>   at 
> org.apache.hadoop.hive.metastore.HMSHandler.commit_txn(HMSHandler.java:8652) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at sun.reflect.GeneratedMethodAccessor118.invoke(Unknown Source) ~[?:?]
>   at 
>

[jira] [Work logged] (HIVE-25538) CommitTxn replay failing during incremental run

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25538?focusedWorklogId=656031=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656031
 ]

ASF GitHub Bot logged work on HIVE-25538:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 07:53
Start Date: 28/Sep/21 07:53
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2655:
URL: https://github.com/apache/hive/pull/2655#discussion_r717313439



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosIncrementalLoadAcidTables.java
##
@@ -243,6 +248,64 @@ public void testReplCM() throws Throwable {
 Lists.newArrayList(result, result));
   }
 
+  @Test
+  public void testReplCommitTransactionOnSourceDeleteORC() throws Throwable {
+// Run test with ORC format & with transactional true.
+testReplCommitTransactionOnSourceDelete("STORED AS ORC", 
"'transactional'='true'");
+  }
+
+  @Test
+  public void testReplCommitTransactionOnSourceDeleteText() throws Throwable {
+// Run test with TEXT format & with transactional true.

Review comment:
   false?

##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
##
@@ -401,6 +404,18 @@ private boolean isSourceFileMismatch(FileSystem sourceFs, 
ReplChangeManager.File
 return false;
   }
 
+  @VisibleForTesting
+  private void runTestOnlyExecutions() throws IOException {

Review comment:
   Wondering if this logic can be. moved to test itself

##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
##
@@ -65,6 +65,8 @@
   private final String copyAsUser;
   private FileSystem destinationFs;
   private final int maxParallelCopyTask;
+  @VisibleForTesting

Review comment:
   If the method is public does the annotation VisibleForTesting  have  any 
impact?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656031)
Time Spent: 40m  (was: 0.5h)

> CommitTxn replay failing during incremental run
> ---
>
> Key: HIVE-25538
> URL: https://issues.apache.org/jira/browse/HIVE-25538
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> CommitTxn Fails during incremental run, in case the source file is deleted 
> post copy & before checksum validation.
> {noformat}
> 2021-09-21T07:53:40,898 ERROR [TThreadPoolServer WorkerProcess-%d] 
> thrift.ProcessFunction: Internal error processing commit_txn
> org.apache.thrift.TException: 
> /warehouse1/replicated_testreplcommittransactiononsourcedelete_1632235978675.db/testreplcommittransactiononsourcedelete/load_date=2016-03-01/delta_002_002_
>  (is not a directory)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolvePath(FSDirectory.java:677)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:151)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1927)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:424)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
>   at 
> org.apache.hadoop.hive.metastore.HMSHandler.commit_txn(HMSHandler.java:8652) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at

[jira] [Work logged] (HIVE-25538) CommitTxn replay failing during incremental run

2021-09-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25538?focusedWorklogId=656027=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-656027
 ]

ASF GitHub Bot logged work on HIVE-25538:
-

Author: ASF GitHub Bot
Created on: 28/Sep/21 07:47
Start Date: 28/Sep/21 07:47
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2655:
URL: https://github.com/apache/hive/pull/2655#discussion_r716418809



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosIncrementalLoadAcidTables.java
##
@@ -243,6 +248,53 @@ public void testReplCM() throws Throwable {
 Lists.newArrayList(result, result));
   }
 
+  @Test
+  public void testReplCommitTransactionOnSourceDelete() throws Throwable {
+String tableName = "testReplCommitTransactionOnSourceDelete";
+String[] result = new String[] { "5" };
+
+// Do a bootstrap dump.
+WarehouseInstance.Tuple bootStrapDump = primary.dump(primaryDbName);
+replica.load(replicatedDbName, primaryDbName).run("REPL STATUS " + 
replicatedDbName)
+.verifyResult(bootStrapDump.lastReplicationId);
+
+// Add some data to the table & do a incremental dump.
+ReplicationTestUtils.insertRecords(primary, primaryDbName, 
primaryDbNameExtra, tableName, null, false,
+ReplicationTestUtils.OperationType.REPL_TEST_ACID_INSERT);
+WarehouseInstance.Tuple incrementalDump = primary.dump(primaryDbName);

Review comment:
   Can you please add the tables with following property:
   - ORC Format (I think covered)
   - bucketed 
   - text input format
   All these tables should have a drop table use case like you are targeting 
now?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 656027)
Time Spent: 0.5h  (was: 20m)

> CommitTxn replay failing during incremental run
> ---
>
> Key: HIVE-25538
> URL: https://issues.apache.org/jira/browse/HIVE-25538
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> CommitTxn Fails during incremental run, in case the source file is deleted 
> post copy & before checksum validation.
> {noformat}
> 2021-09-21T07:53:40,898 ERROR [TThreadPoolServer WorkerProcess-%d] 
> thrift.ProcessFunction: Internal error processing commit_txn
> org.apache.thrift.TException: 
> /warehouse1/replicated_testreplcommittransactiononsourcedelete_1632235978675.db/testreplcommittransactiononsourcedelete/load_date=2016-03-01/delta_002_002_
>  (is not a directory)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolvePath(FSDirectory.java:677)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:151)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1927)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:424)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
>   at 
> org.apache.hadoop.hive.metastore.HMSHandler.commit_txn(HMSHandler.java:8652) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at sun.reflect.GeneratedMethodAccessor118.invoke(Unknown Source) ~[?:?]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_261]
>   at

[jira] [Updated] (HIVE-25565) Materialized view Rebuild issue Aws EMR

2021-09-28 Thread Vipin (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vipin updated HIVE-25565:
-
Description: 
We have Materialized views built on top of Hudi tables which are hive-sync'd.
 Hive uses AWS Glue for its metastore catalog. 

We are running into issue whenever we are trying to "**rebuild**" Hive 
materialized views.

Please note, creation of materialized views works fine.  It's only rebuild 
which is failing.

However, it does seem the rebuild actually seems to work behind the scenes but 
its throws some exception causing EMR steps to fail. 
 Can anyone please guide us here, about any config changes that we need to do 
or anything. Any help will be great.  

 

The stack trace of the exception - 
{quote} FAILED: Hive Internal Error: 
org.apache.hadoop.hive.ql.metadata.HiveException(Error while invoking 
FailureHook. hooks: java.lang.NullPointerException at 
org.apache.hadoop.hive.ql.reexec.ReExecutionOverlayPlugin$LocalHook.run(ReExecutionOverlayPlugin.java:45)
 at org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296) 
at org.apache.hadoop.hive.ql.HookRunner.runFailureHooks(HookRunner.java:283) at 
org.apache.hadoop.hive.ql.Driver.invokeFailureHooks(Driver.java:2616) at 
org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2386) at 
org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011) at 
org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709) at 
org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703) at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:224)
 at 
org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
 at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:316)
 at java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
 at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:330)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)) > 
org.apache.hadoop.hive.ql.metadata.HiveException: Error while invoking 
FailureHook. hooks:  > java.lang.NullPointerException > at 
org.apache.hadoop.hive.ql.reexec.ReExecutionOverlayPlugin$LocalHook.run(ReExecutionOverlayPlugin.java:45)>
 at 
org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296)> at 
org.apache.hadoop.hive.ql.HookRunner.runFailureHooks(HookRunner.java:283)> at 
org.apache.hadoop.hive.ql.Driver.invokeFailureHooks(Driver.java:2616)> at 
org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2386)> at 
org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)> at 
org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)> at 
org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)> at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)> at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:224)>
 at 
org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)>
 at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:316)>
 at java.security.AccessController.doPrivileged(Native Method)> at 
javax.security.auth.Subject.doAs(Subject.java:422)> at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)>
 at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:330)>
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)> at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)> at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)>
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)>
 at java.lang.Thread.run(Thread.java:748)> > at 
org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:302)> at 
org.apache.hadoop.hive.ql.HookRunner.runFailureHooks(HookRunner.java:283)> at 
org.apache.hadoop.hive.ql.Driver.invokeFailureHooks(Driver.java:2616)> at 
org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2386)> at 
org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)> at 
org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)> at 
org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)> at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)> at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:224)>
 at

66 matches

Mail list logo