[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=627046=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-627046
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 23/Jul/21 07:52
Start Date: 23/Jul/21 07:52
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #2510:
URL: https://github.com/apache/hive/pull/2510


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 627046)
Time Spent: 5h 50m  (was: 5h 40m)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=625784=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-625784
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 20/Jul/21 20:11
Start Date: 20/Jul/21 20:11
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #2505:
URL: https://github.com/apache/hive/pull/2505#issuecomment-883663529


   Thanks @kuczoram for noticing and taking care of this!!! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 625784)
Time Spent: 5h 40m  (was: 5.5h)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=625783=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-625783
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 20/Jul/21 20:10
Start Date: 20/Jul/21 20:10
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #2505:
URL: https://github.com/apache/hive/pull/2505


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 625783)
Time Spent: 5.5h  (was: 5h 20m)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=625780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-625780
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 20/Jul/21 20:06
Start Date: 20/Jul/21 20:06
Worklog Time Spent: 10m 
  Work Description: kuczoram opened a new pull request #2505:
URL: https://github.com/apache/hive/pull/2505


   Reverts apache/hive#2419
   Unfortunately this commit broke the build because of some stylecheck issue:
   
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/771/pipeline/
   Most probably caused some additional test failures as well.
   Revert this commit until finding out what caused this issue exactly as the 
PR for this commit had green runs before.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 625780)
Time Spent: 5h 20m  (was: 5h 10m)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=625609=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-625609
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 20/Jul/21 13:52
Start Date: 20/Jul/21 13:52
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #2419:
URL: https://github.com/apache/hive/pull/2419


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 625609)
Time Spent: 5h 10m  (was: 5h)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=625608=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-625608
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 20/Jul/21 13:49
Start Date: 20/Jul/21 13:49
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r673140199



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -233,15 +237,21 @@ public void 
preAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable, E
   preAlterTableProperties.tableLocation = sd.getLocation();
   preAlterTableProperties.format = sd.getInputFormat();
   preAlterTableProperties.schema = schema(catalogProperties, hmsTable);
-  preAlterTableProperties.spec = spec(conf, 
preAlterTableProperties.schema, catalogProperties, hmsTable);
   preAlterTableProperties.partitionKeys = hmsTable.getPartitionKeys();
 
   context.getProperties().put(HiveMetaHook.ALLOW_PARTITION_KEY_CHANGE, 
"true");
   // If there are partition keys specified remove them from the HMS table 
and add them to the column list
-  if (hmsTable.isSetPartitionKeys()) {
+  if (hmsTable.isSetPartitionKeys() && 
!hmsTable.getPartitionKeys().isEmpty()) {
+List spec = 
PartitionTransform.getPartitionTransformSpec(hmsTable.getPartitionKeys());
+if (!SessionStateUtil.addResource(conf, 
hive_metastoreConstants.PARTITION_TRANSFORM_SPEC, spec)) {
+  throw new MetaException("Query state attached to Session state must 
be not null. " +
+  "Partition transform metadata cannot be saved.");
+}
 hmsTable.getSd().getCols().addAll(hmsTable.getPartitionKeys());
 hmsTable.setPartitionKeysIsSet(false);
   }
+  preAlterTableProperties.spec = spec(conf, 
preAlterTableProperties.schema, hmsTable);

Review comment:
   Right, thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 625608)
Time Spent: 5h  (was: 4h 50m)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=625511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-625511
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 20/Jul/21 12:20
Start Date: 20/Jul/21 12:20
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r672435549



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -233,15 +237,21 @@ public void 
preAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable, E
   preAlterTableProperties.tableLocation = sd.getLocation();
   preAlterTableProperties.format = sd.getInputFormat();
   preAlterTableProperties.schema = schema(catalogProperties, hmsTable);
-  preAlterTableProperties.spec = spec(conf, 
preAlterTableProperties.schema, catalogProperties, hmsTable);
   preAlterTableProperties.partitionKeys = hmsTable.getPartitionKeys();
 
   context.getProperties().put(HiveMetaHook.ALLOW_PARTITION_KEY_CHANGE, 
"true");
   // If there are partition keys specified remove them from the HMS table 
and add them to the column list
-  if (hmsTable.isSetPartitionKeys()) {
+  if (hmsTable.isSetPartitionKeys() && 
!hmsTable.getPartitionKeys().isEmpty()) {
+List spec = 
PartitionTransform.getPartitionTransformSpec(hmsTable.getPartitionKeys());
+if (!SessionStateUtil.addResource(conf, 
hive_metastoreConstants.PARTITION_TRANSFORM_SPEC, spec)) {
+  throw new MetaException("Query state attached to Session state must 
be not null. " +
+  "Partition transform metadata cannot be saved.");
+}
 hmsTable.getSd().getCols().addAll(hmsTable.getPartitionKeys());
 hmsTable.setPartitionKeysIsSet(false);
   }
+  preAlterTableProperties.spec = spec(conf, 
preAlterTableProperties.schema, hmsTable);

Review comment:
   This is moved from line 236. We need it to be set, but we have to do it 
after we got the correct spec

##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -233,15 +237,21 @@ public void 
preAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable, E
   preAlterTableProperties.tableLocation = sd.getLocation();
   preAlterTableProperties.format = sd.getInputFormat();
   preAlterTableProperties.schema = schema(catalogProperties, hmsTable);
-  preAlterTableProperties.spec = spec(conf, 
preAlterTableProperties.schema, catalogProperties, hmsTable);
   preAlterTableProperties.partitionKeys = hmsTable.getPartitionKeys();
 
   context.getProperties().put(HiveMetaHook.ALLOW_PARTITION_KEY_CHANGE, 
"true");
   // If there are partition keys specified remove them from the HMS table 
and add them to the column list
-  if (hmsTable.isSetPartitionKeys()) {
+  if (hmsTable.isSetPartitionKeys() && 
!hmsTable.getPartitionKeys().isEmpty()) {
+List spec = 
PartitionTransform.getPartitionTransformSpec(hmsTable.getPartitionKeys());
+if (!SessionStateUtil.addResource(conf, 
hive_metastoreConstants.PARTITION_TRANSFORM_SPEC, spec)) {

Review comment:
   This is for migrating tables from non-Iceberg tables to Iceberg tables. 
Previously we just depended on the partition cols, from now on we need to have 
the data in the `SessionState` instead. So we put that there

##
File path: 
iceberg/iceberg-handler/src/test/results/positive/vectorized_iceberg_read.q.out
##
@@ -129,17 +129,17 @@ Stage-0
 Stage-1
   Reducer 2 vectorized
   File Output Operator [FS_11]
-Select Operator [SEL_10] (rows=1 width=564)
+Select Operator [SEL_10] (rows=1 width=372)

Review comment:
   TBH I am not sure, but I expect that has something to do with the new 
statistics

##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezOutputCommitter.java
##
@@ -122,6 +122,7 @@ private IDriver getDriverWithCommitter(String 
committerClass) {
 conf.setVar(HiveConf.ConfVars.HIVE_AUTHORIZATION_MANAGER,
 
"org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory");
 conf.setBoolVar(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY, false);
+conf.setBoolVar(HiveConf.ConfVars.HIVESTATSCOLAUTOGATHER, false);

Review comment:
   Otherwise the tests are failing, because with stats turned on we 
generate 2 tasks instead of 1 (change of the execution plans which contain a 
stage)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, 

[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=625465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-625465
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 20/Jul/21 12:13
Start Date: 20/Jul/21 12:13
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r672325104



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezOutputCommitter.java
##
@@ -122,6 +122,7 @@ private IDriver getDriverWithCommitter(String 
committerClass) {
 conf.setVar(HiveConf.ConfVars.HIVE_AUTHORIZATION_MANAGER,
 
"org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory");
 conf.setBoolVar(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY, false);
+conf.setBoolVar(HiveConf.ConfVars.HIVESTATSCOLAUTOGATHER, false);

Review comment:
   Why is this required here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 625465)
Time Spent: 4h 40m  (was: 4.5h)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=625136=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-625136
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 20/Jul/21 10:10
Start Date: 20/Jul/21 10:10
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r672164500



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -233,15 +237,21 @@ public void 
preAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable, E
   preAlterTableProperties.tableLocation = sd.getLocation();
   preAlterTableProperties.format = sd.getInputFormat();
   preAlterTableProperties.schema = schema(catalogProperties, hmsTable);
-  preAlterTableProperties.spec = spec(conf, 
preAlterTableProperties.schema, catalogProperties, hmsTable);
   preAlterTableProperties.partitionKeys = hmsTable.getPartitionKeys();
 
   context.getProperties().put(HiveMetaHook.ALLOW_PARTITION_KEY_CHANGE, 
"true");
   // If there are partition keys specified remove them from the HMS table 
and add them to the column list
-  if (hmsTable.isSetPartitionKeys()) {
+  if (hmsTable.isSetPartitionKeys() && 
!hmsTable.getPartitionKeys().isEmpty()) {
+List spec = 
PartitionTransform.getPartitionTransformSpec(hmsTable.getPartitionKeys());
+if (!SessionStateUtil.addResource(conf, 
hive_metastoreConstants.PARTITION_TRANSFORM_SPEC, spec)) {
+  throw new MetaException("Query state attached to Session state must 
be not null. " +
+  "Partition transform metadata cannot be saved.");
+}
 hmsTable.getSd().getCols().addAll(hmsTable.getPartitionKeys());
 hmsTable.setPartitionKeysIsSet(false);
   }
+  preAlterTableProperties.spec = spec(conf, 
preAlterTableProperties.schema, hmsTable);

Review comment:
   Is this needed here? Or if we use it only for the validation inside 
`spec()`, then maybe remove the local variable, or better yet, extract the 
validation logic into a different method we can call here?

##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -233,15 +237,21 @@ public void 
preAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable, E
   preAlterTableProperties.tableLocation = sd.getLocation();
   preAlterTableProperties.format = sd.getInputFormat();
   preAlterTableProperties.schema = schema(catalogProperties, hmsTable);
-  preAlterTableProperties.spec = spec(conf, 
preAlterTableProperties.schema, catalogProperties, hmsTable);
   preAlterTableProperties.partitionKeys = hmsTable.getPartitionKeys();
 
   context.getProperties().put(HiveMetaHook.ALLOW_PARTITION_KEY_CHANGE, 
"true");
   // If there are partition keys specified remove them from the HMS table 
and add them to the column list
-  if (hmsTable.isSetPartitionKeys()) {
+  if (hmsTable.isSetPartitionKeys() && 
!hmsTable.getPartitionKeys().isEmpty()) {
+List spec = 
PartitionTransform.getPartitionTransformSpec(hmsTable.getPartitionKeys());
+if (!SessionStateUtil.addResource(conf, 
hive_metastoreConstants.PARTITION_TRANSFORM_SPEC, spec)) {

Review comment:
   Where was this part before that we saved it into the session conf?

##
File path: 
iceberg/iceberg-handler/src/test/results/positive/vectorized_iceberg_read.q.out
##
@@ -129,17 +129,17 @@ Stage-0
 Stage-1
   Reducer 2 vectorized
   File Output Operator [FS_11]
-Select Operator [SEL_10] (rows=1 width=564)
+Select Operator [SEL_10] (rows=1 width=372)

Review comment:
   Out of curiosity: do you know why the width has descreased so much?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 625136)
Time Spent: 4.5h  (was: 4h 20m)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This 

[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=625124=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-625124
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 20/Jul/21 10:09
Start Date: 20/Jul/21 10:09
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r672435549



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -233,15 +237,21 @@ public void 
preAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable, E
   preAlterTableProperties.tableLocation = sd.getLocation();
   preAlterTableProperties.format = sd.getInputFormat();
   preAlterTableProperties.schema = schema(catalogProperties, hmsTable);
-  preAlterTableProperties.spec = spec(conf, 
preAlterTableProperties.schema, catalogProperties, hmsTable);
   preAlterTableProperties.partitionKeys = hmsTable.getPartitionKeys();
 
   context.getProperties().put(HiveMetaHook.ALLOW_PARTITION_KEY_CHANGE, 
"true");
   // If there are partition keys specified remove them from the HMS table 
and add them to the column list
-  if (hmsTable.isSetPartitionKeys()) {
+  if (hmsTable.isSetPartitionKeys() && 
!hmsTable.getPartitionKeys().isEmpty()) {
+List spec = 
PartitionTransform.getPartitionTransformSpec(hmsTable.getPartitionKeys());
+if (!SessionStateUtil.addResource(conf, 
hive_metastoreConstants.PARTITION_TRANSFORM_SPEC, spec)) {
+  throw new MetaException("Query state attached to Session state must 
be not null. " +
+  "Partition transform metadata cannot be saved.");
+}
 hmsTable.getSd().getCols().addAll(hmsTable.getPartitionKeys());
 hmsTable.setPartitionKeysIsSet(false);
   }
+  preAlterTableProperties.spec = spec(conf, 
preAlterTableProperties.schema, hmsTable);

Review comment:
   This is moved from line 236. We need it to be set, but we have to do it 
after we got the correct spec

##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -233,15 +237,21 @@ public void 
preAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable, E
   preAlterTableProperties.tableLocation = sd.getLocation();
   preAlterTableProperties.format = sd.getInputFormat();
   preAlterTableProperties.schema = schema(catalogProperties, hmsTable);
-  preAlterTableProperties.spec = spec(conf, 
preAlterTableProperties.schema, catalogProperties, hmsTable);
   preAlterTableProperties.partitionKeys = hmsTable.getPartitionKeys();
 
   context.getProperties().put(HiveMetaHook.ALLOW_PARTITION_KEY_CHANGE, 
"true");
   // If there are partition keys specified remove them from the HMS table 
and add them to the column list
-  if (hmsTable.isSetPartitionKeys()) {
+  if (hmsTable.isSetPartitionKeys() && 
!hmsTable.getPartitionKeys().isEmpty()) {
+List spec = 
PartitionTransform.getPartitionTransformSpec(hmsTable.getPartitionKeys());
+if (!SessionStateUtil.addResource(conf, 
hive_metastoreConstants.PARTITION_TRANSFORM_SPEC, spec)) {

Review comment:
   This is for migrating tables from non-Iceberg tables to Iceberg tables. 
Previously we just depended on the partition cols, from now on we need to have 
the data in the `SessionState` instead. So we put that there

##
File path: 
iceberg/iceberg-handler/src/test/results/positive/vectorized_iceberg_read.q.out
##
@@ -129,17 +129,17 @@ Stage-0
 Stage-1
   Reducer 2 vectorized
   File Output Operator [FS_11]
-Select Operator [SEL_10] (rows=1 width=564)
+Select Operator [SEL_10] (rows=1 width=372)

Review comment:
   TBH I am not sure, but I expect that has something to do with the new 
statistics

##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezOutputCommitter.java
##
@@ -122,6 +122,7 @@ private IDriver getDriverWithCommitter(String 
committerClass) {
 conf.setVar(HiveConf.ConfVars.HIVE_AUTHORIZATION_MANAGER,
 
"org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory");
 conf.setBoolVar(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY, false);
+conf.setBoolVar(HiveConf.ConfVars.HIVESTATSCOLAUTOGATHER, false);

Review comment:
   Otherwise the tests are failing, because with stats turned on we 
generate 2 tasks instead of 1 (change of the execution plans which contain a 
stage)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, 

[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=625098=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-625098
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 20/Jul/21 10:05
Start Date: 20/Jul/21 10:05
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r672325104



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezOutputCommitter.java
##
@@ -122,6 +122,7 @@ private IDriver getDriverWithCommitter(String 
committerClass) {
 conf.setVar(HiveConf.ConfVars.HIVE_AUTHORIZATION_MANAGER,
 
"org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory");
 conf.setBoolVar(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY, false);
+conf.setBoolVar(HiveConf.ConfVars.HIVESTATSCOLAUTOGATHER, false);

Review comment:
   Why is this required here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 625098)
Time Spent: 4h 10m  (was: 4h)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-07-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=624468=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-624468
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 19/Jul/21 16:14
Start Date: 19/Jul/21 16:14
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r672441145



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezOutputCommitter.java
##
@@ -122,6 +122,7 @@ private IDriver getDriverWithCommitter(String 
committerClass) {
 conf.setVar(HiveConf.ConfVars.HIVE_AUTHORIZATION_MANAGER,
 
"org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory");
 conf.setBoolVar(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY, false);
+conf.setBoolVar(HiveConf.ConfVars.HIVESTATSCOLAUTOGATHER, false);

Review comment:
   Otherwise the tests are failing, because with stats turned on we 
generate 2 tasks instead of 1 (change of the execution plans which contain a 
stage)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 624468)
Time Spent: 4h  (was: 3h 50m)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-07-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=624466=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-624466
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 19/Jul/21 16:12
Start Date: 19/Jul/21 16:12
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r672439783



##
File path: 
iceberg/iceberg-handler/src/test/results/positive/vectorized_iceberg_read.q.out
##
@@ -129,17 +129,17 @@ Stage-0
 Stage-1
   Reducer 2 vectorized
   File Output Operator [FS_11]
-Select Operator [SEL_10] (rows=1 width=564)
+Select Operator [SEL_10] (rows=1 width=372)

Review comment:
   TBH I am not sure, but I expect that has something to do with the new 
statistics




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 624466)
Time Spent: 3h 50m  (was: 3h 40m)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-07-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=624465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-624465
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 19/Jul/21 16:12
Start Date: 19/Jul/21 16:12
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r672439401



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -233,15 +237,21 @@ public void 
preAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable, E
   preAlterTableProperties.tableLocation = sd.getLocation();
   preAlterTableProperties.format = sd.getInputFormat();
   preAlterTableProperties.schema = schema(catalogProperties, hmsTable);
-  preAlterTableProperties.spec = spec(conf, 
preAlterTableProperties.schema, catalogProperties, hmsTable);
   preAlterTableProperties.partitionKeys = hmsTable.getPartitionKeys();
 
   context.getProperties().put(HiveMetaHook.ALLOW_PARTITION_KEY_CHANGE, 
"true");
   // If there are partition keys specified remove them from the HMS table 
and add them to the column list
-  if (hmsTable.isSetPartitionKeys()) {
+  if (hmsTable.isSetPartitionKeys() && 
!hmsTable.getPartitionKeys().isEmpty()) {
+List spec = 
PartitionTransform.getPartitionTransformSpec(hmsTable.getPartitionKeys());
+if (!SessionStateUtil.addResource(conf, 
hive_metastoreConstants.PARTITION_TRANSFORM_SPEC, spec)) {

Review comment:
   This is for migrating tables from non-Iceberg tables to Iceberg tables. 
Previously we just depended on the partition cols, from now on we need to have 
the data in the `SessionState` instead. So we put that there




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 624465)
Time Spent: 3h 40m  (was: 3.5h)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-07-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=624463=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-624463
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 19/Jul/21 16:07
Start Date: 19/Jul/21 16:07
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r672435549



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -233,15 +237,21 @@ public void 
preAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable, E
   preAlterTableProperties.tableLocation = sd.getLocation();
   preAlterTableProperties.format = sd.getInputFormat();
   preAlterTableProperties.schema = schema(catalogProperties, hmsTable);
-  preAlterTableProperties.spec = spec(conf, 
preAlterTableProperties.schema, catalogProperties, hmsTable);
   preAlterTableProperties.partitionKeys = hmsTable.getPartitionKeys();
 
   context.getProperties().put(HiveMetaHook.ALLOW_PARTITION_KEY_CHANGE, 
"true");
   // If there are partition keys specified remove them from the HMS table 
and add them to the column list
-  if (hmsTable.isSetPartitionKeys()) {
+  if (hmsTable.isSetPartitionKeys() && 
!hmsTable.getPartitionKeys().isEmpty()) {
+List spec = 
PartitionTransform.getPartitionTransformSpec(hmsTable.getPartitionKeys());
+if (!SessionStateUtil.addResource(conf, 
hive_metastoreConstants.PARTITION_TRANSFORM_SPEC, spec)) {
+  throw new MetaException("Query state attached to Session state must 
be not null. " +
+  "Partition transform metadata cannot be saved.");
+}
 hmsTable.getSd().getCols().addAll(hmsTable.getPartitionKeys());
 hmsTable.setPartitionKeysIsSet(false);
   }
+  preAlterTableProperties.spec = spec(conf, 
preAlterTableProperties.schema, hmsTable);

Review comment:
   This is moved from line 236. We need it to be set, but we have to do it 
after we got the correct spec




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 624463)
Time Spent: 3.5h  (was: 3h 20m)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-07-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=624362=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-624362
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 19/Jul/21 14:01
Start Date: 19/Jul/21 14:01
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r672325104



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezOutputCommitter.java
##
@@ -122,6 +122,7 @@ private IDriver getDriverWithCommitter(String 
committerClass) {
 conf.setVar(HiveConf.ConfVars.HIVE_AUTHORIZATION_MANAGER,
 
"org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory");
 conf.setBoolVar(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY, false);
+conf.setBoolVar(HiveConf.ConfVars.HIVESTATSCOLAUTOGATHER, false);

Review comment:
   Why is this required here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 624362)
Time Spent: 3h 20m  (was: 3h 10m)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-07-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=624261=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-624261
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 19/Jul/21 10:30
Start Date: 19/Jul/21 10:30
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r672180879



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -233,15 +237,21 @@ public void 
preAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable, E
   preAlterTableProperties.tableLocation = sd.getLocation();
   preAlterTableProperties.format = sd.getInputFormat();
   preAlterTableProperties.schema = schema(catalogProperties, hmsTable);
-  preAlterTableProperties.spec = spec(conf, 
preAlterTableProperties.schema, catalogProperties, hmsTable);
   preAlterTableProperties.partitionKeys = hmsTable.getPartitionKeys();
 
   context.getProperties().put(HiveMetaHook.ALLOW_PARTITION_KEY_CHANGE, 
"true");
   // If there are partition keys specified remove them from the HMS table 
and add them to the column list
-  if (hmsTable.isSetPartitionKeys()) {
+  if (hmsTable.isSetPartitionKeys() && 
!hmsTable.getPartitionKeys().isEmpty()) {
+List spec = 
PartitionTransform.getPartitionTransformSpec(hmsTable.getPartitionKeys());
+if (!SessionStateUtil.addResource(conf, 
hive_metastoreConstants.PARTITION_TRANSFORM_SPEC, spec)) {

Review comment:
   Where was this part before that we saved it into the session conf?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 624261)
Time Spent: 3h  (was: 2h 50m)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-07-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=624263=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-624263
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 19/Jul/21 10:31
Start Date: 19/Jul/21 10:31
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r672181449



##
File path: 
iceberg/iceberg-handler/src/test/results/positive/vectorized_iceberg_read.q.out
##
@@ -129,17 +129,17 @@ Stage-0
 Stage-1
   Reducer 2 vectorized
   File Output Operator [FS_11]
-Select Operator [SEL_10] (rows=1 width=564)
+Select Operator [SEL_10] (rows=1 width=372)

Review comment:
   Out of curiosity: do you know why the width has descreased so much?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 624263)
Time Spent: 3h 10m  (was: 3h)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-07-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=624257=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-624257
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 19/Jul/21 10:05
Start Date: 19/Jul/21 10:05
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r672164500



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -233,15 +237,21 @@ public void 
preAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable, E
   preAlterTableProperties.tableLocation = sd.getLocation();
   preAlterTableProperties.format = sd.getInputFormat();
   preAlterTableProperties.schema = schema(catalogProperties, hmsTable);
-  preAlterTableProperties.spec = spec(conf, 
preAlterTableProperties.schema, catalogProperties, hmsTable);
   preAlterTableProperties.partitionKeys = hmsTable.getPartitionKeys();
 
   context.getProperties().put(HiveMetaHook.ALLOW_PARTITION_KEY_CHANGE, 
"true");
   // If there are partition keys specified remove them from the HMS table 
and add them to the column list
-  if (hmsTable.isSetPartitionKeys()) {
+  if (hmsTable.isSetPartitionKeys() && 
!hmsTable.getPartitionKeys().isEmpty()) {
+List spec = 
PartitionTransform.getPartitionTransformSpec(hmsTable.getPartitionKeys());
+if (!SessionStateUtil.addResource(conf, 
hive_metastoreConstants.PARTITION_TRANSFORM_SPEC, spec)) {
+  throw new MetaException("Query state attached to Session state must 
be not null. " +
+  "Partition transform metadata cannot be saved.");
+}
 hmsTable.getSd().getCols().addAll(hmsTable.getPartitionKeys());
 hmsTable.setPartitionKeysIsSet(false);
   }
+  preAlterTableProperties.spec = spec(conf, 
preAlterTableProperties.schema, hmsTable);

Review comment:
   Is this needed here? Or if we use it only for the validation inside 
`spec()`, then maybe remove the local variable, or better yet, extract the 
validation logic into a different method we can call here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 624257)
Time Spent: 2h 50m  (was: 2h 40m)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-06-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=615148=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-615148
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 25/Jun/21 18:53
Start Date: 25/Jun/21 18:53
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r658973349



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1314,6 +1314,104 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testStatWithInsert() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+testTables.createTable(shell, identifier.name(), 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+PartitionSpec.unpartitioned(), fileFormat, ImmutableList.of());
+
+if (testTableType != TestTables.TestTableType.HIVE_CATALOG) {
+  // If the location is set and we have to gather stats, then we have to 
update the table stats now
+  shell.executeStatement("ANALYZE TABLE " + identifier + " COMPUTE 
STATISTICS FOR COLUMNS");
+}
+
+String insert = 
testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
identifier, false);
+shell.executeStatement(insert);
+
+checkColStat(identifier.name(), "customer_id");
+  }
+
+  @Test
+  public void testStatWithInsertOverwrite() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+testTables.createTable(shell, identifier.name(), 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+PartitionSpec.unpartitioned(), fileFormat, ImmutableList.of());
+
+String insert = 
testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
identifier, true);
+shell.executeStatement(insert);
+
+checkColStat(identifier.name(), "customer_id");
+  }
+
+  @Test
+  public void testStatWithPartitionedInsert() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+PartitionSpec spec = 
PartitionSpec.builderFor(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA)
+.identity("last_name").build();
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+testTables.createTable(shell, identifier.name(), 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, spec,
+fileFormat, ImmutableList.of());
+
+if (testTableType != TestTables.TestTableType.HIVE_CATALOG) {
+  // If the location is set and we have to gather stats, then we have to 
update the table stats now
+  shell.executeStatement("ANALYZE TABLE " + identifier + " COMPUTE 
STATISTICS FOR COLUMNS");
+}
+
+String insert = 
testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
identifier, false);
+shell.executeStatement(insert);
+
+checkColStat("customers", "customer_id");
+checkColStat("customers", "first_name");
+  }
+
+  @Test
+  public void testStatWithCTAS() {
+Assume.assumeTrue(HiveIcebergSerDe.CTAS_EXCEPTION_MSG, testTableType == 
TestTables.TestTableType.HIVE_CATALOG);
+
+shell.executeStatement("CREATE TABLE source (id bigint, name string) 
PARTITIONED BY (dept string) STORED AS ORC");
+shell.executeStatement(testTables.getInsertQuery(
+HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
TableIdentifier.of("default", "source"), false));
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+shell.executeStatement(String.format(
+"CREATE TABLE target STORED BY ICEBERG %s TBLPROPERTIES ('%s'='%s') AS 
SELECT * FROM source",
+testTables.locationForCreateTableSQL(TableIdentifier.of("default", 
"target")),
+TableProperties.DEFAULT_FILE_FORMAT, fileFormat));
+
+checkColStat("target", "id");
+  }
+
+  @Test
+  public void testStatWithPartitionedCTAS() {
+Assume.assumeTrue(HiveIcebergSerDe.CTAS_EXCEPTION_MSG, testTableType == 
TestTables.TestTableType.HIVE_CATALOG);
+
+shell.executeStatement("CREATE TABLE source (id bigint, name string) 
PARTITIONED BY (dept string) STORED AS ORC");
+shell.executeStatement(testTables.getInsertQuery(
+HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
TableIdentifier.of("default", "source"), false));
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond 

[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-06-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=615147=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-615147
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 25/Jun/21 18:52
Start Date: 25/Jun/21 18:52
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r658972546



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CreateTableDesc.java
##
@@ -804,9 +801,18 @@ public Table toTable(HiveConf conf) throws HiveException {
   }
 }
 
-if (getCols() != null) {
-  tbl.setFields(getCols());
+Optional> cols = Optional.of(getCols());
+Optional> partCols = Optional.of(getPartCols());
+
+if (storageHandler !=null && storageHandler.alwaysUnpartitioned()) {

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 615147)
Time Spent: 2.5h  (was: 2h 20m)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-06-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=615146=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-615146
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 25/Jun/21 18:51
Start Date: 25/Jun/21 18:51
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r658972416



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CreateTableDesc.java
##
@@ -804,9 +801,18 @@ public Table toTable(HiveConf conf) throws HiveException {
   }
 }
 
-if (getCols() != null) {
-  tbl.setFields(getCols());
+Optional> cols = Optional.of(getCols());

Review comment:
   Thanks!
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 615146)
Time Spent: 2h 20m  (was: 2h 10m)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-06-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=615035=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-615035
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 25/Jun/21 14:35
Start Date: 25/Jun/21 14:35
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r658813552



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CreateTableDesc.java
##
@@ -804,9 +801,18 @@ public Table toTable(HiveConf conf) throws HiveException {
   }
 }
 
-if (getCols() != null) {
-  tbl.setFields(getCols());
+Optional> cols = Optional.of(getCols());
+Optional> partCols = Optional.of(getPartCols());
+
+if (storageHandler !=null && storageHandler.alwaysUnpartitioned()) {

Review comment:
   nit: `!= null`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 615035)
Time Spent: 2h 10m  (was: 2h)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-06-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=615032=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-615032
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 25/Jun/21 14:29
Start Date: 25/Jun/21 14:29
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r658808816



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1314,6 +1314,104 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testStatWithInsert() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+testTables.createTable(shell, identifier.name(), 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+PartitionSpec.unpartitioned(), fileFormat, ImmutableList.of());
+
+if (testTableType != TestTables.TestTableType.HIVE_CATALOG) {
+  // If the location is set and we have to gather stats, then we have to 
update the table stats now
+  shell.executeStatement("ANALYZE TABLE " + identifier + " COMPUTE 
STATISTICS FOR COLUMNS");
+}
+
+String insert = 
testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
identifier, false);
+shell.executeStatement(insert);
+
+checkColStat(identifier.name(), "customer_id");
+  }
+
+  @Test
+  public void testStatWithInsertOverwrite() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+testTables.createTable(shell, identifier.name(), 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+PartitionSpec.unpartitioned(), fileFormat, ImmutableList.of());
+
+String insert = 
testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
identifier, true);
+shell.executeStatement(insert);
+
+checkColStat(identifier.name(), "customer_id");
+  }
+
+  @Test
+  public void testStatWithPartitionedInsert() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+PartitionSpec spec = 
PartitionSpec.builderFor(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA)
+.identity("last_name").build();
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+testTables.createTable(shell, identifier.name(), 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, spec,
+fileFormat, ImmutableList.of());
+
+if (testTableType != TestTables.TestTableType.HIVE_CATALOG) {
+  // If the location is set and we have to gather stats, then we have to 
update the table stats now
+  shell.executeStatement("ANALYZE TABLE " + identifier + " COMPUTE 
STATISTICS FOR COLUMNS");
+}
+
+String insert = 
testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
identifier, false);
+shell.executeStatement(insert);
+
+checkColStat("customers", "customer_id");
+checkColStat("customers", "first_name");
+  }
+
+  @Test
+  public void testStatWithCTAS() {
+Assume.assumeTrue(HiveIcebergSerDe.CTAS_EXCEPTION_MSG, testTableType == 
TestTables.TestTableType.HIVE_CATALOG);
+
+shell.executeStatement("CREATE TABLE source (id bigint, name string) 
PARTITIONED BY (dept string) STORED AS ORC");
+shell.executeStatement(testTables.getInsertQuery(
+HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
TableIdentifier.of("default", "source"), false));
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+shell.executeStatement(String.format(
+"CREATE TABLE target STORED BY ICEBERG %s TBLPROPERTIES ('%s'='%s') AS 
SELECT * FROM source",
+testTables.locationForCreateTableSQL(TableIdentifier.of("default", 
"target")),
+TableProperties.DEFAULT_FILE_FORMAT, fileFormat));
+
+checkColStat("target", "id");
+  }
+
+  @Test
+  public void testStatWithPartitionedCTAS() {
+Assume.assumeTrue(HiveIcebergSerDe.CTAS_EXCEPTION_MSG, testTableType == 
TestTables.TestTableType.HIVE_CATALOG);
+
+shell.executeStatement("CREATE TABLE source (id bigint, name string) 
PARTITIONED BY (dept string) STORED AS ORC");
+shell.executeStatement(testTables.getInsertQuery(
+HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
TableIdentifier.of("default", "source"), false));
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+

Review comment:
   nit: extra new line




-- 
This is an automated message from the Apache Git 

[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-06-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=615030=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-615030
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 25/Jun/21 14:26
Start Date: 25/Jun/21 14:26
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r658806362



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CreateTableDesc.java
##
@@ -804,9 +801,18 @@ public Table toTable(HiveConf conf) throws HiveException {
   }
 }
 
-if (getCols() != null) {
-  tbl.setFields(getCols());
+Optional> cols = Optional.of(getCols());

Review comment:
   It seems like `getCols()` can return null based on the previous null 
check. This would fail here as well, unless wrap it using 
`Optional.ofNullable()`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 615030)
Time Spent: 1h 50m  (was: 1h 40m)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-06-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=614981=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614981
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 25/Jun/21 12:58
Start Date: 25/Jun/21 12:58
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r658743756



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsAutoGatherContext.java
##
@@ -71,13 +71,21 @@
   private Context origCtx;
   
   public ColumnStatsAutoGatherContext(SemanticAnalyzer sa, HiveConf conf,
-  Operator op, Table tbl, Map 
partSpec,
+  Operator op, Table origTbl, Map 
partSpec,
   boolean isInsertInto, Context ctx) throws SemanticException {
 super();
 this.sa = sa;
 this.conf = conf;
 this.op = op;
-this.tbl = tbl;
+try {
+  this.tbl = origTbl.copy();
+} catch (HiveException he) {
+  throw new SemanticException(he);
+}
+if (tbl.getStorageHandler().alwaysUnpartitioned()) {

Review comment:
   Not sure, so I have added the `null` check




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614981)
Time Spent: 1h 40m  (was: 1.5h)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-06-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=614980=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614980
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 25/Jun/21 12:58
Start Date: 25/Jun/21 12:58
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r658743466



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1314,6 +1314,85 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testStatWithInsert() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+testTables.createTable(shell, identifier.name(), 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+PartitionSpec.unpartitioned(), fileFormat, ImmutableList.of());
+
+if (testTableType != TestTables.TestTableType.HIVE_CATALOG) {
+  // If the location is set and we have to gather stats, then we have to 
update the table stats now
+  shell.executeStatement("ANALYZE TABLE " + identifier + " COMPUTE 
STATISTICS FOR COLUMNS");
+}
+
+String insert = 
testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
identifier, false);
+shell.executeStatement(insert);
+
+checkColStat(identifier.name(), "customer_id");

Review comment:
   Added to a partitioned CTAS test




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614980)
Time Spent: 1.5h  (was: 1h 20m)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-06-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=614979=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614979
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 25/Jun/21 12:57
Start Date: 25/Jun/21 12:57
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r658743168



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1314,6 +1314,85 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testStatWithInsert() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+testTables.createTable(shell, identifier.name(), 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+PartitionSpec.unpartitioned(), fileFormat, ImmutableList.of());
+
+if (testTableType != TestTables.TestTableType.HIVE_CATALOG) {
+  // If the location is set and we have to gather stats, then we have to 
update the table stats now
+  shell.executeStatement("ANALYZE TABLE " + identifier + " COMPUTE 
STATISTICS FOR COLUMNS");
+}
+
+String insert = 
testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
identifier, false);
+shell.executeStatement(insert);
+
+checkColStat(identifier.name(), "customer_id");
+  }
+
+  @Test
+  public void testStatWithInsertOverwrite() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+testTables.createTable(shell, identifier.name(), 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,

Review comment:
   Added Partitioned CTAS




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614979)
Time Spent: 1h 20m  (was: 1h 10m)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-06-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=614973=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614973
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 25/Jun/21 12:52
Start Date: 25/Jun/21 12:52
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r658739899



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsAutoGatherContext.java
##
@@ -71,13 +71,21 @@
   private Context origCtx;
   
   public ColumnStatsAutoGatherContext(SemanticAnalyzer sa, HiveConf conf,
-  Operator op, Table tbl, Map 
partSpec,
+  Operator op, Table origTbl, Map 
partSpec,
   boolean isInsertInto, Context ctx) throws SemanticException {
 super();
 this.sa = sa;
 this.conf = conf;
 this.op = op;
-this.tbl = tbl;
+try {
+  this.tbl = origTbl.copy();
+} catch (HiveException he) {
+  throw new SemanticException(he);
+}
+if (tbl.getStorageHandler().alwaysUnpartitioned()) {

Review comment:
   Can the storage handler be null?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614973)
Time Spent: 1h 10m  (was: 1h)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=614598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614598
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 16:11
Start Date: 24/Jun/21 16:11
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r658090073



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1314,6 +1314,85 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testStatWithInsert() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+testTables.createTable(shell, identifier.name(), 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+PartitionSpec.unpartitioned(), fileFormat, ImmutableList.of());
+
+if (testTableType != TestTables.TestTableType.HIVE_CATALOG) {
+  // If the location is set and we have to gather stats, then we have to 
update the table stats now
+  shell.executeStatement("ANALYZE TABLE " + identifier + " COMPUTE 
STATISTICS FOR COLUMNS");
+}
+
+String insert = 
testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
identifier, false);
+shell.executeStatement(insert);
+
+checkColStat(identifier.name(), "customer_id");

Review comment:
   Ok, thanks, just wanted to doube-check my understanding.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614598)
Time Spent: 1h  (was: 50m)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=614593=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614593
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 16:05
Start Date: 24/Jun/21 16:05
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r658085550



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1314,6 +1314,85 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testStatWithInsert() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+testTables.createTable(shell, identifier.name(), 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+PartitionSpec.unpartitioned(), fileFormat, ImmutableList.of());
+
+if (testTableType != TestTables.TestTableType.HIVE_CATALOG) {
+  // If the location is set and we have to gather stats, then we have to 
update the table stats now
+  shell.executeStatement("ANALYZE TABLE " + identifier + " COMPUTE 
STATISTICS FOR COLUMNS");
+}
+
+String insert = 
testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
identifier, false);
+shell.executeStatement(insert);
+
+checkColStat(identifier.name(), "customer_id");
+  }
+
+  @Test
+  public void testStatWithInsertOverwrite() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+testTables.createTable(shell, identifier.name(), 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,

Review comment:
   My thought process was the following:
   - We need a basic test - so created unpartitioned insert
   - We need to test partitioned tables (what happens with the partition 
columns) - so created partitioned test (not sure that this is strictly needed)
   - We need to test IOW - so created IOW
   
   I am open to discussions either way




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614593)
Time Spent: 50m  (was: 40m)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=614590=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614590
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 16:04
Start Date: 24/Jun/21 16:04
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r658084643



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1314,6 +1314,85 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testStatWithInsert() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+testTables.createTable(shell, identifier.name(), 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+PartitionSpec.unpartitioned(), fileFormat, ImmutableList.of());
+
+if (testTableType != TestTables.TestTableType.HIVE_CATALOG) {
+  // If the location is set and we have to gather stats, then we have to 
update the table stats now
+  shell.executeStatement("ANALYZE TABLE " + identifier + " COMPUTE 
STATISTICS FOR COLUMNS");
+}
+
+String insert = 
testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
identifier, false);
+shell.executeStatement(insert);
+
+checkColStat(identifier.name(), "customer_id");

Review comment:
   They are basically the same, and generated as well.
   Only in partitioned case did I went for checking the partition columns too. 
Just to be sure




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614590)
Time Spent: 40m  (was: 0.5h)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=614577=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614577
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 15:38
Start Date: 24/Jun/21 15:38
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r658062987



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1314,6 +1314,85 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testStatWithInsert() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+testTables.createTable(shell, identifier.name(), 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+PartitionSpec.unpartitioned(), fileFormat, ImmutableList.of());
+
+if (testTableType != TestTables.TestTableType.HIVE_CATALOG) {
+  // If the location is set and we have to gather stats, then we have to 
update the table stats now
+  shell.executeStatement("ANALYZE TABLE " + identifier + " COMPUTE 
STATISTICS FOR COLUMNS");
+}
+
+String insert = 
testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
identifier, false);
+shell.executeStatement(insert);
+
+checkColStat(identifier.name(), "customer_id");

Review comment:
   Are the columns stats gathered for `first_name` and `last_name` as well, 
we're just saving on the number of describe calls?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614577)
Time Spent: 0.5h  (was: 20m)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=614574=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-614574
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 24/Jun/21 15:36
Start Date: 24/Jun/21 15:36
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r658061851



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1314,6 +1314,85 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testStatWithInsert() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+testTables.createTable(shell, identifier.name(), 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+PartitionSpec.unpartitioned(), fileFormat, ImmutableList.of());
+
+if (testTableType != TestTables.TestTableType.HIVE_CATALOG) {
+  // If the location is set and we have to gather stats, then we have to 
update the table stats now
+  shell.executeStatement("ANALYZE TABLE " + identifier + " COMPUTE 
STATISTICS FOR COLUMNS");
+}
+
+String insert = 
testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 
identifier, false);
+shell.executeStatement(insert);
+
+checkColStat(identifier.name(), "customer_id");
+  }
+
+  @Test
+  public void testStatWithInsertOverwrite() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+
+shell.setHiveSessionValue(HiveConf.ConfVars.HIVESTATSAUTOGATHER.varname, 
true);
+testTables.createTable(shell, identifier.name(), 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,

Review comment:
   If we have test cases for unpartitioned insert, partitioned insert, 
unpartitioned IOW, should we have a test case for partitioned IOW as well?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 614574)
Time Spent: 20m  (was: 10m)

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

2021-06-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=613498=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-613498
 ]

ASF GitHub Bot logged work on HIVE-25276:
-

Author: ASF GitHub Bot
Created on: 22/Jun/21 14:02
Start Date: 22/Jun/21 14:02
Worklog Time Spent: 10m 
  Work Description: pvary opened a new pull request #2419:
URL: https://github.com/apache/hive/pull/2419


   ### What changes were proposed in this pull request?
   Allow column stat generation when we can commit in a move task
   
   ### Why are the changes needed?
   So we have column statistics after inserting to an Iceberg table
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Added unit tests
   
   Also added #2359 as it is required for this patch


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 613498)
Remaining Estimate: 0h
Time Spent: 10m

> Enable automatic statistics generation for Iceberg tables
> -
>
> Key: HIVE-25276
> URL: https://issues.apache.org/jira/browse/HIVE-25276
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)