subject:"\[jira\] \[Work logged\] \(HIVE\-25673\) Column pruning fix for MR tasks"

[jira] [Work logged] (HIVE-25673) Column pruning fix for MR tasks

2021-11-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25673?focusedWorklogId=678397&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-678397
 ]

ASF GitHub Bot logged work on HIVE-25673:
-

Author: ASF GitHub Bot
Created on: 08/Nov/21 10:01
Start Date: 08/Nov/21 10:01
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2765:
URL: https://github.com/apache/hive/pull/2765#discussion_r744568098



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergSelects.java
##
@@ -203,4 +204,29 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {0L, "Alice", "Brown"}, rows.get(0));
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
+
+  /**
+   * Column pruning could become problematic when a single Map Task contains 
multiple TableScan operators where
+   * different columns are pruned. This only occurs on MR, as Tez initializes 
a single Map task for every TableScan
+   * operator.
+   */
+  @Test
+  public void testMultiColumnPruning() throws IOException {
+shell.setHiveSessionValue("hive.cbo.enable", true);
+
+Schema schema1 = new Schema(optional(1, "fk", Types.StringType.get()));
+List records1 = 
TestHelper.RecordsBuilder.newInstance(schema1).add("fk1").build();
+testTables.createTable(shell, "table1", schema1, fileFormat, records1);
+
+Schema schema2 = new Schema(optional(1, "fk", Types.StringType.get()), 
optional(2, "val", Types.StringType.get()));
+List records2 = 
TestHelper.RecordsBuilder.newInstance(schema2).add("fk1", "val").build();
+testTables.createTable(shell, "table2", schema2, fileFormat, records2);
+
+// MR is needed for the reproduction
+shell.setHiveSessionValue("hive.execution.engine", "mr");

Review comment:
   I am not sure that Hive removed MR support (I think not yet, but there 
is no active development maintaining it), but Hive - Iceberg definitely needs 
Tez.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 678397)
Time Spent: 1h 40m  (was: 1.5h)

> Column pruning fix for MR tasks
> ---
>
> Key: HIVE-25673
> URL: https://issues.apache.org/jira/browse/HIVE-25673
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When running join tests for Iceberg tables then we got the following 
> exception:
> {code}
> Caused by: java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:131)
>   ... 23 more
> Caused by: java.lang.RuntimeException: cannot find field val from 
> [org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector$IcebergRecordStructField@45f29d]
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:550)
>   at 
> org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector.getStructFieldRef(IcebergRecordObjectInspector.java:70)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:56)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1073)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1099)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:74)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:505)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:110)
>   ... 23 more
>

[jira] [Work logged] (HIVE-25673) Column pruning fix for MR tasks

2021-11-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25673?focusedWorklogId=678391&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-678391
 ]

ASF GitHub Bot logged work on HIVE-25673:
-

Author: ASF GitHub Bot
Created on: 08/Nov/21 09:56
Start Date: 08/Nov/21 09:56
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #2765:
URL: https://github.com/apache/hive/pull/2765


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 678391)
Time Spent: 1h 20m  (was: 1h 10m)

> Column pruning fix for MR tasks
> ---
>
> Key: HIVE-25673
> URL: https://issues.apache.org/jira/browse/HIVE-25673
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When running join tests for Iceberg tables then we got the following 
> exception:
> {code}
> Caused by: java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:131)
>   ... 23 more
> Caused by: java.lang.RuntimeException: cannot find field val from 
> [org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector$IcebergRecordStructField@45f29d]
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:550)
>   at 
> org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector.getStructFieldRef(IcebergRecordObjectInspector.java:70)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:56)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1073)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1099)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:74)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:505)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:110)
>   ... 23 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25673) Column pruning fix for MR tasks

2021-11-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25673?focusedWorklogId=678392&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-678392
 ]

ASF GitHub Bot logged work on HIVE-25673:
-

Author: ASF GitHub Bot
Created on: 08/Nov/21 09:56
Start Date: 08/Nov/21 09:56
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2765:
URL: https://github.com/apache/hive/pull/2765#discussion_r744563870



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergSelects.java
##
@@ -203,4 +204,29 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {0L, "Alice", "Brown"}, rows.get(0));
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
+
+  /**
+   * Column pruning could become problematic when a single Map Task contains 
multiple TableScan operators where
+   * different columns are pruned. This only occurs on MR, as Tez initializes 
a single Map task for every TableScan
+   * operator.
+   */
+  @Test
+  public void testMultiColumnPruning() throws IOException {
+shell.setHiveSessionValue("hive.cbo.enable", true);
+
+Schema schema1 = new Schema(optional(1, "fk", Types.StringType.get()));
+List records1 = 
TestHelper.RecordsBuilder.newInstance(schema1).add("fk1").build();
+testTables.createTable(shell, "table1", schema1, fileFormat, records1);
+
+Schema schema2 = new Schema(optional(1, "fk", Types.StringType.get()), 
optional(2, "val", Types.StringType.get()));
+List records2 = 
TestHelper.RecordsBuilder.newInstance(schema2).add("fk1", "val").build();
+testTables.createTable(shell, "table2", schema2, fileFormat, records2);
+
+// MR is needed for the reproduction
+shell.setHiveSessionValue("hive.execution.engine", "mr");

Review comment:
   Got it. By the way, did you mean "With Hive 4.0.0 we do not support MR"?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 678392)
Time Spent: 1.5h  (was: 1h 20m)

> Column pruning fix for MR tasks
> ---
>
> Key: HIVE-25673
> URL: https://issues.apache.org/jira/browse/HIVE-25673
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When running join tests for Iceberg tables then we got the following 
> exception:
> {code}
> Caused by: java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:131)
>   ... 23 more
> Caused by: java.lang.RuntimeException: cannot find field val from 
> [org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector$IcebergRecordStructField@45f29d]
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:550)
>   at 
> org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector.getStructFieldRef(IcebergRecordObjectInspector.java:70)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:56)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1073)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1099)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:74)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:505)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:110)
>   ... 23 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25673) Column pruning fix for MR tasks

2021-11-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25673?focusedWorklogId=678389&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-678389
 ]

ASF GitHub Bot logged work on HIVE-25673:
-

Author: ASF GitHub Bot
Created on: 08/Nov/21 09:55
Start Date: 08/Nov/21 09:55
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2765:
URL: https://github.com/apache/hive/pull/2765#discussion_r744563434



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestTables.java
##
@@ -437,7 +437,7 @@ public String identifier(String tableIdentifier) {
   }
 
   Assert.assertTrue(location.delete());
-  return location.toString();
+  return "file://" + location;

Review comment:
   If the table location does not match the file locations then we had a 
problem matching the `alias` to the `path`, and failed to add the configuration 
values to the HiveConf.
   
   We need to use the full path in the table location to match the file names.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 678389)
Time Spent: 1h 10m  (was: 1h)

> Column pruning fix for MR tasks
> ---
>
> Key: HIVE-25673
> URL: https://issues.apache.org/jira/browse/HIVE-25673
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When running join tests for Iceberg tables then we got the following 
> exception:
> {code}
> Caused by: java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:131)
>   ... 23 more
> Caused by: java.lang.RuntimeException: cannot find field val from 
> [org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector$IcebergRecordStructField@45f29d]
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:550)
>   at 
> org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector.getStructFieldRef(IcebergRecordObjectInspector.java:70)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:56)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1073)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1099)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:74)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:505)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:110)
>   ... 23 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25673) Column pruning fix for MR tasks

2021-11-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25673?focusedWorklogId=678383&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-678383
 ]

ASF GitHub Bot logged work on HIVE-25673:
-

Author: ASF GitHub Bot
Created on: 08/Nov/21 09:52
Start Date: 08/Nov/21 09:52
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2765:
URL: https://github.com/apache/hive/pull/2765#discussion_r744560669



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergSelects.java
##
@@ -203,4 +204,29 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {0L, "Alice", "Brown"}, rows.get(0));
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
+
+  /**
+   * Column pruning could become problematic when a single Map Task contains 
multiple TableScan operators where
+   * different columns are pruned. This only occurs on MR, as Tez initializes 
a single Map task for every TableScan
+   * operator.
+   */
+  @Test
+  public void testMultiColumnPruning() throws IOException {
+shell.setHiveSessionValue("hive.cbo.enable", true);
+
+Schema schema1 = new Schema(optional(1, "fk", Types.StringType.get()));
+List records1 = 
TestHelper.RecordsBuilder.newInstance(schema1).add("fk1").build();
+testTables.createTable(shell, "table1", schema1, fileFormat, records1);
+
+Schema schema2 = new Schema(optional(1, "fk", Types.StringType.get()), 
optional(2, "val", Types.StringType.get()));
+List records2 = 
TestHelper.RecordsBuilder.newInstance(schema2).add("fk1", "val").build();
+testTables.createTable(shell, "table2", schema2, fileFormat, records2);
+
+// MR is needed for the reproduction
+shell.setHiveSessionValue("hive.execution.engine", "mr");

Review comment:
   With Hive 4.0.0 we do not support Iceberg. When I tried to run the tests 
with MR, the inserts were not working. So I had to run the inserts with Tez, 
and then the test query with MR.
   
   OTOH this is a valid issue with MR, and older versions on Hive, where MR is 
supported. (Maybe on newer version as well)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 678383)
Time Spent: 1h  (was: 50m)

> Column pruning fix for MR tasks
> ---
>
> Key: HIVE-25673
> URL: https://issues.apache.org/jira/browse/HIVE-25673
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When running join tests for Iceberg tables then we got the following 
> exception:
> {code}
> Caused by: java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:131)
>   ... 23 more
> Caused by: java.lang.RuntimeException: cannot find field val from 
> [org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector$IcebergRecordStructField@45f29d]
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:550)
>   at 
> org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector.getStructFieldRef(IcebergRecordObjectInspector.java:70)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:56)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1073)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1099)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:74)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java

[jira] [Work logged] (HIVE-25673) Column pruning fix for MR tasks

2021-11-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25673?focusedWorklogId=678367&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-678367
 ]

ASF GitHub Bot logged work on HIVE-25673:
-

Author: ASF GitHub Bot
Created on: 08/Nov/21 08:42
Start Date: 08/Nov/21 08:42
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2765:
URL: https://github.com/apache/hive/pull/2765#discussion_r744505963



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestTables.java
##
@@ -437,7 +437,7 @@ public String identifier(String tableIdentifier) {
   }
 
   Assert.assertTrue(location.delete());
-  return location.toString();
+  return "file://" + location;

Review comment:
   Why was this needed?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 678367)
Time Spent: 50m  (was: 40m)

> Column pruning fix for MR tasks
> ---
>
> Key: HIVE-25673
> URL: https://issues.apache.org/jira/browse/HIVE-25673
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When running join tests for Iceberg tables then we got the following 
> exception:
> {code}
> Caused by: java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:131)
>   ... 23 more
> Caused by: java.lang.RuntimeException: cannot find field val from 
> [org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector$IcebergRecordStructField@45f29d]
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:550)
>   at 
> org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector.getStructFieldRef(IcebergRecordObjectInspector.java:70)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:56)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1073)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1099)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:74)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:505)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:110)
>   ... 23 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25673) Column pruning fix for MR tasks

2021-11-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25673?focusedWorklogId=678366&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-678366
 ]

ASF GitHub Bot logged work on HIVE-25673:
-

Author: ASF GitHub Bot
Created on: 08/Nov/21 08:40
Start Date: 08/Nov/21 08:40
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2765:
URL: https://github.com/apache/hive/pull/2765#discussion_r744503927



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergSelects.java
##
@@ -203,4 +204,29 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {0L, "Alice", "Brown"}, rows.get(0));
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
+
+  /**
+   * Column pruning could become problematic when a single Map Task contains 
multiple TableScan operators where
+   * different columns are pruned. This only occurs on MR, as Tez initializes 
a single Map task for every TableScan
+   * operator.
+   */
+  @Test
+  public void testMultiColumnPruning() throws IOException {
+shell.setHiveSessionValue("hive.cbo.enable", true);
+
+Schema schema1 = new Schema(optional(1, "fk", Types.StringType.get()));
+List records1 = 
TestHelper.RecordsBuilder.newInstance(schema1).add("fk1").build();
+testTables.createTable(shell, "table1", schema1, fileFormat, records1);
+
+Schema schema2 = new Schema(optional(1, "fk", Types.StringType.get()), 
optional(2, "val", Types.StringType.get()));
+List records2 = 
TestHelper.RecordsBuilder.newInstance(schema2).add("fk1", "val").build();
+testTables.createTable(shell, "table2", schema2, fileFormat, records2);
+
+// MR is needed for the reproduction
+shell.setHiveSessionValue("hive.execution.engine", "mr");

Review comment:
   How's the test coverage for this on Tez? Is it already covered, or do we 
need to run this for Tez as well to avoid future regressions?
   
   And a minor nit: we're setting the engine to MR but the test output will 
still show the parameter engine=Tez, which is not a big deal just hurts my OCD 
a bit :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 678366)
Time Spent: 40m  (was: 0.5h)

> Column pruning fix for MR tasks
> ---
>
> Key: HIVE-25673
> URL: https://issues.apache.org/jira/browse/HIVE-25673
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When running join tests for Iceberg tables then we got the following 
> exception:
> {code}
> Caused by: java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:131)
>   ... 23 more
> Caused by: java.lang.RuntimeException: cannot find field val from 
> [org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector$IcebergRecordStructField@45f29d]
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:550)
>   at 
> org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector.getStructFieldRef(IcebergRecordObjectInspector.java:70)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:56)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1073)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1099)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:74)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.jav

[jira] [Work logged] (HIVE-25673) Column pruning fix for MR tasks

2021-11-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25673?focusedWorklogId=677786&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-677786
 ]

ASF GitHub Bot logged work on HIVE-25673:
-

Author: ASF GitHub Bot
Created on: 05/Nov/21 20:37
Start Date: 05/Nov/21 20:37
Worklog Time Spent: 10m 
  Work Description: pvary opened a new pull request #2765:
URL: https://github.com/apache/hive/pull/2765


   ### What changes were proposed in this pull request?
   When updating column pruning information `READ_NESTED_COLUMN_PATH_CONF_STR`, 
update `READ_COLUMN_NAMES_CONF_STR` and `READ_COLUMN_IDS_CONF_STR` as well. 
   
   ### Why are the changes needed?
   Iceberg MR queries are failing if multiple tables are queried and several 
columns are pruned
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Added unit test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 677786)
Time Spent: 0.5h  (was: 20m)

> Column pruning fix for MR tasks
> ---
>
> Key: HIVE-25673
> URL: https://issues.apache.org/jira/browse/HIVE-25673
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When running join tests for Iceberg tables then we got the following 
> exception:
> {code}
> Caused by: java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:131)
>   ... 23 more
> Caused by: java.lang.RuntimeException: cannot find field val from 
> [org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector$IcebergRecordStructField@45f29d]
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:550)
>   at 
> org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector.getStructFieldRef(IcebergRecordObjectInspector.java:70)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:56)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1073)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1099)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:74)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:505)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:110)
>   ... 23 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25673) Column pruning fix for MR tasks

2021-11-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25673?focusedWorklogId=677412&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-677412
 ]

ASF GitHub Bot logged work on HIVE-25673:
-

Author: ASF GitHub Bot
Created on: 05/Nov/21 19:49
Start Date: 05/Nov/21 19:49
Worklog Time Spent: 10m 
  Work Description: pvary opened a new pull request #2765:
URL: https://github.com/apache/hive/pull/2765


   ### What changes were proposed in this pull request?
   When updating column pruning information `READ_NESTED_COLUMN_PATH_CONF_STR`, 
update `READ_COLUMN_NAMES_CONF_STR` and `READ_COLUMN_IDS_CONF_STR` as well. 
   
   ### Why are the changes needed?
   Iceberg MR queries are failing if multiple tables are queried and several 
columns are pruned
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Added unit test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 677412)
Time Spent: 20m  (was: 10m)

> Column pruning fix for MR tasks
> ---
>
> Key: HIVE-25673
> URL: https://issues.apache.org/jira/browse/HIVE-25673
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When running join tests for Iceberg tables then we got the following 
> exception:
> {code}
> Caused by: java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:131)
>   ... 23 more
> Caused by: java.lang.RuntimeException: cannot find field val from 
> [org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector$IcebergRecordStructField@45f29d]
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:550)
>   at 
> org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector.getStructFieldRef(IcebergRecordObjectInspector.java:70)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:56)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1073)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1099)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:74)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:505)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:110)
>   ... 23 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25673) Column pruning fix for MR tasks

2021-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25673?focusedWorklogId=676719&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-676719
 ]

ASF GitHub Bot logged work on HIVE-25673:
-

Author: ASF GitHub Bot
Created on: 04/Nov/21 20:43
Start Date: 04/Nov/21 20:43
Worklog Time Spent: 10m 
  Work Description: pvary opened a new pull request #2765:
URL: https://github.com/apache/hive/pull/2765


   ### What changes were proposed in this pull request?
   When updating column pruning information `READ_NESTED_COLUMN_PATH_CONF_STR`, 
update `READ_COLUMN_NAMES_CONF_STR` and `READ_COLUMN_IDS_CONF_STR` as well. 
   
   ### Why are the changes needed?
   Iceberg MR queries are failing if multiple tables are queried and several 
columns are pruned
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Added unit test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 676719)
Remaining Estimate: 0h
Time Spent: 10m

> Column pruning fix for MR tasks
> ---
>
> Key: HIVE-25673
> URL: https://issues.apache.org/jira/browse/HIVE-25673
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When running join tests for Iceberg tables then we got the following 
> exception:
> {code}
> Caused by: java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:131)
>   ... 23 more
> Caused by: java.lang.RuntimeException: cannot find field val from 
> [org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector$IcebergRecordStructField@45f29d]
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:550)
>   at 
> org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector.getStructFieldRef(IcebergRecordObjectInspector.java:70)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:56)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1073)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1099)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:74)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:549)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:503)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:369)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:505)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:110)
>   ... 23 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25673) Column pruning fix for MR tasks

[jira] [Work logged] (HIVE-25673) Column pruning fix for MR tasks

[jira] [Work logged] (HIVE-25673) Column pruning fix for MR tasks

[jira] [Work logged] (HIVE-25673) Column pruning fix for MR tasks

[jira] [Work logged] (HIVE-25673) Column pruning fix for MR tasks

[jira] [Work logged] (HIVE-25673) Column pruning fix for MR tasks

[jira] [Work logged] (HIVE-25673) Column pruning fix for MR tasks

[jira] [Work logged] (HIVE-25673) Column pruning fix for MR tasks

[jira] [Work logged] (HIVE-25673) Column pruning fix for MR tasks

[jira] [Work logged] (HIVE-25673) Column pruning fix for MR tasks

10 matches

Site Navigation

Mail list logo

Footer information