[jira] [Work logged] (HIVE-26261) Fix some issues with Spark engine removal

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26261?focusedWorklogId=774900=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774900
 ]

ASF GitHub Bot logged work on HIVE-26261:
-

Author: ASF GitHub Bot
Created on: 26/May/22 04:45
Start Date: 26/May/22 04:45
Worklog Time Spent: 10m 
  Work Description: pvary commented on PR #3320:
URL: https://github.com/apache/hive/pull/3320#issuecomment-1138149053

   @kgyrtkirk: Could you please review?




Issue Time Tracking
---

Worklog Id: (was: 774900)
Time Spent: 20m  (was: 10m)

> Fix some issues with Spark engine removal
> -
>
> Key: HIVE-26261
> URL: https://issues.apache.org/jira/browse/HIVE-26261
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I have made some mistakes when removed the Spark code:
>  * CommandAuthorizerV2.java - should check the properties. At that stage the 
> authorizer was referring tables created by Spark as a HMS client, and not as 
> an engine
>  * There is one unused method left in MapJoinTableContainerSerDe.java



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26235) OR Condition on binary column is returning empty result

2022-05-25 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26235.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the review [~abstractdog]

> OR Condition on binary column is returning empty result
> ---
>
> Key: HIVE-26235
> URL: https://issues.apache.org/jira/browse/HIVE-26235
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Repro steps
> {code:java}
> create table test_binary(data_col timestamp, binary_col binary) partitioned 
> by (ts string);
> insert into test_binary partition(ts='20220420') values ('2022-04-20 
> 00:00:00.0', 'a'),('2022-04-20 00:00:00.0', 'b'), ('2022-04-20 00:00:00.0', 
> 'c');
> // Works
> select * from test_binary where ts='20220420' and binary_col = 
> unhex('61');
> select * from test_binary where ts='20220420' and binary_col between 
> unhex('61') and unhex('62');
> //Returns empty result
> select * from test_binary where binary_col = unhex('61') or binary_col = 
> unhex('62');
> select * from test_binary where ts='20220420' and (binary_col = 
> unhex('61') or binary_col = unhex('62'));
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26235) OR Condition on binary column is returning empty result

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26235?focusedWorklogId=774898=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774898
 ]

ASF GitHub Bot logged work on HIVE-26235:
-

Author: ASF GitHub Bot
Created on: 26/May/22 04:42
Start Date: 26/May/22 04:42
Worklog Time Spent: 10m 
  Work Description: pvary merged PR #3305:
URL: https://github.com/apache/hive/pull/3305




Issue Time Tracking
---

Worklog Id: (was: 774898)
Time Spent: 40m  (was: 0.5h)

> OR Condition on binary column is returning empty result
> ---
>
> Key: HIVE-26235
> URL: https://issues.apache.org/jira/browse/HIVE-26235
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Repro steps
> {code:java}
> create table test_binary(data_col timestamp, binary_col binary) partitioned 
> by (ts string);
> insert into test_binary partition(ts='20220420') values ('2022-04-20 
> 00:00:00.0', 'a'),('2022-04-20 00:00:00.0', 'b'), ('2022-04-20 00:00:00.0', 
> 'c');
> // Works
> select * from test_binary where ts='20220420' and binary_col = 
> unhex('61');
> select * from test_binary where ts='20220420' and binary_col between 
> unhex('61') and unhex('62');
> //Returns empty result
> select * from test_binary where binary_col = unhex('61') or binary_col = 
> unhex('62');
> select * from test_binary where ts='20220420' and (binary_col = 
> unhex('61') or binary_col = unhex('62'));
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774820=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774820
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 25/May/22 21:26
Start Date: 25/May/22 21:26
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3295:
URL: https://github.com/apache/hive/pull/3295#discussion_r882128671


##
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampsHive2Compatibility.java:
##
@@ -79,6 +79,18 @@ void testWriteHive2ReadHive4UsingLegacyConversion(String 
timestampString) {
 assertEquals(timestampString, ts.toString());
   }
 
+  /**
+   * Tests that timestamps written using Hive2 APIs are read correctly by 
Hive4 APIs when legacy conversion is on.
+   */
+  @ParameterizedTest(name = "{0}")
+  @MethodSource("generateTimestamps")
+  void testWriteHive2ReadHive4UsingLegacyConversionWithZone(String 
timestampString) {
+String zoneId = "US/Pacific";
+NanoTime nt = writeHive2(timestampString);

Review Comment:
   Since there is no parameter in `writeHive2` for specifying the timezone I 
think you will need to call `TimeZone.setDefault()` explicitly otherwise it 
will not work.





Issue Time Tracking
---

Worklog Id: (was: 774820)
Time Spent: 2h 10m  (was: 2h)

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (HIVE-26265) REPL DUMP should filter out OpenXacts and unneeded CommitXact/Abort.

2022-05-25 Thread francis pang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

francis pang reassigned HIVE-26265:
---


> REPL DUMP should filter out OpenXacts and unneeded CommitXact/Abort.
> 
>
> Key: HIVE-26265
> URL: https://issues.apache.org/jira/browse/HIVE-26265
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: francis pang
>Assignee: francis pang
>Priority: Major
>
> REPL DUMP is replication all OpenXacts, even when they are from other non 
> replicated databases. This wastes space in the dump, and ends up opening 
> unneeded transactions during REPL LOAD.
>  
> Add a config property for replication that filters out OpenXact events during 
> REPL DUMP. During REPL LOAD, the txns can be implicitly opened when the 
> ALLOC_WRITE_ID is processed. For CommitTxn and AbortTxn, dump only if WRITE 
> ID was allocated.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=774692=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774692
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 25/May/22 16:57
Start Date: 25/May/22 16:57
Worklog Time Spent: 10m 
  Work Description: pvary commented on PR #3324:
URL: https://github.com/apache/hive/pull/3324#issuecomment-1137555779

   Overall, I like this change very much!
   Much more readable and clean!
   Left some comments.
   
   Thanks,
   Peter




Issue Time Tracking
---

Worklog Id: (was: 774692)
Time Spent: 1.5h  (was: 1h 20m)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>   ... 15 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:574)
>   at 
> 

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=774691=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774691
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 25/May/22 16:56
Start Date: 25/May/22 16:56
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r881902028


##
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##
@@ -15005,6 +15004,12 @@ private AcidUtils.Operation getAcidType(String 
destination) {
 AcidUtils.Operation.INSERT);
   }
 
+  private Context.Operation getWriteOperation(String destination) {

Review Comment:
   Maybe a public method where we set the operation? Also for the conversion we 
could use enum





Issue Time Tracking
---

Worklog Id: (was: 774691)
Time Spent: 1h 20m  (was: 1h 10m)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> 

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=774690=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774690
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 25/May/22 16:54
Start Date: 25/May/22 16:54
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r881900188


##
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##
@@ -11433,6 +11435,7 @@ private Operator genTablePlan(String alias, QB qb) 
throws SemanticException {
   // Determine row schema for TSOP.
   // Include column names from SerDe, the partition and virtual columns.
   rwsch = new RowResolver();
+

Review Comment:
   nit: maybe just a mistake?





Issue Time Tracking
---

Worklog Id: (was: 774690)
Time Spent: 1h 10m  (was: 1h)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> 

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=774689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774689
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 25/May/22 16:53
Start Date: 25/May/22 16:53
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r881899259


##
ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:
##
@@ -932,7 +940,9 @@ protected void createBucketForFileIdx(FSPaths fsp, int 
filesIdx)
 && !FileUtils.mkdir(fs, outPath.getParent(), hconf)) {
   LOG.warn("Unable to create directory with inheritPerms: " + outPath);
 }
-fsp.outWriters[filesIdx] = HiveFileFormatUtils.getHiveRecordWriter(jc, 
conf.getTableInfo(),
+JobConf jobConf = new JobConf(jc);
+setWriteOperation(jobConf);
+fsp.outWriters[filesIdx] = 
HiveFileFormatUtils.getHiveRecordWriter(jobConf, conf.getTableInfo(),

Review Comment:
   Why not just add the operation as a parameter to the `getHiveRecordWriter` 
method? Passing parameters in a conf seems like a bad practice, if not 
necessary.





Issue Time Tracking
---

Worklog Id: (was: 774689)
Time Spent: 1h  (was: 50m)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=774688=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774688
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 25/May/22 16:52
Start Date: 25/May/22 16:52
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r881898206


##
ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:
##
@@ -932,7 +940,9 @@ protected void createBucketForFileIdx(FSPaths fsp, int 
filesIdx)
 && !FileUtils.mkdir(fs, outPath.getParent(), hconf)) {
   LOG.warn("Unable to create directory with inheritPerms: " + outPath);
 }
-fsp.outWriters[filesIdx] = HiveFileFormatUtils.getHiveRecordWriter(jc, 
conf.getTableInfo(),
+JobConf jobConf = new JobConf(jc);

Review Comment:
   Why do we need to copy the jobConf?





Issue Time Tracking
---

Worklog Id: (was: 774688)
Time Spent: 50m  (was: 40m)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>   at 
> 

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=774687=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774687
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 25/May/22 16:51
Start Date: 25/May/22 16:51
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r881897659


##
ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:
##
@@ -739,6 +742,11 @@ protected void initializeOp(Configuration hconf) throws 
HiveException {
 }
   }
 
+  private void setWriteOperation(Configuration conf) {

Review Comment:
   This is the same as we write out/read back stuff in the 
`HiveIcebergStorageHandler.operation`.
   If it is not removed from there we should create commonly used methods for it





Issue Time Tracking
---

Worklog Id: (was: 774687)
Time Spent: 40m  (was: 0.5h)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> 

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=774686=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774686
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 25/May/22 16:47
Start Date: 25/May/22 16:47
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r881893733


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -259,22 +258,27 @@ public void initialize(InputSplit split, 
TaskAttemptContext newContext) {
   this.inMemoryDataModel = 
conf.getEnum(InputFormatConfig.IN_MEMORY_DATA_MODEL,
   InputFormatConfig.InMemoryDataModel.GENERIC);
   this.currentIterator = open(tasks.next(), expectedSchema).iterator();
-  Operation operation = HiveIcebergStorageHandler.operation(conf, 
conf.get(Catalogs.NAME));
-  this.updateOrDelete = Operation.DELETE.equals(operation) || 
Operation.UPDATE.equals(operation);
+  this.fetchVirtualColumns = InputFormatConfig.fetchVirtualColumns(conf);
 }
 
 @Override
 public boolean nextKeyValue() throws IOException {
   while (true) {
 if (currentIterator.hasNext()) {
   current = currentIterator.next();
-  if (updateOrDelete) {
+  if (fetchVirtualColumns) {
 GenericRecord rec = (GenericRecord) current;
 PositionDeleteInfo.setIntoConf(conf,
 IcebergAcidUtil.parseSpecId(rec),
 IcebergAcidUtil.computePartitionHash(rec),
 IcebergAcidUtil.parseFilePath(rec),
 IcebergAcidUtil.parseFilePosition(rec));
+GenericRecord tmp = GenericRecord.create(
+new Schema(expectedSchema.columns().subList(4, 
expectedSchema.columns().size(;
+for (int i = 4; i < expectedSchema.columns().size(); ++i) {

Review Comment:
   For this kind of conversion we usually create a method in `IcebergAcidUtil`





Issue Time Tracking
---

Worklog Id: (was: 774686)
Time Spent: 0.5h  (was: 20m)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> 

[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=774685=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774685
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 25/May/22 16:46
Start Date: 25/May/22 16:46
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3324:
URL: https://github.com/apache/hive/pull/3324#discussion_r881892934


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -259,22 +258,27 @@ public void initialize(InputSplit split, 
TaskAttemptContext newContext) {
   this.inMemoryDataModel = 
conf.getEnum(InputFormatConfig.IN_MEMORY_DATA_MODEL,
   InputFormatConfig.InMemoryDataModel.GENERIC);
   this.currentIterator = open(tasks.next(), expectedSchema).iterator();
-  Operation operation = HiveIcebergStorageHandler.operation(conf, 
conf.get(Catalogs.NAME));
-  this.updateOrDelete = Operation.DELETE.equals(operation) || 
Operation.UPDATE.equals(operation);
+  this.fetchVirtualColumns = InputFormatConfig.fetchVirtualColumns(conf);
 }
 
 @Override
 public boolean nextKeyValue() throws IOException {
   while (true) {
 if (currentIterator.hasNext()) {
   current = currentIterator.next();
-  if (updateOrDelete) {
+  if (fetchVirtualColumns) {
 GenericRecord rec = (GenericRecord) current;
 PositionDeleteInfo.setIntoConf(conf,
 IcebergAcidUtil.parseSpecId(rec),
 IcebergAcidUtil.computePartitionHash(rec),
 IcebergAcidUtil.parseFilePath(rec),
 IcebergAcidUtil.parseFilePosition(rec));
+GenericRecord tmp = GenericRecord.create(

Review Comment:
   We should not create the `tmp` record for every value. We should reuse a 
previously created record





Issue Time Tracking
---

Worklog Id: (was: 774685)
Time Spent: 20m  (was: 10m)

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> 

[jira] [Work logged] (HIVE-26260) Use `Reader.getSchema` instead of deprecated `Reader.getTypes`

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26260?focusedWorklogId=774671=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774671
 ]

ASF GitHub Bot logged work on HIVE-26260:
-

Author: ASF GitHub Bot
Created on: 25/May/22 16:29
Start Date: 25/May/22 16:29
Worklog Time Spent: 10m 
  Work Description: dongjoon-hyun commented on PR #3318:
URL: https://github.com/apache/hive/pull/3318#issuecomment-1137510045

   Thank you, @sunchao !




Issue Time Tracking
---

Worklog Id: (was: 774671)
Time Spent: 1h 10m  (was: 1h)

> Use `Reader.getSchema` instead of deprecated `Reader.getTypes`
> --
>
> Key: HIVE-26260
> URL: https://issues.apache.org/jira/browse/HIVE-26260
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26264?focusedWorklogId=774662=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774662
 ]

ASF GitHub Bot logged work on HIVE-26264:
-

Author: ASF GitHub Bot
Created on: 25/May/22 16:12
Start Date: 25/May/22 16:12
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request, #3324:
URL: https://github.com/apache/hive/pull/3324

   
   
   ### What changes were proposed in this pull request?
   * Add new boolean configuration to jobconf: 
`iceberg.mr.fetch.virtual.columns` and `hive.io.file.fetch.virtual.columns`
   * Populate the config value by TablesScanDesc related to TS operators in the 
plan
   * In case of iceberg read operation initialize the expected schema by the 
table's columns defined and add the virtual columns if the setting 
`iceberg.mr.fetch.virtual.columns` is true
   * Add new config setting to jobconf which is passed to serde init from FS 
operator: `file.sink.write.operation.`
   * In case of iceberg serde initialize the schema based on 
`file.sink.write.operation.`
   
   ### Why are the changes needed?
   An execution plan can have multiple TS and FS operators and some of them 
need virtual columns others not.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests 
-Dtest=TestIcebergCliDriver -Dqfile=query_iceberg_virtualcol.q -pl 
itests/qtest-iceberg -Pitests -Piceberg
   ```




Issue Time Tracking
---

Worklog Id: (was: 774662)
Remaining Estimate: 0h
Time Spent: 10m

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> 

[jira] [Updated] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26264:
--
Labels: pull-request-available  (was: )

> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>   ... 15 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:574)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
>   ... 18 more
> Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
> java.lang.Integer
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaIntObjectInspector.get(JavaIntObjectInspector.java:40)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPLessThan.evaluate(GenericUDFOPLessThan.java:127)
>   at 
> 

[jira] [Work logged] (HIVE-26260) Use `Reader.getSchema` instead of deprecated `Reader.getTypes`

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26260?focusedWorklogId=774659=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774659
 ]

ASF GitHub Bot logged work on HIVE-26260:
-

Author: ASF GitHub Bot
Created on: 25/May/22 16:05
Start Date: 25/May/22 16:05
Worklog Time Spent: 10m 
  Work Description: dongjoon-hyun commented on PR #3318:
URL: https://github.com/apache/hive/pull/3318#issuecomment-1137480247

   Thank you for review, @prasanthj .




Issue Time Tracking
---

Worklog Id: (was: 774659)
Time Spent: 1h  (was: 50m)

> Use `Reader.getSchema` instead of deprecated `Reader.getTypes`
> --
>
> Key: HIVE-26260
> URL: https://issues.apache.org/jira/browse/HIVE-26260
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26260) Use `Reader.getSchema` instead of deprecated `Reader.getTypes`

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26260?focusedWorklogId=774651=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774651
 ]

ASF GitHub Bot logged work on HIVE-26260:
-

Author: ASF GitHub Bot
Created on: 25/May/22 15:57
Start Date: 25/May/22 15:57
Worklog Time Spent: 10m 
  Work Description: prasanthj commented on PR #3318:
URL: https://github.com/apache/hive/pull/3318#issuecomment-1137470896

   lgtm, +1




Issue Time Tracking
---

Worklog Id: (was: 774651)
Time Spent: 50m  (was: 40m)

> Use `Reader.getSchema` instead of deprecated `Reader.getTypes`
> --
>
> Key: HIVE-26260
> URL: https://issues.apache.org/jira/browse/HIVE-26260
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (HIVE-26264) Iceberg integration: Fetch virtual columns on demand

2022-05-25 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-26264:
-


> Iceberg integration: Fetch virtual columns on demand
> 
>
> Key: HIVE-26264
> URL: https://issues.apache.org/jira/browse/HIVE-26264
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
>
> Currently virtual columns are fetched from iceberg tables if the statement 
> being executed is a delete or update statement and the setting is global. It 
> means it affects all tables affected by the statement. Also the read and 
> write schema depends on the operation setting.
> Some statements fails due to invalid schema:
> {code}
> create external table tbl_ice(a int, b string, c int) stored by iceberg 
> stored as orc tblproperties ('format-version'='2');
> insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
> (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
> update tbl_ice set b='Changed' where b in (select b from tbl_ice where a < 4);
> {code}
> {code}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Map 3, vertexId=vertex_1653493839723_0001_3_01, diagnostics=[Task 
> failed, taskId=task_1653493839723_0001_3_01_00, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1653493839723_0001_3_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>   ... 15 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:574)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
>   ... 18 more
> Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
> java.lang.Integer
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaIntObjectInspector.get(JavaIntObjectInspector.java:40)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPLessThan.evaluate(GenericUDFOPLessThan.java:127)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:235)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
>   at 
> 

[jira] [Commented] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542086#comment-17542086
 ] 

Stamatis Zampetakis commented on HIVE-26233:


For the record, I think this problem affects AVRO format as well. Thankfully 
the fix in https://github.com/apache/hive/pull/3295 seems sufficient to cover 
all cases.

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774593=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774593
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 25/May/22 14:42
Start Date: 25/May/22 14:42
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3295:
URL: https://github.com/apache/hive/pull/3295#discussion_r881743843


##
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java:
##
@@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() {
 verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", 
LogicalTypeAnnotation.TimeUnit.NANOS, false);
   }
 
+  @Test
+  public void testTimestamp() {

Review Comment:
   `TestParquetTimestampsHive2Compatibility` also uses multiple timezones so I 
think it should be fine and adding another one should be trivial.





Issue Time Tracking
---

Worklog Id: (was: 774593)
Time Spent: 2h  (was: 1h 50m)

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774587=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774587
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 25/May/22 14:40
Start Date: 25/May/22 14:40
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3295:
URL: https://github.com/apache/hive/pull/3295#discussion_r881741459


##
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java:
##
@@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() {
 verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", 
LogicalTypeAnnotation.TimeUnit.NANOS, false);
   }
 
+  @Test
+  public void testTimestamp() {

Review Comment:
   How about something like the following:
   
   ```java
 private static Stream generateTimestamps() {
   return Stream.concat(Stream.of("-12-31 23:59:59.999"), 
Stream.generate(new Supplier() {
   ```





Issue Time Tracking
---

Worklog Id: (was: 774587)
Time Spent: 1h 50m  (was: 1h 40m)

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774581=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774581
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 25/May/22 14:18
Start Date: 25/May/22 14:18
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3295:
URL: https://github.com/apache/hive/pull/3295#discussion_r881712501


##
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java:
##
@@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() {
 verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", 
LogicalTypeAnnotation.TimeUnit.NANOS, false);
   }
 
+  @Test
+  public void testTimestamp() {

Review Comment:
   Th ave never seen `TestParquetTimestampsHive2Compatibility`, so it is good 
that you highlighted here. OTOH that is a parameterized test, without TZ, so I 
think it would be a full rewrite to get this work for this specific TZ and TS 





Issue Time Tracking
---

Worklog Id: (was: 774581)
Time Spent: 1h 40m  (was: 1.5h)

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774576=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774576
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 25/May/22 14:09
Start Date: 25/May/22 14:09
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3295:
URL: https://github.com/apache/hive/pull/3295#discussion_r881702228


##
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java:
##
@@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() {
 verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", 
LogicalTypeAnnotation.TimeUnit.NANOS, false);
   }
 
+  @Test
+  public void testTimestamp() {

Review Comment:
   Or rather enrich the tests inside `TestParquetTimestampsHive2Compatibility` 
to account for this edge case.





Issue Time Tracking
---

Worklog Id: (was: 774576)
Time Spent: 1.5h  (was: 1h 20m)

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774574=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774574
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 25/May/22 14:07
Start Date: 25/May/22 14:07
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3295:
URL: https://github.com/apache/hive/pull/3295#discussion_r881700144


##
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java:
##
@@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() {
 verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", 
LogicalTypeAnnotation.TimeUnit.NANOS, false);
   }
 
+  @Test
+  public void testTimestamp() {

Review Comment:
   I was thinking to move it inside the existing class, not create a new one.





Issue Time Tracking
---

Worklog Id: (was: 774574)
Time Spent: 1h 20m  (was: 1h 10m)

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774535=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774535
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 25/May/22 12:52
Start Date: 25/May/22 12:52
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3295:
URL: https://github.com/apache/hive/pull/3295#discussion_r881615549


##
common/src/java/org/apache/hadoop/hive/common/type/Timestamp.java:
##
@@ -128,6 +138,10 @@ public String toString() {
 return localDateTime.format(PRINT_FORMATTER);
   }
 
+  public String toStingWithLenientFormatter() {
+return localDateTime.format(PRINT_LENIENT_FORMATTER);
+  }
+

Review Comment:
   Good idea with the `format()` method





Issue Time Tracking
---

Worklog Id: (was: 774535)
Time Spent: 1h 10m  (was: 1h)

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774534=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774534
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 25/May/22 12:52
Start Date: 25/May/22 12:52
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3295:
URL: https://github.com/apache/hive/pull/3295#discussion_r881615152


##
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java:
##
@@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() {
 verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", 
LogicalTypeAnnotation.TimeUnit.NANOS, false);
   }
 
+  @Test
+  public void testTimestamp() {

Review Comment:
   I think for a single tests we should not create a separate test class.
   Added comment to make clear what we are testing here.





Issue Time Tracking
---

Worklog Id: (was: 774534)
Time Spent: 1h  (was: 50m)

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774533=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774533
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 25/May/22 12:51
Start Date: 25/May/22 12:51
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3295:
URL: https://github.com/apache/hive/pull/3295#discussion_r881614470


##
common/src/java/org/apache/hadoop/hive/common/type/Timestamp.java:
##
@@ -101,6 +101,16 @@ public class Timestamp implements Comparable {
   // Fractional Part (Optional)
   .optionalStart().appendFraction(ChronoField.NANO_OF_SECOND, 0, 9, 
true).optionalEnd().toFormatter();
 
+  private static final DateTimeFormatter PRINT_LENIENT_FORMATTER = new 
DateTimeFormatterBuilder()
+  // Date and Time Parts
+  .appendValue(YEAR, 4, 10, 
SignStyle.NORMAL).appendLiteral('-').appendValue(MONTH_OF_YEAR, 2, 2, 
SignStyle.NORMAL)
+  .appendLiteral('-').appendValue(DAY_OF_MONTH, 2, 2, SignStyle.NORMAL)
+  .appendLiteral(" ").appendValue(HOUR_OF_DAY, 2, 2, 
SignStyle.NORMAL).appendLiteral(':')
+  .appendValue(MINUTE_OF_HOUR, 2, 2, SignStyle.NORMAL).appendLiteral(':')
+  .appendValue(SECOND_OF_MINUTE, 2, 2, SignStyle.NORMAL)
+  // Fractional Part (Optional)
+  .optionalStart().appendFraction(ChronoField.NANO_OF_SECOND, 0, 9, 
true).optionalEnd().toFormatter();
+

Review Comment:
   Moved to TimestampTZUtil, and renamed





Issue Time Tracking
---

Worklog Id: (was: 774533)
Time Spent: 50m  (was: 40m)

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26136) Implement UPDATE statements for Iceberg tables

2022-05-25 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26136.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, but forgot to close the jira

> Implement UPDATE statements for Iceberg tables
> --
>
> Key: HIVE-26136
> URL: https://issues.apache.org/jira/browse/HIVE-26136
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26092) Fix javadoc errors for the 4.0.0 release

2022-05-25 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26092.
---
Resolution: Fixed

Thanks [~slachiewicz]!

Forgot to close the jira.

> Fix javadoc errors for the 4.0.0 release
> 
>
> Key: HIVE-26092
> URL: https://issues.apache.org/jira/browse/HIVE-26092
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently there are plenty of errors in the javadoc.
> We should fix those before a final release



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26084) Oracle metastore init tests are flaky

2022-05-25 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542011#comment-17542011
 ] 

Zoltan Haindrich commented on HIVE-26084:
-

hmm..last time I checked I've only seen oracle-11g in xe - I'm so happy to see 
18 and 21 :D

> Oracle metastore init tests are flaky
> -
>
> Key: HIVE-26084
> URL: https://issues.apache.org/jira/browse/HIVE-26084
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Peter Vary
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> After HIVE-26022 we started to run the oracle metastore init tests, but they 
> seem to be flaky.
> I see this issue quite often: 
> http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-3147/1/pipeline/551
> We might have to increase the timeout, or use another oracle image for more 
> consistent tests.
> The error in the logs for future reference
> {code}
> [2022-03-28T14:10:07.804Z] + echo 127.0.0.1 dev_oracle
> [2022-03-28T14:10:07.804Z] + sudo tee -a /etc/hosts
> [2022-03-28T14:10:07.804Z] 127.0.0.1 dev_oracle
> [2022-03-28T14:10:07.804Z] + . /etc/profile.d/confs.sh
> [2022-03-28T14:10:07.804Z] ++ export MAVEN_OPTS=-Xmx2g
> [2022-03-28T14:10:07.804Z] ++ MAVEN_OPTS=-Xmx2g
> [2022-03-28T14:10:07.804Z] ++ export HADOOP_CONF_DIR=/etc/hadoop
> [2022-03-28T14:10:07.804Z] ++ HADOOP_CONF_DIR=/etc/hadoop
> [2022-03-28T14:10:07.804Z] ++ export HADOOP_LOG_DIR=/data/log
> [2022-03-28T14:10:07.804Z] ++ HADOOP_LOG_DIR=/data/log
> [2022-03-28T14:10:07.804Z] ++ export 
> 'HADOOP_CLASSPATH=/etc/tez/:/active/tez/lib/*:/active/tez/*:/apps/lib/*'
> [2022-03-28T14:10:07.804Z] ++ 
> HADOOP_CLASSPATH='/etc/tez/:/active/tez/lib/*:/active/tez/*:/apps/lib/*'
> [2022-03-28T14:10:07.804Z] ++ export HIVE_CONF_DIR=/etc/hive/
> [2022-03-28T14:10:07.804Z] ++ HIVE_CONF_DIR=/etc/hive/
> [2022-03-28T14:10:07.804Z] ++ export 
> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/active/hive/bin:/active/hadoop/bin:/active/eclipse/:/active/maven/bin/:/active/protobuf/bin:/active/visualvm/bin:/active/kubebuilder/bin:/active/idea/bin
> [2022-03-28T14:10:07.804Z] ++ 
> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/active/hive/bin:/active/hadoop/bin:/active/eclipse/:/active/maven/bin/:/active/protobuf/bin:/active/visualvm/bin:/active/kubebuilder/bin:/active/idea/bin
> [2022-03-28T14:10:07.804Z] ++ . /etc/profile.d/java.sh
> [2022-03-28T14:10:07.804Z] +++ export JAVA_HOME=/usr/lib/jvm/zulu-8-amd64/
> [2022-03-28T14:10:07.804Z] +++ JAVA_HOME=/usr/lib/jvm/zulu-8-amd64/
> [2022-03-28T14:10:07.804Z] + sw hive-dev 
> /home/jenkins/agent/workspace/hive-precommit_PR-3147
> [2022-03-28T14:10:07.804Z] @ activating: 
> /home/jenkins/agent/workspace/hive-precommit_PR-3147/packaging/target/apache-hive-4.0.0-alpha-1-SNAPSHOT-bin/apache-hive-4.0.0-alpha-1-SNAPSHOT-bin/
>  for hive
> [2022-03-28T14:10:07.804Z] + ping -c2 dev_oracle
> [2022-03-28T14:10:07.804Z] PING dev_oracle (127.0.0.1) 56(84) bytes of data.
> [2022-03-28T14:10:07.804Z] 64 bytes from localhost (127.0.0.1): icmp_seq=1 
> ttl=64 time=0.082 ms
> [2022-03-28T14:10:08.795Z] 64 bytes from localhost (127.0.0.1): icmp_seq=2 
> ttl=64 time=0.087 ms
> [2022-03-28T14:10:08.795Z] 
> [2022-03-28T14:10:08.795Z] --- dev_oracle ping statistics ---
> [2022-03-28T14:10:08.795Z] 2 packets transmitted, 2 received, 0% packet loss, 
> time 51ms
> [2022-03-28T14:10:08.795Z] rtt min/avg/max/mdev = 0.082/0.084/0.087/0.009 ms
> [2022-03-28T14:10:08.795Z] + export DOCKER_NETWORK=host
> [2022-03-28T14:10:08.795Z] + DOCKER_NETWORK=host
> [2022-03-28T14:10:08.795Z] + export DBNAME=metastore
> [2022-03-28T14:10:08.795Z] + DBNAME=metastore
> [2022-03-28T14:10:08.795Z] + reinit_metastore oracle
> [2022-03-28T14:10:08.795Z] @ initializing: oracle
> [2022-03-28T14:10:08.795Z] metastore database name: metastore
> [2022-03-28T14:10:09.135Z] @ starting dev_oracle...
> [2022-03-28T14:10:09.445Z] Unable to find image 
> 'quay.io/maksymbilenko/oracle-12c:latest' locally
> [2022-03-28T14:10:10.407Z] latest: Pulling from maksymbilenko/oracle-12c
> [2022-03-28T14:10:10.407Z] 8ba884070f61: Pulling fs layer
> [2022-03-28T14:10:10.407Z] ef9513b81046: Pulling fs layer
> [2022-03-28T14:10:10.407Z] 6f1de349e202: Pulling fs layer
> [2022-03-28T14:10:10.407Z] 5376ebfa0fa3: Pulling fs layer
> [2022-03-28T14:10:10.407Z] 5f632c3633d2: Pulling fs layer
> [2022-03-28T14:10:10.407Z] 3e74293031d2: Pulling fs layer
> [2022-03-28T14:10:10.407Z] 5376ebfa0fa3: Waiting
> [2022-03-28T14:10:10.407Z] 5f632c3633d2: Waiting
> [2022-03-28T14:10:10.407Z] 3e74293031d2: Waiting
> [2022-03-28T14:10:10.407Z] 6f1de349e202: Download complete
> [2022-03-28T14:10:11.365Z] ef9513b81046: Download complete
> [2022-03-28T14:10:11.365Z] 5f632c3633d2: 

[jira] [Commented] (HIVE-26263) Mysql metastore init tests are flaky

2022-05-25 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542008#comment-17542008
 ] 

Zoltan Haindrich commented on HIVE-26263:
-

I've [disabled the mysql/metastore test for 
now|https://github.com/apache/hive/commit/34b24d55ade393673424f077b69add43bad9f731]

its strange that this happens so frequently and only for this database type...


> Mysql metastore init tests are flaky
> 
>
> Key: HIVE-26263
> URL: https://issues.apache.org/jira/browse/HIVE-26263
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Zoltan Haindrich
>Priority: Major
>
> Similarly to HIVE-26084 (Oracle tests), also Mysql tests are failing 
> similarly.
> In both cases we use _:latest_ as docker image version, which is probably not 
> ideal.
> Reporting the error for future reference:
> {noformat}
> [2022-05-24T14:07:52.127Z] + sudo tee -a /etc/hosts
> [2022-05-24T14:07:52.127Z] + echo 127.0.0.1 dev_mysql
> [2022-05-24T14:07:52.127Z] 127.0.0.1 dev_mysql
> [2022-05-24T14:07:52.127Z] + . /etc/profile.d/confs.sh
> [2022-05-24T14:07:52.127Z] ++ export MAVEN_OPTS=-Xmx2g
> [2022-05-24T14:07:52.127Z] ++ MAVEN_OPTS=-Xmx2g
> [2022-05-24T14:07:52.127Z] ++ export HADOOP_CONF_DIR=/etc/hadoop
> [2022-05-24T14:07:52.127Z] ++ HADOOP_CONF_DIR=/etc/hadoop
> [2022-05-24T14:07:52.127Z] ++ export HADOOP_LOG_DIR=/data/log
> [2022-05-24T14:07:52.127Z] ++ HADOOP_LOG_DIR=/data/log
> [2022-05-24T14:07:52.127Z] ++ export 
> 'HADOOP_CLASSPATH=/etc/tez/:/active/tez/lib/*:/active/tez/*:/apps/lib/*'
> [2022-05-24T14:07:52.127Z] ++ 
> HADOOP_CLASSPATH='/etc/tez/:/active/tez/lib/*:/active/tez/*:/apps/lib/*'
> [2022-05-24T14:07:52.127Z] ++ export HIVE_CONF_DIR=/etc/hive/
> [2022-05-24T14:07:52.127Z] ++ HIVE_CONF_DIR=/etc/hive/
> [2022-05-24T14:07:52.127Z] ++ export 
> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/active/hive/bin:/active/hadoop/bin:/active/eclipse/:/active/maven/bin/:/active/protobuf/bin:/active/visualvm/bin:/active/kubebuilder/bin:/active/idea/bin
> [2022-05-24T14:07:52.127Z] ++ 
> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/active/hive/bin:/active/hadoop/bin:/active/eclipse/:/active/maven/bin/:/active/protobuf/bin:/active/visualvm/bin:/active/kubebuilder/bin:/active/idea/bin
> [2022-05-24T14:07:52.127Z] ++ . /etc/profile.d/java.sh
> [2022-05-24T14:07:52.127Z] +++ export JAVA_HOME=/usr/lib/jvm/zulu-8-amd64/
> [2022-05-24T14:07:52.127Z] +++ JAVA_HOME=/usr/lib/jvm/zulu-8-amd64/
> [2022-05-24T14:07:52.127Z] + sw hive-dev 
> /home/jenkins/agent/workspace/hive-precommit_PR-3317
> [2022-05-24T14:07:52.127Z] @ activating: 
> /home/jenkins/agent/workspace/hive-precommit_PR-3317/packaging/target/apache-hive-4.0.0-alpha-2-SNAPSHOT-bin/apache-hive-4.0.0-alpha-2-SNAPSHOT-bin/
>  for hive
> [2022-05-24T14:07:52.127Z] + ping -c2 dev_mysql
> [2022-05-24T14:07:52.127Z] PING dev_mysql (127.0.0.1) 56(84) bytes of data.
> [2022-05-24T14:07:52.127Z] 64 bytes from localhost (127.0.0.1): icmp_seq=1 
> ttl=64 time=0.114 ms
> [2022-05-24T14:07:53.107Z] 64 bytes from localhost (127.0.0.1): icmp_seq=2 
> ttl=64 time=0.123 ms
> [2022-05-24T14:07:53.107Z] 
> [2022-05-24T14:07:53.107Z] --- dev_mysql ping statistics ---
> [2022-05-24T14:07:53.107Z] 2 packets transmitted, 2 received, 0% packet loss, 
> time 49ms
> [2022-05-24T14:07:53.107Z] rtt min/avg/max/mdev = 0.114/0.118/0.123/0.011 ms
> [2022-05-24T14:07:53.107Z] + export DOCKER_NETWORK=host
> [2022-05-24T14:07:53.107Z] + DOCKER_NETWORK=host
> [2022-05-24T14:07:53.107Z] + export DBNAME=metastore
> [2022-05-24T14:07:53.107Z] + DBNAME=metastore
> [2022-05-24T14:07:53.107Z] + reinit_metastore mysql
> [2022-05-24T14:07:53.107Z] @ initializing: mysql
> [2022-05-24T14:07:53.107Z] metastore database name: metastore
> [2022-05-24T14:07:53.381Z] @ starting dev_mysql...
> [2022-05-24T14:07:53.382Z] Unable to find image 'mariadb:latest' locally
> [2022-05-24T14:07:54.354Z] latest: Pulling from library/mariadb
> [2022-05-24T14:07:54.354Z] 125a6e411906: Pulling fs layer
> [2022-05-24T14:07:54.354Z] a28b55cc656d: Pulling fs layer
> [2022-05-24T14:07:54.354Z] f2325f4e25a1: Pulling fs layer
> [2022-05-24T14:07:54.354Z] c6c2d09f748d: Pulling fs layer
> [2022-05-24T14:07:54.354Z] af2b4ed853d2: Pulling fs layer
> [2022-05-24T14:07:54.354Z] 8394ac6b401e: Pulling fs layer
> [2022-05-24T14:07:54.354Z] 5b150cf0c5a7: Pulling fs layer
> [2022-05-24T14:07:54.354Z] 1b11b2e20899: Pulling fs layer
> [2022-05-24T14:07:54.354Z] 3d35790a91d9: Pulling fs layer
> [2022-05-24T14:07:54.354Z] 5e73c7793365: Pulling fs layer
> [2022-05-24T14:07:54.354Z] 3d34b9f14ede: Pulling fs layer
> [2022-05-24T14:07:54.354Z] c6c2d09f748d: Waiting
> [2022-05-24T14:07:54.354Z] 8394ac6b401e: 

[jira] [Assigned] (HIVE-26263) Mysql metastore init tests are flaky

2022-05-25 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-26263:
---

Assignee: Zoltan Haindrich

> Mysql metastore init tests are flaky
> 
>
> Key: HIVE-26263
> URL: https://issues.apache.org/jira/browse/HIVE-26263
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Zoltan Haindrich
>Priority: Major
>
> Similarly to HIVE-26084 (Oracle tests), also Mysql tests are failing 
> similarly.
> In both cases we use _:latest_ as docker image version, which is probably not 
> ideal.
> Reporting the error for future reference:
> {noformat}
> [2022-05-24T14:07:52.127Z] + sudo tee -a /etc/hosts
> [2022-05-24T14:07:52.127Z] + echo 127.0.0.1 dev_mysql
> [2022-05-24T14:07:52.127Z] 127.0.0.1 dev_mysql
> [2022-05-24T14:07:52.127Z] + . /etc/profile.d/confs.sh
> [2022-05-24T14:07:52.127Z] ++ export MAVEN_OPTS=-Xmx2g
> [2022-05-24T14:07:52.127Z] ++ MAVEN_OPTS=-Xmx2g
> [2022-05-24T14:07:52.127Z] ++ export HADOOP_CONF_DIR=/etc/hadoop
> [2022-05-24T14:07:52.127Z] ++ HADOOP_CONF_DIR=/etc/hadoop
> [2022-05-24T14:07:52.127Z] ++ export HADOOP_LOG_DIR=/data/log
> [2022-05-24T14:07:52.127Z] ++ HADOOP_LOG_DIR=/data/log
> [2022-05-24T14:07:52.127Z] ++ export 
> 'HADOOP_CLASSPATH=/etc/tez/:/active/tez/lib/*:/active/tez/*:/apps/lib/*'
> [2022-05-24T14:07:52.127Z] ++ 
> HADOOP_CLASSPATH='/etc/tez/:/active/tez/lib/*:/active/tez/*:/apps/lib/*'
> [2022-05-24T14:07:52.127Z] ++ export HIVE_CONF_DIR=/etc/hive/
> [2022-05-24T14:07:52.127Z] ++ HIVE_CONF_DIR=/etc/hive/
> [2022-05-24T14:07:52.127Z] ++ export 
> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/active/hive/bin:/active/hadoop/bin:/active/eclipse/:/active/maven/bin/:/active/protobuf/bin:/active/visualvm/bin:/active/kubebuilder/bin:/active/idea/bin
> [2022-05-24T14:07:52.127Z] ++ 
> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/active/hive/bin:/active/hadoop/bin:/active/eclipse/:/active/maven/bin/:/active/protobuf/bin:/active/visualvm/bin:/active/kubebuilder/bin:/active/idea/bin
> [2022-05-24T14:07:52.127Z] ++ . /etc/profile.d/java.sh
> [2022-05-24T14:07:52.127Z] +++ export JAVA_HOME=/usr/lib/jvm/zulu-8-amd64/
> [2022-05-24T14:07:52.127Z] +++ JAVA_HOME=/usr/lib/jvm/zulu-8-amd64/
> [2022-05-24T14:07:52.127Z] + sw hive-dev 
> /home/jenkins/agent/workspace/hive-precommit_PR-3317
> [2022-05-24T14:07:52.127Z] @ activating: 
> /home/jenkins/agent/workspace/hive-precommit_PR-3317/packaging/target/apache-hive-4.0.0-alpha-2-SNAPSHOT-bin/apache-hive-4.0.0-alpha-2-SNAPSHOT-bin/
>  for hive
> [2022-05-24T14:07:52.127Z] + ping -c2 dev_mysql
> [2022-05-24T14:07:52.127Z] PING dev_mysql (127.0.0.1) 56(84) bytes of data.
> [2022-05-24T14:07:52.127Z] 64 bytes from localhost (127.0.0.1): icmp_seq=1 
> ttl=64 time=0.114 ms
> [2022-05-24T14:07:53.107Z] 64 bytes from localhost (127.0.0.1): icmp_seq=2 
> ttl=64 time=0.123 ms
> [2022-05-24T14:07:53.107Z] 
> [2022-05-24T14:07:53.107Z] --- dev_mysql ping statistics ---
> [2022-05-24T14:07:53.107Z] 2 packets transmitted, 2 received, 0% packet loss, 
> time 49ms
> [2022-05-24T14:07:53.107Z] rtt min/avg/max/mdev = 0.114/0.118/0.123/0.011 ms
> [2022-05-24T14:07:53.107Z] + export DOCKER_NETWORK=host
> [2022-05-24T14:07:53.107Z] + DOCKER_NETWORK=host
> [2022-05-24T14:07:53.107Z] + export DBNAME=metastore
> [2022-05-24T14:07:53.107Z] + DBNAME=metastore
> [2022-05-24T14:07:53.107Z] + reinit_metastore mysql
> [2022-05-24T14:07:53.107Z] @ initializing: mysql
> [2022-05-24T14:07:53.107Z] metastore database name: metastore
> [2022-05-24T14:07:53.381Z] @ starting dev_mysql...
> [2022-05-24T14:07:53.382Z] Unable to find image 'mariadb:latest' locally
> [2022-05-24T14:07:54.354Z] latest: Pulling from library/mariadb
> [2022-05-24T14:07:54.354Z] 125a6e411906: Pulling fs layer
> [2022-05-24T14:07:54.354Z] a28b55cc656d: Pulling fs layer
> [2022-05-24T14:07:54.354Z] f2325f4e25a1: Pulling fs layer
> [2022-05-24T14:07:54.354Z] c6c2d09f748d: Pulling fs layer
> [2022-05-24T14:07:54.354Z] af2b4ed853d2: Pulling fs layer
> [2022-05-24T14:07:54.354Z] 8394ac6b401e: Pulling fs layer
> [2022-05-24T14:07:54.354Z] 5b150cf0c5a7: Pulling fs layer
> [2022-05-24T14:07:54.354Z] 1b11b2e20899: Pulling fs layer
> [2022-05-24T14:07:54.354Z] 3d35790a91d9: Pulling fs layer
> [2022-05-24T14:07:54.354Z] 5e73c7793365: Pulling fs layer
> [2022-05-24T14:07:54.354Z] 3d34b9f14ede: Pulling fs layer
> [2022-05-24T14:07:54.354Z] c6c2d09f748d: Waiting
> [2022-05-24T14:07:54.354Z] 8394ac6b401e: Waiting
> [2022-05-24T14:07:54.354Z] 5b150cf0c5a7: Waiting
> [2022-05-24T14:07:54.354Z] 3d35790a91d9: Waiting
> [2022-05-24T14:07:54.354Z] 5e73c7793365: Waiting
> [2022-05-24T14:07:54.354Z] 3d34b9f14ede: Waiting
> 

[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774483=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774483
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 25/May/22 11:12
Start Date: 25/May/22 11:12
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3295:
URL: https://github.com/apache/hive/pull/3295#discussion_r881514185


##
common/src/java/org/apache/hadoop/hive/common/type/Timestamp.java:
##
@@ -101,6 +101,16 @@ public class Timestamp implements Comparable {
   // Fractional Part (Optional)
   .optionalStart().appendFraction(ChronoField.NANO_OF_SECOND, 0, 9, 
true).optionalEnd().toFormatter();
 
+  private static final DateTimeFormatter PRINT_LENIENT_FORMATTER = new 
DateTimeFormatterBuilder()
+  // Date and Time Parts
+  .appendValue(YEAR, 4, 10, 
SignStyle.NORMAL).appendLiteral('-').appendValue(MONTH_OF_YEAR, 2, 2, 
SignStyle.NORMAL)
+  .appendLiteral('-').appendValue(DAY_OF_MONTH, 2, 2, SignStyle.NORMAL)
+  .appendLiteral(" ").appendValue(HOUR_OF_DAY, 2, 2, 
SignStyle.NORMAL).appendLiteral(':')
+  .appendValue(MINUTE_OF_HOUR, 2, 2, SignStyle.NORMAL).appendLiteral(':')
+  .appendValue(SECOND_OF_MINUTE, 2, 2, SignStyle.NORMAL)
+  // Fractional Part (Optional)
+  .optionalStart().appendFraction(ChronoField.NANO_OF_SECOND, 0, 9, 
true).optionalEnd().toFormatter();
+

Review Comment:
   Maybe it would be better to move this formatter in `TimestampTZUtil` since 
it should be strictly used for `LEGACY` purposes and since there is another 
formatter there as well.



##
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java:
##
@@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() {
 verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", 
LogicalTypeAnnotation.TimeUnit.NANOS, false);
   }
 
+  @Test
+  public void testTimestamp() {

Review Comment:
   What do you think of refactoring and moving the test in 
`TestParquetTimestampsHive2Compatibility` which exactly about compatibility 
with Hive 2? 



##
common/src/java/org/apache/hadoop/hive/common/type/Timestamp.java:
##
@@ -101,6 +101,16 @@ public class Timestamp implements Comparable {
   // Fractional Part (Optional)
   .optionalStart().appendFraction(ChronoField.NANO_OF_SECOND, 0, 9, 
true).optionalEnd().toFormatter();
 
+  private static final DateTimeFormatter PRINT_LENIENT_FORMATTER = new 
DateTimeFormatterBuilder()
+  // Date and Time Parts
+  .appendValue(YEAR, 4, 10, 
SignStyle.NORMAL).appendLiteral('-').appendValue(MONTH_OF_YEAR, 2, 2, 
SignStyle.NORMAL)
+  .appendLiteral('-').appendValue(DAY_OF_MONTH, 2, 2, SignStyle.NORMAL)
+  .appendLiteral(" ").appendValue(HOUR_OF_DAY, 2, 2, 
SignStyle.NORMAL).appendLiteral(':')
+  .appendValue(MINUTE_OF_HOUR, 2, 2, SignStyle.NORMAL).appendLiteral(':')
+  .appendValue(SECOND_OF_MINUTE, 2, 2, SignStyle.NORMAL)
+  // Fractional Part (Optional)
+  .optionalStart().appendFraction(ChronoField.NANO_OF_SECOND, 0, 9, 
true).optionalEnd().toFormatter();
+

Review Comment:
   Also using `LENIENT` in the name is a bit misleading since it implies that 
`DateTimeFormatterBuilder#parseLenient` is in use which is not the case here.



##
common/src/java/org/apache/hadoop/hive/common/type/Timestamp.java:
##
@@ -128,6 +138,10 @@ public String toString() {
 return localDateTime.format(PRINT_FORMATTER);
   }
 
+  public String toStingWithLenientFormatter() {
+return localDateTime.format(PRINT_LENIENT_FORMATTER);
+  }
+

Review Comment:
   The use of `Lenient` is a bit misleading as I wrote previously. Also there 
is a small typo in the method name `toSting` vs `toString`.
   
   Instead of adding a new method we could use `Timestamp#format` passing in 
the desired formatter. With the right naming for the formatter parameter it 
would make the intention more clear.





Issue Time Tracking
---

Worklog Id: (was: 774483)
Time Spent: 40m  (was: 0.5h)

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we 

[jira] [Comment Edited] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541969#comment-17541969
 ] 

Stamatis Zampetakis edited comment on HIVE-26233 at 5/25/22 10:50 AM:
--

Based on the git history I suspect the problem here starts appearing after 
HIVE-20007 although I haven't tried to repeat the test on that commit.


was (Author: zabetak):
I suspect the problem here starts appearing after HIVE-20007 although I haven't 
confirmed this.

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541969#comment-17541969
 ] 

Stamatis Zampetakis commented on HIVE-26233:


I suspect the problem here starts appearing after HIVE-20007 although I haven't 
confirmed this.

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541968#comment-17541968
 ] 

Stamatis Zampetakis commented on HIVE-26233:


It's clear that people who need HIVE-24074 should also include this fix if they 
want to avoid this problem.

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541951#comment-17541951
 ] 

Stamatis Zampetakis commented on HIVE-26233:


I was wrong to believe that the problem here is a result of the proleptic 
calendar changes in HIVE-22589. The test in the PR consistently fails with or 
without HIVE-22589 so my previous observation was wrong. (I suspect my 
workspace was stale while I was doing the tests which led me to the wrong 
conclusions). 

{quote}I did a quick test and the test case you included in the PR passes 
successfully by reverting these changes in NanoTimeUtils{quote}

There may other problems with the proleptic calendar changes but it's 
definitely not related with the problem described here so I will remove the 
direct link to HIVE-22589 to avoid further confusion.

Apart from that I did some additional experiments writing timestamps in Hive 2 
and I confirm that what Peter is saying is true. Writing timestamps in any UTC 
- X timezone can result in a date greater than 1 to be stored in the 
Parquet file and the latter cannot be handled by the current code. 

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26233) Problems reading back PARQUET timestamps above 10000 years

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26233?focusedWorklogId=774462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774462
 ]

ASF GitHub Bot logged work on HIVE-26233:
-

Author: ASF GitHub Bot
Created on: 25/May/22 10:00
Start Date: 25/May/22 10:00
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3295:
URL: https://github.com/apache/hive/pull/3295#discussion_r881466988


##
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java:
##
@@ -389,6 +390,26 @@ public void testIllegalInt64TimestampStrings() {
 verifyInt64TimestampValue("2262-04-11 23:47:16.854775808", 
LogicalTypeAnnotation.TimeUnit.NANOS, false);
   }
 
+  @Test

Review Comment:
   What I wrote above is not true. Reverting the changes about the proleptic 
calendar in `NanoTimeUtils` does not fix this test. I don't know why the test 
was passing before but I suspect a "stale" workspace.





Issue Time Tracking
---

Worklog Id: (was: 774462)
Time Spent: 0.5h  (was: 20m)

> Problems reading back PARQUET timestamps above 1 years
> --
>
> Key: HIVE-26233
> URL: https://issues.apache.org/jira/browse/HIVE-26233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available, 
> timestamp
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Timestamp values above year 1 are not supported, but during the migration 
> from Hive2 to Hive3 some might appear because of TZ issues. We should be able 
> to at least read these tables before rewriting the data.
> For this we need to change the Timestamp.PRINT_FORMATTER, so no {{+}} sign is 
> appended to the timestamp if the year exceeds 4 digits.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26228) Implement Iceberg table rollback feature

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26228?focusedWorklogId=774440=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774440
 ]

ASF GitHub Bot logged work on HIVE-26228:
-

Author: ASF GitHub Bot
Created on: 25/May/22 09:11
Start Date: 25/May/22 09:11
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3287:
URL: https://github.com/apache/hive/pull/3287#discussion_r881416545


##
ql/src/java/org/apache/hadoop/hive/ql/parse/AlterTableExecuteSpec.java:
##
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.parse;
+
+/**
+ * Execute operation specification. It stores the type of the operation and 
its parameters.
+ * The following operation are supported
+ * 
+ *   Rollback
+ * 
+ * @param  Value object class to store the operation specific parameters
+ */
+public class AlterTableExecuteSpec {
+
+  public enum ExecuteOperationType {
+ROLLBACK
+  }
+
+  private final ExecuteOperationType operationType;
+  private final T operationParams;
+
+  public AlterTableExecuteSpec(ExecuteOperationType type, T value) {
+this.operationType = type;
+this.operationParams = value;
+  }
+
+  public ExecuteOperationType getOperationType() {
+return operationType;
+  }
+
+  public T getOperationParams() {
+return operationParams;
+  }
+
+  @Override
+  public String toString() {
+return "operationType=" + operationType.name() + ", " + operationParams;
+  }
+
+  /**
+   * Value object class, that stores the rollback operation specific parameters
+   * 
+   *   Rollback type: it can be either version based or time based
+   *   Rollback value: it should either a snapshot id or a timestamp in 
milliseconds
+   * 
+   */
+  public static class RollbackSpec {
+
+public enum RollbackType {
+  VERSION, TIME
+}
+
+private final RollbackType rollbackType;
+private final Long param;
+
+public RollbackSpec(RollbackType rollbackType, Long param) {
+  this.rollbackType = rollbackType;
+  this.param = param;
+}
+
+public RollbackType getRollbackType() {
+  return rollbackType;
+}
+
+public Long getParam() {
+  return param;
+}
+
+@Override
+public String toString() {
+  return "rollbackType=" + rollbackType.name() + ", param=" + param;

Review Comment:
   nit: We might want to use MoreObjects to generate the toString?





Issue Time Tracking
---

Worklog Id: (was: 774440)
Time Spent: 4h 10m  (was: 4h)

> Implement Iceberg table rollback feature
> 
>
> Key: HIVE-26228
> URL: https://issues.apache.org/jira/browse/HIVE-26228
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> We should allow rolling back iceberg table's data to the state at an older 
> table snapshot. 
> Rollback to the last snapshot before a specific timestamp
> {code:java}
> ALTER TABLE ice_t EXECUTE ROLLBACK('2022-05-12 00:00:00')
> {code}
> Rollback to a specific snapshot ID
> {code:java}
>  ALTER TABLE ice_t EXECUTE ROLLBACK(); 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26228) Implement Iceberg table rollback feature

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26228?focusedWorklogId=774439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774439
 ]

ASF GitHub Bot logged work on HIVE-26228:
-

Author: ASF GitHub Bot
Created on: 25/May/22 09:11
Start Date: 25/May/22 09:11
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3287:
URL: https://github.com/apache/hive/pull/3287#discussion_r881416218


##
ql/src/java/org/apache/hadoop/hive/ql/parse/AlterTableExecuteSpec.java:
##
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.parse;
+
+/**
+ * Execute operation specification. It stores the type of the operation and 
its parameters.
+ * The following operation are supported
+ * 
+ *   Rollback
+ * 
+ * @param  Value object class to store the operation specific parameters
+ */
+public class AlterTableExecuteSpec {
+
+  public enum ExecuteOperationType {
+ROLLBACK
+  }
+
+  private final ExecuteOperationType operationType;
+  private final T operationParams;
+
+  public AlterTableExecuteSpec(ExecuteOperationType type, T value) {
+this.operationType = type;
+this.operationParams = value;
+  }
+
+  public ExecuteOperationType getOperationType() {
+return operationType;
+  }
+
+  public T getOperationParams() {
+return operationParams;
+  }
+
+  @Override
+  public String toString() {
+return "operationType=" + operationType.name() + ", " + operationParams;

Review Comment:
   nit: We might want to use `MoreObjects` to generate the toString?





Issue Time Tracking
---

Worklog Id: (was: 774439)
Time Spent: 4h  (was: 3h 50m)

> Implement Iceberg table rollback feature
> 
>
> Key: HIVE-26228
> URL: https://issues.apache.org/jira/browse/HIVE-26228
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> We should allow rolling back iceberg table's data to the state at an older 
> table snapshot. 
> Rollback to the last snapshot before a specific timestamp
> {code:java}
> ALTER TABLE ice_t EXECUTE ROLLBACK('2022-05-12 00:00:00')
> {code}
> Rollback to a specific snapshot ID
> {code:java}
>  ALTER TABLE ice_t EXECUTE ROLLBACK(); 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26228) Implement Iceberg table rollback feature

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26228?focusedWorklogId=774441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774441
 ]

ASF GitHub Bot logged work on HIVE-26228:
-

Author: ASF GitHub Bot
Created on: 25/May/22 09:11
Start Date: 25/May/22 09:11
Worklog Time Spent: 10m 
  Work Description: pvary commented on PR #3287:
URL: https://github.com/apache/hive/pull/3287#issuecomment-1136995297

   I like this new approach very much!




Issue Time Tracking
---

Worklog Id: (was: 774441)
Time Spent: 4h 20m  (was: 4h 10m)

> Implement Iceberg table rollback feature
> 
>
> Key: HIVE-26228
> URL: https://issues.apache.org/jira/browse/HIVE-26228
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> We should allow rolling back iceberg table's data to the state at an older 
> table snapshot. 
> Rollback to the last snapshot before a specific timestamp
> {code:java}
> ALTER TABLE ice_t EXECUTE ROLLBACK('2022-05-12 00:00:00')
> {code}
> Rollback to a specific snapshot ID
> {code:java}
>  ALTER TABLE ice_t EXECUTE ROLLBACK(); 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26228) Implement Iceberg table rollback feature

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26228?focusedWorklogId=774437=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774437
 ]

ASF GitHub Bot logged work on HIVE-26228:
-

Author: ASF GitHub Bot
Created on: 25/May/22 09:10
Start Date: 25/May/22 09:10
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3287:
URL: https://github.com/apache/hive/pull/3287#discussion_r881415324


##
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##
@@ -6505,4 +6505,15 @@ public StorageHandlerInfo getStorageHandlerInfo(Table 
table)
   throw new HiveException(e);
 }
   }
+
+  public void alterTableExecuteOperation(Table table, AlterTableExecuteSpec 
executeSpec) throws HiveException {
+try {
+  HiveStorageHandler storageHandler = 
createStorageHandler(table.getTTable());
+  if (storageHandler.supportsExecuteOperations()) {

Review Comment:
   Shall we throw an exception if the feature is not supported by the table?





Issue Time Tracking
---

Worklog Id: (was: 774437)
Time Spent: 3h 50m  (was: 3h 40m)

> Implement Iceberg table rollback feature
> 
>
> Key: HIVE-26228
> URL: https://issues.apache.org/jira/browse/HIVE-26228
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> We should allow rolling back iceberg table's data to the state at an older 
> table snapshot. 
> Rollback to the last snapshot before a specific timestamp
> {code:java}
> ALTER TABLE ice_t EXECUTE ROLLBACK('2022-05-12 00:00:00')
> {code}
> Rollback to a specific snapshot ID
> {code:java}
>  ALTER TABLE ice_t EXECUTE ROLLBACK(); 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26228) Implement Iceberg table rollback feature

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26228?focusedWorklogId=774435=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774435
 ]

ASF GitHub Bot logged work on HIVE-26228:
-

Author: ASF GitHub Bot
Created on: 25/May/22 09:08
Start Date: 25/May/22 09:08
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3287:
URL: https://github.com/apache/hive/pull/3287#discussion_r881413044


##
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/execute/AlterTableExecuteAnalyzer.java:
##
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.ddl.table.execute;
+
+import org.apache.hadoop.hive.common.TableName;
+import org.apache.hadoop.hive.common.type.TimestampTZ;
+import org.apache.hadoop.hive.common.type.TimestampTZUtil;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.ddl.DDLSemanticAnalyzerFactory.DDLType;
+import org.apache.hadoop.hive.ql.ddl.DDLWork;
+import org.apache.hadoop.hive.ql.ddl.table.AbstractAlterTableAnalyzer;
+import org.apache.hadoop.hive.ql.ddl.table.AlterTableType;
+import org.apache.hadoop.hive.ql.exec.TaskFactory;
+import org.apache.hadoop.hive.ql.hooks.ReadEntity;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.parse.ASTNode;
+import org.apache.hadoop.hive.ql.parse.HiveParser;
+import org.apache.hadoop.hive.ql.parse.AlterTableExecuteSpec;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.session.SessionState;
+
+import java.time.ZoneId;
+import java.util.Map;
+
+/**
+ * Analyzer for ALTER TABLE ... EXECUTE commands.
+ */
+@DDLType(types = HiveParser.TOK_ALTERTABLE_EXECUTE)
+public class AlterTableExecuteAnalyzer extends AbstractAlterTableAnalyzer {
+
+  public AlterTableExecuteAnalyzer(QueryState queryState) throws 
SemanticException {
+super(queryState);
+  }
+
+  @Override
+  protected void analyzeCommand(TableName tableName, Map 
partitionSpec, ASTNode command)
+  throws SemanticException {
+Table table = getTable(tableName);
+// the first child must be the execute operation type
+ASTNode executeCommandType = (ASTNode) command.getChild(0);
+validateAlterTableType(table, AlterTableType.EXECUTE, false);
+inputs.add(new ReadEntity(table));
+AlterTableExecuteDesc desc = null;
+if (HiveParser.KW_ROLLBACK == executeCommandType.getType()) {
+  AlterTableExecuteSpec spec;
+  // the second child must be the rollback parameter
+  ASTNode child = (ASTNode) command.getChild(1);
+
+  if (child.getType() == HiveParser.StringLiteral) {
+ZoneId timeZone = SessionState.get() == null ? new 
HiveConf().getLocalTimeZone() : SessionState.get().getConf()
+.getLocalTimeZone();
+TimestampTZ time = 
TimestampTZUtil.parse(PlanUtils.stripQuotes(child.getText()), timeZone);
+spec = new 
AlterTableExecuteSpec(AlterTableExecuteSpec.ExecuteOperationType.ROLLBACK,
+new 
AlterTableExecuteSpec.RollbackSpec(AlterTableExecuteSpec.RollbackSpec.RollbackType.TIME,

Review Comment:
   nit: maybe static import `AlterTableExecuteSpec.RollbackSpec`?





Issue Time Tracking
---

Worklog Id: (was: 774435)
Time Spent: 3.5h  (was: 3h 20m)

> Implement Iceberg table rollback feature
> 
>
> Key: HIVE-26228
> URL: https://issues.apache.org/jira/browse/HIVE-26228
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> We should allow rolling back iceberg table's data to the state at an older 
> table snapshot. 
> Rollback to the last snapshot before a specific timestamp
> {code:java}
> ALTER TABLE ice_t EXECUTE ROLLBACK('2022-05-12 00:00:00')
> {code}
> Rollback to a 

[jira] [Work logged] (HIVE-26228) Implement Iceberg table rollback feature

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26228?focusedWorklogId=774436=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774436
 ]

ASF GitHub Bot logged work on HIVE-26228:
-

Author: ASF GitHub Bot
Created on: 25/May/22 09:08
Start Date: 25/May/22 09:08
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3287:
URL: https://github.com/apache/hive/pull/3287#discussion_r881413355


##
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/execute/AlterTableExecuteAnalyzer.java:
##
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.ddl.table.execute;
+
+import org.apache.hadoop.hive.common.TableName;
+import org.apache.hadoop.hive.common.type.TimestampTZ;
+import org.apache.hadoop.hive.common.type.TimestampTZUtil;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.ddl.DDLSemanticAnalyzerFactory.DDLType;
+import org.apache.hadoop.hive.ql.ddl.DDLWork;
+import org.apache.hadoop.hive.ql.ddl.table.AbstractAlterTableAnalyzer;
+import org.apache.hadoop.hive.ql.ddl.table.AlterTableType;
+import org.apache.hadoop.hive.ql.exec.TaskFactory;
+import org.apache.hadoop.hive.ql.hooks.ReadEntity;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.parse.ASTNode;
+import org.apache.hadoop.hive.ql.parse.HiveParser;
+import org.apache.hadoop.hive.ql.parse.AlterTableExecuteSpec;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.plan.PlanUtils;
+import org.apache.hadoop.hive.ql.session.SessionState;
+
+import java.time.ZoneId;
+import java.util.Map;
+
+/**
+ * Analyzer for ALTER TABLE ... EXECUTE commands.
+ */
+@DDLType(types = HiveParser.TOK_ALTERTABLE_EXECUTE)
+public class AlterTableExecuteAnalyzer extends AbstractAlterTableAnalyzer {
+
+  public AlterTableExecuteAnalyzer(QueryState queryState) throws 
SemanticException {
+super(queryState);
+  }
+
+  @Override
+  protected void analyzeCommand(TableName tableName, Map 
partitionSpec, ASTNode command)
+  throws SemanticException {
+Table table = getTable(tableName);
+// the first child must be the execute operation type
+ASTNode executeCommandType = (ASTNode) command.getChild(0);
+validateAlterTableType(table, AlterTableType.EXECUTE, false);
+inputs.add(new ReadEntity(table));
+AlterTableExecuteDesc desc = null;
+if (HiveParser.KW_ROLLBACK == executeCommandType.getType()) {
+  AlterTableExecuteSpec spec;
+  // the second child must be the rollback parameter
+  ASTNode child = (ASTNode) command.getChild(1);
+
+  if (child.getType() == HiveParser.StringLiteral) {
+ZoneId timeZone = SessionState.get() == null ? new 
HiveConf().getLocalTimeZone() : SessionState.get().getConf()
+.getLocalTimeZone();
+TimestampTZ time = 
TimestampTZUtil.parse(PlanUtils.stripQuotes(child.getText()), timeZone);
+spec = new 
AlterTableExecuteSpec(AlterTableExecuteSpec.ExecuteOperationType.ROLLBACK,
+new 
AlterTableExecuteSpec.RollbackSpec(AlterTableExecuteSpec.RollbackSpec.RollbackType.TIME,

Review Comment:
   Or even `AlterTableExecuteSpec.RollbackSpec.RollbackType`?





Issue Time Tracking
---

Worklog Id: (was: 774436)
Time Spent: 3h 40m  (was: 3.5h)

> Implement Iceberg table rollback feature
> 
>
> Key: HIVE-26228
> URL: https://issues.apache.org/jira/browse/HIVE-26228
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> We should allow rolling back iceberg table's data to the state at an older 
> table snapshot. 
> Rollback to the last snapshot before a specific timestamp
> {code:java}
> ALTER TABLE ice_t EXECUTE ROLLBACK('2022-05-12 00:00:00')
> {code}
> Rollback to a 

[jira] [Work logged] (HIVE-26228) Implement Iceberg table rollback feature

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26228?focusedWorklogId=774431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774431
 ]

ASF GitHub Bot logged work on HIVE-26228:
-

Author: ASF GitHub Bot
Created on: 25/May/22 09:06
Start Date: 25/May/22 09:06
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3287:
URL: https://github.com/apache/hive/pull/3287#discussion_r881410490


##
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergTimeTravel.java:
##
@@ -20,31 +20,35 @@
 package org.apache.iceberg.mr.hive;
 
 import java.io.IOException;
-import java.text.SimpleDateFormat;
-import java.util.Date;
 import java.util.List;
 import org.apache.iceberg.AssertHelpers;
 import org.apache.iceberg.HistoryEntry;
 import org.apache.iceberg.Table;
 import org.junit.Assert;
 import org.junit.Test;
 
+import static 
org.apache.iceberg.mr.hive.HiveIcebergTestUtils.timestampAfterSnapshot;
+
 /**
  * Tests covering the time travel feature, aka reading from a table as of a 
certain snapshot.
  */
 public class TestHiveIcebergTimeTravel extends 
HiveIcebergStorageHandlerWithEngineBase {
 
   @Test
   public void testSelectAsOfTimestamp() throws IOException, 
InterruptedException {
-Table table = prepareTableWithVersions(2);
+Table table = testTables.createTableWithVersions(shell, "customers",
+HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS, 2);
 
 List rows = shell.executeStatement(
-"SELECT * FROM customers FOR SYSTEM_TIME AS OF '" + 
timestampAfterSnapshot(table, 0) + "'");
+"SELECT * FROM customers FOR SYSTEM_TIME AS OF '" +

Review Comment:
   What is the change here?





Issue Time Tracking
---

Worklog Id: (was: 774431)
Time Spent: 3h 10m  (was: 3h)

> Implement Iceberg table rollback feature
> 
>
> Key: HIVE-26228
> URL: https://issues.apache.org/jira/browse/HIVE-26228
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> We should allow rolling back iceberg table's data to the state at an older 
> table snapshot. 
> Rollback to the last snapshot before a specific timestamp
> {code:java}
> ALTER TABLE ice_t EXECUTE ROLLBACK('2022-05-12 00:00:00')
> {code}
> Rollback to a specific snapshot ID
> {code:java}
>  ALTER TABLE ice_t EXECUTE ROLLBACK(); 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26228) Implement Iceberg table rollback feature

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26228?focusedWorklogId=774432=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774432
 ]

ASF GitHub Bot logged work on HIVE-26228:
-

Author: ASF GitHub Bot
Created on: 25/May/22 09:06
Start Date: 25/May/22 09:06
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3287:
URL: https://github.com/apache/hive/pull/3287#discussion_r881411074


##
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java:
##
@@ -379,4 +382,23 @@ public static DeleteFile createPositionalDeleteFile(Table 
table, String deleteFi
 return posWriter.toDeleteFile();
   }
 
+  /**
+   * Get the timestamp string which we can use in the queries. The timestamp 
will be after the given snapshot
+   * and before the next one
+   * @param table The table which we want to query
+   * @param snapshotPosition The position of the last snapshot we want to see 
in the query results
+   * @return The timestamp which we can use in the queries
+   */
+  public static String timestampAfterSnapshot(Table table, int 
snapshotPosition) {

Review Comment:
   Never mind, I see you just refactored this :)





Issue Time Tracking
---

Worklog Id: (was: 774432)
Time Spent: 3h 20m  (was: 3h 10m)

> Implement Iceberg table rollback feature
> 
>
> Key: HIVE-26228
> URL: https://issues.apache.org/jira/browse/HIVE-26228
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> We should allow rolling back iceberg table's data to the state at an older 
> table snapshot. 
> Rollback to the last snapshot before a specific timestamp
> {code:java}
> ALTER TABLE ice_t EXECUTE ROLLBACK('2022-05-12 00:00:00')
> {code}
> Rollback to a specific snapshot ID
> {code:java}
>  ALTER TABLE ice_t EXECUTE ROLLBACK(); 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26228) Implement Iceberg table rollback feature

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26228?focusedWorklogId=774429=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774429
 ]

ASF GitHub Bot logged work on HIVE-26228:
-

Author: ASF GitHub Bot
Created on: 25/May/22 09:03
Start Date: 25/May/22 09:03
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3287:
URL: https://github.com/apache/hive/pull/3287#discussion_r881407923


##
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java:
##
@@ -379,4 +382,23 @@ public static DeleteFile createPositionalDeleteFile(Table 
table, String deleteFi
 return posWriter.toDeleteFile();
   }
 
+  /**
+   * Get the timestamp string which we can use in the queries. The timestamp 
will be after the given snapshot
+   * and before the next one
+   * @param table The table which we want to query
+   * @param snapshotPosition The position of the last snapshot we want to see 
in the query results
+   * @return The timestamp which we can use in the queries
+   */
+  public static String timestampAfterSnapshot(Table table, int 
snapshotPosition) {

Review Comment:
   Could this be reused for the timetravel tests?





Issue Time Tracking
---

Worklog Id: (was: 774429)
Time Spent: 3h  (was: 2h 50m)

> Implement Iceberg table rollback feature
> 
>
> Key: HIVE-26228
> URL: https://issues.apache.org/jira/browse/HIVE-26228
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> We should allow rolling back iceberg table's data to the state at an older 
> table snapshot. 
> Rollback to the last snapshot before a specific timestamp
> {code:java}
> ALTER TABLE ice_t EXECUTE ROLLBACK('2022-05-12 00:00:00')
> {code}
> Rollback to a specific snapshot ID
> {code:java}
>  ALTER TABLE ice_t EXECUTE ROLLBACK(); 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26217) Make CTAS use Direct Insert Semantics

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26217?focusedWorklogId=774417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774417
 ]

ASF GitHub Bot logged work on HIVE-26217:
-

Author: ASF GitHub Bot
Created on: 25/May/22 08:31
Start Date: 25/May/22 08:31
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3281:
URL: https://github.com/apache/hive/pull/3281#discussion_r881374106


##
ql/src/test/queries/clientpositive/ctas_direct.q:
##
@@ -0,0 +1,94 @@
+

Issue Time Tracking
---

Worklog Id: (was: 774417)
Time Spent: 5h  (was: 4h 50m)

> Make CTAS use Direct Insert Semantics
> -
>
> Key: HIVE-26217
> URL: https://issues.apache.org/jira/browse/HIVE-26217
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> CTAS on transactional tables currently does a copy from staging location to 
> table location. This can be avoided by using Direct Insert semantics. Added 
> support for suffixed table locations as well.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26217) Make CTAS use Direct Insert Semantics

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26217?focusedWorklogId=774415=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774415
 ]

ASF GitHub Bot logged work on HIVE-26217:
-

Author: ASF GitHub Bot
Created on: 25/May/22 08:31
Start Date: 25/May/22 08:31
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3281:
URL: https://github.com/apache/hive/pull/3281#discussion_r881373707


##
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##
@@ -8223,9 +8286,27 @@ private void handleLineage(LoadTableDesc ltd, Operator 
output)
   Path tlocation = null;
   String tName = Utilities.getDbTableName(tableDesc.getDbTableName())[1];
   try {
+String suffix = "";
+if (!tableDesc.isExternal()) {
+  boolean useSuffix = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_ACID_CREATE_TABLE_USE_SUFFIX)

Review Comment:
   Updated.





Issue Time Tracking
---

Worklog Id: (was: 774415)
Time Spent: 4h 50m  (was: 4h 40m)

> Make CTAS use Direct Insert Semantics
> -
>
> Key: HIVE-26217
> URL: https://issues.apache.org/jira/browse/HIVE-26217
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> CTAS on transactional tables currently does a copy from staging location to 
> table location. This can be avoided by using Direct Insert semantics. Added 
> support for suffixed table locations as well.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26217) Make CTAS use Direct Insert Semantics

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26217?focusedWorklogId=774414=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774414
 ]

ASF GitHub Bot logged work on HIVE-26217:
-

Author: ASF GitHub Bot
Created on: 25/May/22 08:30
Start Date: 25/May/22 08:30
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3281:
URL: https://github.com/apache/hive/pull/3281#discussion_r881372627


##
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##
@@ -7592,6 +7594,22 @@ protected Operator genFileSinkPlan(String dest, QB qb, 
Operator input)
 
   destTableIsTransactional = tblProps != null && 
AcidUtils.isTablePropertyTransactional(tblProps);
   if (destTableIsTransactional) {
+isNonNativeTable = MetaStoreUtils.isNonNativeTable(tblProps);
+boolean isCtas = tblDesc != null && tblDesc.isCTAS();
+isMmTable = isMmCreate = AcidUtils.isInsertOnlyTable(tblProps);
+if (!isNonNativeTable && !destTableIsTemporary && isCtas) {
+  destTableIsFullAcid = AcidUtils.isFullAcidTable(tblProps);
+  acidOperation = getAcidType(dest);
+  isDirectInsert = isDirectInsert(destTableIsFullAcid, acidOperation);
+  boolean enableSuffixing = 
conf.getBoolVar(ConfVars.HIVE_ACID_CREATE_TABLE_USE_SUFFIX)

Review Comment:
   Updated.





Issue Time Tracking
---

Worklog Id: (was: 774414)
Time Spent: 4h 40m  (was: 4.5h)

> Make CTAS use Direct Insert Semantics
> -
>
> Key: HIVE-26217
> URL: https://issues.apache.org/jira/browse/HIVE-26217
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> CTAS on transactional tables currently does a copy from staging location to 
> table location. This can be avoided by using Direct Insert semantics. Added 
> support for suffixed table locations as well.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-26228) Implement Iceberg table rollback feature

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26228?focusedWorklogId=774412=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774412
 ]

ASF GitHub Bot logged work on HIVE-26228:
-

Author: ASF GitHub Bot
Created on: 25/May/22 08:27
Start Date: 25/May/22 08:27
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on code in PR #3287:
URL: https://github.com/apache/hive/pull/3287#discussion_r881369177


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergTableUtil.java:
##
@@ -184,4 +186,27 @@ public static void updateSpec(Configuration configuration, 
Table table) {
   public static boolean isBucketed(Table table) {
 return table.spec().fields().stream().anyMatch(f -> 
f.transform().toString().startsWith("bucket["));
   }
+
+  /**
+   * Roll an iceberg table's data back to a specific snapshot identified 
either by id or before a given timestamp.
+   * @param configuration a Hadoop configuration
+   * @param table the iceberg table
+   */
+  public static void rollback(Configuration configuration, Table table) {
+RollbackSpec rollbackSpec = SessionStateUtil.getResource(configuration, 
hive_metastoreConstants.ROLLBACK_SPEC)

Review Comment:
   I refactored the code to avoid using the SessionState to pass execute 
operation parameters. From now every info is passed through the 
`AlterTableExecuteSpec` and the storagehandler will do the rollback instead of 
the metahook.





Issue Time Tracking
---

Worklog Id: (was: 774412)
Time Spent: 2h 50m  (was: 2h 40m)

> Implement Iceberg table rollback feature
> 
>
> Key: HIVE-26228
> URL: https://issues.apache.org/jira/browse/HIVE-26228
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> We should allow rolling back iceberg table's data to the state at an older 
> table snapshot. 
> Rollback to the last snapshot before a specific timestamp
> {code:java}
> ALTER TABLE ice_t EXECUTE ROLLBACK('2022-05-12 00:00:00')
> {code}
> Rollback to a specific snapshot ID
> {code:java}
>  ALTER TABLE ice_t EXECUTE ROLLBACK(); 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2022-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?focusedWorklogId=774386=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774386
 ]

ASF GitHub Bot logged work on HIVE-25335:
-

Author: ASF GitHub Bot
Created on: 25/May/22 07:28
Start Date: 25/May/22 07:28
Worklog Time Spent: 10m 
  Work Description: zabetak commented on PR #3292:
URL: https://github.com/apache/hive/pull/3292#issuecomment-1136889593

   @zhengchenyu I suspect that the error you see on Jenkins has to do with the 
fact that there are a lot of errors in the tests. 
   
   If you run locally and you use the `-Dtest.ouptut.overwrite` then you will 
not have any errors cause you are updating automatically the "reference files". 
If you want to see all the errors locally you must remove this parameter.
   
   Having said that if you commit all the changes in the reference files then 
tests most likely will pass and the Jenkins pipeline may run fine.




Issue Time Tracking
---

Worklog Id: (was: 774386)
Time Spent: 3h 20m  (was: 3h 10m)

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25335.001.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)