[jira] [Updated] (HIVE-24114) Fix Repl Load with both staging and data copy on target

2020-09-07 Thread Anishek Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anishek Agarwal updated HIVE-24114:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to master, Thanks for the patch [~pkumarsinha] and review [~aasha]

> Fix Repl Load with both staging and data copy on target
> ---
>
> Key: HIVE-24114
> URL: https://issues.apache.org/jira/browse/HIVE-24114
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24114.01.patch, HIVE-24114.02.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24128) transactions cannot recognize bucket file

2020-09-07 Thread richt richt (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

richt richt updated HIVE-24128:
---
Component/s: Query Processor

> transactions cannot recognize bucket file 
> --
>
> Key: HIVE-24128
> URL: https://issues.apache.org/jira/browse/HIVE-24128
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor, Transactions
>Affects Versions: 3.1.1
>Reporter: richt richt
>Priority: Major
>
>  
>  * Error while compiling statement: FAILED: SemanticException [Error 10141]: 
> Bucketed table metadata is not correct. Fix the metadata or don't use 
> bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
> of buckets for table dcp partition ods_load_date=20200701 is 2, whereas the 
> number of files is 1
>  * the trnasaaction table  manage file like below 
> {code:java}
> -rw-r--r--   3 hadoop supergroup  1 2020-09-08 10:20 
> /usr/hive/warehouse/test_etl_dwd.db/dcp/ods_load_date=20200701/_orc_acid_version
> drwxr-xr-x   - hadoop supergroup  0 2020-09-07 19:28 
> /usr/hive/warehouse/test_etl_dwd.db/dcp/ods_load_date=20200701/base_021
> -rw-r--r--   3 hadoop supergroup   15401449 2020-09-08 10:20 
> /usr/hive/warehouse/test_etl_dwd.db/dcp/ods_load_date=20200701/bucket_0
> -rw-r--r--   3 hadoop supergroup   15408471 2020-09-08 10:20 
> /usr/hive/warehouse/test_etl_dwd.db/dcp/ods_load_date=20200701/bucket_1
> {code}
>  * it put the bucket file into.      dir  base**, 
>  * but when I use merge  the table , hive will check the bucket file number , 
>  * 
> {code:java}
> public static List getBucketFilePathsOfPartition(
>   Path location, ParseContext pGraphContext) throws SemanticException {
> List fileNames = new ArrayList();
> try {
>   FileSystem fs = location.getFileSystem(pGraphContext.getConf());
>   FileStatus[] files = fs.listStatus(new Path(location.toString()), 
> FileUtils.HIDDEN_FILES_PATH_FILTER);
>   if (files != null) {
> for (FileStatus file : files) {
>   fileNames.add(file.getPath().toString());
> }
>   }
> } catch (IOException e) {
>   throw new SemanticException(e);
> }
> return fileNames;
>   }
> {code}
> it only. check the file under the partition , not check the base** directory 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24128) transactions cannot recognize bucket file

2020-09-07 Thread richt richt (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

richt richt updated HIVE-24128:
---
Description: 
 
 * Error while compiling statement: FAILED: SemanticException [Error 10141]: 
Bucketed table metadata is not correct. Fix the metadata or don't use bucketed 
mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of buckets 
for table dcp partition ods_load_date=20200701 is 2, whereas the number of 
files is 1
 * the trnasaaction table  manage file like below 
{code:java}
-rw-r--r--   3 hadoop supergroup  1 2020-09-08 10:20 
/usr/hive/warehouse/test_etl_dwd.db/dcp/ods_load_date=20200701/_orc_acid_version
drwxr-xr-x   - hadoop supergroup  0 2020-09-07 19:28 
/usr/hive/warehouse/test_etl_dwd.db/dcp/ods_load_date=20200701/base_021
-rw-r--r--   3 hadoop supergroup   15401449 2020-09-08 10:20 
/usr/hive/warehouse/test_etl_dwd.db/dcp/ods_load_date=20200701/bucket_0
-rw-r--r--   3 hadoop supergroup   15408471 2020-09-08 10:20 
/usr/hive/warehouse/test_etl_dwd.db/dcp/ods_load_date=20200701/bucket_1
{code}

 * it put the bucket file into.      dir  base**, 
 * but when I use merge  the table , hive will check the bucket file number , 
 * 
{code:java}
public static List getBucketFilePathsOfPartition(
  Path location, ParseContext pGraphContext) throws SemanticException {
List fileNames = new ArrayList();
try {
  FileSystem fs = location.getFileSystem(pGraphContext.getConf());
  FileStatus[] files = fs.listStatus(new Path(location.toString()), 
FileUtils.HIDDEN_FILES_PATH_FILTER);
  if (files != null) {
for (FileStatus file : files) {
  fileNames.add(file.getPath().toString());
}
  }
} catch (IOException e) {
  throw new SemanticException(e);
}
return fileNames;
  }
{code}
it only. check the file under the partition , not check the base** directory 

  was:
 
 * Error while compiling statement: FAILED: SemanticException [Error 10141]: 
Bucketed table metadata is not correct. Fix the metadata or don't use bucketed 
mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of buckets 
for table dcp partition ods_load_date=20200701 is 2, whereas the number of 
files is 1
 * the trnasaaction table  manage file like below 
{code:java}
-rw-r--r--   3 hadoop supergroup  1 2020-09-08 10:20 
/usr/hive/warehouse/test_etl_dwd.db/dcp/ods_load_date=20200701/_orc_acid_version
drwxr-xr-x   - hadoop supergroup  0 2020-09-07 19:28 
/usr/hive/warehouse/test_etl_dwd.db/dcp/ods_load_date=20200701/base_021
-rw-r--r--   3 hadoop supergroup   15401449 2020-09-08 10:20 
/usr/hive/warehouse/test_etl_dwd.db/dcp/ods_load_date=20200701/bucket_0
-rw-r--r--   3 hadoop supergroup   15408471 2020-09-08 10:20 
/usr/hive/warehouse/test_etl_dwd.db/dcp/ods_load_date=20200701/bucket_1
{code}

 * it put the bucket file into.      dir  base**, 
 * but when I use merge  the table , hive will check the bucket file number , 
 * 
{code:java}
public static List getBucketFilePathsOfPartition(  Path location, 
ParseContext pGraphContext) throws SemanticException {List 
fileNames = new ArrayList();try {  FileSystem fs = 
location.getFileSystem(pGraphContext.getConf());  FileStatus[] files = 
fs.listStatus(new Path(location.toString()), 
FileUtils.HIDDEN_FILES_PATH_FILTER);  if (files != null) {for 
(FileStatus file : files) {  fileNames.add(file.getPath().toString());  
  }  }} catch (IOException e) {  throw new 
SemanticException(e);}return fileNames;  }
{code}
it only. check the file under the partition , not check the base** directory 


> transactions cannot recognize bucket file 
> --
>
> Key: HIVE-24128
> URL: https://issues.apache.org/jira/browse/HIVE-24128
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.1.1
>Reporter: richt richt
>Priority: Major
>
>  
>  * Error while compiling statement: FAILED: SemanticException [Error 10141]: 
> Bucketed table metadata is not correct. Fix the metadata or don't use 
> bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
> of buckets for table dcp partition ods_load_date=20200701 is 2, whereas the 
> number of files is 1
>  * the trnasaaction table  manage file like below 
> {code:java}
> -rw-r--r--   3 hadoop supergroup  1 2020-09-08 10:20 
> /usr/hive/warehouse/test_etl_dwd.db/dcp/ods_load_date=20200701/_orc_acid_version
> drwxr-xr-x   - hadoop supergroup  0 2020-09-07 19:28 
> /usr/hive/warehouse/test_etl_dwd.db/dcp/ods_load_date=20200701/base_021
> -rw-r--r--   3 hadoop supergroup   15401449 2020-09-08 10:20 
> /usr/hive/warehouse/test_etl_dwd.db/dcp/ods_load_date=20200701/bucket_0
> 

[jira] [Work logged] (HIVE-23772) Relocate calcite-core to prevent NoSuchFiledError

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23772?focusedWorklogId=479815=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479815
 ]

ASF GitHub Bot logged work on HIVE-23772:
-

Author: ASF GitHub Bot
Created on: 08/Sep/20 00:46
Start Date: 08/Sep/20 00:46
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1187:
URL: https://github.com/apache/hive/pull/1187#issuecomment-688557522


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479815)
Time Spent: 1h 50m  (was: 1h 40m)

> Relocate calcite-core to prevent NoSuchFiledError
> -
>
> Key: HIVE-23772
> URL: https://issues.apache.org/jira/browse/HIVE-23772
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Exception trace due to conflict with {{calcite-core}}
> {noformat}
> Caused by: java.lang.NoSuchFieldError: operands
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:785)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:509)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[calcite-core-1.21.0.jar:1.21.0]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:239)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:437)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:124)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:112)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1620)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:555)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12456)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:433)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:290)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:184) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:602) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:548) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:542) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23754) LLAP: Add LoggingHandler in ShuffleHandler pipeline for better debuggability

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23754?focusedWorklogId=479816=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479816
 ]

ASF GitHub Bot logged work on HIVE-23754:
-

Author: ASF GitHub Bot
Created on: 08/Sep/20 00:46
Start Date: 08/Sep/20 00:46
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1172:
URL: https://github.com/apache/hive/pull/1172#issuecomment-688557531


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479816)
Time Spent: 20m  (was: 10m)

> LLAP: Add LoggingHandler in ShuffleHandler pipeline for better debuggability
> 
>
> Key: HIVE-23754
> URL: https://issues.apache.org/jira/browse/HIVE-23754
> Project: Hive
>  Issue Type: Improvement
> Environment:  
>  
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java#L616]
>  
> For corner case debugging, it would be helpful to understand when netty 
> processed OPEN/BOUND/CLOSE/RECEIVED/CONNECTED events along with payload 
> details.
> Adding "LoggingHandler" in ChannelPipeline mode can help in debugging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23413) Create a new config to skip all locks

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23413?focusedWorklogId=479814=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479814
 ]

ASF GitHub Bot logged work on HIVE-23413:
-

Author: ASF GitHub Bot
Created on: 08/Sep/20 00:46
Start Date: 08/Sep/20 00:46
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1220:
URL: https://github.com/apache/hive/pull/1220#issuecomment-688557508


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479814)
Time Spent: 0.5h  (was: 20m)

> Create a new config to skip all locks
> -
>
> Key: HIVE-23413
> URL: https://issues.apache.org/jira/browse/HIVE-23413
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23413.1.patch, HIVE-23413.2.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> From time-to-time some query is blocked on locks which should not.
> To have a quick workaround for this we should have a config which the user 
> can set in the session to disable acquiring/checking locks, so we can provide 
> it immediately and then later investigate and fix the root cause.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24127) Dump events from default catalog only

2020-09-07 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi reassigned HIVE-24127:
--


> Dump events from default catalog only
> -
>
> Key: HIVE-24127
> URL: https://issues.apache.org/jira/browse/HIVE-24127
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>
> Don't dump events from spark catalog. In bootstrap we skip spark tables. In 
> inceremental load also we should skip spark events.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24122) When CBO is enable, CAST(STR as Bigint)IS NOT NULL result is wrong

2020-09-07 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191815#comment-17191815
 ] 

Stamatis Zampetakis commented on HIVE-24122:


Hey [~luguangming] did you reproduce it on current master?

> When CBO is enable, CAST(STR as Bigint)IS NOT NULL result is wrong 
> ---
>
> Key: HIVE-24122
> URL: https://issues.apache.org/jira/browse/HIVE-24122
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.0, 3.1.2
>Reporter: GuangMing Lu
>Priority: Major
>
> {code:java}
> create  database testdb;
> CREATE TABLE IF NOT EXISTS testdb.z_tab 
> ( 
>     SEARCHWORD    STRING, 
>     COUNT_NUM BIGINT, 
>     WORDS STRING 
> ) 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
> STORED AS TEXTFILE;
> insert into table testdb.z_tab 
> values('hivetest',111,'aaa'),('hivetest2',111,'bbb');
> set hive.cbo.enable=true;
> SELECT CAST(searchword as bigint) IS NOT NULL FROM testdb.z_tab;
> SELECT CAST(searchword as bigint) IS NULL FROM testdb.z_tab;
> {code}
> The SQL results for both queries are the same, as follows:
> {noformat}
> +---+
> |  _c0  |
> +---+
> | true  |
> | true  |
> +---+{noformat}
> SELECT CAST(searchword as bigint) IS NOT NULL FROM testdb.z_tab;  execute 
> result is wrong
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24095) Load partitions in parallel for external tables in the bootstrap phase

2020-09-07 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24095:
---
Attachment: HIVE-24095.01.patch
Status: Patch Available  (was: Open)

> Load partitions in parallel for external tables in the bootstrap phase
> --
>
> Key: HIVE-24095
> URL: https://issues.apache.org/jira/browse/HIVE-24095
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24095.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is part 1 of the change. This will load partitions in parallel for 
> external tables. Managed table is tracked as part of 
> https://issues.apache.org/jira/browse/HIVE-24109



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=479695=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479695
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 16:34
Start Date: 07/Sep/20 16:34
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1474:
URL: https://github.com/apache/hive/pull/1474#discussion_r484516378



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -488,30 +489,40 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 
   lockAndRespond();
 
+  int retryShapshotCnt = 0;
+  int maxRetrySnapshotCnt = HiveConf.getIntVar(driverContext.getConf(),
+HiveConf.ConfVars.HIVE_TXN_MAX_RETRYSNAPSHOT_COUNT);
+
   try {
-if (!driverTxnHandler.isValidTxnListState()) {
-  LOG.info("Compiling after acquiring locks");
+while (!driverTxnHandler.isValidTxnListState() && ++retryShapshotCnt 
<= maxRetrySnapshotCnt) {
+  LOG.info("Compiling after acquiring locks, attempt #" + 
retryShapshotCnt);
   // Snapshot was outdated when locks were acquired, hence regenerate 
context,
   // txn list and retry
   // TODO: Lock acquisition should be moved before analyze, this is a 
bit hackish.
   // Currently, we acquire a snapshot, we compile the query wrt that 
snapshot,
   // and then, we acquire locks. If snapshot is still valid, we 
continue as usual.
   // But if snapshot is not valid, we recompile the query.
   if (driverContext.isOutdatedTxn()) {
+LOG.info("Snapshot is outdated, re-initiating transaction ...");
 driverContext.getTxnManager().rollbackTxn();
 
 String userFromUGI = DriverUtils.getUserFromUGI(driverContext);
 driverContext.getTxnManager().openTxn(context, userFromUGI, 
driverContext.getTxnType());
 lockAndRespond();
   }
+
   driverContext.setRetrial(true);
   driverContext.getBackupContext().addSubContext(context);
   
driverContext.getBackupContext().setHiveLocks(context.getHiveLocks());
   context = driverContext.getBackupContext();
+
   driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY,
 driverContext.getTxnManager().getValidTxns().toString());
+
   if (driverContext.getPlan().hasAcidResourcesInQuery()) {
+compileInternal(context.getCmd(), true);
 driverTxnHandler.recordValidWriteIds();
+driverTxnHandler.setWriteIdForAcidFileSinks();
   }
 
   if (!alreadyCompiled) {

Review comment:
   I think this code should be removed. If the alreadyCompiled was false, 
we already compiled it once line 473. This should not matter when we are in the 
invalid snapshot case, you already recompile the query if it is neccessarry





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479695)
Time Spent: 7h  (was: 6h 50m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=479692=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479692
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 16:24
Start Date: 07/Sep/20 16:24
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1474:
URL: https://github.com/apache/hive/pull/1474#discussion_r484513473



##
File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java
##
@@ -2315,6 +2386,139 @@ private void 
testConcurrentMergeInsertNoDuplicates(String query, boolean sharedW
 List res = new ArrayList();
 driver.getFetchTask().fetch(res);
 Assert.assertEquals("Duplicate records found", 4, res.size());
+dropTable(new String[]{"target", "source"});
+  }
+
+  /**
+   * ValidTxnManager.isValidTxnListState can invalidate a snapshot if a 
relevant write transaction was committed
+   * between a query compilation and lock acquisition. When this happens we 
have to recompile the given query,
+   * otherwise we can miss reading partitions created between. The following 
three cases test these scenarios.
+   * @throws Exception ex
+   */
+  @Test
+  public void testMergeInsertDynamicPartitioningSequential() throws Exception {
+dropTable(new String[]{"target", "source"});
+conf.setBoolVar(HiveConf.ConfVars.TXN_WRITE_X_LOCK, false);
+
+// Create partition c=1
+driver.run("create table target (a int, b int) partitioned by (c int) 
stored as orc TBLPROPERTIES ('transactional'='true')");
+driver.run("insert into target values (1,1,1), (2,2,1)");
+//Create partition c=2
+driver.run("create table source (a int, b int) partitioned by (c int) 
stored as orc TBLPROPERTIES ('transactional'='true')");
+driver.run("insert into source values (3,3,2), (4,4,2)");
+
+// txn 1 inserts data to an old and a new partition
+driver.run("insert into source values (5,5,2), (6,6,3)");
+
+// txn 2 inserts into the target table into a new partition ( and a 
duplicate considering the source table)
+driver.run("insert into target values (3, 3, 2)");
+
+// txn3 merge
+driver.run("merge into target t using source s on t.a = s.a " +
+  "when not matched then insert values (s.a, s.b, s.c)");
+driver.run("select * from target");
+List res = new ArrayList();
+driver.getFetchTask().fetch(res);
+// The merge should see all three partition and not create duplicates
+Assert.assertEquals("Duplicate records found", 6, res.size());
+Assert.assertTrue("Partition 3 was skipped", res.contains("6\t6\t3"));
+dropTable(new String[]{"target", "source"});
+  }
+
+  @Test
+  public void 
testMergeInsertDynamicPartitioningSnapshotInvalidatedWithOldCommit() throws 
Exception {
+// By creating the driver with the factory, we should have a ReExecDriver
+IDriver driver3 = DriverFactory.newDriver(conf);
+Assert.assertTrue("ReExecDriver was expected", driver3 instanceof 
ReExecDriver);

Review comment:
   This Reexec part is not really relevant now, we don't need the reexec 
driver for this to work properly





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479692)
Time Spent: 6h 50m  (was: 6h 40m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the 

[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=479691=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479691
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 16:21
Start Date: 07/Sep/20 16:21
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1474:
URL: https://github.com/apache/hive/pull/1474#discussion_r484512373



##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -1012,8 +1012,9 @@ struct CommitTxnRequest {
 3: optional list writeEventInfos,
 // Information to update the last repl id of table/partition along with 
commit txn (replication from 2.6 to 3.0)
 4: optional ReplLastIdInfo replLastIdInfo,
+5: optional bool exclWriteEnabled = true,

Review comment:
   This should be added as the last parameter to not break backward 
compatibility no?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479691)
Time Spent: 6h 40m  (was: 6.5h)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24125) Incorrect transaction snapshot invalidation with unnecessary writeset check for exclusive operations

2020-09-07 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-24125:
--
Description: 
Fixes [HIVE-23725|https://issues.apache.org/jira/browse/HIVE-23725] and 
addresses issue with concurrent exclusive writes (shouldn't fail on writeset 
check).

https://docs.google.com/document/d/1NVfk479_SxVIWPLXYmZkU8MYQE5nhcHbKMrf3bO_qwI

  was:
fixes: 
https://docs.google.com/document/d/1NVfk479_SxVIWPLXYmZkU8MYQE5nhcHbKMrf3bO_qwI


> Incorrect transaction snapshot invalidation with unnecessary writeset check 
> for exclusive operations
> 
>
> Key: HIVE-24125
> URL: https://issues.apache.org/jira/browse/HIVE-24125
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Priority: Major
>
> Fixes [HIVE-23725|https://issues.apache.org/jira/browse/HIVE-23725] and 
> addresses issue with concurrent exclusive writes (shouldn't fail on writeset 
> check).
> https://docs.google.com/document/d/1NVfk479_SxVIWPLXYmZkU8MYQE5nhcHbKMrf3bO_qwI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24124) NPE occurs when bucket_version different bucket tables are joined

2020-09-07 Thread GuangMing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GuangMing Lu updated HIVE-24124:

Description: 
{code:java}
create table z_tab_1(
    task_id  string,    
    data_date  string,  
    accno  string,  
    curr_type  string,  
    ifrs9_pd12_value  double,
    ifrs9_ccf_value  double,
    ifrs9_lgd_value  double
)partitioned by(pt_dt string)
STORED AS ORCFILE
TBLPROPERTIES ('bucketing_version'='1');

alter table z_tab_1 add partition(pt_dt = '2020-7-31');

insert into z_tab_1 partition(pt_dt = '2020-7-31') values
('123','2020-7-31','accno-','curr_type-x', 0.1, 0.2 ,0.3),
('1','2020-1-31','a','1-curr_type-a', 0.1, 0.2 ,0.3),
('2','2020-2-31','b','2-curr_type-b', 0.1, 0.2 ,0.3),
('3','2020-3-31','c','3-curr_type-c', 0.1, 0.2 ,0.3),
('4','2020-4-31','d','4-curr_type-d', 0.1, 0.2 ,0.3),
('5','2020-5-31','e','5-curr_type-e', 0.1, 0.2 ,0.3),
('6','2020-6-31','f','6-curr_type-f', 0.1, 0.2 ,0.3),
('7','2020-7-31','g','7-curr_type-g', 0.1, 0.2 ,0.3),
('8','2020-8-31','h','8-curr_type-h', 0.1, 0.2 ,0.3),
('9','2020-9-31','i','9-curr_type-i', 0.1, 0.2 ,0.3);
drop table if exists z_tab_2;
CREATE TABLE z_tab_2(  
    task_id  string,    
    data_date  string,  
    accno  string,  
    curr_type  string,  
    ifrs9_pd12_value  double,   
    ifrs9_ccf_value  double,    
    ifrs9_lgd_value  double
) 
CLUSTERED BY (TASK_ID, DATA_DATE, ACCNO, CURR_TYPE)  SORTED by (TASK_ID, ACCNO, 
CURR_TYPE) INTO 2000 BUCKETS 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS ORCFILE;

set hive.enforce.bucketing=true;

INSERT OVERWRITE TABLE z_tab_2
SELECT  DCCR.TASK_ID
   ,DCCR.DATA_DATE
   ,DCCR.ACCNO
   ,DCCR.CURR_TYPE
   ,DCCR.IFRS9_PD12_VALUE
   ,DCCR.IFRS9_CCF_VALUE
   ,DCCR.IFRS9_LGD_VALUE 
FROM z_tab_1 DCCR
WHERE pt_dt = '2020-7-31';
{code}
{noformat}
Caused by: java.lang.NullPointerException  
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:1072)
  
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:988)
  
at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:995)  
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:941)  
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:928)  
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)  
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:237)  
... 7 more{noformat}

  was:
{code:java}
create table z_tab_1(
    task_id  string,    
    data_date  string,  
    accno  string,  
    curr_type  string,  
    ifrs9_pd12_value  double,
    ifrs9_ccf_value  double,
    ifrs9_lgd_value  double
)partitioned by(pt_dt string)
STORED AS ORCFILE
TBLPROPERTIES ('bucketing_version'='1');alter table z_tab_1 add partition(pt_dt 
= '2020-7-31');
insert into z_tab_1 partition(pt_dt = '2020-7-31') 
values('123','2020-7-31','accno-','curr_type-x', 0.1, 0.2 ,0.3),
('1','2020-1-31','a','1-curr_type-a', 0.1, 0.2 ,0.3),
('2','2020-2-31','b','2-curr_type-b', 0.1, 0.2 ,0.3),
('3','2020-3-31','c','3-curr_type-c', 0.1, 0.2 ,0.3),
('4','2020-4-31','d','4-curr_type-d', 0.1, 0.2 ,0.3),
('5','2020-5-31','e','5-curr_type-e', 0.1, 0.2 ,0.3),
('6','2020-6-31','f','6-curr_type-f', 0.1, 0.2 ,0.3),
('7','2020-7-31','g','7-curr_type-g', 0.1, 0.2 ,0.3),
('8','2020-8-31','h','8-curr_type-h', 0.1, 0.2 ,0.3),
('9','2020-9-31','i','9-curr_type-i', 0.1, 0.2 ,0.3);
drop table if exists z_tab_2;
CREATE TABLE z_tab_2(  
    task_id  string,    
    data_date  string,  
    accno  string,  
    curr_type  string,  
    ifrs9_pd12_value  double,   
    ifrs9_ccf_value  double,    
    ifrs9_lgd_value  double
) 
CLUSTERED BY (TASK_ID, DATA_DATE, ACCNO, CURR_TYPE)  SORTED by (TASK_ID, ACCNO, 
CURR_TYPE) INTO 2000 BUCKETS 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS ORCFILE;

set hive.enforce.bucketing=true;

INSERT OVERWRITE TABLE z_tab_2
SELECT  DCCR.TASK_ID
   ,DCCR.DATA_DATE
   ,DCCR.ACCNO
   ,DCCR.CURR_TYPE
   ,DCCR.IFRS9_PD12_VALUE
   ,DCCR.IFRS9_CCF_VALUE
   ,DCCR.IFRS9_LGD_VALUE 
FROM z_tab_1 DCCR
WHERE pt_dt = '2020-7-31';
{code}
{noformat}
Caused by: java.lang.NullPointerException  
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:1072)
  
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:988)
  
at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:995)  
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:941)  
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:928)  
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)  
at 

[jira] [Updated] (HIVE-24124) NPE occurs when bucket_version different bucket tables are joined

2020-09-07 Thread GuangMing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GuangMing Lu updated HIVE-24124:

Description: 
{code:java}
create table z_tab_1(
    task_id  string,    
    data_date  string,  
    accno  string,  
    curr_type  string,  
    ifrs9_pd12_value  double,
    ifrs9_ccf_value  double,
    ifrs9_lgd_value  double
)partitioned by(pt_dt string)
STORED AS ORCFILE
TBLPROPERTIES ('bucketing_version'='1');alter table z_tab_1 add partition(pt_dt 
= '2020-7-31');
insert into z_tab_1 partition(pt_dt = '2020-7-31') 
values('123','2020-7-31','accno-','curr_type-x', 0.1, 0.2 ,0.3),
('1','2020-1-31','a','1-curr_type-a', 0.1, 0.2 ,0.3),
('2','2020-2-31','b','2-curr_type-b', 0.1, 0.2 ,0.3),
('3','2020-3-31','c','3-curr_type-c', 0.1, 0.2 ,0.3),
('4','2020-4-31','d','4-curr_type-d', 0.1, 0.2 ,0.3),
('5','2020-5-31','e','5-curr_type-e', 0.1, 0.2 ,0.3),
('6','2020-6-31','f','6-curr_type-f', 0.1, 0.2 ,0.3),
('7','2020-7-31','g','7-curr_type-g', 0.1, 0.2 ,0.3),
('8','2020-8-31','h','8-curr_type-h', 0.1, 0.2 ,0.3),
('9','2020-9-31','i','9-curr_type-i', 0.1, 0.2 ,0.3);
drop table if exists z_tab_2;
CREATE TABLE z_tab_2(  
    task_id  string,    
    data_date  string,  
    accno  string,  
    curr_type  string,  
    ifrs9_pd12_value  double,   
    ifrs9_ccf_value  double,    
    ifrs9_lgd_value  double
) 
CLUSTERED BY (TASK_ID, DATA_DATE, ACCNO, CURR_TYPE)  SORTED by (TASK_ID, ACCNO, 
CURR_TYPE) INTO 2000 BUCKETS 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS ORCFILE;

set hive.enforce.bucketing=true;

INSERT OVERWRITE TABLE z_tab_2
SELECT  DCCR.TASK_ID
   ,DCCR.DATA_DATE
   ,DCCR.ACCNO
   ,DCCR.CURR_TYPE
   ,DCCR.IFRS9_PD12_VALUE
   ,DCCR.IFRS9_CCF_VALUE
   ,DCCR.IFRS9_LGD_VALUE 
FROM z_tab_1 DCCR
WHERE pt_dt = '2020-7-31';
{code}
{noformat}
Caused by: java.lang.NullPointerException  
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:1072)
  
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:988)
  
at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:995)  
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:941)  
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:928)  
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)  
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:237)  
... 7 more{noformat}

  was:
{code:java}
create table z_tab_1(
    task_id  string,    
    data_date  string,  
    accno  string,  
    curr_type  string,  
    ifrs9_pd12_value  double,
    ifrs9_ccf_value  double,
    ifrs9_lgd_value  double
)partitioned by(pt_dt string)
STORED AS ORCFILE
TBLPROPERTIES ('bucketing_version'='1');alter table z_tab_1 add partition(pt_dt 
= '2020-7-31');
insert into z_tab_1 partition(pt_dt = '2020-7-31') 
values('123','2020-7-31','accno-','curr_type-x', 0.1, 0.2 ,0.3),
('1','2020-1-31','a','1-curr_type-a', 0.1, 0.2 ,0.3),
('2','2020-2-31','b','2-curr_type-b', 0.1, 0.2 ,0.3),
('3','2020-3-31','c','3-curr_type-c', 0.1, 0.2 ,0.3),
('4','2020-4-31','d','4-curr_type-d', 0.1, 0.2 ,0.3),
('5','2020-5-31','e','5-curr_type-e', 0.1, 0.2 ,0.3),
('6','2020-6-31','f','6-curr_type-f', 0.1, 0.2 ,0.3),
('7','2020-7-31','g','7-curr_type-g', 0.1, 0.2 ,0.3),
('8','2020-8-31','h','8-curr_type-h', 0.1, 0.2 ,0.3),
('9','2020-9-31','i','9-curr_type-i', 0.1, 0.2 ,0.3);
drop table if exists z_tab_2;
CREATE TABLE z_tab_2(  
    task_id  string,    
    data_date  string,  
    accno  string,  
    curr_type  string,  
    ifrs9_pd12_value  double,   
    ifrs9_ccf_value  double,    
    ifrs9_lgd_value  double
) 
CLUSTERED BY (TASK_ID, DATA_DATE, ACCNO, CURR_TYPE)  SORTED by (TASK_ID, ACCNO, 
CURR_TYPE) INTO 2000 BUCKETS 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS ORCFILE;

set hive.enforce.bucketing=true;

INSERT OVERWRITE TABLE z_tab_2
SELECT  DCCR.TASK_ID
   ,DCCR.DATA_DATE
   ,DCCR.ACCNO
   ,DCCR.CURR_TYPE
   ,DCCR.IFRS9_PD12_VALUE
   ,DCCR.IFRS9_CCF_VALUE
   ,DCCR.IFRS9_LGD_VALUE 
FROM z_tab_1 DCCR
WHERE pt_dt = '2020-7-31';
{code}
{noformat}
Caused by: java.lang.NullPointerException  at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:1072)
  at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:988)
  at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:995)  at 
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:941)  at 
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:928)  at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)  
at 

[jira] [Work logged] (HIVE-24072) HiveAggregateJoinTransposeRule may try to create an invalid transformation

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24072?focusedWorklogId=479655=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479655
 ]

ASF GitHub Bot logged work on HIVE-24072:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 14:23
Start Date: 07/Sep/20 14:23
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1432:
URL: https://github.com/apache/hive/pull/1432#discussion_r484462304



##
File path: ql/src/test/results/clientpositive/llap/groupby_join_pushdown.q.out
##
@@ -644,29 +646,18 @@ STAGE PLANS:
   Statistics: Num rows: 9173 Data size: 82188 Basic stats: 
COMPLETE Column stats: COMPLETE
   Group By Operator
 aggregations: max(_col0)
-keys: _col1 (type: bigint)
-minReductionHashAggr: 0.49994552
+keys: _col1 (type: bigint), _col0 (type: int)

Review comment:
   this q.out change is not there anymore





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479655)
Time Spent: 1h  (was: 50m)

> HiveAggregateJoinTransposeRule may try to create an invalid transformation
> --
>
> Key: HIVE-24072
> URL: https://issues.apache.org/jira/browse/HIVE-24072
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {code}
> java.lang.AssertionError: 
> Cannot add expression of different type to set:
> set type is RecordType(INTEGER NOT NULL o_orderkey, DECIMAL(10, 0) 
> o_totalprice, DATE o_orderdate, INTEGER NOT NULL c_custkey, VARCHAR(25) 
> CHARACTER SET "UTF-16LE" c_name, DOUBLE $f5) NOT NULL
> expression type is RecordType(INTEGER NOT NULL o_orderkey, INTEGER NOT NULL 
> o_custkey, DECIMAL(10, 0) o_totalprice, DATE o_orderdate, INTEGER NOT NULL 
> c_custkey, DOUBLE $f1) NOT NULL
> set is rel#567:HiveAggregate.HIVE.[].any(input=HepRelVertex#490,group={2, 4, 
> 5, 6, 7},agg#0=sum($1))
> expression is HiveProject(o_orderkey=[$2], o_custkey=[$3], o_totalprice=[$4], 
> o_orderdate=[$5], c_custkey=[$6], $f1=[$1])
>   HiveJoin(condition=[=($2, $0)], joinType=[inner], algorithm=[none], 
> cost=[{2284.5 rows, 0.0 cpu, 0.0 io}])
> HiveAggregate(group=[{0}], agg#0=[sum($1)])
>   HiveProject(l_orderkey=[$0], l_quantity=[$4])
> HiveTableScan(table=[[tpch_0_001, lineitem]], table:alias=[l])
> HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[none], 
> cost=[{1.9115E15 rows, 0.0 cpu, 0.0 io}])
>   HiveJoin(condition=[=($4, $1)], joinType=[inner], algorithm=[none], 
> cost=[{1650.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(o_orderkey=[$0], o_custkey=[$1], o_totalprice=[$3], 
> o_orderdate=[$4])
>   HiveTableScan(table=[[tpch_0_001, orders]], table:alias=[orders])
> HiveProject(c_custkey=[$0], c_name=[$1])
>   HiveTableScan(table=[[tpch_0_001, customer]], 
> table:alias=[customer])
>   HiveProject($f0=[$0])
> HiveFilter(condition=[>($1, 3E2)])
>   HiveAggregate(group=[{0}], agg#0=[sum($4)])
> HiveTableScan(table=[[tpch_0_001, lineitem]], 
> table:alias=[lineitem])
>   at 
> org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:383)
>   at 
> org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:57)
>   at 
> org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:236)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveAggregateJoinTransposeRule.onMatch(HiveAggregateJoinTransposeRule.java:300)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24084) Push Aggregates thru joins in case it re-groups previously unique columns

2020-09-07 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24084:

Summary: Push Aggregates thru joins in case it re-groups previously unique 
columns  (was: Enhance cost model to push down more Aggregates)

> Push Aggregates thru joins in case it re-groups previously unique columns
> -
>
> Key: HIVE-24084
> URL: https://issues.apache.org/jira/browse/HIVE-24084
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24123) Improve cost model for Aggregates

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24123?focusedWorklogId=479650=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479650
 ]

ASF GitHub Bot logged work on HIVE-24123:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 14:20
Start Date: 07/Sep/20 14:20
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk opened a new pull request #1475:
URL: https://github.com/apache/hive/pull/1475


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479650)
Remaining Estimate: 0h
Time Spent: 10m

> Improve cost model for Aggregates
> -
>
> Key: HIVE-24123
> URL: https://issues.apache.org/jira/browse/HIVE-24123
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24123) Improve cost model for Aggregates

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24123:
--
Labels: pull-request-available  (was: )

> Improve cost model for Aggregates
> -
>
> Key: HIVE-24123
> URL: https://issues.apache.org/jira/browse/HIVE-24123
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24123) Improve cost model for Aggregates

2020-09-07 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24123:
---


> Improve cost model for Aggregates
> -
>
> Key: HIVE-24123
> URL: https://issues.apache.org/jira/browse/HIVE-24123
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24122) When CBO is enable, CAST(STR as Bigint)IS NOT NULL result is wrong

2020-09-07 Thread GuangMing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GuangMing Lu updated HIVE-24122:

Description: 
{code:java}
create  database testdb;
CREATE TABLE IF NOT EXISTS testdb.z_tab 
( 
    SEARCHWORD    STRING, 
    COUNT_NUM BIGINT, 
    WORDS STRING 
) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
STORED AS TEXTFILE;
insert into table testdb.z_tab 
values('hivetest',111,'aaa'),('hivetest2',111,'bbb');

set hive.cbo.enable=true;

SELECT CAST(searchword as bigint) IS NOT NULL FROM testdb.z_tab;
SELECT CAST(searchword as bigint) IS NULL FROM testdb.z_tab;
{code}
The SQL results for both queries are the same, as follows:
{noformat}
+---+
|  _c0  |
+---+
| true  |
| true  |
+---+{noformat}
SELECT CAST(searchword as bigint) IS NOT NULL FROM testdb.z_tab;  execute 
result is wrong

 

  was:
{code:java}
create  database testdb;
CREATE TABLE IF NOT EXISTS testdb.z_tab 
( 
    SEARCHWORD    STRING, 
    COUNT_NUM BIGINT, 
    WORDS STRING 
) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
STORED AS TEXTFILE;
insert into table testdb.z_tab 
values('hivetest',111,'aaa'),('hivetest2',111,'bbb');

set hive.cbo.enable=true;

SELECT CAST(searchword as bigint) IS NOT NULL FROM testdb.z_tab;
SELECT CAST(searchword as bigint) IS NULL FROM testdb.z_tab;
{code}
The SQL results for both queries are the same, as follows:

+---+
|  _c0  |
+---+
| true  |
| true  |
+---+

SELECT CAST(searchword as bigint) IS NOT NULL FROM testdb.z_tab;  execute 
result is wrong


> When CBO is enable, CAST(STR as Bigint)IS NOT NULL result is wrong 
> ---
>
> Key: HIVE-24122
> URL: https://issues.apache.org/jira/browse/HIVE-24122
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.0, 3.1.2
>Reporter: GuangMing Lu
>Priority: Major
>
> {code:java}
> create  database testdb;
> CREATE TABLE IF NOT EXISTS testdb.z_tab 
> ( 
>     SEARCHWORD    STRING, 
>     COUNT_NUM BIGINT, 
>     WORDS STRING 
> ) 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
> STORED AS TEXTFILE;
> insert into table testdb.z_tab 
> values('hivetest',111,'aaa'),('hivetest2',111,'bbb');
> set hive.cbo.enable=true;
> SELECT CAST(searchword as bigint) IS NOT NULL FROM testdb.z_tab;
> SELECT CAST(searchword as bigint) IS NULL FROM testdb.z_tab;
> {code}
> The SQL results for both queries are the same, as follows:
> {noformat}
> +---+
> |  _c0  |
> +---+
> | true  |
> | true  |
> +---+{noformat}
> SELECT CAST(searchword as bigint) IS NOT NULL FROM testdb.z_tab;  execute 
> result is wrong
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24084) Enhance cost model to push down more Aggregates

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24084?focusedWorklogId=479644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479644
 ]

ASF GitHub Bot logged work on HIVE-24084:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 13:42
Start Date: 07/Sep/20 13:42
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1439:
URL: https://github.com/apache/hive/pull/1439#discussion_r484439075



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAggregateJoinTransposeRule.java
##
@@ -290,7 +291,8 @@ public void onMatch(RelOptRuleCall call) {
   RelNode r = relBuilder.build();
   RelOptCost afterCost = mq.getCumulativeCost(r);
   RelOptCost beforeCost = mq.getCumulativeCost(aggregate);
-  if (afterCost.isLt(beforeCost)) {
+  boolean shouldForceTransform = isGroupingUnique(join, 
aggregate.getGroupSet());

Review comment:
   I've added a config: `hive.transpose.aggr.join.unique` to enable/disable 
this feature





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479644)
Time Spent: 2h  (was: 1h 50m)

> Enhance cost model to push down more Aggregates
> ---
>
> Key: HIVE-24084
> URL: https://issues.apache.org/jira/browse/HIVE-24084
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24084) Enhance cost model to push down more Aggregates

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24084?focusedWorklogId=479641=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479641
 ]

ASF GitHub Bot logged work on HIVE-24084:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 13:33
Start Date: 07/Sep/20 13:33
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1439:
URL: https://github.com/apache/hive/pull/1439#discussion_r484434865



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveOnTezCostModel.java
##
@@ -89,22 +89,23 @@ public RelOptCost getAggregateCost(HiveAggregate aggregate) 
{
 } else {
   final RelMetadataQuery mq = aggregate.getCluster().getMetadataQuery();
   // 1. Sum of input cardinalities
-  final Double rCount = mq.getRowCount(aggregate.getInput());
-  if (rCount == null) {
+  final Double inputRowCount = mq.getRowCount(aggregate.getInput());
+  final Double rowCount = mq.getRowCount(aggregate);
+  if (inputRowCount == null || rowCount == null) {
 return null;
   }
   // 2. CPU cost = sorting cost
-  final double cpuCost = algoUtils.computeSortCPUCost(rCount);
+  final double cpuCost = algoUtils.computeSortCPUCost(rowCount) + 
inputRowCount * algoUtils.getCpuUnitCost();

Review comment:
   sure; I'll open a separate ticket for the cost model changes





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479641)
Time Spent: 1h 50m  (was: 1h 40m)

> Enhance cost model to push down more Aggregates
> ---
>
> Key: HIVE-24084
> URL: https://issues.apache.org/jira/browse/HIVE-24084
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23976) Enable vectorization for multi-col semi join reducers

2020-09-07 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191703#comment-17191703
 ] 

Stamatis Zampetakis commented on HIVE-23976:


Hey [~abstractdog], thanks for taking over this :)

I had a quick look on the PR and noticed that the vectorized hash 
implementation seems to be a binary operator (two inputs, one output) while the 
non-vectorized alternative (GenericUDFMurmurHash) is an n-ary operator. 

Can we make the vectorized implementation n-ary or we should rather transform 
an expression {{hash(a,b,c,d)}} to something like {{hash(hash(hash(a,b),c),d)}}?

In any case if we don't treat this now we should create a follow-up JIRA.

> Enable vectorization for multi-col semi join reducers
> -
>
> Key: HIVE-23976
> URL: https://issues.apache.org/jira/browse/HIVE-23976
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-21196 introduces multi-column semi-join reducers in the query engine. 
> However, the implementation relies on GenericUDFMurmurHash which is not 
> vectorized thus the respective operators cannot be executed in vectorized 
> mode. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24084) Enhance cost model to push down more Aggregates

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24084?focusedWorklogId=479629=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479629
 ]

ASF GitHub Bot logged work on HIVE-24084:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 13:03
Start Date: 07/Sep/20 13:03
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1439:
URL: https://github.com/apache/hive/pull/1439#discussion_r484419776



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAggregateJoinTransposeRule.java
##
@@ -303,6 +305,90 @@ public void onMatch(RelOptRuleCall call) {
 }
   }
 
+  /**
+   * Determines weather the give grouping is unique.
+   *
+   * Consider a join which might produce non-unique rows; but later the 
results are aggregated again.
+   * This method determines if there are sufficient columns in the grouping 
which have been present previously as unique column(s).
+   */
+  private boolean isGroupingUnique(RelNode input, ImmutableBitSet groups) {
+if (groups.isEmpty()) {
+  return false;
+}
+RelMetadataQuery mq = input.getCluster().getMetadataQuery();
+Set uKeys = mq.getUniqueKeys(input);
+for (ImmutableBitSet u : uKeys) {
+  if (groups.contains(u)) {
+return true;
+  }
+}
+if (input instanceof Join) {
+  Join join = (Join) input;
+  RexBuilder rexBuilder = input.getCluster().getRexBuilder();
+  SimpleConditionInfo cond = new SimpleConditionInfo(join.getCondition(), 
rexBuilder);
+
+  if (cond.valid) {
+ImmutableBitSet newGroup = 
groups.intersect(ImmutableBitSet.fromBitSet(cond.fields));
+RelNode l = join.getLeft();
+RelNode r = join.getRight();
+
+int joinFieldCount = join.getRowType().getFieldCount();
+int lFieldCount = l.getRowType().getFieldCount();
+
+ImmutableBitSet groupL = newGroup.get(0, lFieldCount);
+ImmutableBitSet groupR = newGroup.get(lFieldCount, 
joinFieldCount).shift(-lFieldCount);
+
+if (isGroupingUnique(l, groupL)) {

Review comment:
   this method recursively checks that the above condition is satisfied or 
not - that's why it needs to call itself





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479629)
Time Spent: 1h 40m  (was: 1.5h)

> Enhance cost model to push down more Aggregates
> ---
>
> Key: HIVE-24084
> URL: https://issues.apache.org/jira/browse/HIVE-24084
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=479617=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479617
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 12:10
Start Date: 07/Sep/20 12:10
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1474:
URL: https://github.com/apache/hive/pull/1474#discussion_r484392701



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionDagSubmitPlugin.java
##
@@ -76,7 +76,7 @@ public void beforeExecute(int executionIndex, boolean 
explainReOptimization) {
   }
 
   @Override
-  public boolean shouldReExecute(int executionNum, CommandProcessorException 
ex) {
+  public boolean shouldReExecute(int executionNum) {
 return (executionNum < maxExecutions) && retryPossible;

Review comment:
   changed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479617)
Time Spent: 6.5h  (was: 6h 20m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=479615=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479615
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 12:03
Start Date: 07/Sep/20 12:03
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1474:
URL: https://github.com/apache/hive/pull/1474#discussion_r484389253



##
File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java
##
@@ -2324,142 +2315,6 @@ private void 
testConcurrentMergeInsertNoDuplicates(String query, boolean sharedW
 List res = new ArrayList();
 driver.getFetchTask().fetch(res);
 Assert.assertEquals("Duplicate records found", 4, res.size());
-dropTable(new String[]{"target", "source"});
-  }
-
-  /**
-   * ValidTxnManager.isValidTxnListState can invalidate a snapshot if a 
relevant write transaction was committed
-   * between a query compilation and lock acquisition. When this happens we 
have to recompile the given query,
-   * otherwise we can miss reading partitions created between. The following 
three cases test these scenarios.
-   * @throws Exception ex
-   */
-  @Test
-  public void testMergeInsertDynamicPartitioningSequential() throws Exception {

Review comment:
   Ack.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479615)
Time Spent: 6h 20m  (was: 6h 10m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=479614=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479614
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 12:01
Start Date: 07/Sep/20 12:01
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1474:
URL: https://github.com/apache/hive/pull/1474#discussion_r484388472



##
File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java
##
@@ -2324,142 +2315,6 @@ private void 
testConcurrentMergeInsertNoDuplicates(String query, boolean sharedW
 List res = new ArrayList();
 driver.getFetchTask().fetch(res);
 Assert.assertEquals("Duplicate records found", 4, res.size());
-dropTable(new String[]{"target", "source"});
-  }
-
-  /**
-   * ValidTxnManager.isValidTxnListState can invalidate a snapshot if a 
relevant write transaction was committed
-   * between a query compilation and lock acquisition. When this happens we 
have to recompile the given query,
-   * otherwise we can miss reading partitions created between. The following 
three cases test these scenarios.
-   * @throws Exception ex
-   */
-  @Test
-  public void testMergeInsertDynamicPartitioningSequential() throws Exception {

Review comment:
   These tests should be added back in your next change, when we recompile 
without the reexec driver, to verify the dyn partitioning use-cases





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479614)
Time Spent: 6h 10m  (was: 6h)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24072) HiveAggregateJoinTransposeRule may try to create an invalid transformation

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24072?focusedWorklogId=479613=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479613
 ]

ASF GitHub Bot logged work on HIVE-24072:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 11:56
Start Date: 07/Sep/20 11:56
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1432:
URL: https://github.com/apache/hive/pull/1432#discussion_r484385748



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAggregateJoinTransposeRule.java
##
@@ -145,8 +145,7 @@ public void onMatch(RelOptRuleCall call) {
 int fieldCount = joinInput.getRowType().getFieldCount();
 final ImmutableBitSet fieldSet =
 ImmutableBitSet.range(offset, offset + fieldCount);
-final ImmutableBitSet belowAggregateKeyNotShifted =
-belowAggregateColumns.intersect(fieldSet);

Review comment:
   right; the column which was causing the trouble was the preceeding 
join's joinkey.
   
   the issue was caused by
   * that between the previous / and current join there were no projects - so 
all the join keys of the previous join were present in the input
   * meanwhile the aggregate had references to `0,2,...` columns - which were 
unique; so the logic assumed that the `joinInput` could be used as is
   
   however..because column 1 was present in the input; but not in the output 
this have caused that the actual join was not in sync with the mapping being 
created
   
   I think the patch may make more sense than the above reasoning :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479613)
Time Spent: 50m  (was: 40m)

> HiveAggregateJoinTransposeRule may try to create an invalid transformation
> --
>
> Key: HIVE-24072
> URL: https://issues.apache.org/jira/browse/HIVE-24072
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code}
> java.lang.AssertionError: 
> Cannot add expression of different type to set:
> set type is RecordType(INTEGER NOT NULL o_orderkey, DECIMAL(10, 0) 
> o_totalprice, DATE o_orderdate, INTEGER NOT NULL c_custkey, VARCHAR(25) 
> CHARACTER SET "UTF-16LE" c_name, DOUBLE $f5) NOT NULL
> expression type is RecordType(INTEGER NOT NULL o_orderkey, INTEGER NOT NULL 
> o_custkey, DECIMAL(10, 0) o_totalprice, DATE o_orderdate, INTEGER NOT NULL 
> c_custkey, DOUBLE $f1) NOT NULL
> set is rel#567:HiveAggregate.HIVE.[].any(input=HepRelVertex#490,group={2, 4, 
> 5, 6, 7},agg#0=sum($1))
> expression is HiveProject(o_orderkey=[$2], o_custkey=[$3], o_totalprice=[$4], 
> o_orderdate=[$5], c_custkey=[$6], $f1=[$1])
>   HiveJoin(condition=[=($2, $0)], joinType=[inner], algorithm=[none], 
> cost=[{2284.5 rows, 0.0 cpu, 0.0 io}])
> HiveAggregate(group=[{0}], agg#0=[sum($1)])
>   HiveProject(l_orderkey=[$0], l_quantity=[$4])
> HiveTableScan(table=[[tpch_0_001, lineitem]], table:alias=[l])
> HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[none], 
> cost=[{1.9115E15 rows, 0.0 cpu, 0.0 io}])
>   HiveJoin(condition=[=($4, $1)], joinType=[inner], algorithm=[none], 
> cost=[{1650.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(o_orderkey=[$0], o_custkey=[$1], o_totalprice=[$3], 
> o_orderdate=[$4])
>   HiveTableScan(table=[[tpch_0_001, orders]], table:alias=[orders])
> HiveProject(c_custkey=[$0], c_name=[$1])
>   HiveTableScan(table=[[tpch_0_001, customer]], 
> table:alias=[customer])
>   HiveProject($f0=[$0])
> HiveFilter(condition=[>($1, 3E2)])
>   HiveAggregate(group=[{0}], agg#0=[sum($4)])
> HiveTableScan(table=[[tpch_0_001, lineitem]], 
> table:alias=[lineitem])
>   at 
> org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:383)
>   at 
> org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:57)
>   at 
> org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:236)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveAggregateJoinTransposeRule.onMatch(HiveAggregateJoinTransposeRule.java:300)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=479607=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479607
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 11:47
Start Date: 07/Sep/20 11:47
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1474:
URL: https://github.com/apache/hive/pull/1474#discussion_r484381500



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionDagSubmitPlugin.java
##
@@ -76,7 +76,7 @@ public void beforeExecute(int executionIndex, boolean 
explainReOptimization) {
   }
 
   @Override
-  public boolean shouldReExecute(int executionNum, CommandProcessorException 
ex) {
+  public boolean shouldReExecute(int executionNum) {
 return (executionNum < maxExecutions) && retryPossible;

Review comment:
   please change this to `retryPossible` ( and remove maxExecutions field)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479607)
Time Spent: 6h  (was: 5h 50m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=479606=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479606
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 11:38
Start Date: 07/Sep/20 11:38
Worklog Time Spent: 10m 
  Work Description: deniskuzZ opened a new pull request #1474:
URL: https://github.com/apache/hive/pull/1474


   …l reads (Peter Varga, reviewed by Jesus Camacho Rodriguez, Denys Kuzmenko)"
   
   This reverts commit e2a02f1b43cba657d4d1c16ead091072be5fe834.
   
   
   
   ### What changes were proposed in this pull request?
   
   reverts https://issues.apache.org/jira/browse/HIVE-23725
   
   ### Why are the changes needed?
   
   doesn't completely solve the problem described in a JIRA, will be replaced 
with another solution
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479606)
Time Spent: 5h 50m  (was: 5h 40m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24082) Expose information whether AcidUtils.ParsedDelta contains statementId

2020-09-07 Thread Piotr Findeisen (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191652#comment-17191652
 ] 

Piotr Findeisen commented on HIVE-24082:


 

We call these methods today:
{code:java}
AcidUtils.isTransactionalTable
AcidUtils.isFullAcidTable
AcidUtils.getAcidState 
AcidUtils.OrcAcidVersion.getAcidVersionFromMetaFile
AcidUtils.deleteDeltaSubdir
AcidUtils.createBucketFile{code}
I expect further usage as we advance Presto's support for ORC ACID / 
Transactional tables.

I am aware AcidUtils is not public interface, so I am aware a breakage may 
occur when we upgrade.
We chose to do this  since the ACID handling logic has quite a few nuances that 
are easy to overlook. Copying the logic over to Presto codebase would probably 
be safer from code compilation perspective, but could cause trouble as ORC ACID 
evolves.

 
{quote}we were planning to change things around that.
{quote}
 

sure! I am always curious so please CC me whenever you feel like I could be 
interested.

 

> Expose information whether AcidUtils.ParsedDelta contains statementId
> -
>
> Key: HIVE-24082
> URL: https://issues.apache.org/jira/browse/HIVE-24082
> Project: Hive
>  Issue Type: Improvement
>Reporter: Piotr Findeisen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In [Presto|https://prestosql.io] we support reading ORC ACID tables by 
> leveraging AcidUtils rather than duplicate the file name parsing logic in our 
> code.
> To do this fully correctly, we need information whether 
> {{org.apache.hadoop.hive.ql.io.AcidUtils.ParsedDelta}} contains 
> {{statementId}} information or not. 
> Currently, a getter of that property does not allow us to access this 
> information.
> [https://github.com/apache/hive/blob/468907eab36f78df3e14a24005153c9a23d62555/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L804-L806]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23851?focusedWorklogId=479595=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479595
 ]

ASF GitHub Bot logged work on HIVE-23851:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 11:26
Start Date: 07/Sep/20 11:26
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on pull request #1271:
URL: https://github.com/apache/hive/pull/1271#issuecomment-688263050


   @kgyrtkirk Ping for review request~



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479595)
Time Spent: 2h 40m  (was: 2.5h)

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> 

[jira] [Commented] (HIVE-24082) Expose information whether AcidUtils.ParsedDelta contains statementId

2020-09-07 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191639#comment-17191639
 ] 

Peter Vary commented on HIVE-24082:
---

[~findepi]: Which part of the AcidUtils code are you using? This is not a 
public facing interface and we were planning to change things around that.

CC: [~pvargacl], [~b.maidics]

> Expose information whether AcidUtils.ParsedDelta contains statementId
> -
>
> Key: HIVE-24082
> URL: https://issues.apache.org/jira/browse/HIVE-24082
> Project: Hive
>  Issue Type: Improvement
>Reporter: Piotr Findeisen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In [Presto|https://prestosql.io] we support reading ORC ACID tables by 
> leveraging AcidUtils rather than duplicate the file name parsing logic in our 
> code.
> To do this fully correctly, we need information whether 
> {{org.apache.hadoop.hive.ql.io.AcidUtils.ParsedDelta}} contains 
> {{statementId}} information or not. 
> Currently, a getter of that property does not allow us to access this 
> information.
> [https://github.com/apache/hive/blob/468907eab36f78df3e14a24005153c9a23d62555/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L804-L806]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24082) Expose information whether AcidUtils.ParsedDelta contains statementId

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24082?focusedWorklogId=479581=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479581
 ]

ASF GitHub Bot logged work on HIVE-24082:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 10:55
Start Date: 07/Sep/20 10:55
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1438:
URL: https://github.com/apache/hive/pull/1438#issuecomment-688247601


   fyi @pvary 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479581)
Time Spent: 50m  (was: 40m)

> Expose information whether AcidUtils.ParsedDelta contains statementId
> -
>
> Key: HIVE-24082
> URL: https://issues.apache.org/jira/browse/HIVE-24082
> Project: Hive
>  Issue Type: Improvement
>Reporter: Piotr Findeisen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In [Presto|https://prestosql.io] we support reading ORC ACID tables by 
> leveraging AcidUtils rather than duplicate the file name parsing logic in our 
> code.
> To do this fully correctly, we need information whether 
> {{org.apache.hadoop.hive.ql.io.AcidUtils.ParsedDelta}} contains 
> {{statementId}} information or not. 
> Currently, a getter of that property does not allow us to access this 
> information.
> [https://github.com/apache/hive/blob/468907eab36f78df3e14a24005153c9a23d62555/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L804-L806]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24104) NPE due to null key columns in ReduceSink after deduplication

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24104?focusedWorklogId=479576=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479576
 ]

ASF GitHub Bot logged work on HIVE-24104:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 10:50
Start Date: 07/Sep/20 10:50
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1453:
URL: https://github.com/apache/hive/pull/1453


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479576)
Time Spent: 40m  (was: 0.5h)

> NPE due to null key columns in ReduceSink after deduplication
> -
>
> Key: HIVE-24104
> URL: https://issues.apache.org/jira/browse/HIVE-24104
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In some cases the {{ReduceSinkDeDuplication}} optimization creates ReduceSink 
> operators where the key columns are null. This can lead to NPE in various 
> places in the code. 
> The following stracktraces show some places where a NPE appears. Note that 
> the stacktraces do not correspond to the same query.
> +NPE  during planning+
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.plan.ExprNodeDesc$ExprNodeDescEqualityWrapper.equals(ExprNodeDesc.java:141)
>   at java.util.AbstractList.equals(AbstractList.java:523)
>   at 
> org.apache.hadoop.hive.ql.optimizer.SetReducerParallelism.process(SetReducerParallelism.java:101)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>   at 
> org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:74)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>   at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsDependentOptimizations(TezCompiler.java:492)
>   at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:226)
>   at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:161)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12643)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:443)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
>   at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:173)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:414)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:363)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:357)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:129)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:231)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:740)
>   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:710)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>   at 
> 

[jira] [Resolved] (HIVE-24104) NPE due to null key columns in ReduceSink after deduplication

2020-09-07 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24104.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you [~zabetak]!

> NPE due to null key columns in ReduceSink after deduplication
> -
>
> Key: HIVE-24104
> URL: https://issues.apache.org/jira/browse/HIVE-24104
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In some cases the {{ReduceSinkDeDuplication}} optimization creates ReduceSink 
> operators where the key columns are null. This can lead to NPE in various 
> places in the code. 
> The following stracktraces show some places where a NPE appears. Note that 
> the stacktraces do not correspond to the same query.
> +NPE  during planning+
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.plan.ExprNodeDesc$ExprNodeDescEqualityWrapper.equals(ExprNodeDesc.java:141)
>   at java.util.AbstractList.equals(AbstractList.java:523)
>   at 
> org.apache.hadoop.hive.ql.optimizer.SetReducerParallelism.process(SetReducerParallelism.java:101)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>   at 
> org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:74)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>   at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsDependentOptimizations(TezCompiler.java:492)
>   at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:226)
>   at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:161)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12643)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:443)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
>   at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:173)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:414)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:363)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:357)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:129)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:231)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:740)
>   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:710)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>   at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> 

[jira] [Work logged] (HIVE-24082) Expose information whether AcidUtils.ParsedDelta contains statementId

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24082?focusedWorklogId=479575=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479575
 ]

ASF GitHub Bot logged work on HIVE-24082:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 10:47
Start Date: 07/Sep/20 10:47
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1438:
URL: https://github.com/apache/hive/pull/1438#issuecomment-688243599


   no need; it's fine to have the patch only in the PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479575)
Time Spent: 40m  (was: 0.5h)

> Expose information whether AcidUtils.ParsedDelta contains statementId
> -
>
> Key: HIVE-24082
> URL: https://issues.apache.org/jira/browse/HIVE-24082
> Project: Hive
>  Issue Type: Improvement
>Reporter: Piotr Findeisen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In [Presto|https://prestosql.io] we support reading ORC ACID tables by 
> leveraging AcidUtils rather than duplicate the file name parsing logic in our 
> code.
> To do this fully correctly, we need information whether 
> {{org.apache.hadoop.hive.ql.io.AcidUtils.ParsedDelta}} contains 
> {{statementId}} information or not. 
> Currently, a getter of that property does not allow us to access this 
> information.
> [https://github.com/apache/hive/blob/468907eab36f78df3e14a24005153c9a23d62555/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L804-L806]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23938) LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used anymore

2020-09-07 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-23938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191620#comment-17191620
 ] 

László Bodor edited comment on HIVE-23938 at 9/7/20, 10:20 AM:
---

[~ashutoshc]: checked, new args are not backward compatible with JDK8, created 
a compatible version of the 
[bashscript|https://github.com/apache/hive/pull/1430/commits/ec6d133c14acb6acba49d9c69e582a36a1aed1f4],
 tested on cluster with JDK8 and JDK11
let me know if +1 still valid for the new approach


was (Author: abstractdog):
[~ashutoshc]: checked, new args are not backward compatible with JDK8, created 
a compatible version of the 
[bashscript|https://github.com/apache/hive/pull/1430/commits/ec6d133c14acb6acba49d9c69e582a36a1aed1f4]
let me know if +1 still valid for the new approach

> LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used 
> anymore
> 
>
> Key: HIVE-23938
> URL: https://issues.apache.org/jira/browse/HIVE-23938
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: gc_2020-07-27-13.log, gc_2020-07-29-12.jdk8.log, 
> llap_new_gc_options_fail_on_jdk8.log
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/master/llap-server/bin/runLlapDaemon.sh#L55
> {code}
> JAVA_OPTS_BASE="-server -Djava.net.preferIPv4Stack=true -XX:+UseNUMA 
> -XX:+PrintGCDetails -verbose:gc -XX:+UseGCLogFileRotation 
> -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M -XX:+PrintGCDateStamps"
> {code}
> on JDK11 I got something like:
> {code}
> + exec /usr/lib/jvm/jre-11-openjdk/bin/java -Dproc_llapdaemon -Xms32000m 
> -Xmx64000m -Dhttp.maxConnections=17 -XX:+UseG1GC -XX:+ResizeTLAB -XX:+UseNUMA 
> -XX:+AggressiveOpts -XX:MetaspaceSize=1024m 
> -XX:InitiatingHeapOccupancyPercent=80 -XX:MaxGCPauseMillis=200 
> -XX:+PreserveFramePointer -XX:AllocatePrefetchStyle=2 
> -Dhttp.maxConnections=10 -Dasync.profiler.home=/grid/0/async-profiler -server 
> -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails -verbose:gc 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M 
> -XX:+PrintGCDateStamps 
> -Xloggc:/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/gc_2020-07-27-12.log
>  
> ... 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon
> OpenJDK 64-Bit Server VM warning: Option AggressiveOpts was deprecated in 
> version 11.0 and will likely be removed in a future release.
> Unrecognized VM option 'UseGCLogFileRotation'
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> {code}
> These are not valid in JDK11:
> {code}
> -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles
> -XX:GCLogFileSize
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDateStamps
> {code}
> Instead something like:
> {code}
> -Xlog:gc*,safepoint:gc.log:time,uptime:filecount=4,filesize=100M
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23938) LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used anymore

2020-09-07 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-23938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191620#comment-17191620
 ] 

László Bodor edited comment on HIVE-23938 at 9/7/20, 10:19 AM:
---

[~ashutoshc]: checked, new args are not backward compatible with JDK8, created 
a compatible version of the 
[bashscript|https://github.com/apache/hive/pull/1430/commits/ec6d133c14acb6acba49d9c69e582a36a1aed1f4]
let me know if +1 still valid for the new approach


was (Author: abstractdog):
[~ashutoshc]: checked, new args are not backward compatible with JDK8, created 
a compatible version of the bashscript: 
https://github.com/apache/hive/pull/1430/commits/4b8b3d0673b936f7d26d7b62be786e21490d85e2

> LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used 
> anymore
> 
>
> Key: HIVE-23938
> URL: https://issues.apache.org/jira/browse/HIVE-23938
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: gc_2020-07-27-13.log, gc_2020-07-29-12.jdk8.log, 
> llap_new_gc_options_fail_on_jdk8.log
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/master/llap-server/bin/runLlapDaemon.sh#L55
> {code}
> JAVA_OPTS_BASE="-server -Djava.net.preferIPv4Stack=true -XX:+UseNUMA 
> -XX:+PrintGCDetails -verbose:gc -XX:+UseGCLogFileRotation 
> -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M -XX:+PrintGCDateStamps"
> {code}
> on JDK11 I got something like:
> {code}
> + exec /usr/lib/jvm/jre-11-openjdk/bin/java -Dproc_llapdaemon -Xms32000m 
> -Xmx64000m -Dhttp.maxConnections=17 -XX:+UseG1GC -XX:+ResizeTLAB -XX:+UseNUMA 
> -XX:+AggressiveOpts -XX:MetaspaceSize=1024m 
> -XX:InitiatingHeapOccupancyPercent=80 -XX:MaxGCPauseMillis=200 
> -XX:+PreserveFramePointer -XX:AllocatePrefetchStyle=2 
> -Dhttp.maxConnections=10 -Dasync.profiler.home=/grid/0/async-profiler -server 
> -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails -verbose:gc 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M 
> -XX:+PrintGCDateStamps 
> -Xloggc:/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/gc_2020-07-27-12.log
>  
> ... 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon
> OpenJDK 64-Bit Server VM warning: Option AggressiveOpts was deprecated in 
> version 11.0 and will likely be removed in a future release.
> Unrecognized VM option 'UseGCLogFileRotation'
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> {code}
> These are not valid in JDK11:
> {code}
> -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles
> -XX:GCLogFileSize
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDateStamps
> {code}
> Instead something like:
> {code}
> -Xlog:gc*,safepoint:gc.log:time,uptime:filecount=4,filesize=100M
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23938) LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used anymore

2020-09-07 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-23938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191620#comment-17191620
 ] 

László Bodor commented on HIVE-23938:
-

[~ashutoshc]: checked, new args are not backward compatible with JDK8, created 
a compatible version of the bashscript: 
https://github.com/apache/hive/pull/1430/commits/4b8b3d0673b936f7d26d7b62be786e21490d85e2

> LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used 
> anymore
> 
>
> Key: HIVE-23938
> URL: https://issues.apache.org/jira/browse/HIVE-23938
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: gc_2020-07-27-13.log, gc_2020-07-29-12.jdk8.log, 
> llap_new_gc_options_fail_on_jdk8.log
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/master/llap-server/bin/runLlapDaemon.sh#L55
> {code}
> JAVA_OPTS_BASE="-server -Djava.net.preferIPv4Stack=true -XX:+UseNUMA 
> -XX:+PrintGCDetails -verbose:gc -XX:+UseGCLogFileRotation 
> -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M -XX:+PrintGCDateStamps"
> {code}
> on JDK11 I got something like:
> {code}
> + exec /usr/lib/jvm/jre-11-openjdk/bin/java -Dproc_llapdaemon -Xms32000m 
> -Xmx64000m -Dhttp.maxConnections=17 -XX:+UseG1GC -XX:+ResizeTLAB -XX:+UseNUMA 
> -XX:+AggressiveOpts -XX:MetaspaceSize=1024m 
> -XX:InitiatingHeapOccupancyPercent=80 -XX:MaxGCPauseMillis=200 
> -XX:+PreserveFramePointer -XX:AllocatePrefetchStyle=2 
> -Dhttp.maxConnections=10 -Dasync.profiler.home=/grid/0/async-profiler -server 
> -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails -verbose:gc 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M 
> -XX:+PrintGCDateStamps 
> -Xloggc:/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/gc_2020-07-27-12.log
>  
> ... 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon
> OpenJDK 64-Bit Server VM warning: Option AggressiveOpts was deprecated in 
> version 11.0 and will likely be removed in a future release.
> Unrecognized VM option 'UseGCLogFileRotation'
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> {code}
> These are not valid in JDK11:
> {code}
> -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles
> -XX:GCLogFileSize
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDateStamps
> {code}
> Instead something like:
> {code}
> -Xlog:gc*,safepoint:gc.log:time,uptime:filecount=4,filesize=100M
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23938) LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used anymore

2020-09-07 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-23938:

Attachment: llap_new_gc_options_fail_on_jdk8.log

> LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used 
> anymore
> 
>
> Key: HIVE-23938
> URL: https://issues.apache.org/jira/browse/HIVE-23938
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: gc_2020-07-27-13.log, gc_2020-07-29-12.jdk8.log, 
> llap_new_gc_options_fail_on_jdk8.log
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/master/llap-server/bin/runLlapDaemon.sh#L55
> {code}
> JAVA_OPTS_BASE="-server -Djava.net.preferIPv4Stack=true -XX:+UseNUMA 
> -XX:+PrintGCDetails -verbose:gc -XX:+UseGCLogFileRotation 
> -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M -XX:+PrintGCDateStamps"
> {code}
> on JDK11 I got something like:
> {code}
> + exec /usr/lib/jvm/jre-11-openjdk/bin/java -Dproc_llapdaemon -Xms32000m 
> -Xmx64000m -Dhttp.maxConnections=17 -XX:+UseG1GC -XX:+ResizeTLAB -XX:+UseNUMA 
> -XX:+AggressiveOpts -XX:MetaspaceSize=1024m 
> -XX:InitiatingHeapOccupancyPercent=80 -XX:MaxGCPauseMillis=200 
> -XX:+PreserveFramePointer -XX:AllocatePrefetchStyle=2 
> -Dhttp.maxConnections=10 -Dasync.profiler.home=/grid/0/async-profiler -server 
> -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails -verbose:gc 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M 
> -XX:+PrintGCDateStamps 
> -Xloggc:/grid/2/yarn/container-logs/application_1595375468459_0113/container_e26_1595375468459_0113_01_09/gc_2020-07-27-12.log
>  
> ... 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon
> OpenJDK 64-Bit Server VM warning: Option AggressiveOpts was deprecated in 
> version 11.0 and will likely be removed in a future release.
> Unrecognized VM option 'UseGCLogFileRotation'
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> {code}
> These are not valid in JDK11:
> {code}
> -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles
> -XX:GCLogFileSize
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDateStamps
> {code}
> Instead something like:
> {code}
> -Xlog:gc*,safepoint:gc.log:time,uptime:filecount=4,filesize=100M
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24082) Expose information whether AcidUtils.ParsedDelta contains statementId

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24082?focusedWorklogId=479557=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479557
 ]

ASF GitHub Bot logged work on HIVE-24082:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 09:50
Start Date: 07/Sep/20 09:50
Worklog Time Spent: 10m 
  Work Description: findepi commented on pull request #1438:
URL: https://github.com/apache/hive/pull/1438#issuecomment-688210666


   @kgyrtkirk thanks for the info about tests.
   Should I also attach a patch in the JIRA or is the PR enough?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479557)
Time Spent: 0.5h  (was: 20m)

> Expose information whether AcidUtils.ParsedDelta contains statementId
> -
>
> Key: HIVE-24082
> URL: https://issues.apache.org/jira/browse/HIVE-24082
> Project: Hive
>  Issue Type: Improvement
>Reporter: Piotr Findeisen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In [Presto|https://prestosql.io] we support reading ORC ACID tables by 
> leveraging AcidUtils rather than duplicate the file name parsing logic in our 
> code.
> To do this fully correctly, we need information whether 
> {{org.apache.hadoop.hive.ql.io.AcidUtils.ParsedDelta}} contains 
> {{statementId}} information or not. 
> Currently, a getter of that property does not allow us to access this 
> information.
> [https://github.com/apache/hive/blob/468907eab36f78df3e14a24005153c9a23d62555/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L804-L806]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23408) Hive on Tez : Kafka storage handler broken in secure environment

2020-09-07 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-23408:

Fix Version/s: 4.0.0

> Hive on Tez :  Kafka storage handler broken in secure environment
> -
>
> Key: HIVE-23408
> URL: https://issues.apache.org/jira/browse/HIVE-23408
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Rajkumar Singh
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> hive.server2.authentication.kerberos.principal set in the form of 
> hive/_HOST@REALM,
> Tez task can start at the random NM host and unfold the value of _HOST with 
> the value of fqdn where it is running. this leads to an authentication issue.
> for LLAP there is fallback for LLAP daemon keytab/principal, Kafka 1.1 
> onwards support delegation token and we should take advantage of it for hive 
> on tez.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23408) Hive on Tez : Kafka storage handler broken in secure environment

2020-09-07 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor resolved HIVE-23408.
-
Resolution: Fixed

> Hive on Tez :  Kafka storage handler broken in secure environment
> -
>
> Key: HIVE-23408
> URL: https://issues.apache.org/jira/browse/HIVE-23408
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Rajkumar Singh
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> hive.server2.authentication.kerberos.principal set in the form of 
> hive/_HOST@REALM,
> Tez task can start at the random NM host and unfold the value of _HOST with 
> the value of fqdn where it is running. this leads to an authentication issue.
> for LLAP there is fallback for LLAP daemon keytab/principal, Kafka 1.1 
> onwards support delegation token and we should take advantage of it for hive 
> on tez.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23408) Hive on Tez : Kafka storage handler broken in secure environment

2020-09-07 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-23408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191576#comment-17191576
 ] 

László Bodor commented on HIVE-23408:
-

PR merged, thanks [~ashutoshc] for the review!

> Hive on Tez :  Kafka storage handler broken in secure environment
> -
>
> Key: HIVE-23408
> URL: https://issues.apache.org/jira/browse/HIVE-23408
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Rajkumar Singh
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> hive.server2.authentication.kerberos.principal set in the form of 
> hive/_HOST@REALM,
> Tez task can start at the random NM host and unfold the value of _HOST with 
> the value of fqdn where it is running. this leads to an authentication issue.
> for LLAP there is fallback for LLAP daemon keytab/principal, Kafka 1.1 
> onwards support delegation token and we should take advantage of it for hive 
> on tez.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23408) Hive on Tez : Kafka storage handler broken in secure environment

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23408?focusedWorklogId=479541=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479541
 ]

ASF GitHub Bot logged work on HIVE-23408:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 08:39
Start Date: 07/Sep/20 08:39
Worklog Time Spent: 10m 
  Work Description: abstractdog merged pull request #1379:
URL: https://github.com/apache/hive/pull/1379


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479541)
Time Spent: 1h 20m  (was: 1h 10m)

> Hive on Tez :  Kafka storage handler broken in secure environment
> -
>
> Key: HIVE-23408
> URL: https://issues.apache.org/jira/browse/HIVE-23408
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Rajkumar Singh
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> hive.server2.authentication.kerberos.principal set in the form of 
> hive/_HOST@REALM,
> Tez task can start at the random NM host and unfold the value of _HOST with 
> the value of fqdn where it is running. this leads to an authentication issue.
> for LLAP there is fallback for LLAP daemon keytab/principal, Kafka 1.1 
> onwards support delegation token and we should take advantage of it for hive 
> on tez.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24094) cast type mismatch and use is not null, the results are error if cbo is true

2020-09-07 Thread zhaolong (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191514#comment-17191514
 ] 

zhaolong edited comment on HIVE-24094 at 9/7/20, 7:25 AM:
--

cbo=false

!image-2020-09-07-15-24-59-015.png!

cbo=true

!image-2020-09-07-15-25-18-785.png!

  this is maybe 
 calcite problem? I find calcite change the exps, and i delete "basePlan = 
fieldTrimmer.trim(basePlan);" and "optimizedRelNode = planner.findBestExp();" 
then the result is correct.
 !image-2020-09-07-15-21-35-566.png!
 !image-2020-09-07-15-20-44-201.png!


was (Author: fsilent):
!image-2020-09-07-15-15-22-869.png|width=526,height=222!  this is maybe 
calcite problem? I find calcite change the exps, and i delete "basePlan = 
fieldTrimmer.trim(basePlan);" and "optimizedRelNode = planner.findBestExp();" 
then the result is correct.
!image-2020-09-07-15-21-35-566.png!
!image-2020-09-07-15-20-44-201.png!

> cast type mismatch and use is not null, the results are error if cbo is true
> 
>
> Key: HIVE-24094
> URL: https://issues.apache.org/jira/browse/HIVE-24094
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.0
>Reporter: zhaolong
>Priority: Major
> Attachments: image-2020-08-31-10-01-26-250.png, 
> image-2020-08-31-10-02-39-154.png, image-2020-09-04-10-54-43-141.png, 
> image-2020-09-04-10-56-00-764.png, image-2020-09-04-10-56-07-286.png, 
> image-2020-09-04-10-59-36-780.png, image-2020-09-04-11-02-07-917.png, 
> image-2020-09-04-11-02-18-008.png, image-2020-09-07-15-20-44-201.png, 
> image-2020-09-07-15-21-35-566.png, image-2020-09-07-15-24-59-015.png, 
> image-2020-09-07-15-25-18-785.png
>
>
> 1.CREATE TABLE IF NOT EXISTS testa
> ( 
>  SEARCHWORD STRING, 
>  COUNT_NUM BIGINT, 
>  WORDS STRING 
> ) 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\27' 
> STORED AS TEXTFILE; 
> 2.insert into testa values('searchword', 1, 'a');
> 3.set hive.cbo.enable=false;
> 4.SELECT 
> CASE 
>  WHEN CAST(searchword as bigint) IS NOT NULL THEN CAST(CAST(searchword as 
> bigint) as String) 
>  ELSE searchword 
> END AS WORDS, 
> searchword FROM testa;
> !image-2020-08-31-10-01-26-250.png!
> 5.set hive.cbo.enable=true;
> 6.SELECT 
> CASE 
>  WHEN CAST(searchword as bigint) IS NOT NULL THEN CAST(CAST(searchword as 
> bigint) as String) 
>  ELSE searchword 
> END AS WORDS, 
> searchword FROM testa;
> !image-2020-08-31-10-02-39-154.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24094) cast type mismatch and use is not null, the results are error if cbo is true

2020-09-07 Thread zhaolong (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17191514#comment-17191514
 ] 

zhaolong commented on HIVE-24094:
-

!image-2020-09-07-15-15-22-869.png|width=526,height=222!  this is maybe 
calcite problem? I find calcite change the exps, and i delete "basePlan = 
fieldTrimmer.trim(basePlan);" and "optimizedRelNode = planner.findBestExp();" 
then the result is correct.
!image-2020-09-07-15-21-35-566.png!
!image-2020-09-07-15-20-44-201.png!

> cast type mismatch and use is not null, the results are error if cbo is true
> 
>
> Key: HIVE-24094
> URL: https://issues.apache.org/jira/browse/HIVE-24094
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.0
>Reporter: zhaolong
>Priority: Major
> Attachments: image-2020-08-31-10-01-26-250.png, 
> image-2020-08-31-10-02-39-154.png, image-2020-09-04-10-54-43-141.png, 
> image-2020-09-04-10-56-00-764.png, image-2020-09-04-10-56-07-286.png, 
> image-2020-09-04-10-59-36-780.png, image-2020-09-04-11-02-07-917.png, 
> image-2020-09-04-11-02-18-008.png, image-2020-09-07-15-20-44-201.png, 
> image-2020-09-07-15-21-35-566.png
>
>
> 1.CREATE TABLE IF NOT EXISTS testa
> ( 
>  SEARCHWORD STRING, 
>  COUNT_NUM BIGINT, 
>  WORDS STRING 
> ) 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\27' 
> STORED AS TEXTFILE; 
> 2.insert into testa values('searchword', 1, 'a');
> 3.set hive.cbo.enable=false;
> 4.SELECT 
> CASE 
>  WHEN CAST(searchword as bigint) IS NOT NULL THEN CAST(CAST(searchword as 
> bigint) as String) 
>  ELSE searchword 
> END AS WORDS, 
> searchword FROM testa;
> !image-2020-08-31-10-01-26-250.png!
> 5.set hive.cbo.enable=true;
> 6.SELECT 
> CASE 
>  WHEN CAST(searchword as bigint) IS NOT NULL THEN CAST(CAST(searchword as 
> bigint) as String) 
>  ELSE searchword 
> END AS WORDS, 
> searchword FROM testa;
> !image-2020-08-31-10-02-39-154.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24094) cast type mismatch and use is not null, the results are error if cbo is true

2020-09-07 Thread zhaolong (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhaolong updated HIVE-24094:

Attachment: image-2020-09-07-15-21-35-566.png

> cast type mismatch and use is not null, the results are error if cbo is true
> 
>
> Key: HIVE-24094
> URL: https://issues.apache.org/jira/browse/HIVE-24094
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.0
>Reporter: zhaolong
>Priority: Major
> Attachments: image-2020-08-31-10-01-26-250.png, 
> image-2020-08-31-10-02-39-154.png, image-2020-09-04-10-54-43-141.png, 
> image-2020-09-04-10-56-00-764.png, image-2020-09-04-10-56-07-286.png, 
> image-2020-09-04-10-59-36-780.png, image-2020-09-04-11-02-07-917.png, 
> image-2020-09-04-11-02-18-008.png, image-2020-09-07-15-20-44-201.png, 
> image-2020-09-07-15-21-35-566.png
>
>
> 1.CREATE TABLE IF NOT EXISTS testa
> ( 
>  SEARCHWORD STRING, 
>  COUNT_NUM BIGINT, 
>  WORDS STRING 
> ) 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\27' 
> STORED AS TEXTFILE; 
> 2.insert into testa values('searchword', 1, 'a');
> 3.set hive.cbo.enable=false;
> 4.SELECT 
> CASE 
>  WHEN CAST(searchword as bigint) IS NOT NULL THEN CAST(CAST(searchword as 
> bigint) as String) 
>  ELSE searchword 
> END AS WORDS, 
> searchword FROM testa;
> !image-2020-08-31-10-01-26-250.png!
> 5.set hive.cbo.enable=true;
> 6.SELECT 
> CASE 
>  WHEN CAST(searchword as bigint) IS NOT NULL THEN CAST(CAST(searchword as 
> bigint) as String) 
>  ELSE searchword 
> END AS WORDS, 
> searchword FROM testa;
> !image-2020-08-31-10-02-39-154.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24094) cast type mismatch and use is not null, the results are error if cbo is true

2020-09-07 Thread zhaolong (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhaolong updated HIVE-24094:

Attachment: image-2020-09-07-15-20-44-201.png

> cast type mismatch and use is not null, the results are error if cbo is true
> 
>
> Key: HIVE-24094
> URL: https://issues.apache.org/jira/browse/HIVE-24094
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.0
>Reporter: zhaolong
>Priority: Major
> Attachments: image-2020-08-31-10-01-26-250.png, 
> image-2020-08-31-10-02-39-154.png, image-2020-09-04-10-54-43-141.png, 
> image-2020-09-04-10-56-00-764.png, image-2020-09-04-10-56-07-286.png, 
> image-2020-09-04-10-59-36-780.png, image-2020-09-04-11-02-07-917.png, 
> image-2020-09-04-11-02-18-008.png, image-2020-09-07-15-20-44-201.png
>
>
> 1.CREATE TABLE IF NOT EXISTS testa
> ( 
>  SEARCHWORD STRING, 
>  COUNT_NUM BIGINT, 
>  WORDS STRING 
> ) 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\27' 
> STORED AS TEXTFILE; 
> 2.insert into testa values('searchword', 1, 'a');
> 3.set hive.cbo.enable=false;
> 4.SELECT 
> CASE 
>  WHEN CAST(searchword as bigint) IS NOT NULL THEN CAST(CAST(searchword as 
> bigint) as String) 
>  ELSE searchword 
> END AS WORDS, 
> searchword FROM testa;
> !image-2020-08-31-10-01-26-250.png!
> 5.set hive.cbo.enable=true;
> 6.SELECT 
> CASE 
>  WHEN CAST(searchword as bigint) IS NOT NULL THEN CAST(CAST(searchword as 
> bigint) as String) 
>  ELSE searchword 
> END AS WORDS, 
> searchword FROM testa;
> !image-2020-08-31-10-02-39-154.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24119) fix the issue that the client got wrong jobTrackerUrl when resourcemanager has ha instances

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24119?focusedWorklogId=479488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479488
 ]

ASF GitHub Bot logged work on HIVE-24119:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 06:11
Start Date: 07/Sep/20 06:11
Worklog Time Spent: 10m 
  Work Description: Neilxzn commented on pull request #1469:
URL: https://github.com/apache/hive/pull/1469#issuecomment-688059289


   @kgyrtkirk  Can you help me review it ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479488)
Time Spent: 20m  (was: 10m)

> fix the issue that the client got wrong jobTrackerUrl when resourcemanager 
> has ha instances
> ---
>
> Key: HIVE-24119
> URL: https://issues.apache.org/jira/browse/HIVE-24119
> Project: Hive
>  Issue Type: Improvement
>  Components: Shims
>Affects Versions: 2.1.0
> Environment: ha resourcemanager
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2020-09-04-16-34-28-341.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When the clusters set HA ResourceManagers , the conf  
> `yarn.resourcemanager.address.rm1` or `yarn.resourcemanager.address.rm2`  
> will replace  the conf `yarn.resourcemanager.address`. 
> the conf `yarn.resourcemanager.address` may be not set,  then the method 
> Hadoop23Shims.getJobLauncherRpcAddress  will return a wrong value. 
> !image-2020-09-04-16-34-28-341.png!
> Maybe it should return the value of  the conf 
> `yarn.resourcemanager.address.rm1` or `yarn.resourcemanager.address.rm2`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)