[jira] [Work logged] (HIVE-24165) CBO: Query fails after multiple count distinct rewrite

2020-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24165?focusedWorklogId=503704=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503704
 ]

ASF GitHub Bot logged work on HIVE-24165:
-

Author: ASF GitHub Bot
Created on: 22/Oct/20 12:36
Start Date: 22/Oct/20 12:36
Worklog Time Spent: 10m 
  Work Description: loudongfeng opened a new pull request #1597:
URL: https://github.com/apache/hive/pull/1597


   
   
   ### What changes were proposed in this pull request?
   
   Keep Aggregate's groupSet in order during multiple distinct rewrite.
   
   ### Why are the changes needed?
   
   Fix column mismatch issue between HiveExpandDistinctAggregatesRule and 
AggregateProjectPullUpConstantsRule
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Test by qtests



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503704)
Remaining Estimate: 0h
Time Spent: 10m

> CBO: Query fails after multiple count distinct rewrite 
> ---
>
> Key: HIVE-24165
> URL: https://issues.apache.org/jira/browse/HIVE-24165
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>Priority: Major
> Attachments: HIVE-24165.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> One way to reproduce:
>  
> {code:sql}
>  CREATE TABLE test(
>  `device_id` string, 
>  `level` string, 
>  `site_id` string, 
>  `user_id` string, 
>  `first_date` string, 
>  `last_date` string,
>  `dt` string) ;
>  set hive.execution.engine=tez;
>  set hive.optimize.distinct.rewrite=true;
>  set hive.cli.print.header=true;
>  select 
>  dt,
>  site_id,
>  count(DISTINCT t1.device_id) as device_tol_cnt,
>  count(DISTINCT case when t1.first_date='2020-09-15' then t1.device_id else 
> null end) as device_add_cnt 
>  from test t1 where dt='2020-09-15' 
>  group by
>  dt,
>  site_id
>  ;
> {code}
>  
> Error log:  
> {code:java}
> Exception in thread "main" java.lang.AssertionError: Cannot add expression of 
> different type to set:
> set type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE 
> "ISO-8859-1$en_US$primary" $f2, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
> COLLATE "ISO-8859-1$en_US$primary" $f3, BIGINT $f2_0, BIGINT $f3_0) NOT NULL
> expression type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
> COLLATE "ISO-8859-1$en_US$primary" $f2, BIGINT $f3, BIGINT $f2_0, BIGINT 
> $f3_0) NOT NULL
> set is rel#85:HiveAggregate.HIVE.[](input=HepRelVertex#84,group={2, 
> 3},agg#0=count($0),agg#1=count($1))
> expression is HiveProject#95
>   at 
> org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:411)
>   at 
> org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:57)
>   at 
> org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:234)
>   at 
> org.apache.calcite.rel.rules.AggregateProjectPullUpConstantsRule.onMatch(AggregateProjectPullUpConstantsRule.java:186)
>   at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:317)
>   at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:556)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:415)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:280)
>   at 
> org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:211)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:198)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:2273)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:2002)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1709)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1609)
>   at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118)
>   at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1052)
>   at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154)

[jira] [Updated] (HIVE-24165) CBO: Query fails after multiple count distinct rewrite

2020-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24165:
--
Labels: pull-request-available  (was: )

> CBO: Query fails after multiple count distinct rewrite 
> ---
>
> Key: HIVE-24165
> URL: https://issues.apache.org/jira/browse/HIVE-24165
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24165.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> One way to reproduce:
>  
> {code:sql}
>  CREATE TABLE test(
>  `device_id` string, 
>  `level` string, 
>  `site_id` string, 
>  `user_id` string, 
>  `first_date` string, 
>  `last_date` string,
>  `dt` string) ;
>  set hive.execution.engine=tez;
>  set hive.optimize.distinct.rewrite=true;
>  set hive.cli.print.header=true;
>  select 
>  dt,
>  site_id,
>  count(DISTINCT t1.device_id) as device_tol_cnt,
>  count(DISTINCT case when t1.first_date='2020-09-15' then t1.device_id else 
> null end) as device_add_cnt 
>  from test t1 where dt='2020-09-15' 
>  group by
>  dt,
>  site_id
>  ;
> {code}
>  
> Error log:  
> {code:java}
> Exception in thread "main" java.lang.AssertionError: Cannot add expression of 
> different type to set:
> set type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE 
> "ISO-8859-1$en_US$primary" $f2, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
> COLLATE "ISO-8859-1$en_US$primary" $f3, BIGINT $f2_0, BIGINT $f3_0) NOT NULL
> expression type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
> COLLATE "ISO-8859-1$en_US$primary" $f2, BIGINT $f3, BIGINT $f2_0, BIGINT 
> $f3_0) NOT NULL
> set is rel#85:HiveAggregate.HIVE.[](input=HepRelVertex#84,group={2, 
> 3},agg#0=count($0),agg#1=count($1))
> expression is HiveProject#95
>   at 
> org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:411)
>   at 
> org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:57)
>   at 
> org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:234)
>   at 
> org.apache.calcite.rel.rules.AggregateProjectPullUpConstantsRule.onMatch(AggregateProjectPullUpConstantsRule.java:186)
>   at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:317)
>   at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:556)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:415)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:280)
>   at 
> org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:211)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:198)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:2273)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:2002)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1709)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1609)
>   at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118)
>   at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1052)
>   at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154)
>   at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1414)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1430)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:450)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12164)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:330)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1768)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
>   

[jira] [Work logged] (HIVE-24109) Load partitions in parallel for managed tables in the bootstrap phase

2020-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24109?focusedWorklogId=503697=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503697
 ]

ASF GitHub Bot logged work on HIVE-24109:
-

Author: ASF GitHub Bot
Created on: 22/Oct/20 12:16
Start Date: 22/Oct/20 12:16
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1529:
URL: https://github.com/apache/hive/pull/1529#discussion_r510111928



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
##
@@ -207,119 +220,88 @@ private void addConsolidatedPartitionDesc() throws 
Exception {
 tableDesc.getTableName(), true, partitions);
 
   //don't need to add ckpt task separately. Added as part of add partition 
task
-  addPartition((toPartitionCount < totalPartitionCount), 
consolidatedPartitionDesc, null);
-  if (partitions.size() > 0) {
-LOG.info("Added {} partitions", partitions.size());
+  addPartition((toPartitionCount < totalPartitionCount), 
consolidatedPartitionDesc);
+  if (!tracker.canAddMoreTasks()) {
+//No need to do processing as no more tasks can be added. Will be 
processed in next run. State is already
+//updated in add partition task
+return;
   }
   currentPartitionCount = toPartitionCount;
 }
   }
 
   private TaskTracker forNewTable() throws Exception {
-if (isMetaDataOp() || 
TableType.EXTERNAL_TABLE.equals(table.getTableType())) {
-  // Place all partitions in single task to reduce load on HMS.
-  addConsolidatedPartitionDesc();
-  return tracker;
-}
-
-Iterator iterator = 
event.partitionDescriptions(tableDesc).iterator();
-while (iterator.hasNext() && tracker.canAddMoreTasks()) {
-  AlterTableAddPartitionDesc currentPartitionDesc = iterator.next();
-  /*
-   the currentPartitionDesc cannot be inlined as we need the hasNext() to 
be evaluated post the
-   current retrieved lastReplicatedPartition
-  */
-  addPartition(iterator.hasNext(), currentPartitionDesc, null);
-}
+// Place all partitions in single task to reduce load on HMS.
+addConsolidatedPartitionDesc(null);
 return tracker;
   }
 
-  private void addPartition(boolean hasMorePartitions, 
AlterTableAddPartitionDesc addPartitionDesc, Task ptnRootTask)
+  private void addPartition(boolean hasMorePartitions, 
AlterTableAddPartitionDesc addPartitionDesc)
   throws Exception {
-tracker.addTask(tasksForAddPartition(table, addPartitionDesc, 
ptnRootTask));
-if (hasMorePartitions && !tracker.canAddMoreTasks()) {
+boolean processingComplete = addTasksForPartition(table, addPartitionDesc, 
null,
+  PartitionState.Stage.PARTITION);
+//If processing is not complete, means replication state is already 
updated with copy or move tasks which need

Review comment:
   Removed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503697)
Time Spent: 1h 10m  (was: 1h)

> Load partitions in parallel for managed tables in the bootstrap phase
> -
>
> Key: HIVE-24109
> URL: https://issues.apache.org/jira/browse/HIVE-24109
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24109.01.patch, HIVE-24109.02.patch, 
> HIVE-24109.03.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24109) Load partitions in parallel for managed tables in the bootstrap phase

2020-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24109?focusedWorklogId=503699=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503699
 ]

ASF GitHub Bot logged work on HIVE-24109:
-

Author: ASF GitHub Bot
Created on: 22/Oct/20 12:16
Start Date: 22/Oct/20 12:16
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1529:
URL: https://github.com/apache/hive/pull/1529#discussion_r510112281



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
##
@@ -224,54 +220,19 @@ public String getName() {
   }
 
 
-  public static Task getLoadCopyTask(ReplicationSpec replicationSpec, Path 
srcPath, Path dstPath,
-HiveConf conf, boolean isAutoPurge, 
boolean needRecycle,
-boolean readSourceAsFileList) {
-return getLoadCopyTask(replicationSpec, srcPath, dstPath, conf, 
isAutoPurge, needRecycle,
-readSourceAsFileList, false);
-  }
-
   public static Task getLoadCopyTask(ReplicationSpec replicationSpec, Path 
srcPath, Path dstPath,
 HiveConf conf, boolean isAutoPurge, 
boolean needRecycle,
 boolean readSourceAsFileList, String 
dumpDirectory,
 ReplicationMetricCollector 
metricCollector) {
 return getLoadCopyTask(replicationSpec, srcPath, dstPath, conf, 
isAutoPurge, needRecycle,
-readSourceAsFileList, false, dumpDirectory, metricCollector);
+readSourceAsFileList, false, true, dumpDirectory, metricCollector);
   }
 
-  private static Task getLoadCopyTask(ReplicationSpec replicationSpec, Path 
srcPath, Path dstPath,
-HiveConf conf, boolean isAutoPurge, 
boolean needRecycle,
-boolean readSourceAsFileList,
-boolean overWrite) {
-Task copyTask = null;
-LOG.debug("ReplCopyTask:getLoadCopyTask: {}=>{}", srcPath, dstPath);
-if ((replicationSpec != null) && replicationSpec.isInReplicationScope()){
-  ReplCopyWork rcwork = new ReplCopyWork(srcPath, dstPath, false, 
overWrite);
-  rcwork.setReadSrcAsFilesList(readSourceAsFileList);
-  if (replicationSpec.isReplace() && 
(conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION))) {
-rcwork.setDeleteDestIfExist(true);
-rcwork.setAutoPurge(isAutoPurge);
-rcwork.setNeedRecycle(needRecycle);
-  }
-  // For replace case, duplicate check should not be done. The new base 
directory will automatically make the older
-  // data invisible. Doing duplicate check and ignoring copy will cause 
consistency issue if there are multiple
-  // replace events getting replayed in the first incremental load.
-  rcwork.setCheckDuplicateCopy(replicationSpec.needDupCopyCheck() && 
!replicationSpec.isReplace());
-  LOG.debug("ReplCopyTask:\trcwork");
-  String distCpDoAsUser = 
conf.getVar(HiveConf.ConfVars.HIVE_DISTCP_DOAS_USER);
-  rcwork.setDistCpDoAsUser(distCpDoAsUser);
-  copyTask = TaskFactory.get(rcwork, conf);
-} else {
-  LOG.debug("ReplCopyTask:\tcwork");
-  copyTask = TaskFactory.get(new CopyWork(srcPath, dstPath, false), conf);
-}
-return copyTask;
-  }
 
   private static Task getLoadCopyTask(ReplicationSpec replicationSpec, Path 
srcPath, Path dstPath,
  HiveConf conf, boolean isAutoPurge, 
boolean needRecycle,
  boolean readSourceAsFileList,
- boolean overWrite,
+ boolean overWrite, boolean autoPurge,

Review comment:
   renamed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503699)
Time Spent: 1.5h  (was: 1h 20m)

> Load partitions in parallel for managed tables in the bootstrap phase
> -
>
> Key: HIVE-24109
> URL: https://issues.apache.org/jira/browse/HIVE-24109
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24109.01.patch, HIVE-24109.02.patch, 
> HIVE-24109.03.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira

[jira] [Work logged] (HIVE-24109) Load partitions in parallel for managed tables in the bootstrap phase

2020-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24109?focusedWorklogId=503698=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503698
 ]

ASF GitHub Bot logged work on HIVE-24109:
-

Author: ASF GitHub Bot
Created on: 22/Oct/20 12:16
Start Date: 22/Oct/20 12:16
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1529:
URL: https://github.com/apache/hive/pull/1529#discussion_r510112188



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpWork.java
##
@@ -204,10 +209,10 @@ public void setResultValues(List resultValues) {
   replSpec.setInReplicationScope(true);
   EximUtil.DataCopyPath managedTableCopyPath = new 
EximUtil.DataCopyPath(replSpec);
   managedTableCopyPath.loadFromString(managedTblCopyPathIterator.next());
-  Task copyTask = ReplCopyTask.getLoadCopyTask(
+  Task copyTask = ReplCopyTask.getDumpCopyTask(
   managedTableCopyPath.getReplicationSpec(), 
managedTableCopyPath.getSrcPath(),
-  managedTableCopyPath.getTargetPath(), conf, false, 
shouldOverwrite,
-  getCurrentDumpPath().toString(), getMetricCollector());
+  managedTableCopyPath.getTargetPath(), conf, false, 
shouldOverwrite, !isBootstrap,

Review comment:
   renamed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503698)
Time Spent: 1h 20m  (was: 1h 10m)

> Load partitions in parallel for managed tables in the bootstrap phase
> -
>
> Key: HIVE-24109
> URL: https://issues.apache.org/jira/browse/HIVE-24109
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24109.01.patch, HIVE-24109.02.patch, 
> HIVE-24109.03.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24109) Load partitions in parallel for managed tables in the bootstrap phase

2020-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24109?focusedWorklogId=503696=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503696
 ]

ASF GitHub Bot logged work on HIVE-24109:
-

Author: ASF GitHub Bot
Created on: 22/Oct/20 12:15
Start Date: 22/Oct/20 12:15
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1529:
URL: https://github.com/apache/hive/pull/1529#discussion_r509991386



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadTable.java
##
@@ -270,10 +269,9 @@ static TableLocationTuple tableLocation(ImportTableDesc 
tblDesc, Database parent
 Path dataPath = fromURI;
 Path tmpPath = tgtPath;
 
-// if move optimization is enabled, copy the files directly to the target 
path. No need to create the staging dir.
+// if acid tables, copy the files directly to the target path. No need to 
create the staging dir.
 LoadFileType loadFileType;
-if (replicationSpec.isInReplicationScope() &&
-context.hiveConf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION)) {
+if (replicationSpec.isInReplicationScope() && 
AcidUtils.isTransactionalTable(table)) {

Review comment:
   For tables irrespective of move optimization, move task is always 
created and if its transactional tables, move task handles the case





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503696)
Time Spent: 1h  (was: 50m)

> Load partitions in parallel for managed tables in the bootstrap phase
> -
>
> Key: HIVE-24109
> URL: https://issues.apache.org/jira/browse/HIVE-24109
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24109.01.patch, HIVE-24109.02.patch, 
> HIVE-24109.03.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18284?focusedWorklogId=503686=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503686
 ]

ASF GitHub Bot logged work on HIVE-18284:
-

Author: ASF GitHub Bot
Created on: 22/Oct/20 11:59
Start Date: 22/Oct/20 11:59
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on pull request #1400:
URL: https://github.com/apache/hive/pull/1400#issuecomment-714443949


   @kgyrtkirk Ping for review request!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503686)
Time Spent: 2h  (was: 1h 50m)

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0, 2.3.1, 2.3.2, 4.0.0, 3.1.1, 3.1.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> 

[jira] [Work logged] (HIVE-24293) Integer overflow in llap collision mask

2020-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24293?focusedWorklogId=503673=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503673
 ]

ASF GitHub Bot logged work on HIVE-24293:
-

Author: ASF GitHub Bot
Created on: 22/Oct/20 11:07
Start Date: 22/Oct/20 11:07
Worklog Time Spent: 10m 
  Work Description: szlta merged pull request #1595:
URL: https://github.com/apache/hive/pull/1595


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503673)
Time Spent: 20m  (was: 10m)

> Integer overflow in llap collision mask
> ---
>
> Key: HIVE-24293
> URL: https://issues.apache.org/jira/browse/HIVE-24293
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If multiple threads put the same buffer to the cache, only one succeeds. The 
> other one detects this, and replaces its own buffer. This is marked by a bit 
> mask encoded in a long, where the collided buffers are marked with a 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24109) Load partitions in parallel for managed tables in the bootstrap phase

2020-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24109?focusedWorklogId=503644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503644
 ]

ASF GitHub Bot logged work on HIVE-24109:
-

Author: ASF GitHub Bot
Created on: 22/Oct/20 09:20
Start Date: 22/Oct/20 09:20
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1529:
URL: https://github.com/apache/hive/pull/1529#discussion_r510009825



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
##
@@ -207,119 +220,88 @@ private void addConsolidatedPartitionDesc() throws 
Exception {
 tableDesc.getTableName(), true, partitions);
 
   //don't need to add ckpt task separately. Added as part of add partition 
task
-  addPartition((toPartitionCount < totalPartitionCount), 
consolidatedPartitionDesc, null);
-  if (partitions.size() > 0) {
-LOG.info("Added {} partitions", partitions.size());
+  addPartition((toPartitionCount < totalPartitionCount), 
consolidatedPartitionDesc);
+  if (!tracker.canAddMoreTasks()) {
+//No need to do processing as no more tasks can be added. Will be 
processed in next run. State is already
+//updated in add partition task
+return;
   }
   currentPartitionCount = toPartitionCount;
 }
   }
 
   private TaskTracker forNewTable() throws Exception {
-if (isMetaDataOp() || 
TableType.EXTERNAL_TABLE.equals(table.getTableType())) {
-  // Place all partitions in single task to reduce load on HMS.
-  addConsolidatedPartitionDesc();
-  return tracker;
-}
-
-Iterator iterator = 
event.partitionDescriptions(tableDesc).iterator();
-while (iterator.hasNext() && tracker.canAddMoreTasks()) {
-  AlterTableAddPartitionDesc currentPartitionDesc = iterator.next();
-  /*
-   the currentPartitionDesc cannot be inlined as we need the hasNext() to 
be evaluated post the
-   current retrieved lastReplicatedPartition
-  */
-  addPartition(iterator.hasNext(), currentPartitionDesc, null);
-}
+// Place all partitions in single task to reduce load on HMS.
+addConsolidatedPartitionDesc(null);
 return tracker;
   }
 
-  private void addPartition(boolean hasMorePartitions, 
AlterTableAddPartitionDesc addPartitionDesc, Task ptnRootTask)
+  private void addPartition(boolean hasMorePartitions, 
AlterTableAddPartitionDesc addPartitionDesc)
   throws Exception {
-tracker.addTask(tasksForAddPartition(table, addPartitionDesc, 
ptnRootTask));
-if (hasMorePartitions && !tracker.canAddMoreTasks()) {
+boolean processingComplete = addTasksForPartition(table, addPartitionDesc, 
null,
+  PartitionState.Stage.PARTITION);
+//If processing is not complete, means replication state is already 
updated with copy or move tasks which need
+//to be processed
+if (processingComplete && hasMorePartitions && !tracker.canAddMoreTasks()) 
{
   ReplicationState currentReplicationState =
   new ReplicationState(new PartitionState(table.getTableName(), 
addPartitionDesc));
   updateReplicationState(currentReplicationState);
 }
   }
 
   /**
-   * returns the root task for adding a partition
+   * returns the root task for adding all partitions in a batch
*/
-  private Task tasksForAddPartition(Table table, AlterTableAddPartitionDesc 
addPartitionDesc, Task ptnRootTask)
+  private boolean addTasksForPartition(Table table, AlterTableAddPartitionDesc 
addPartitionDesc,
+AlterTableAddPartitionDesc.PartitionDesc 
lastPartSpec,
+PartitionState.Stage lastStage)
   throws MetaException, HiveException {
 Task addPartTask = TaskFactory.get(
   new DDLWork(new HashSet<>(), new HashSet<>(), addPartitionDesc,
   true, (new Path(context.dumpDirectory)).getParent().toString(), 
this.metricCollector),
   context.hiveConf
 );
-//checkpointing task already added as part of add batch of partition in 
case for metadata only and external tables
+//checkpointing task already added as part of add batch of partition
 if (isMetaDataOp() || 
TableType.EXTERNAL_TABLE.equals(table.getTableType())) {
-  if (ptnRootTask == null) {
-ptnRootTask = addPartTask;
-  } else {
-ptnRootTask.addDependentTask(addPartTask);
-  }
-  return ptnRootTask;
+  tracker.addTask(addPartTask);
+  return true;
 }
-
-AlterTableAddPartitionDesc.PartitionDesc partSpec = 
addPartitionDesc.getPartitions().get(0);
-Path sourceWarehousePartitionLocation = new Path(partSpec.getLocation());
-Path replicaWarehousePartitionLocation = locationOnReplicaWarehouse(table, 
partSpec);
-partSpec.setLocation(replicaWarehousePartitionLocation.toString());
-LOG.debug("adding 

[jira] [Work logged] (HIVE-24109) Load partitions in parallel for managed tables in the bootstrap phase

2020-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24109?focusedWorklogId=503628=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503628
 ]

ASF GitHub Bot logged work on HIVE-24109:
-

Author: ASF GitHub Bot
Created on: 22/Oct/20 08:52
Start Date: 22/Oct/20 08:52
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1529:
URL: https://github.com/apache/hive/pull/1529#discussion_r509991300



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -655,6 +643,11 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
   "Provide the maximum number of partitions of a table that will be 
batched together during  \n"
 + "repl load. All the partitions in a batch will make a single 
metastore call to update the metadata. \n"
 + "The data for these partitions will be copied before copying the 
metadata batch. "),
+
REPL_LOAD_PARTITIONS_WITH_DATA_COPY_BATCH_SIZE("hive.repl.load.partitions.with.data.copy.batch.size",
+  1000,

Review comment:
   It will increase the line length

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadTable.java
##
@@ -270,10 +269,9 @@ static TableLocationTuple tableLocation(ImportTableDesc 
tblDesc, Database parent
 Path dataPath = fromURI;
 Path tmpPath = tgtPath;
 
-// if move optimization is enabled, copy the files directly to the target 
path. No need to create the staging dir.
+// if acid tables, copy the files directly to the target path. No need to 
create the staging dir.
 LoadFileType loadFileType;
-if (replicationSpec.isInReplicationScope() &&
-context.hiveConf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION)) {
+if (replicationSpec.isInReplicationScope() && 
AcidUtils.isTransactionalTable(table)) {

Review comment:
   This is only for the ptests which use non acid managed tables.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503628)
Time Spent: 40m  (was: 0.5h)

> Load partitions in parallel for managed tables in the bootstrap phase
> -
>
> Key: HIVE-24109
> URL: https://issues.apache.org/jira/browse/HIVE-24109
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24109.01.patch, HIVE-24109.02.patch, 
> HIVE-24109.03.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24109) Load partitions in parallel for managed tables in the bootstrap phase

2020-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24109?focusedWorklogId=503605=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503605
 ]

ASF GitHub Bot logged work on HIVE-24109:
-

Author: ASF GitHub Bot
Created on: 22/Oct/20 07:49
Start Date: 22/Oct/20 07:49
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1529:
URL: https://github.com/apache/hive/pull/1529#discussion_r509895178



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
##
@@ -224,54 +220,19 @@ public String getName() {
   }
 
 
-  public static Task getLoadCopyTask(ReplicationSpec replicationSpec, Path 
srcPath, Path dstPath,
-HiveConf conf, boolean isAutoPurge, 
boolean needRecycle,
-boolean readSourceAsFileList) {
-return getLoadCopyTask(replicationSpec, srcPath, dstPath, conf, 
isAutoPurge, needRecycle,
-readSourceAsFileList, false);
-  }
-
   public static Task getLoadCopyTask(ReplicationSpec replicationSpec, Path 
srcPath, Path dstPath,
 HiveConf conf, boolean isAutoPurge, 
boolean needRecycle,
 boolean readSourceAsFileList, String 
dumpDirectory,
 ReplicationMetricCollector 
metricCollector) {
 return getLoadCopyTask(replicationSpec, srcPath, dstPath, conf, 
isAutoPurge, needRecycle,
-readSourceAsFileList, false, dumpDirectory, metricCollector);
+readSourceAsFileList, false, true, dumpDirectory, metricCollector);
   }
 
-  private static Task getLoadCopyTask(ReplicationSpec replicationSpec, Path 
srcPath, Path dstPath,
-HiveConf conf, boolean isAutoPurge, 
boolean needRecycle,
-boolean readSourceAsFileList,
-boolean overWrite) {
-Task copyTask = null;
-LOG.debug("ReplCopyTask:getLoadCopyTask: {}=>{}", srcPath, dstPath);
-if ((replicationSpec != null) && replicationSpec.isInReplicationScope()){
-  ReplCopyWork rcwork = new ReplCopyWork(srcPath, dstPath, false, 
overWrite);
-  rcwork.setReadSrcAsFilesList(readSourceAsFileList);
-  if (replicationSpec.isReplace() && 
(conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION))) {
-rcwork.setDeleteDestIfExist(true);
-rcwork.setAutoPurge(isAutoPurge);
-rcwork.setNeedRecycle(needRecycle);
-  }
-  // For replace case, duplicate check should not be done. The new base 
directory will automatically make the older
-  // data invisible. Doing duplicate check and ignoring copy will cause 
consistency issue if there are multiple
-  // replace events getting replayed in the first incremental load.
-  rcwork.setCheckDuplicateCopy(replicationSpec.needDupCopyCheck() && 
!replicationSpec.isReplace());
-  LOG.debug("ReplCopyTask:\trcwork");
-  String distCpDoAsUser = 
conf.getVar(HiveConf.ConfVars.HIVE_DISTCP_DOAS_USER);
-  rcwork.setDistCpDoAsUser(distCpDoAsUser);
-  copyTask = TaskFactory.get(rcwork, conf);
-} else {
-  LOG.debug("ReplCopyTask:\tcwork");
-  copyTask = TaskFactory.get(new CopyWork(srcPath, dstPath, false), conf);
-}
-return copyTask;
-  }
 
   private static Task getLoadCopyTask(ReplicationSpec replicationSpec, Path 
srcPath, Path dstPath,
  HiveConf conf, boolean isAutoPurge, 
boolean needRecycle,
  boolean readSourceAsFileList,
- boolean overWrite,
+ boolean overWrite, boolean autoPurge,

Review comment:
   Why do we need to have both isAutoPurge and autoPurge? Can we simplify 
this part?

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpWork.java
##
@@ -204,10 +209,10 @@ public void setResultValues(List resultValues) {
   replSpec.setInReplicationScope(true);
   EximUtil.DataCopyPath managedTableCopyPath = new 
EximUtil.DataCopyPath(replSpec);
   managedTableCopyPath.loadFromString(managedTblCopyPathIterator.next());
-  Task copyTask = ReplCopyTask.getLoadCopyTask(
+  Task copyTask = ReplCopyTask.getDumpCopyTask(
   managedTableCopyPath.getReplicationSpec(), 
managedTableCopyPath.getSrcPath(),
-  managedTableCopyPath.getTargetPath(), conf, false, 
shouldOverwrite,
-  getCurrentDumpPath().toString(), getMetricCollector());
+  managedTableCopyPath.getTargetPath(), conf, false, 
shouldOverwrite, !isBootstrap,

Review comment:
   isBootstrap value doesn't influence isAutoPurge as that will default to 
false when this method is called. Is there any other usage of this 

[jira] [Work logged] (HIVE-24270) Move scratchdir cleanup to background

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24270?focusedWorklogId=503523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503523
 ]

ASF GitHub Bot logged work on HIVE-24270:
-

Author: ASF GitHub Bot
Created on: 22/Oct/20 04:00
Start Date: 22/Oct/20 04:00
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on pull request #1577:
URL: https://github.com/apache/hive/pull/1577#issuecomment-714206578


   Thanks for revising the patch @mustafaiman . Recent patch LGTM. +1 pending 
tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503523)
Time Spent: 1h 20m  (was: 1h 10m)

> Move scratchdir cleanup to background
> -
>
> Key: HIVE-24270
> URL: https://issues.apache.org/jira/browse/HIVE-24270
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In cloud environment, scratchdir cleaning at the end of the query may take 
> long time. This causes client to hang up to 1 minute even after the results 
> were streamed back. During this time client just waits for cleanup to finish. 
> Cleanup can take place in the background in HiveServer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24042) Fix typo in MetastoreConf.java

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24042?focusedWorklogId=503503=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503503
 ]

ASF GitHub Bot logged work on HIVE-24042:
-

Author: ASF GitHub Bot
Created on: 22/Oct/20 02:43
Start Date: 22/Oct/20 02:43
Worklog Time Spent: 10m 
  Work Description: yx91490 commented on pull request #1406:
URL: https://github.com/apache/hive/pull/1406#issuecomment-714185296


   ping @gm7y8



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503503)
Time Spent: 0.5h  (was: 20m)

> Fix typo in MetastoreConf.java
> --
>
> Key: HIVE-24042
> URL: https://issues.apache.org/jira/browse/HIVE-24042
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: yx91490
>Priority: Trivial
>  Labels: pull-request-available
> Attachments: HIVE-24042.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Fix typo in MetastoreConf.java: correct word "riven" in package name to 
> "hadoop.hive.metastore".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24292) hive webUI should support keystoretype by config

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24292?focusedWorklogId=503486=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503486
 ]

ASF GitHub Bot logged work on HIVE-24292:
-

Author: ASF GitHub Bot
Created on: 22/Oct/20 02:06
Start Date: 22/Oct/20 02:06
Worklog Time Spent: 10m 
  Work Description: yongzhi commented on a change in pull request #1594:
URL: https://github.com/apache/hive/pull/1594#discussion_r509836264



##
File path: 
service/src/test/org/apache/hive/service/server/TestHS2HttpServerPamConfiguration.java
##
@@ -48,6 +48,7 @@
   private static HiveConf hiveConf = null;
   private static String keyStorePassword = "123456";
   private static String keyFileName = "myKeyStore";
+  private static String keyStoreType = "jks";

Review comment:
   Will fix





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503486)
Time Spent: 0.5h  (was: 20m)

> hive webUI should support keystoretype by config
> 
>
> Key: HIVE-24292
> URL: https://issues.apache.org/jira/browse/HIVE-24292
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We need a property to pass-in  keystore type in webui too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24294) TezSessionPool sessions can throw AssertionError

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24294:
--
Labels: pull-request-available  (was: )

> TezSessionPool sessions can throw AssertionError
> 
>
> Key: HIVE-24294
> URL: https://issues.apache.org/jira/browse/HIVE-24294
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Whenever default TezSessionPool sessions are reopened for some reason, we are 
> setting dagResources to null before close & setting it back in openWhenever 
> default TezSessionPool sessions are reopened for some reason, we are setting 
> dagResources to null before close & setting it back in open
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java#L498-L503
> If there is an exception in sessionState.close(), we are not restoring the 
> dagResource but moving the session back to TezSessionPool.eg., exception 
> trace when sessionState.close() failed
> {code:java}
> 2020-10-15T09:20:28,749 INFO  [HiveServer2-Background-Pool: Thread-25451]: 
> client.TezClient (:()) - Failed to shutdown Tez Session via proxy
> org.apache.tez.dag.api.SessionNotRunning: Application not running, 
> applicationId=application_1602093123456_12345, yarnApplicationState=FINISHED, 
> finalApplicationStatus=SUCCEEDED, 
> trackingUrl=http://localhost:8088/proxy/application_1602093123456_12345/, 
> diagnostics=Session timed out, lastDAGCompletionTime=1602997683786 ms, 
> sessionTimeoutInterval=60 ms
> Session stats:submittedDAGs=2, successfulDAGs=2, failedDAGs=0, killedDAGs=0   
>  at 
> org.apache.tez.client.TezClientUtils.getAMProxy(TezClientUtils.java:910) 
> at org.apache.tez.client.TezClient.getAMProxy(TezClient.java:1060) 
> at org.apache.tez.client.TezClient.stop(TezClient.java:743) 
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.closeClient(TezSessionState.java:789)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.close(TezSessionState.java:756)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.close(TezSessionPoolSession.java:111)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.reopenInternal(TezSessionPoolManager.java:496)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.reopen(TezSessionPoolManager.java:487)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.reopen(TezSessionPoolSession.java:228)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.getNewTezSessionOnError(TezTask.java:531)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:546) 
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:221){code}
> Because of this, all new queries using this corrupted sessions are failing 
> with below exception
> {code:java}
> Caused by: java.lang.AssertionError: Ensure called on an unitialized (or 
> closed) session 41774265-b7da-4d58-84a8-1bedfd597aecCaused by: 
> java.lang.AssertionError: Ensure called on an unitialized (or closed) session 
> 41774265-b7da-4d58-84a8-1bedfd597aec at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.ensureLocalResources(TezSessionState.java:685){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24294) TezSessionPool sessions can throw AssertionError

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24294?focusedWorklogId=503473=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503473
 ]

ASF GitHub Bot logged work on HIVE-24294:
-

Author: ASF GitHub Bot
Created on: 22/Oct/20 01:39
Start Date: 22/Oct/20 01:39
Worklog Time Spent: 10m 
  Work Description: nareshpr opened a new pull request #1596:
URL: https://github.com/apache/hive/pull/1596


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503473)
Remaining Estimate: 0h
Time Spent: 10m

> TezSessionPool sessions can throw AssertionError
> 
>
> Key: HIVE-24294
> URL: https://issues.apache.org/jira/browse/HIVE-24294
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Whenever default TezSessionPool sessions are reopened for some reason, we are 
> setting dagResources to null before close & setting it back in openWhenever 
> default TezSessionPool sessions are reopened for some reason, we are setting 
> dagResources to null before close & setting it back in open
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java#L498-L503
> If there is an exception in sessionState.close(), we are not restoring the 
> dagResource but moving the session back to TezSessionPool.eg., exception 
> trace when sessionState.close() failed
> {code:java}
> 2020-10-15T09:20:28,749 INFO  [HiveServer2-Background-Pool: Thread-25451]: 
> client.TezClient (:()) - Failed to shutdown Tez Session via proxy
> org.apache.tez.dag.api.SessionNotRunning: Application not running, 
> applicationId=application_1602093123456_12345, yarnApplicationState=FINISHED, 
> finalApplicationStatus=SUCCEEDED, 
> trackingUrl=http://localhost:8088/proxy/application_1602093123456_12345/, 
> diagnostics=Session timed out, lastDAGCompletionTime=1602997683786 ms, 
> sessionTimeoutInterval=60 ms
> Session stats:submittedDAGs=2, successfulDAGs=2, failedDAGs=0, killedDAGs=0   
>  at 
> org.apache.tez.client.TezClientUtils.getAMProxy(TezClientUtils.java:910) 
> at org.apache.tez.client.TezClient.getAMProxy(TezClient.java:1060) 
> at org.apache.tez.client.TezClient.stop(TezClient.java:743) 
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.closeClient(TezSessionState.java:789)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.close(TezSessionState.java:756)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.close(TezSessionPoolSession.java:111)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.reopenInternal(TezSessionPoolManager.java:496)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.reopen(TezSessionPoolManager.java:487)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.reopen(TezSessionPoolSession.java:228)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.getNewTezSessionOnError(TezTask.java:531)
>  
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:546) 
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:221){code}
> Because of this, all new queries using this corrupted sessions are failing 
> with below exception
> {code:java}
> Caused by: java.lang.AssertionError: Ensure called on an unitialized (or 
> closed) session 41774265-b7da-4d58-84a8-1bedfd597aecCaused by: 
> java.lang.AssertionError: Ensure called on an unitialized (or closed) session 
> 41774265-b7da-4d58-84a8-1bedfd597aec at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.ensureLocalResources(TezSessionState.java:685){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24043) Retain original path info in Warehouse.makeSpecFromName()'s logger

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24043?focusedWorklogId=503465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503465
 ]

ASF GitHub Bot logged work on HIVE-24043:
-

Author: ASF GitHub Bot
Created on: 22/Oct/20 00:58
Start Date: 22/Oct/20 00:58
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1407:
URL: https://github.com/apache/hive/pull/1407


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503465)
Time Spent: 0.5h  (was: 20m)

> Retain original path info in Warehouse.makeSpecFromName()'s logger
> --
>
> Key: HIVE-24043
> URL: https://issues.apache.org/jira/browse/HIVE-24043
> Project: Hive
>  Issue Type: Improvement
>Reporter: yx91490
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24043.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The warn logger in Warehouse.makeSpecFromName() not retain original path 
> info, for example:
> {code:java}
> 20/08/07 14:32:28 WARN warehouse: Cannot create partition spec from 
> hdfs://nameservice/; missing keys [dt1]
> {code}
> the log content was expect to be the full hdfs path but 'hdfs://nameservice'
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24042) Fix typo in MetastoreConf.java

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24042?focusedWorklogId=503466=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503466
 ]

ASF GitHub Bot logged work on HIVE-24042:
-

Author: ASF GitHub Bot
Created on: 22/Oct/20 00:58
Start Date: 22/Oct/20 00:58
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1406:
URL: https://github.com/apache/hive/pull/1406


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503466)
Time Spent: 20m  (was: 10m)

> Fix typo in MetastoreConf.java
> --
>
> Key: HIVE-24042
> URL: https://issues.apache.org/jira/browse/HIVE-24042
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: yx91490
>Priority: Trivial
>  Labels: pull-request-available
> Attachments: HIVE-24042.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Fix typo in MetastoreConf.java: correct word "riven" in package name to 
> "hadoop.hive.metastore".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24292) hive webUI should support keystoretype by config

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24292?focusedWorklogId=503446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503446
 ]

ASF GitHub Bot logged work on HIVE-24292:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 23:56
Start Date: 21/Oct/20 23:56
Worklog Time Spent: 10m 
  Work Description: risdenk commented on a change in pull request #1594:
URL: https://github.com/apache/hive/pull/1594#discussion_r509800907



##
File path: 
service/src/test/org/apache/hive/service/server/TestHS2HttpServerPamConfiguration.java
##
@@ -48,6 +48,7 @@
   private static HiveConf hiveConf = null;
   private static String keyStorePassword = "123456";
   private static String keyFileName = "myKeyStore";
+  private static String keyStoreType = "jks";

Review comment:
   You might want this to be `KeyStore. getDefaultType()` depending on the 
JDK being used? I think in most cases this should be ok though.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503446)
Time Spent: 20m  (was: 10m)

> hive webUI should support keystoretype by config
> 
>
> Key: HIVE-24292
> URL: https://issues.apache.org/jira/browse/HIVE-24292
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We need a property to pass-in  keystore type in webui too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24109) Load partitions in parallel for managed tables in the bootstrap phase

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24109?focusedWorklogId=503445=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503445
 ]

ASF GitHub Bot logged work on HIVE-24109:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 23:54
Start Date: 21/Oct/20 23:54
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on pull request #1529:
URL: https://github.com/apache/hive/pull/1529#issuecomment-714015286


   Can you please rebase the patch.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503445)
Time Spent: 20m  (was: 10m)

> Load partitions in parallel for managed tables in the bootstrap phase
> -
>
> Key: HIVE-24109
> URL: https://issues.apache.org/jira/browse/HIVE-24109
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24109.01.patch, HIVE-24109.02.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24293) Integer overflow in llap collision mask

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24293:
--
Labels: pull-request-available  (was: )

> Integer overflow in llap collision mask
> ---
>
> Key: HIVE-24293
> URL: https://issues.apache.org/jira/browse/HIVE-24293
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If multiple threads put the same buffer to the cache, only one succeeds. The 
> other one detects this, and replaces its own buffer. This is marked by a bit 
> mask encoded in a long, where the collided buffers are marked with a 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24293) Integer overflow in llap collision mask

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24293?focusedWorklogId=503411=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503411
 ]

ASF GitHub Bot logged work on HIVE-24293:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 21:45
Start Date: 21/Oct/20 21:45
Worklog Time Spent: 10m 
  Work Description: asinkovits opened a new pull request #1595:
URL: https://github.com/apache/hive/pull/1595


   
   
   
   
   ### What changes were proposed in this pull request?
   bugfix
   
   
   ### Why are the changes needed?
   If multiple threads put the same buffer to the cache, only one succeeds. The 
other one detects this, and replaces its own buffer. This is marked by a bit 
mask encoded in a long, where the collided buffers are marked with a 1. By 
shifting the integer 1, it can happen that due to an overflow some buffers will 
not be removed after a collision, and the reference count decreases below zero, 
which is not a valid state.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Unit test added



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503411)
Remaining Estimate: 0h
Time Spent: 10m

> Integer overflow in llap collision mask
> ---
>
> Key: HIVE-24293
> URL: https://issues.apache.org/jira/browse/HIVE-24293
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If multiple threads put the same buffer to the cache, only one succeeds. The 
> other one detects this, and replaces its own buffer. This is marked by a bit 
> mask encoded in a long, where the collided buffers are marked with a 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24275) Configurations to delay the deletion of obsolete files by the Cleaner

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24275?focusedWorklogId=503379=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503379
 ]

ASF GitHub Bot logged work on HIVE-24275:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 20:10
Start Date: 21/Oct/20 20:10
Worklog Time Spent: 10m 
  Work Description: kishendas commented on a change in pull request #1583:
URL: https://github.com/apache/hive/pull/1583#discussion_r509643679



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -244,7 +249,32 @@ private void removeFiles(String location, ValidWriteIdList 
writeIdList, Compacti
 obsoleteDirs.addAll(dir.getAbortedDirectories());
 List filesToDelete = new ArrayList<>(obsoleteDirs.size());
 StringBuilder extraDebugInfo = new StringBuilder("[");
+boolean delayedCleanupEnabled = 
conf.getBoolVar(HiveConf.ConfVars.HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED);
+
 for (Path stat : obsoleteDirs) {
+  if (delayedCleanupEnabled) {
+String filename = stat.toString();
+if (filename.startsWith(AcidUtils.BASE_PREFIX)) {
+  long writeId = AcidUtils.ParsedBase.parseBase(stat).getWriteId();
+  if (ci.type == CompactionType.MINOR) {
+LOG.info("Skipping base dir " + stat + " as this cleanup is for 
minor compaction"
++ ", compaction id " + ci.id);
+continue;
+  } else if (writeId > writeIdList.getHighWatermark()) {
+LOG.info("Skipping base dir " + stat + " deletion as WriteId of 
this base dir is"
++ " greater than highWaterMark for compaction id " + ci.id);
+continue;
+  }
+}
+else if (filename.startsWith(AcidUtils.DELTA_PREFIX) || 
filename.startsWith(AcidUtils.DELETE_DELTA_PREFIX)) {
+  AcidUtils.ParsedDelta delta = AcidUtils.parsedDelta(stat, fs);
+  if (delta.getMaxWriteId() > writeIdList.getHighWatermark()) {

Review comment:
   Please add relevant comments in the code, wherever its not very obvious. 

##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -244,7 +249,32 @@ private void removeFiles(String location, ValidWriteIdList 
writeIdList, Compacti
 obsoleteDirs.addAll(dir.getAbortedDirectories());
 List filesToDelete = new ArrayList<>(obsoleteDirs.size());
 StringBuilder extraDebugInfo = new StringBuilder("[");
+boolean delayedCleanupEnabled = 
conf.getBoolVar(HiveConf.ConfVars.HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED);
+
 for (Path stat : obsoleteDirs) {
+  if (delayedCleanupEnabled) {
+String filename = stat.toString();
+if (filename.startsWith(AcidUtils.BASE_PREFIX)) {
+  long writeId = AcidUtils.ParsedBase.parseBase(stat).getWriteId();
+  if (ci.type == CompactionType.MINOR) {
+LOG.info("Skipping base dir " + stat + " as this cleanup is for 
minor compaction"
++ ", compaction id " + ci.id);
+continue;
+  } else if (writeId > writeIdList.getHighWatermark()) {
+LOG.info("Skipping base dir " + stat + " deletion as WriteId of 
this base dir is"
++ " greater than highWaterMark for compaction id " + ci.id);
+continue;
+  }
+}
+else if (filename.startsWith(AcidUtils.DELTA_PREFIX) || 
filename.startsWith(AcidUtils.DELETE_DELTA_PREFIX)) {
+  AcidUtils.ParsedDelta delta = AcidUtils.parsedDelta(stat, fs);

Review comment:
   It would be helpful to extract this logic to a separate method. 

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -3058,6 +3058,11 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 
 HIVE_COMPACTOR_CLEANER_RUN_INTERVAL("hive.compactor.cleaner.run.interval", 
"5000ms",
 new TimeValidator(TimeUnit.MILLISECONDS), "Time between runs of the 
cleaner thread"),
+
HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED("hive.compactor.delayed.cleanup.enabled",
 false,
+"When enabled, cleanup of obsolete files/dirs after compaction can be 
delayed. This delay \n" +
+" can be configured by hive configuration 
hive.compactor.cleaner.retention.time.seconds"),
+
HIVE_COMPACTOR_CLEANER_RETENTION_TIME_SECONDS("hive.compactor.cleaner.retention.time.seconds",
 "300s",

Review comment:
   It might be better to change the name to 
"HIVE_COMPACTOR_CLEANER_RETENTION_TIME", since the value would indicate whether 
it's in seconds or milliseconds. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:

[jira] [Updated] (HIVE-24292) hive webUI should support keystoretype by config

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24292:
--
Labels: pull-request-available  (was: )

> hive webUI should support keystoretype by config
> 
>
> Key: HIVE-24292
> URL: https://issues.apache.org/jira/browse/HIVE-24292
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We need a property to pass-in  keystore type in webui too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24292) hive webUI should support keystoretype by config

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24292?focusedWorklogId=503360=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503360
 ]

ASF GitHub Bot logged work on HIVE-24292:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 19:14
Start Date: 21/Oct/20 19:14
Worklog Time Spent: 10m 
  Work Description: yongzhi opened a new pull request #1594:
URL: https://github.com/apache/hive/pull/1594


   Unit test
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503360)
Remaining Estimate: 0h
Time Spent: 10m

> hive webUI should support keystoretype by config
> 
>
> Key: HIVE-24292
> URL: https://issues.apache.org/jira/browse/HIVE-24292
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We need a property to pass-in  keystore type in webui too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23962) Make bin/hive pick user defined jdbc url

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23962?focusedWorklogId=503309=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503309
 ]

ASF GitHub Bot logged work on HIVE-23962:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 17:13
Start Date: 21/Oct/20 17:13
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on pull request #1591:
URL: https://github.com/apache/hive/pull/1591#issuecomment-713725949


   Thanks Vihang for the review and commit.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503309)
Time Spent: 1h 20m  (was: 1h 10m)

> Make bin/hive pick user defined jdbc url 
> -
>
> Key: HIVE-23962
> URL: https://issues.apache.org/jira/browse/HIVE-23962
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Xiaomeng Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently hive command will trigger bin/hive which run "beeline" by default.
> We want to pass a env variable so that user can define which url beeline use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23962) Make bin/hive pick user defined jdbc url

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23962?focusedWorklogId=503303=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503303
 ]

ASF GitHub Bot logged work on HIVE-23962:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 16:58
Start Date: 21/Oct/20 16:58
Worklog Time Spent: 10m 
  Work Description: vihangk1 merged pull request #1591:
URL: https://github.com/apache/hive/pull/1591


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503303)
Time Spent: 1h 10m  (was: 1h)

> Make bin/hive pick user defined jdbc url 
> -
>
> Key: HIVE-23962
> URL: https://issues.apache.org/jira/browse/HIVE-23962
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Xiaomeng Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently hive command will trigger bin/hive which run "beeline" by default.
> We want to pass a env variable so that user can define which url beeline use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24173) notification cleanup interval value changes depending upon replication enabled or not.

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24173?focusedWorklogId=503273=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503273
 ]

ASF GitHub Bot logged work on HIVE-24173:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 16:00
Start Date: 21/Oct/20 16:00
Worklog Time Spent: 10m 
  Work Description: ArkoSharma opened a new pull request #1593:
URL: https://github.com/apache/hive/pull/1593


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503273)
Remaining Estimate: 0h
Time Spent: 10m

> notification cleanup interval value changes depending upon replication 
> enabled or not.
> --
>
> Key: HIVE-24173
> URL: https://issues.apache.org/jira/browse/HIVE-24173
> Project: Hive
>  Issue Type: Improvement
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently we use hive.metastore.event.db.listener.timetolive to determine how 
> long the events are stored in rdbms backing hms. We should have another 
> configuration for the same purpose in context of replication so that we have 
> longer time configured for that otherwise we can default to a 1 day.
> hive.repl.cm.enabled can be used to identify if replication is enabled or 
> not. if enabled use the new configuration property to determine ttl for 
> events in rdbms else use hive.metastore.event.db.listener.timetolive for ttl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24173) notification cleanup interval value changes depending upon replication enabled or not.

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24173:
--
Labels: pull-request-available  (was: )

> notification cleanup interval value changes depending upon replication 
> enabled or not.
> --
>
> Key: HIVE-24173
> URL: https://issues.apache.org/jira/browse/HIVE-24173
> Project: Hive
>  Issue Type: Improvement
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently we use hive.metastore.event.db.listener.timetolive to determine how 
> long the events are stored in rdbms backing hms. We should have another 
> configuration for the same purpose in context of replication so that we have 
> longer time configured for that otherwise we can default to a 1 day.
> hive.repl.cm.enabled can be used to identify if replication is enabled or 
> not. if enabled use the new configuration property to determine ttl for 
> events in rdbms else use hive.metastore.event.db.listener.timetolive for ttl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24275) Configurations to delay the deletion of obsolete files by the Cleaner

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24275?focusedWorklogId=503213=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503213
 ]

ASF GitHub Bot logged work on HIVE-24275:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 14:22
Start Date: 21/Oct/20 14:22
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1583:
URL: https://github.com/apache/hive/pull/1583#discussion_r509330534



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -272,4 +302,22 @@ private void removeFiles(String location, ValidWriteIdList 
writeIdList, Compacti
   fs.delete(dead, true);
 }
   }
+
+  /**
+   * Check if user configured retention time for the cleanup of obsolete 
directories/files for the table
+   * has passed or not
+   *
+   * @param ci CompactionInfo
+   * @return True, if retention time has passed and it is ok to clean, else 
false
+   */
+  public boolean isReadyToCleanWithRetentionPolicy(CompactionInfo ci) {

Review comment:
   This whole thing could be added to the findReadyToClean's sql query 
where clause. It could use the TxnDbUtil.getEpochFn so there would be no need 
for new fields in CompactionInfo





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503213)
Time Spent: 1h  (was: 50m)

> Configurations to delay the deletion of obsolete files by the Cleaner
> -
>
> Key: HIVE-24275
> URL: https://issues.apache.org/jira/browse/HIVE-24275
> Project: Hive
>  Issue Type: New Feature
>Reporter: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Whenever compaction happens, the cleaner immediately deletes older obsolete 
> files. In certain cases it would be beneficial to retain these for certain 
> period. For example : if you are serving the file metadata from cache and 
> don't want to invalidate the cache during compaction because of performance 
> reasons. 
> For this purpose we should introduce a configuration 
> hive.compactor.delayed.cleanup.enabled, which if enabled will delay the 
> cleaning up obsolete files. There should be a separate configuration 
> CLEANER_RETENTION_TIME to specify the duration till which we should retain 
> these older obsolete files. 
> It might be beneficial to have one more configuration to decide whether to 
> retain files involved in an aborted transaction 
> hive.compactor.aborted.txn.delayed.cleanup.enabled . 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24275) Configurations to delay the deletion of obsolete files by the Cleaner

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24275?focusedWorklogId=503210=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503210
 ]

ASF GitHub Bot logged work on HIVE-24275:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 14:18
Start Date: 21/Oct/20 14:18
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1583:
URL: https://github.com/apache/hive/pull/1583#discussion_r509327437



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##
@@ -237,10 +238,22 @@ public void markCompacted(CompactionInfo info) throws 
MetaException {
   try {
 dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
 stmt = dbConn.createStatement();
-String s = "UPDATE \"COMPACTION_QUEUE\" SET \"CQ_STATE\" = '" + 
READY_FOR_CLEANING + "', "
+long now = getDbTime(dbConn);
+String s = "UPDATE \"COMPACTION_QUEUE\" SET \"CQ_META_INFO\" = " + now 
+ ", \"CQ_STATE\" = '" + READY_FOR_CLEANING + "', "

Review comment:
   You should use a new field for that, and also can leverage 
TxnDbUtil.getEpochFn





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503210)
Time Spent: 50m  (was: 40m)

> Configurations to delay the deletion of obsolete files by the Cleaner
> -
>
> Key: HIVE-24275
> URL: https://issues.apache.org/jira/browse/HIVE-24275
> Project: Hive
>  Issue Type: New Feature
>Reporter: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Whenever compaction happens, the cleaner immediately deletes older obsolete 
> files. In certain cases it would be beneficial to retain these for certain 
> period. For example : if you are serving the file metadata from cache and 
> don't want to invalidate the cache during compaction because of performance 
> reasons. 
> For this purpose we should introduce a configuration 
> hive.compactor.delayed.cleanup.enabled, which if enabled will delay the 
> cleaning up obsolete files. There should be a separate configuration 
> CLEANER_RETENTION_TIME to specify the duration till which we should retain 
> these older obsolete files. 
> It might be beneficial to have one more configuration to decide whether to 
> retain files involved in an aborted transaction 
> hive.compactor.aborted.txn.delayed.cleanup.enabled . 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24275) Configurations to delay the deletion of obsolete files by the Cleaner

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24275?focusedWorklogId=503209=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503209
 ]

ASF GitHub Bot logged work on HIVE-24275:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 14:15
Start Date: 21/Oct/20 14:15
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1583:
URL: https://github.com/apache/hive/pull/1583#discussion_r509324748



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -229,6 +233,7 @@ private static String idWatermark(CompactionInfo ci) {
   private void removeFiles(String location, ValidWriteIdList writeIdList, 
CompactionInfo ci)
   throws IOException, NoSuchObjectException, MetaException {
 Path locPath = new Path(location);
+FileSystem fs = locPath.getFileSystem(conf);

Review comment:
   This fs can passed to getAcidState





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503209)
Time Spent: 40m  (was: 0.5h)

> Configurations to delay the deletion of obsolete files by the Cleaner
> -
>
> Key: HIVE-24275
> URL: https://issues.apache.org/jira/browse/HIVE-24275
> Project: Hive
>  Issue Type: New Feature
>Reporter: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Whenever compaction happens, the cleaner immediately deletes older obsolete 
> files. In certain cases it would be beneficial to retain these for certain 
> period. For example : if you are serving the file metadata from cache and 
> don't want to invalidate the cache during compaction because of performance 
> reasons. 
> For this purpose we should introduce a configuration 
> hive.compactor.delayed.cleanup.enabled, which if enabled will delay the 
> cleaning up obsolete files. There should be a separate configuration 
> CLEANER_RETENTION_TIME to specify the duration till which we should retain 
> these older obsolete files. 
> It might be beneficial to have one more configuration to decide whether to 
> retain files involved in an aborted transaction 
> hive.compactor.aborted.txn.delayed.cleanup.enabled . 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24275) Configurations to delay the deletion of obsolete files by the Cleaner

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24275?focusedWorklogId=503208=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503208
 ]

ASF GitHub Bot logged work on HIVE-24275:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 14:12
Start Date: 21/Oct/20 14:12
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1583:
URL: https://github.com/apache/hive/pull/1583#discussion_r509322335



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -244,7 +249,32 @@ private void removeFiles(String location, ValidWriteIdList 
writeIdList, Compacti
 obsoleteDirs.addAll(dir.getAbortedDirectories());
 List filesToDelete = new ArrayList<>(obsoleteDirs.size());
 StringBuilder extraDebugInfo = new StringBuilder("[");
+boolean delayedCleanupEnabled = 
conf.getBoolVar(HiveConf.ConfVars.HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED);
+
 for (Path stat : obsoleteDirs) {
+  if (delayedCleanupEnabled) {
+String filename = stat.toString();
+if (filename.startsWith(AcidUtils.BASE_PREFIX)) {
+  long writeId = AcidUtils.ParsedBase.parseBase(stat).getWriteId();
+  if (ci.type == CompactionType.MINOR) {
+LOG.info("Skipping base dir " + stat + " as this cleanup is for 
minor compaction"
++ ", compaction id " + ci.id);
+continue;
+  } else if (writeId > writeIdList.getHighWatermark()) {
+LOG.info("Skipping base dir " + stat + " deletion as WriteId of 
this base dir is"
++ " greater than highWaterMark for compaction id " + ci.id);
+continue;
+  }
+}
+else if (filename.startsWith(AcidUtils.DELTA_PREFIX) || 
filename.startsWith(AcidUtils.DELETE_DELTA_PREFIX)) {
+  AcidUtils.ParsedDelta delta = AcidUtils.parsedDelta(stat, fs);
+  if (delta.getMaxWriteId() > writeIdList.getHighWatermark()) {

Review comment:
   I am not sure about this check. I guess this is here, to prepare for the 
case when there were two compaction, and we are doing the cleanup of the first 
one and don't want to clean up the stuff that was compacted by the second one. 
But the cleaner validWriteId list is topped by the minOpenTxnId, so if 
everything was committed the writeIdList.getHighWatermark() will be 
NEXT_WRITE_ID -1 , so it won't prevent the cleaning of the second compacted 
stuff. Maybe you can use the highestWriteId from the CompactionInfo? Not sure.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503208)
Time Spent: 0.5h  (was: 20m)

> Configurations to delay the deletion of obsolete files by the Cleaner
> -
>
> Key: HIVE-24275
> URL: https://issues.apache.org/jira/browse/HIVE-24275
> Project: Hive
>  Issue Type: New Feature
>Reporter: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Whenever compaction happens, the cleaner immediately deletes older obsolete 
> files. In certain cases it would be beneficial to retain these for certain 
> period. For example : if you are serving the file metadata from cache and 
> don't want to invalidate the cache during compaction because of performance 
> reasons. 
> For this purpose we should introduce a configuration 
> hive.compactor.delayed.cleanup.enabled, which if enabled will delay the 
> cleaning up obsolete files. There should be a separate configuration 
> CLEANER_RETENTION_TIME to specify the duration till which we should retain 
> these older obsolete files. 
> It might be beneficial to have one more configuration to decide whether to 
> retain files involved in an aborted transaction 
> hive.compactor.aborted.txn.delayed.cleanup.enabled . 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24275) Configurations to delay the deletion of obsolete files by the Cleaner

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24275?focusedWorklogId=503205=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503205
 ]

ASF GitHub Bot logged work on HIVE-24275:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 14:04
Start Date: 21/Oct/20 14:04
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1583:
URL: https://github.com/apache/hive/pull/1583#discussion_r509315841



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -244,7 +249,32 @@ private void removeFiles(String location, ValidWriteIdList 
writeIdList, Compacti
 obsoleteDirs.addAll(dir.getAbortedDirectories());
 List filesToDelete = new ArrayList<>(obsoleteDirs.size());
 StringBuilder extraDebugInfo = new StringBuilder("[");
+boolean delayedCleanupEnabled = 
conf.getBoolVar(HiveConf.ConfVars.HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED);
+
 for (Path stat : obsoleteDirs) {
+  if (delayedCleanupEnabled) {
+String filename = stat.toString();
+if (filename.startsWith(AcidUtils.BASE_PREFIX)) {
+  long writeId = AcidUtils.ParsedBase.parseBase(stat).getWriteId();
+  if (ci.type == CompactionType.MINOR) {
+LOG.info("Skipping base dir " + stat + " as this cleanup is for 
minor compaction"
++ ", compaction id " + ci.id);
+continue;
+  } else if (writeId > writeIdList.getHighWatermark()) {
+LOG.info("Skipping base dir " + stat + " deletion as WriteId of 
this base dir is"
++ " greater than highWaterMark for compaction id " + ci.id);
+continue;
+  }
+}
+else if (filename.startsWith(AcidUtils.DELTA_PREFIX) || 
filename.startsWith(AcidUtils.DELETE_DELTA_PREFIX)) {
+  AcidUtils.ParsedDelta delta = AcidUtils.parsedDelta(stat, fs);

Review comment:
   There is a ParsedDeltaLight in AcidUtils, that is a cheaper way to parse 
out the maxWriteId. This parsedDelta method will issue a FS call to check if 
the delta is in raw format





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503205)
Time Spent: 20m  (was: 10m)

> Configurations to delay the deletion of obsolete files by the Cleaner
> -
>
> Key: HIVE-24275
> URL: https://issues.apache.org/jira/browse/HIVE-24275
> Project: Hive
>  Issue Type: New Feature
>Reporter: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Whenever compaction happens, the cleaner immediately deletes older obsolete 
> files. In certain cases it would be beneficial to retain these for certain 
> period. For example : if you are serving the file metadata from cache and 
> don't want to invalidate the cache during compaction because of performance 
> reasons. 
> For this purpose we should introduce a configuration 
> hive.compactor.delayed.cleanup.enabled, which if enabled will delay the 
> cleaning up obsolete files. There should be a separate configuration 
> CLEANER_RETENTION_TIME to specify the duration till which we should retain 
> these older obsolete files. 
> It might be beneficial to have one more configuration to decide whether to 
> retain files involved in an aborted transaction 
> hive.compactor.aborted.txn.delayed.cleanup.enabled . 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24270) Move scratchdir cleanup to background

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24270?focusedWorklogId=503203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503203
 ]

ASF GitHub Bot logged work on HIVE-24270:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 14:00
Start Date: 21/Oct/20 14:00
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on a change in pull request #1577:
URL: https://github.com/apache/hive/pull/1577#discussion_r509312017



##
File path: ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java
##
@@ -95,7 +95,7 @@ public static SessionState setUpSessionState(HiveConf conf, 
String user, boolean
 if (sessionState == null) {
   // Note: we assume that workers run on the same threads repeatedly, so 
we can set up
   //   the session here and it will be reused without explicitly 
storing in the worker.
-  sessionState = new SessionState(conf, user);
+  sessionState = new SessionState(conf, user, true);

Review comment:
   background threads do not need async delete. Many compaction tests 
specifically have sync assumptions. I dont see any benefit in moving background 
operations to async cleanup model.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503203)
Time Spent: 1h 10m  (was: 1h)

> Move scratchdir cleanup to background
> -
>
> Key: HIVE-24270
> URL: https://issues.apache.org/jira/browse/HIVE-24270
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In cloud environment, scratchdir cleaning at the end of the query may take 
> long time. This causes client to hang up to 1 minute even after the results 
> were streamed back. During this time client just waits for cleanup to finish. 
> Cleanup can take place in the background in HiveServer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24284) NPE when parsing druid logs using Hive

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24284?focusedWorklogId=503122=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503122
 ]

ASF GitHub Bot logged work on HIVE-24284:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 10:12
Start Date: 21/Oct/20 10:12
Worklog Time Spent: 10m 
  Work Description: maheshk114 merged pull request #1586:
URL: https://github.com/apache/hive/pull/1586


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503122)
Time Spent: 20m  (was: 10m)

> NPE when parsing druid logs using Hive
> --
>
> Key: HIVE-24284
> URL: https://issues.apache.org/jira/browse/HIVE-24284
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As per current Sys-logger parser, its always expecting a valid proc id. But 
> as per RFC3164 and RFC5424, the proc id can be skipped.So hive should handled 
> it by using NILVALUE/empty string in case the proc id is null.
>  
> {code:java}
> Caused by: java.lang.NullPointerException: null
> at java.lang.String.(String.java:566)
> at 
> org.apache.hadoop.hive.ql.log.syslog.SyslogParser.createEvent(SyslogParser.java:361)
> at 
> org.apache.hadoop.hive.ql.log.syslog.SyslogParser.readEvent(SyslogParser.java:326)
> at 
> org.apache.hadoop.hive.ql.log.syslog.SyslogSerDe.deserialize(SyslogSerDe.java:95)
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=503053=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503053
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 07:51
Start Date: 21/Oct/20 07:51
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged pull request #1548:
URL: https://github.com/apache/hive/pull/1548


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503053)
Time Spent: 12h 10m  (was: 12h)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=503051=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503051
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 07:37
Start Date: 21/Oct/20 07:37
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r509054430



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##
@@ -414,76 +403,30 @@ public void markCleaned(CompactionInfo info) throws 
MetaException {
  * aborted TXN_COMPONENTS above tc_writeid (and consequently about 
aborted txns).
  * See {@link ql.txn.compactor.Cleaner.removeFiles()}
  */
-s = "SELECT DISTINCT \"TXN_ID\" FROM \"TXNS\", \"TXN_COMPONENTS\" 
WHERE \"TXN_ID\" = \"TC_TXNID\" "
-+ "AND \"TXN_STATE\" = " + TxnStatus.ABORTED + " AND 
\"TC_DATABASE\" = ? AND \"TC_TABLE\" = ?";
-if (info.highestWriteId != 0) s += " AND \"TC_WRITEID\" <= ?";
-if (info.partName != null) s += " AND \"TC_PARTITION\" = ?";
-
+s = "DELETE FROM \"TXN_COMPONENTS\" WHERE \"TC_TXNID\" IN (" +

Review comment:
   never mind, LGTM





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503051)
Time Spent: 12h  (was: 11h 50m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24256) REPL LOAD fails because of unquoted column name

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24256?focusedWorklogId=503050=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503050
 ]

ASF GitHub Bot logged work on HIVE-24256:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 07:31
Start Date: 21/Oct/20 07:31
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #1569:
URL: https://github.com/apache/hive/pull/1569


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503050)
Time Spent: 0.5h  (was: 20m)

> REPL LOAD fails because of unquoted column name
> ---
>
> Key: HIVE-24256
> URL: https://issues.apache.org/jira/browse/HIVE-24256
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Viacheslav Avramenko
>Assignee: Viacheslav Avramenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-24256.01.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There is unquoted column name NWI_TABLE in one of the SQL queries which 
> executed during REPL LOAD.
>  This causes the command to fail when Postgres is used for metastore.
> {code:sql}
> SELECT \"NWI_NEXT\" FROM \"NEXT_WRITE_ID\" WHERE \"NWI_DATABASE\" = ? AND 
> NWI_TABLE = ?
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=503049=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503049
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 07:20
Start Date: 21/Oct/20 07:20
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r508809944



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
##
@@ -589,4 +593,9 @@ private void checkInterrupt() throws InterruptedException {
   throw new InterruptedException("Compaction execution is interrupted");
 }
   }
-}
+
+  private static boolean isDynPartAbort(Table t, CompactionInfo ci) {

Review comment:
   those are actually 2 diff methods the only common part is the check for 
isDynPart. Also there is no CompactionUtils only CompactorUtil, that contains 
thread factory stuff. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503049)
Time Spent: 11h 50m  (was: 11h 40m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24270) Move scratchdir cleanup to background

2020-10-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24270?focusedWorklogId=503046=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503046
 ]

ASF GitHub Bot logged work on HIVE-24270:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 07:00
Start Date: 21/Oct/20 07:00
Worklog Time Spent: 10m 
  Work Description: nareshpr commented on a change in pull request #1577:
URL: https://github.com/apache/hive/pull/1577#discussion_r509033974



##
File path: ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java
##
@@ -95,7 +95,7 @@ public static SessionState setUpSessionState(HiveConf conf, 
String user, boolean
 if (sessionState == null) {
   // Note: we assume that workers run on the same threads repeatedly, so 
we can set up
   //   the session here and it will be reused without explicitly 
storing in the worker.
-  sessionState = new SessionState(conf, user);
+  sessionState = new SessionState(conf, user, true);

Review comment:
   Are we targeting specific queries like auto-gather background stats 
threads & compaction ? why are we not providing a config to toggle sync/async 
delete ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 503046)
Time Spent: 1h  (was: 50m)

> Move scratchdir cleanup to background
> -
>
> Key: HIVE-24270
> URL: https://issues.apache.org/jira/browse/HIVE-24270
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In cloud environment, scratchdir cleaning at the end of the query may take 
> long time. This causes client to hang up to 1 minute even after the results 
> were streamed back. During this time client just waits for cleanup to finish. 
> Cleanup can take place in the background in HiveServer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24286) Render date and time with progress of Hive on Tez

2020-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24286?focusedWorklogId=502953=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502953
 ]

ASF GitHub Bot logged work on HIVE-24286:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 02:43
Start Date: 21/Oct/20 02:43
Worklog Time Spent: 10m 
  Work Description: okumin commented on a change in pull request #1588:
URL: https://github.com/apache/hive/pull/1588#discussion_r508954919



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/RenderStrategy.java
##
@@ -57,7 +60,8 @@ public void update(DAGStatus status, Map 
vertexProgressMap) {
   renderProgress(monitor.progressMonitor(status, vertexProgressMap));
   String report = getReport(vertexProgressMap);
   if (showReport(report)) {
-renderReport(report);
+final String time = FORMATTER.format(LocalDateTime.now());

Review comment:
   @dengzhhu653 Thanks for clarifying that! I removed the final.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502953)
Time Spent: 50m  (was: 40m)

> Render date and time with progress of Hive on Tez
> -
>
> Key: HIVE-24286
> URL: https://issues.apache.org/jira/browse/HIVE-24286
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add date/time to each line written by RenderStrategy like MapReduce and Spark.
>  
>  * 
> [https://github.com/apache/hive/blob/31c1658d9884eb4f31b06eaa718dfef8b1d92d22/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java#L350]
>  * 
> [https://github.com/apache/hive/blob/31c1658d9884eb4f31b06eaa718dfef8b1d92d22/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RenderStrategy.java#L64-L67]
>  
> This ticket would add the current time to the head of each line.
>  
> {code:java}
> 2020-10-19 13:32:41,162   Map 1: 0/1  Reducer 2: 0/1  
> 2020-10-19 13:32:44,231   Map 1: 0/1  Reducer 2: 0/1  
> 2020-10-19 13:32:46,813   Map 1: 0(+1)/1  Reducer 2: 0/1  
> 2020-10-19 13:32:49,878   Map 1: 0(+1)/1  Reducer 2: 0/1  
> 2020-10-19 13:32:51,416   Map 1: 1/1  Reducer 2: 0/1  
> 2020-10-19 13:32:51,936   Map 1: 1/1  Reducer 2: 0(+1)/1  
> 2020-10-19 13:32:52,877   Map 1: 1/1  Reducer 2: 1/1  
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24286) Render date and time with progress of Hive on Tez

2020-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24286?focusedWorklogId=502948=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502948
 ]

ASF GitHub Bot logged work on HIVE-24286:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 02:31
Start Date: 21/Oct/20 02:31
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #1588:
URL: https://github.com/apache/hive/pull/1588#discussion_r508951394



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/RenderStrategy.java
##
@@ -57,7 +60,8 @@ public void update(DAGStatus status, Map 
vertexProgressMap) {
   renderProgress(monitor.progressMonitor(status, vertexProgressMap));
   String report = getReport(vertexProgressMap);
   if (showReport(report)) {
-renderReport(report);
+final String time = FORMATTER.format(LocalDateTime.now());

Review comment:
   yes, align with other variables and there seems no neccesarry to declare 
it as final, as no one would change it after the if branch.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502948)
Time Spent: 40m  (was: 0.5h)

> Render date and time with progress of Hive on Tez
> -
>
> Key: HIVE-24286
> URL: https://issues.apache.org/jira/browse/HIVE-24286
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Add date/time to each line written by RenderStrategy like MapReduce and Spark.
>  
>  * 
> [https://github.com/apache/hive/blob/31c1658d9884eb4f31b06eaa718dfef8b1d92d22/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java#L350]
>  * 
> [https://github.com/apache/hive/blob/31c1658d9884eb4f31b06eaa718dfef8b1d92d22/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RenderStrategy.java#L64-L67]
>  
> This ticket would add the current time to the head of each line.
>  
> {code:java}
> 2020-10-19 13:32:41,162   Map 1: 0/1  Reducer 2: 0/1  
> 2020-10-19 13:32:44,231   Map 1: 0/1  Reducer 2: 0/1  
> 2020-10-19 13:32:46,813   Map 1: 0(+1)/1  Reducer 2: 0/1  
> 2020-10-19 13:32:49,878   Map 1: 0(+1)/1  Reducer 2: 0/1  
> 2020-10-19 13:32:51,416   Map 1: 1/1  Reducer 2: 0/1  
> 2020-10-19 13:32:51,936   Map 1: 1/1  Reducer 2: 0(+1)/1  
> 2020-10-19 13:32:52,877   Map 1: 1/1  Reducer 2: 1/1  
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24286) Render date and time with progress of Hive on Tez

2020-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24286?focusedWorklogId=502943=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502943
 ]

ASF GitHub Bot logged work on HIVE-24286:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 02:12
Start Date: 21/Oct/20 02:12
Worklog Time Spent: 10m 
  Work Description: okumin commented on a change in pull request #1588:
URL: https://github.com/apache/hive/pull/1588#discussion_r508946873



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/RenderStrategy.java
##
@@ -57,7 +60,8 @@ public void update(DAGStatus status, Map 
vertexProgressMap) {
   renderProgress(monitor.progressMonitor(status, vertexProgressMap));
   String report = getReport(vertexProgressMap);
   if (showReport(report)) {
-renderReport(report);
+final String time = FORMATTER.format(LocalDateTime.now());

Review comment:
   @dengzhhu653 I don't disagree, but what's your thought? To align with 
other variables?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502943)
Time Spent: 0.5h  (was: 20m)

> Render date and time with progress of Hive on Tez
> -
>
> Key: HIVE-24286
> URL: https://issues.apache.org/jira/browse/HIVE-24286
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add date/time to each line written by RenderStrategy like MapReduce and Spark.
>  
>  * 
> [https://github.com/apache/hive/blob/31c1658d9884eb4f31b06eaa718dfef8b1d92d22/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java#L350]
>  * 
> [https://github.com/apache/hive/blob/31c1658d9884eb4f31b06eaa718dfef8b1d92d22/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RenderStrategy.java#L64-L67]
>  
> This ticket would add the current time to the head of each line.
>  
> {code:java}
> 2020-10-19 13:32:41,162   Map 1: 0/1  Reducer 2: 0/1  
> 2020-10-19 13:32:44,231   Map 1: 0/1  Reducer 2: 0/1  
> 2020-10-19 13:32:46,813   Map 1: 0(+1)/1  Reducer 2: 0/1  
> 2020-10-19 13:32:49,878   Map 1: 0(+1)/1  Reducer 2: 0/1  
> 2020-10-19 13:32:51,416   Map 1: 1/1  Reducer 2: 0/1  
> 2020-10-19 13:32:51,936   Map 1: 1/1  Reducer 2: 0(+1)/1  
> 2020-10-19 13:32:52,877   Map 1: 1/1  Reducer 2: 1/1  
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24286) Render date and time with progress of Hive on Tez

2020-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24286?focusedWorklogId=502940=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502940
 ]

ASF GitHub Bot logged work on HIVE-24286:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 01:59
Start Date: 21/Oct/20 01:59
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #1588:
URL: https://github.com/apache/hive/pull/1588#discussion_r508943308



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/RenderStrategy.java
##
@@ -57,7 +60,8 @@ public void update(DAGStatus status, Map 
vertexProgressMap) {
   renderProgress(monitor.progressMonitor(status, vertexProgressMap));
   String report = getReport(vertexProgressMap);
   if (showReport(report)) {
-renderReport(report);
+final String time = FORMATTER.format(LocalDateTime.now());

Review comment:
   Can we remove the final here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502940)
Time Spent: 20m  (was: 10m)

> Render date and time with progress of Hive on Tez
> -
>
> Key: HIVE-24286
> URL: https://issues.apache.org/jira/browse/HIVE-24286
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add date/time to each line written by RenderStrategy like MapReduce and Spark.
>  
>  * 
> [https://github.com/apache/hive/blob/31c1658d9884eb4f31b06eaa718dfef8b1d92d22/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java#L350]
>  * 
> [https://github.com/apache/hive/blob/31c1658d9884eb4f31b06eaa718dfef8b1d92d22/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RenderStrategy.java#L64-L67]
>  
> This ticket would add the current time to the head of each line.
>  
> {code:java}
> 2020-10-19 13:32:41,162   Map 1: 0/1  Reducer 2: 0/1  
> 2020-10-19 13:32:44,231   Map 1: 0/1  Reducer 2: 0/1  
> 2020-10-19 13:32:46,813   Map 1: 0(+1)/1  Reducer 2: 0/1  
> 2020-10-19 13:32:49,878   Map 1: 0(+1)/1  Reducer 2: 0/1  
> 2020-10-19 13:32:51,416   Map 1: 1/1  Reducer 2: 0/1  
> 2020-10-19 13:32:51,936   Map 1: 1/1  Reducer 2: 0(+1)/1  
> 2020-10-19 13:32:52,877   Map 1: 1/1  Reducer 2: 1/1  
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24037) Parallelize hash table constructions in map joins

2020-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24037?focusedWorklogId=502921=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502921
 ]

ASF GitHub Bot logged work on HIVE-24037:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 00:57
Start Date: 21/Oct/20 00:57
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1401:
URL: https://github.com/apache/hive/pull/1401#issuecomment-713224730


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502921)
Time Spent: 20m  (was: 10m)

> Parallelize hash table constructions in map joins
> -
>
> Key: HIVE-24037
> URL: https://issues.apache.org/jira/browse/HIVE-24037
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Parallelize hash table constructions in map joins



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24053) Pluggable HttpRequestInterceptor for Hive JDBC

2020-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24053?focusedWorklogId=502919=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502919
 ]

ASF GitHub Bot logged work on HIVE-24053:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 00:57
Start Date: 21/Oct/20 00:57
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1417:
URL: https://github.com/apache/hive/pull/1417#issuecomment-713224719


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502919)
Time Spent: 20m  (was: 10m)

> Pluggable HttpRequestInterceptor for Hive JDBC
> --
>
> Key: HIVE-24053
> URL: https://issues.apache.org/jira/browse/HIVE-24053
> Project: Hive
>  Issue Type: New Feature
>  Components: JDBC
>Affects Versions: 3.1.2
>Reporter: Ying Wang
>Assignee: Ying Wang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Allows client to pass in the name of a customize HttpRequestInterceptor, 
> instantiate the class and adds it to HttpClient.
> Example usage: We would like to pass in a HttpRequestInterceptor for OAuth2.0 
> Authentication purpose. The HttpRequestInterceptor will acquire and/or 
> refresh the access token and add it as authentication header each time 
> HiveConnection sends the HttpRequest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-15157) Partition Table With timestamp type on S3 storage --> Error in getting fields from serde.Invalid Field null

2020-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-15157?focusedWorklogId=502920=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502920
 ]

ASF GitHub Bot logged work on HIVE-15157:
-

Author: ASF GitHub Bot
Created on: 21/Oct/20 00:57
Start Date: 21/Oct/20 00:57
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #840:
URL: https://github.com/apache/hive/pull/840


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502920)
Time Spent: 1h 10m  (was: 1h)

> Partition Table With timestamp type on S3 storage --> Error in getting fields 
> from serde.Invalid Field null
> ---
>
> Key: HIVE-15157
> URL: https://issues.apache.org/jira/browse/HIVE-15157
> Project: Hive
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: 2.1.0
> Environment: JDK 1.8 101 
>Reporter: thauvin damien
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
>  Labels: pull-request-available, timestamp
> Attachments: HIVE-15157.01.patch, HIVE-15157.02.patch, 
> HIVE-15157.03.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Hello 
> I get the error above when i try to perform  :
> hive> DESCRIBE formatted table partition (tsbucket='2016-10-28 16%3A00%3A00');
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. Error in getting fields from 
> serde.Invalid Field null
> Here is the description of the issue.
> --External table Hive with dynamic partition enable on Aws S3 storage.
> --Partition Table with timestamp type .
> When i perform "show partition table;" everything is fine :
> hive>  show partitions table;
> OK
> tsbucket=2016-10-01 11%3A00%3A00
> tsbucket=2016-10-28 16%3A00%3A00
> And when i perform "describe FORMATTED table;" everything is fine
> Is this a bug ? 
> The stacktrace of hive.log :
> 2016-11-08T10:30:20,868 ERROR [ac3e0d48-22c5-4d04-a788-aeb004ea94f3 
> main([])]: exec.DDLTask (DDLTask.java:failed(574)) - 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error in getting fields 
> from serde.Invalid Field null
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getFieldsFromDeserializer(Hive.java:3414)
> at 
> org.apache.hadoop.hive.ql.exec.DDLTask.describeTable(DDLTask.java:3109)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:408)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: MetaException(message:Invalid Field null)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getFieldsFromDeserializer(MetaStoreUtils.java:1336)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getFieldsFromDeserializer(Hive.java:3409)
> ... 21 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=502838=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502838
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 20/Oct/20 20:14
Start Date: 20/Oct/20 20:14
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r508809944



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
##
@@ -589,4 +593,9 @@ private void checkInterrupt() throws InterruptedException {
   throw new InterruptedException("Compaction execution is interrupted");
 }
   }
-}
+
+  private static boolean isDynPartAbort(Table t, CompactionInfo ci) {

Review comment:
   could be, do you know if there is some helper class I could move 
isDynPartAbort method? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502838)
Time Spent: 11h 40m  (was: 11.5h)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=502837=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502837
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 20/Oct/20 20:10
Start Date: 20/Oct/20 20:10
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r508807499



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##
@@ -414,76 +403,30 @@ public void markCleaned(CompactionInfo info) throws 
MetaException {
  * aborted TXN_COMPONENTS above tc_writeid (and consequently about 
aborted txns).
  * See {@link ql.txn.compactor.Cleaner.removeFiles()}
  */
-s = "SELECT DISTINCT \"TXN_ID\" FROM \"TXNS\", \"TXN_COMPONENTS\" 
WHERE \"TXN_ID\" = \"TC_TXNID\" "
-+ "AND \"TXN_STATE\" = " + TxnStatus.ABORTED + " AND 
\"TC_DATABASE\" = ? AND \"TC_TABLE\" = ?";
-if (info.highestWriteId != 0) s += " AND \"TC_WRITEID\" <= ?";
-if (info.partName != null) s += " AND \"TC_PARTITION\" = ?";
-
+s = "DELETE FROM \"TXN_COMPONENTS\" WHERE \"TC_TXNID\" IN (" +

Review comment:
   @pvary, could you please take a quick look? thanks!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502837)
Time Spent: 11.5h  (was: 11h 20m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=502835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502835
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 20/Oct/20 20:05
Start Date: 20/Oct/20 20:05
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r508805163



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##
@@ -414,76 +403,30 @@ public void markCleaned(CompactionInfo info) throws 
MetaException {
  * aborted TXN_COMPONENTS above tc_writeid (and consequently about 
aborted txns).
  * See {@link ql.txn.compactor.Cleaner.removeFiles()}
  */
-s = "SELECT DISTINCT \"TXN_ID\" FROM \"TXNS\", \"TXN_COMPONENTS\" 
WHERE \"TXN_ID\" = \"TC_TXNID\" "
-+ "AND \"TXN_STATE\" = " + TxnStatus.ABORTED + " AND 
\"TC_DATABASE\" = ? AND \"TC_TABLE\" = ?";
-if (info.highestWriteId != 0) s += " AND \"TC_WRITEID\" <= ?";
-if (info.partName != null) s += " AND \"TC_PARTITION\" = ?";
-
+s = "DELETE FROM \"TXN_COMPONENTS\" WHERE \"TC_TXNID\" IN (" +

Review comment:
   this is an optimization that makes everything in 1 db request instead of 
2 (select + delete)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502835)
Time Spent: 11h 20m  (was: 11h 10m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=502834=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502834
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 20/Oct/20 20:04
Start Date: 20/Oct/20 20:04
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r508804039



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##
@@ -400,11 +389,11 @@ public void markCleaned(CompactionInfo info) throws 
MetaException {
   pStmt.setString(paramCount++, info.partName);
 }
 if(info.highestWriteId != 0) {
-  pStmt.setLong(paramCount++, info.highestWriteId);
+  pStmt.setLong(paramCount, info.highestWriteId);

Review comment:
   redundant post increment

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##
@@ -134,9 +132,6 @@ public CompactionTxnHandler() {
 response.add(info);
   }
 }
-
-LOG.debug("Going to rollback");
-dbConn.rollback();

Review comment:
   no idea :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502834)
Time Spent: 11h 10m  (was: 11h)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24282) Show columns shouldn't sort output columns unless explicitly mentioned.

2020-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24282?focusedWorklogId=502803=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502803
 ]

ASF GitHub Bot logged work on HIVE-24282:
-

Author: ASF GitHub Bot
Created on: 20/Oct/20 17:40
Start Date: 20/Oct/20 17:40
Worklog Time Spent: 10m 
  Work Description: nareshpr commented on pull request #1584:
URL: https://github.com/apache/hive/pull/1584#issuecomment-713027214


   Thanks for the review & merge @miklosgergely 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502803)
Time Spent: 0.5h  (was: 20m)

> Show columns shouldn't sort output columns unless explicitly mentioned.
> ---
>
> Key: HIVE-24282
> URL: https://issues.apache.org/jira/browse/HIVE-24282
> Project: Hive
>  Issue Type: Improvement
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> CREATE TABLE foo_n7(c INT, a INT, b INT);
> show columns in foo_n7;
> {code:java}
> // current output
> a
> b 
> c
> // expected
> c
> a 
> b{code}
> HIVE-18373 changed the original behaviour to sorted output.
> Suggesting to provide an optional keyword sorted to sort the show columns 
> output
> eg., 
> {code:java}
> show sorted columns in foo_n7;
> a
> b 
> c
> show columns in foo_n7
> c
> a 
> b{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=502802=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502802
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 20/Oct/20 17:25
Start Date: 20/Oct/20 17:25
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r508641433



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
##
@@ -589,4 +593,9 @@ private void checkInterrupt() throws InterruptedException {
   throw new InterruptedException("Compaction execution is interrupted");
 }
   }
-}
+
+  private static boolean isDynPartAbort(Table t, CompactionInfo ci) {

Review comment:
   This can be consolidated with most of isDynPartIngest in CompactionUtils

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##
@@ -400,11 +389,11 @@ public void markCleaned(CompactionInfo info) throws 
MetaException {
   pStmt.setString(paramCount++, info.partName);
 }
 if(info.highestWriteId != 0) {
-  pStmt.setLong(paramCount++, info.highestWriteId);
+  pStmt.setLong(paramCount, info.highestWriteId);

Review comment:
   Why was this changed?

##
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
##
@@ -2128,24 +2129,601 @@ public void testCleanerForTxnToWriteId() throws 
Exception {
 0, TxnDbUtil.countQueryAgent(hiveConf, "select count(*) from 
TXN_TO_WRITE_ID"));
   }
 
-  private void verifyDirAndResult(int expectedDeltas) throws Exception {
-FileSystem fs = FileSystem.get(hiveConf);
-// Verify the content of subdirs
-FileStatus[] status = fs.listStatus(new Path(TEST_WAREHOUSE_DIR + "/" +
-(Table.MMTBL).toString().toLowerCase()), 
FileUtils.HIDDEN_FILES_PATH_FILTER);
+  @Test
+  public void testMmTableAbortWithCompaction() throws Exception {

Review comment:
   FYI MM tests are usually in TestTxnCommandsForMmTable.java but I don't 
really care about this

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##
@@ -134,9 +132,6 @@ public CompactionTxnHandler() {
 response.add(info);
   }
 }
-
-LOG.debug("Going to rollback");
-dbConn.rollback();

Review comment:
   Any ideas about why this was here? Just curious

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##
@@ -414,76 +403,30 @@ public void markCleaned(CompactionInfo info) throws 
MetaException {
  * aborted TXN_COMPONENTS above tc_writeid (and consequently about 
aborted txns).
  * See {@link ql.txn.compactor.Cleaner.removeFiles()}
  */
-s = "SELECT DISTINCT \"TXN_ID\" FROM \"TXNS\", \"TXN_COMPONENTS\" 
WHERE \"TXN_ID\" = \"TC_TXNID\" "
-+ "AND \"TXN_STATE\" = " + TxnStatus.ABORTED + " AND 
\"TC_DATABASE\" = ? AND \"TC_TABLE\" = ?";
-if (info.highestWriteId != 0) s += " AND \"TC_WRITEID\" <= ?";
-if (info.partName != null) s += " AND \"TC_PARTITION\" = ?";
-
+s = "DELETE FROM \"TXN_COMPONENTS\" WHERE \"TC_TXNID\" IN (" +

Review comment:
   This is just refactoring right? LGTM but can you make sure @pvary sees 
this as well?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502802)
Time Spent: 11h  (was: 10h 50m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 11h
>  

[jira] [Work logged] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures

2020-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24217?focusedWorklogId=502753=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502753
 ]

ASF GitHub Bot logged work on HIVE-24217:
-

Author: ASF GitHub Bot
Created on: 20/Oct/20 15:51
Start Date: 20/Oct/20 15:51
Worklog Time Spent: 10m 
  Work Description: zeroflag commented on a change in pull request #1542:
URL: https://github.com/apache/hive/pull/1542#discussion_r508638143



##
File path: standalone-metastore/metastore-server/src/main/resources/package.jdo
##
@@ -1549,6 +1549,83 @@
 
   
 
+
+
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+
+
+
+
+  
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+  
+
+  

Review comment:
   Discussed this further offline, let's try the string based approach for 
now and see how it goes. I'll modify the patch.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502753)
Time Spent: 2h 40m  (was: 2.5h)

> HMS storage backend for HPL/SQL stored procedures
> -
>
> Key: HIVE-24217
> URL: https://issues.apache.org/jira/browse/HIVE-24217
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Attachments: HPL_SQL storedproc HMS storage.pdf
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> HPL/SQL procedures are currently stored in text files. The goal of this Jira 
> is to implement a Metastore backend for storing and loading these procedures. 
> This is an incremental step towards having fully capable stored procedures in 
> Hive.
>  
> See the attached design for more information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23962) Make bin/hive pick user defined jdbc url

2020-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23962?focusedWorklogId=502713=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502713
 ]

ASF GitHub Bot logged work on HIVE-23962:
-

Author: ASF GitHub Bot
Created on: 20/Oct/20 14:16
Start Date: 20/Oct/20 14:16
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on pull request #1591:
URL: https://github.com/apache/hive/pull/1591#issuecomment-712885001


   the original fix was authored by Xiaomeng Zhang which I had reviewed. So 
will submit this fix



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502713)
Time Spent: 1h  (was: 50m)

> Make bin/hive pick user defined jdbc url 
> -
>
> Key: HIVE-23962
> URL: https://issues.apache.org/jira/browse/HIVE-23962
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Xiaomeng Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently hive command will trigger bin/hive which run "beeline" by default.
> We want to pass a env variable so that user can define which url beeline use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24291) Compaction Cleaner prematurely cleans up deltas

2020-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24291?focusedWorklogId=502705=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502705
 ]

ASF GitHub Bot logged work on HIVE-24291:
-

Author: ASF GitHub Bot
Created on: 20/Oct/20 13:57
Start Date: 20/Oct/20 13:57
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on pull request #1592:
URL: https://github.com/apache/hive/pull/1592#issuecomment-712871640


   @klcopp  @pvary Could any of you review this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502705)
Time Spent: 20m  (was: 10m)

> Compaction Cleaner prematurely cleans up deltas
> ---
>
> Key: HIVE-24291
> URL: https://issues.apache.org/jira/browse/HIVE-24291
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Since HIVE-23107 the cleaner can clean up deltas that are still used by 
> running queries.
> Example:
>  * TxnId 1-5 writes to a partition, all commits
>  * Compactor starts with txnId=6
>  * Long running query starts with txnId=7, it sees txnId=6 as open in its 
> snapshot
>  * Compaction commits
>  * Cleaner runs
> Previously min_history_level table would have prevented the Cleaner to delete 
> the deltas1-5 until txnId=7 is open, but now they will be deleted and the 
> long running query may fail if its tries to access the files.
> Solution could be to not run the cleaner until any txn is open that was 
> opened before the compaction was committed (CQ_NEXT_TXN_ID)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24291) Compaction Cleaner prematurely cleans up deltas

2020-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24291:
--
Labels: pull-request-available  (was: )

> Compaction Cleaner prematurely cleans up deltas
> ---
>
> Key: HIVE-24291
> URL: https://issues.apache.org/jira/browse/HIVE-24291
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Since HIVE-23107 the cleaner can clean up deltas that are still used by 
> running queries.
> Example:
>  * TxnId 1-5 writes to a partition, all commits
>  * Compactor starts with txnId=6
>  * Long running query starts with txnId=7, it sees txnId=6 as open in its 
> snapshot
>  * Compaction commits
>  * Cleaner runs
> Previously min_history_level table would have prevented the Cleaner to delete 
> the deltas1-5 until txnId=7 is open, but now they will be deleted and the 
> long running query may fail if its tries to access the files.
> Solution could be to not run the cleaner until any txn is open that was 
> opened before the compaction was committed (CQ_NEXT_TXN_ID)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24291) Compaction Cleaner prematurely cleans up deltas

2020-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24291?focusedWorklogId=502687=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502687
 ]

ASF GitHub Bot logged work on HIVE-24291:
-

Author: ASF GitHub Bot
Created on: 20/Oct/20 13:34
Start Date: 20/Oct/20 13:34
Worklog Time Spent: 10m 
  Work Description: pvargacl opened a new pull request #1592:
URL: https://github.com/apache/hive/pull/1592


   
   
   ### What changes were proposed in this pull request?
   Compaction cleaner should wait for all previous txns to commit
   
   ### Why are the changes needed?
   Example buggy scenario in the Jira
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Unit tests
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502687)
Remaining Estimate: 0h
Time Spent: 10m

> Compaction Cleaner prematurely cleans up deltas
> ---
>
> Key: HIVE-24291
> URL: https://issues.apache.org/jira/browse/HIVE-24291
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Since HIVE-23107 the cleaner can clean up deltas that are still used by 
> running queries.
> Example:
>  * TxnId 1-5 writes to a partition, all commits
>  * Compactor starts with txnId=6
>  * Long running query starts with txnId=7, it sees txnId=6 as open in its 
> snapshot
>  * Compaction commits
>  * Cleaner runs
> Previously min_history_level table would have prevented the Cleaner to delete 
> the deltas1-5 until txnId=7 is open, but now they will be deleted and the 
> long running query may fail if its tries to access the files.
> Solution could be to not run the cleaner until any txn is open that was 
> opened before the compaction was committed (CQ_NEXT_TXN_ID)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24231) Enhance shared work optimizer to merge scans with filters on both sides

2020-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24231?focusedWorklogId=502643=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502643
 ]

ASF GitHub Bot logged work on HIVE-24231:
-

Author: ASF GitHub Bot
Created on: 20/Oct/20 11:56
Start Date: 20/Oct/20 11:56
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1553:
URL: https://github.com/apache/hive/pull/1553


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502643)
Time Spent: 3h 40m  (was: 3.5h)

> Enhance shared work optimizer to merge scans with filters on both sides
> ---
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24231) Enhance shared work optimizer to merge scans with filters on both sides

2020-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24231?focusedWorklogId=502642=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502642
 ]

ASF GitHub Bot logged work on HIVE-24231:
-

Author: ASF GitHub Bot
Created on: 20/Oct/20 11:56
Start Date: 20/Oct/20 11:56
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1553:
URL: https://github.com/apache/hive/pull/1553#discussion_r508438776



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java
##
@@ -284,6 +304,54 @@ public ParseContext transform(ParseContext pctx) throws 
SemanticException {
 return pctx;
   }
 
+  /** SharedWorkOptimization strategy modes */
+  public enum Mode {
+/**
+ * Merges two identical subtrees.
+ */
+SubtreeMerge,
+/**
+ * Merges a filtered scan into a non-filtered scan.
+ *
+ * In case we are already scanning the whole table - we should not scan it 
twice.
+ */
+RemoveSemijoin,
+/**
+ * Fuses two filtered table scans into a single one.
+ *
+ * Dynamic filter subtree is kept on both sides - but the table is onlt 
scanned once.

Review comment:
   added fix to HIVE-24241





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502642)
Time Spent: 3.5h  (was: 3h 20m)

> Enhance shared work optimizer to merge scans with filters on both sides
> ---
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24231) Enhance shared work optimizer to merge scans with filters on both sides

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24231?focusedWorklogId=502520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502520
 ]

ASF GitHub Bot logged work on HIVE-24231:
-

Author: ASF GitHub Bot
Created on: 20/Oct/20 05:39
Start Date: 20/Oct/20 05:39
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1553:
URL: https://github.com/apache/hive/pull/1553#discussion_r508220345



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java
##
@@ -284,6 +304,54 @@ public ParseContext transform(ParseContext pctx) throws 
SemanticException {
 return pctx;
   }
 
+  /** SharedWorkOptimization strategy modes */
+  public enum Mode {
+/**
+ * Merges two identical subtrees.
+ */
+SubtreeMerge,
+/**
+ * Merges a filtered scan into a non-filtered scan.
+ *
+ * In case we are already scanning the whole table - we should not scan it 
twice.
+ */
+RemoveSemijoin,
+/**
+ * Fuses two filtered table scans into a single one.
+ *
+ * Dynamic filter subtree is kept on both sides - but the table is onlt 
scanned once.

Review comment:
   typo. onlt





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502520)
Time Spent: 3h 20m  (was: 3h 10m)

> Enhance shared work optimizer to merge scans with filters on both sides
> ---
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24227) sys.replication_metrics table shows incorrect status for failed policies

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24227?focusedWorklogId=502514=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502514
 ]

ASF GitHub Bot logged work on HIVE-24227:
-

Author: ASF GitHub Bot
Created on: 20/Oct/20 05:23
Start Date: 20/Oct/20 05:23
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1550:
URL: https://github.com/apache/hive/pull/1550#discussion_r507575660



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
##
@@ -1324,5 +1331,18 @@ private String relativeExtInfoPath(String dbName) {
   return File.separator + dbName.toLowerCase() + File.separator + 
FILE_NAME;
 }
   }
-
+  
+  private Path getNonRecoverablePath(Path dumpDir, String dbName) throws 
IOException {

Review comment:
   Need to be utility method. This is part of other tests also

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/AckTask.java
##
@@ -45,9 +46,10 @@ public int execute() {
   Path ackPath = work.getAckFilePath();
   Utils.create(ackPath, conf);
   LOG.info("Created ack file : {} ", ackPath);
-} catch (SemanticException e) {
+} catch (Exception e) {
   setException(e);
-  return ErrorMsg.getErrorMsg(e.getMessage()).getErrorCode();
+  return ReplUtils.handleException(true, e, 
work.getAckFilePath().getParent().getParent().toString(),

Review comment:
   can this be null work.getAckFilePath().getParent().getParent()





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502514)
Time Spent: 2h 40m  (was: 2.5h)

> sys.replication_metrics table shows incorrect status for failed policies
> 
>
> Key: HIVE-24227
> URL: https://issues.apache.org/jira/browse/HIVE-24227
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23962) Make bin/hive pick user defined jdbc url

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23962?focusedWorklogId=502452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502452
 ]

ASF GitHub Bot logged work on HIVE-23962:
-

Author: ASF GitHub Bot
Created on: 20/Oct/20 02:00
Start Date: 20/Oct/20 02:00
Worklog Time Spent: 10m 
  Work Description: nrg4878 opened a new pull request #1591:
URL: https://github.com/apache/hive/pull/1591


   SAME AS PR#1344 initially submitted by Xiaomeng. I am just rebasing the 
patch as the old PR is auto-closed. Looking for a clean test run.
   
   Add an env variable BEELINE_URL_LEGACY, when this value is not empty,
   run "beeline -c $BEELINE_URL_LEGACY".
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502452)
Time Spent: 50m  (was: 40m)

> Make bin/hive pick user defined jdbc url 
> -
>
> Key: HIVE-23962
> URL: https://issues.apache.org/jira/browse/HIVE-23962
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Xiaomeng Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently hive command will trigger bin/hive which run "beeline" by default.
> We want to pass a env variable so that user can define which url beeline use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24288) Files created by CompileProcessor have incorrect permissions

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24288:
--
Labels: pull-request-available  (was: )

> Files created by CompileProcessor have incorrect permissions
> 
>
> Key: HIVE-24288
> URL: https://issues.apache.org/jira/browse/HIVE-24288
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Compile processor generates some temporary files as part of processing. These 
> need to be cleaned up on exit from CLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24288) Files created by CompileProcessor have incorrect permissions

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24288?focusedWorklogId=502451=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502451
 ]

ASF GitHub Bot logged work on HIVE-24288:
-

Author: ASF GitHub Bot
Created on: 20/Oct/20 01:33
Start Date: 20/Oct/20 01:33
Worklog Time Spent: 10m 
  Work Description: nrg4878 opened a new pull request #1590:
URL: https://github.com/apache/hive/pull/1590


   ### What changes were proposed in this pull request?
   When you run the "compile" query in Hive CLI, this creates some temp files 
in java.io.tmp directory that need to be cleaned up after the resource is added 
to the session.
   For example:
   compile `import org.apache.hadoop.hive.ql.exec.UDF \;
   public class Pyth extends UDF {
 public double evaluate(double a, double b){
   return Math.sqrt((a*a) + (b*b)) \;
 }
   } ` AS GROOVY NAMED Pyth.groovy;
   
   in /tmp,
   ./0_1603130653872in/Pyth.groovy
   ./0_1603130393407in/Pyth.groovy
   ./0_1603130541093in/Pyth.groovy
   
   ls -l *.jar
   -rw-r--r-- 1 root root 1578 Oct 19 17:59 0_1603130393407.jar
   -rw-r--r-- 1 hive hive 1578 Oct 19 18:02 0_1603130541093.jar
   -rw-r--r-- 1 hive hive 1578 Oct 19 18:04 0_1603130653872.jar
   
   ### Why are the changes needed?
   Cleanup needed
   
   ### Does this PR introduce _any_ user-facing change?
   NO
   
   ### How was this patch tested?
   Manually using Hive CLI.
   After the fix,
   ls -l in /tmp, shows no new .groovy files
   Also the jar file has lesser permissions for non-owners
   -rw--- 1 root root 1578 Oct 20 00:54 2_1603155248285.jar



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502451)
Remaining Estimate: 0h
Time Spent: 10m

> Files created by CompileProcessor have incorrect permissions
> 
>
> Key: HIVE-24288
> URL: https://issues.apache.org/jira/browse/HIVE-24288
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Compile processor generates some temporary files as part of processing. These 
> need to be cleaned up on exit from CLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23962) Make bin/hive pick user defined jdbc url

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23962?focusedWorklogId=502447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502447
 ]

ASF GitHub Bot logged work on HIVE-23962:
-

Author: ASF GitHub Bot
Created on: 20/Oct/20 00:57
Start Date: 20/Oct/20 00:57
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1344:
URL: https://github.com/apache/hive/pull/1344


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502447)
Time Spent: 40m  (was: 0.5h)

> Make bin/hive pick user defined jdbc url 
> -
>
> Key: HIVE-23962
> URL: https://issues.apache.org/jira/browse/HIVE-23962
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Xiaomeng Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently hive command will trigger bin/hive which run "beeline" by default.
> We want to pass a env variable so that user can define which url beeline use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24270) Move scratchdir cleanup to background

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24270?focusedWorklogId=502415=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502415
 ]

ASF GitHub Bot logged work on HIVE-24270:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 21:36
Start Date: 19/Oct/20 21:36
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on a change in pull request #1577:
URL: https://github.com/apache/hive/pull/1577#discussion_r508077019



##
File path: ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java
##
@@ -61,6 +61,7 @@ public static void runOnDriver(HiveConf conf, String user,
   throw new 
IllegalArgumentException(JavaUtils.txnIdToString(compactorTxnId) +
   " is not valid. Context: " + query);
 }
+sessionState.setSyncCleanup();

Review comment:
   This switch is actually meant to be constant for a specific session. 
I've put it as a setter as it seemed easier than putting it in constructor(many 
changes to many file). Sync cleanup is used for compactor and stats updater 
thread. Everything else will be async. I believe compactor and stats updater 
use their own session repeatedly. Unless a regular query re-uses 
compactor/statsupdater's session, there is no need to set it back to async. 
I'll double check if this is the case.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502415)
Time Spent: 50m  (was: 40m)

> Move scratchdir cleanup to background
> -
>
> Key: HIVE-24270
> URL: https://issues.apache.org/jira/browse/HIVE-24270
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In cloud environment, scratchdir cleaning at the end of the query may take 
> long time. This causes client to hang up to 1 minute even after the results 
> were streamed back. During this time client just waits for cleanup to finish. 
> Cleanup can take place in the background in HiveServer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24120) Plugin for external DatabaseProduct in standalone HMS

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24120?focusedWorklogId=502408=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502408
 ]

ASF GitHub Bot logged work on HIVE-24120:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 20:31
Start Date: 19/Oct/20 20:31
Worklog Time Spent: 10m 
  Work Description: gatorblue commented on pull request #1470:
URL: https://github.com/apache/hive/pull/1470#issuecomment-712424419


   Thanks for your help, Vihang!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502408)
Time Spent: 2h 40m  (was: 2.5h)

> Plugin for external DatabaseProduct in standalone HMS
> -
>
> Key: HIVE-24120
> URL: https://issues.apache.org/jira/browse/HIVE-24120
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.1
>Reporter: Gustavo Arocena
>Assignee: Gustavo Arocena
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Add a pluggable way to support ANSI compliant databases as backends for 
> standalone HMS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24120) Plugin for external DatabaseProduct in standalone HMS

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24120?focusedWorklogId=502402=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502402
 ]

ASF GitHub Bot logged work on HIVE-24120:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 20:05
Start Date: 19/Oct/20 20:05
Worklog Time Spent: 10m 
  Work Description: vihangk1 commented on a change in pull request #1470:
URL: https://github.com/apache/hive/pull/1470#discussion_r504957647



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DatabaseProduct.java
##
@@ -20,71 +20,666 @@
 
 import java.sql.SQLException;
 import java.sql.SQLTransactionRollbackException;
+import java.sql.Timestamp;
+import java.util.ArrayList;
+import java.util.EnumMap;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
 
-/** Database product infered via JDBC. */
-public enum DatabaseProduct {
-  DERBY, MYSQL, POSTGRES, ORACLE, SQLSERVER, OTHER;
+import org.apache.hadoop.conf.Configurable;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf.ConfVars;
+import org.apache.hadoop.util.ReflectionUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
+import com.google.common.base.Preconditions;
+
+/** Database product inferred via JDBC. Encapsulates all SQL logic associated 
with
+ * the database product.
+ * This class is a singleton, which is instantiated the first time
+ * method determineDatabaseProduct is invoked.
+ * Tests that need to create multiple instances can use the reset method
+ * */
+public class DatabaseProduct implements Configurable {
+  static final private Logger LOG = 
LoggerFactory.getLogger(DatabaseProduct.class.getName());
+
+  private static enum DbType {DERBY, MYSQL, POSTGRES, ORACLE, SQLSERVER, 
CUSTOM, UNDEFINED};
+  public DbType dbType;
+  
+  // Singleton instance
+  private static DatabaseProduct theDatabaseProduct;
+
+  Configuration myConf;
+  /**
+   * Protected constructor for singleton class
+   * @param id
+   */
+  protected DatabaseProduct() {}
+
+  public static final String DERBY_NAME = "derby";
+  public static final String SQL_SERVER_NAME = "microsoft sql server";
+  public static final String MYSQL_NAME = "mysql";
+  public static final String POSTGRESQL_NAME = "postgresql";
+  public static final String ORACLE_NAME = "oracle";
+  public static final String UNDEFINED_NAME = "other";
+  
   /**
* Determine the database product type
* @param productName string to defer database connection
* @return database product type
*/
-  public static DatabaseProduct determineDatabaseProduct(String productName) 
throws SQLException {
-if (productName == null) {
-  return OTHER;
+  public static DatabaseProduct determineDatabaseProduct(String productName, 
Configuration c) {
+DbType dbt;
+
+if (theDatabaseProduct != null) {
+  Preconditions.checkState(theDatabaseProduct.dbType == 
getDbType(productName));
+  return theDatabaseProduct;
 }
+
+// This method may be invoked by concurrent connections
+synchronized (DatabaseProduct.class) {
+
+  if (productName == null) {
+productName = UNDEFINED_NAME;
+  }
+
+  dbt = getDbType(productName);
+
+  // Check for null again in case of race condition
+  if (theDatabaseProduct == null) {
+final Configuration conf = c!= null ? c : 
MetastoreConf.newMetastoreConf();
+// Check if we are using an external database product
+boolean isExternal = MetastoreConf.getBoolVar(conf, 
ConfVars.USE_CUSTOM_RDBMS);
+
+if (isExternal) {
+  // The DatabaseProduct will be created by instantiating an external 
class via
+  // reflection. The external class can override any method in the 
current class
+  String className = MetastoreConf.getVar(conf, 
ConfVars.CUSTOM_RDBMS_CLASSNAME);
+  
+  if (className != null) {
+try {
+  theDatabaseProduct = (DatabaseProduct)
+  ReflectionUtils.newInstance(Class.forName(className), conf);
+  
+  LOG.info(String.format("Using custom RDBMS %s. Overriding 
DbType: %s", className, dbt));
+  dbt = DbType.CUSTOM;
+}catch (Exception e) {
+  LOG.warn("Caught exception instantiating custom database 
product. Reverting to " + dbt, e);
+}
+  }
+  else {
+LOG.warn("Unexpected: metastore.use.custom.database.product was 
set, " +
+ "but metastore.custom.database.product.classname was not. 
Reverting to " + dbt);
+  }
+}
+
+if (theDatabaseProduct == null) {
+  theDatabaseProduct = new DatabaseProduct();

Review 

[jira] [Work logged] (HIVE-24120) Plugin for external DatabaseProduct in standalone HMS

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24120?focusedWorklogId=502401=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502401
 ]

ASF GitHub Bot logged work on HIVE-24120:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 20:05
Start Date: 19/Oct/20 20:05
Worklog Time Spent: 10m 
  Work Description: vihangk1 commented on pull request #1470:
URL: https://github.com/apache/hive/pull/1470#issuecomment-712411363


   +1 Thanks for working on this. Merged into the master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502401)
Time Spent: 2h 20m  (was: 2h 10m)

> Plugin for external DatabaseProduct in standalone HMS
> -
>
> Key: HIVE-24120
> URL: https://issues.apache.org/jira/browse/HIVE-24120
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.1
>Reporter: Gustavo Arocena
>Assignee: Gustavo Arocena
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Add a pluggable way to support ANSI compliant databases as backends for 
> standalone HMS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24120) Plugin for external DatabaseProduct in standalone HMS

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24120?focusedWorklogId=502400=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502400
 ]

ASF GitHub Bot logged work on HIVE-24120:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 20:04
Start Date: 19/Oct/20 20:04
Worklog Time Spent: 10m 
  Work Description: vihangk1 merged pull request #1470:
URL: https://github.com/apache/hive/pull/1470


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502400)
Time Spent: 2h 10m  (was: 2h)

> Plugin for external DatabaseProduct in standalone HMS
> -
>
> Key: HIVE-24120
> URL: https://issues.apache.org/jira/browse/HIVE-24120
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.1
>Reporter: Gustavo Arocena
>Assignee: Gustavo Arocena
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Add a pluggable way to support ANSI compliant databases as backends for 
> standalone HMS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24256) REPL LOAD fails because of unquoted column name

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24256?focusedWorklogId=502394=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502394
 ]

ASF GitHub Bot logged work on HIVE-24256:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 19:35
Start Date: 19/Oct/20 19:35
Worklog Time Spent: 10m 
  Work Description: vyaslav commented on pull request #1569:
URL: https://github.com/apache/hive/pull/1569#issuecomment-712396434


   @kgyrtkirk There is 1 approval on reviewboard for this PR 
https://reviews.apache.org/r/72963/
   Is there anything else I can do? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502394)
Time Spent: 20m  (was: 10m)

> REPL LOAD fails because of unquoted column name
> ---
>
> Key: HIVE-24256
> URL: https://issues.apache.org/jira/browse/HIVE-24256
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Viacheslav Avramenko
>Assignee: Viacheslav Avramenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-24256.01.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is unquoted column name NWI_TABLE in one of the SQL queries which 
> executed during REPL LOAD.
>  This causes the command to fail when Postgres is used for metastore.
> {code:sql}
> SELECT \"NWI_NEXT\" FROM \"NEXT_WRITE_ID\" WHERE \"NWI_DATABASE\" = ? AND 
> NWI_TABLE = ?
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24286) Render date and time with progress of Hive on Tez

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24286?focusedWorklogId=502214=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502214
 ]

ASF GitHub Bot logged work on HIVE-24286:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 14:10
Start Date: 19/Oct/20 14:10
Worklog Time Spent: 10m 
  Work Description: okumin opened a new pull request #1588:
URL: https://github.com/apache/hive/pull/1588


   
   
   ### What changes were proposed in this pull request?
   
   Add date and time to progress logs for Hive on Tez like MapReduce and Spark.
   
   ### Why are the changes needed?
   
   For convenience. The date and time will give us more insights when looking 
through logs.
   
   ### Does this PR introduce _any_ user-facing change?
   
   This PR has the only changes for logging. We may add a configuration if this 
can be a breaking change and we can't accept it.
   
   
   ### How was this patch tested?
   
   Run `beeline --hiveconf hive.server2.in.place.progress=false` and check the 
progress logs. I have checked the following logs show up.
   
   ```
   INFO  : 2020-10-19 13:32:41,162  Map 1: 0/1  Reducer 2: 0/1  
   INFO  : 2020-10-19 13:32:44,231  Map 1: 0/1  Reducer 2: 0/1  
   INFO  : 2020-10-19 13:32:46,813  Map 1: 0(+1)/1  Reducer 2: 0/1  
   INFO  : 2020-10-19 13:32:49,878  Map 1: 0(+1)/1  Reducer 2: 0/1  
   INFO  : 2020-10-19 13:32:51,416  Map 1: 1/1  Reducer 2: 0/1  
   INFO  : 2020-10-19 13:32:51,936  Map 1: 1/1  Reducer 2: 0(+1)/1  
   INFO  : 2020-10-19 13:32:52,877  Map 1: 1/1  Reducer 2: 1/1  
   ```
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502214)
Remaining Estimate: 0h
Time Spent: 10m

> Render date and time with progress of Hive on Tez
> -
>
> Key: HIVE-24286
> URL: https://issues.apache.org/jira/browse/HIVE-24286
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add date/time to each line written by RenderStrategy like MapReduce and Spark.
>  
>  * 
> [https://github.com/apache/hive/blob/31c1658d9884eb4f31b06eaa718dfef8b1d92d22/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java#L350]
>  * 
> [https://github.com/apache/hive/blob/31c1658d9884eb4f31b06eaa718dfef8b1d92d22/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RenderStrategy.java#L64-L67]
>  
> This ticket would add the current time to the head of each line.
>  
> {code:java}
> 2020-10-19 13:32:41,162   Map 1: 0/1  Reducer 2: 0/1  
> 2020-10-19 13:32:44,231   Map 1: 0/1  Reducer 2: 0/1  
> 2020-10-19 13:32:46,813   Map 1: 0(+1)/1  Reducer 2: 0/1  
> 2020-10-19 13:32:49,878   Map 1: 0(+1)/1  Reducer 2: 0/1  
> 2020-10-19 13:32:51,416   Map 1: 1/1  Reducer 2: 0/1  
> 2020-10-19 13:32:51,936   Map 1: 1/1  Reducer 2: 0(+1)/1  
> 2020-10-19 13:32:52,877   Map 1: 1/1  Reducer 2: 1/1  
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24286) Render date and time with progress of Hive on Tez

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24286:
--
Labels: pull-request-available  (was: )

> Render date and time with progress of Hive on Tez
> -
>
> Key: HIVE-24286
> URL: https://issues.apache.org/jira/browse/HIVE-24286
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add date/time to each line written by RenderStrategy like MapReduce and Spark.
>  
>  * 
> [https://github.com/apache/hive/blob/31c1658d9884eb4f31b06eaa718dfef8b1d92d22/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java#L350]
>  * 
> [https://github.com/apache/hive/blob/31c1658d9884eb4f31b06eaa718dfef8b1d92d22/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RenderStrategy.java#L64-L67]
>  
> This ticket would add the current time to the head of each line.
>  
> {code:java}
> 2020-10-19 13:32:41,162   Map 1: 0/1  Reducer 2: 0/1  
> 2020-10-19 13:32:44,231   Map 1: 0/1  Reducer 2: 0/1  
> 2020-10-19 13:32:46,813   Map 1: 0(+1)/1  Reducer 2: 0/1  
> 2020-10-19 13:32:49,878   Map 1: 0(+1)/1  Reducer 2: 0/1  
> 2020-10-19 13:32:51,416   Map 1: 1/1  Reducer 2: 0/1  
> 2020-10-19 13:32:51,936   Map 1: 1/1  Reducer 2: 0(+1)/1  
> 2020-10-19 13:32:52,877   Map 1: 1/1  Reducer 2: 1/1  
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24270) Move scratchdir cleanup to background

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24270?focusedWorklogId=502191=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502191
 ]

ASF GitHub Bot logged work on HIVE-24270:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 13:31
Start Date: 19/Oct/20 13:31
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1577:
URL: https://github.com/apache/hive/pull/1577#discussion_r507744113



##
File path: ql/src/java/org/apache/hadoop/hive/ql/PathCleaner.java
##
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.concurrent.BlockingDeque;
+import java.util.concurrent.LinkedBlockingDeque;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+/**
+ * This class is used to asynchronously remove directories after query 
execution
+ */
+public class PathCleaner {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(PathCleaner.class.getName());
+  private static final AsyncDeleteAction END_OF_PROCESS = new 
AsyncDeleteAction(null, null);
+
+  private final BlockingDeque deleteActions = new 
LinkedBlockingDeque<>();
+  private final AtomicBoolean isShutdown = new AtomicBoolean();

Review comment:
   note: all usages are negated for this boolean - instead of using 
negative logic (shutdown) ; using a positive name like "run" might make things 
easier to read/follow

##
File path: ql/src/java/org/apache/hadoop/hive/ql/Context.java
##
@@ -673,22 +673,27 @@ public void removeScratchDir() {
 if(this.fsResultCacheDirs != null) {
   resultCacheDir = this.fsResultCacheDirs.toUri().getPath();
 }
-for (Map.Entry entry : fsScratchDirs.entrySet()) {
+SessionState sessionState = SessionState.get();
+for (Path p: fsScratchDirs.values()) {
   try {
-Path p = entry.getValue();
 if (p.toUri().getPath().contains(stagingDir) && subDirOf(p, 
fsScratchDirs.values())  ) {
   LOG.debug("Skip deleting stagingDir: " + p);
   FileSystem fs = p.getFileSystem(conf);
   fs.cancelDeleteOnExit(p);
   continue; // staging dir is deleted when deleting the scratch dir
 }
-if(resultCacheDir == null || 
!p.toUri().getPath().contains(resultCacheDir)) {
+if (resultCacheDir == null || 
!p.toUri().getPath().contains(resultCacheDir)) {
   // delete only the paths which aren't result cache dir path
   // because that will be taken care by removeResultCacheDir
-FileSystem fs = p.getFileSystem(conf);
-LOG.debug("Deleting scratch dir: {}",  p);
-fs.delete(p, true);
-fs.cancelDeleteOnExit(p);
+  FileSystem fs = p.getFileSystem(conf);
+  if (sessionState.isSyncContextCleanup()) {

Review comment:
   I think another approach could be to create 2 PathCleaner 
implementations - the existing Async and the synch one; that could create a 
tighter contract seal between the usage and the implementations.

##
File path: ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java
##
@@ -61,6 +61,7 @@ public static void runOnDriver(HiveConf conf, String user,
   throw new 
IllegalArgumentException(JavaUtils.txnIdToString(compactorTxnId) +
   " is not valid. Context: " + query);
 }
+sessionState.setSyncCleanup();

Review comment:
   this is a one-way switch; so there is no way to switch back to the 
earlier behaviour

##
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##
@@ -388,6 +397,19 @@ public boolean getIsQtestLogging() {
 return isQtestLogging;
   }
 
+  public PathCleaner getPathCleaner() {
+if (pathCleaner == null) {
+  pathCleaner = new PathCleaner(getSessionId());
+  pathCleaner.start();
+}
+return 

[jira] [Work logged] (HIVE-24044) Implement listPartitionNames on temporary tables

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24044?focusedWorklogId=502142=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502142
 ]

ASF GitHub Bot logged work on HIVE-24044:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 12:20
Start Date: 19/Oct/20 12:20
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #1408:
URL: https://github.com/apache/hive/pull/1408#issuecomment-712118660


   > Pushed to master. Thanks for the patch @dengzhhu653
   
   Thank you very much for the reviews and help, @lcspinter 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502142)
Time Spent: 2.5h  (was: 2h 20m)

> Implement listPartitionNames on temporary tables 
> -
>
> Key: HIVE-24044
> URL: https://issues.apache.org/jira/browse/HIVE-24044
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Temporary tables can have their own partitions,  and IMetaStoreClient use
> {code:java}
> List listPartitionNames(PartitionsByExprRequest request){code}
> to filter or sort the results. This method can be implemented on temporary 
> tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24044) Implement listPartitionNames on temporary tables

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24044?focusedWorklogId=502139=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502139
 ]

ASF GitHub Bot logged work on HIVE-24044:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 12:18
Start Date: 19/Oct/20 12:18
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on pull request #1408:
URL: https://github.com/apache/hive/pull/1408#issuecomment-712117422


   Pushed to master. Thanks for the patch @dengzhhu653 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502139)
Time Spent: 2h 20m  (was: 2h 10m)

> Implement listPartitionNames on temporary tables 
> -
>
> Key: HIVE-24044
> URL: https://issues.apache.org/jira/browse/HIVE-24044
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Temporary tables can have their own partitions,  and IMetaStoreClient use
> {code:java}
> List listPartitionNames(PartitionsByExprRequest request){code}
> to filter or sort the results. This method can be implemented on temporary 
> tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24044) Implement listPartitionNames on temporary tables

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24044?focusedWorklogId=502138=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502138
 ]

ASF GitHub Bot logged work on HIVE-24044:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 12:17
Start Date: 19/Oct/20 12:17
Worklog Time Spent: 10m 
  Work Description: lcspinter merged pull request #1408:
URL: https://github.com/apache/hive/pull/1408


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502138)
Time Spent: 2h 10m  (was: 2h)

> Implement listPartitionNames on temporary tables 
> -
>
> Key: HIVE-24044
> URL: https://issues.apache.org/jira/browse/HIVE-24044
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Temporary tables can have their own partitions,  and IMetaStoreClient use
> {code:java}
> List listPartitionNames(PartitionsByExprRequest request){code}
> to filter or sort the results. This method can be implemented on temporary 
> tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24044) Implement listPartitionNames on temporary tables

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24044?focusedWorklogId=502137=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502137
 ]

ASF GitHub Bot logged work on HIVE-24044:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 12:16
Start Date: 19/Oct/20 12:16
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #1408:
URL: https://github.com/apache/hive/pull/1408#issuecomment-712116594


   > @dengzhhu653 Could you please provide your full name? I want this change 
to be credited to you.
   
   Zhihua Deng



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502137)
Time Spent: 2h  (was: 1h 50m)

> Implement listPartitionNames on temporary tables 
> -
>
> Key: HIVE-24044
> URL: https://issues.apache.org/jira/browse/HIVE-24044
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Temporary tables can have their own partitions,  and IMetaStoreClient use
> {code:java}
> List listPartitionNames(PartitionsByExprRequest request){code}
> to filter or sort the results. This method can be implemented on temporary 
> tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24044) Implement listPartitionNames on temporary tables

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24044?focusedWorklogId=502136=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502136
 ]

ASF GitHub Bot logged work on HIVE-24044:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 12:16
Start Date: 19/Oct/20 12:16
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 removed a comment on pull request #1408:
URL: https://github.com/apache/hive/pull/1408#issuecomment-712116334


   > @dengzhhu653 Could you please provide your full name? I want this change 
to be credited to you.
   Zhihua Deng



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502136)
Time Spent: 1h 50m  (was: 1h 40m)

> Implement listPartitionNames on temporary tables 
> -
>
> Key: HIVE-24044
> URL: https://issues.apache.org/jira/browse/HIVE-24044
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Temporary tables can have their own partitions,  and IMetaStoreClient use
> {code:java}
> List listPartitionNames(PartitionsByExprRequest request){code}
> to filter or sort the results. This method can be implemented on temporary 
> tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24044) Implement listPartitionNames on temporary tables

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24044?focusedWorklogId=502135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502135
 ]

ASF GitHub Bot logged work on HIVE-24044:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 12:15
Start Date: 19/Oct/20 12:15
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #1408:
URL: https://github.com/apache/hive/pull/1408#issuecomment-712116334


   > @dengzhhu653 Could you please provide your full name? I want this change 
to be credited to you.
   Zhihua Deng



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502135)
Time Spent: 1h 40m  (was: 1.5h)

> Implement listPartitionNames on temporary tables 
> -
>
> Key: HIVE-24044
> URL: https://issues.apache.org/jira/browse/HIVE-24044
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Temporary tables can have their own partitions,  and IMetaStoreClient use
> {code:java}
> List listPartitionNames(PartitionsByExprRequest request){code}
> to filter or sort the results. This method can be implemented on temporary 
> tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24044) Implement listPartitionNames on temporary tables

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24044?focusedWorklogId=502129=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502129
 ]

ASF GitHub Bot logged work on HIVE-24044:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 11:57
Start Date: 19/Oct/20 11:57
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on pull request #1408:
URL: https://github.com/apache/hive/pull/1408#issuecomment-712106681


   @dengzhhu653 Could you please provide your full name? I want this change to 
be credited to you. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502129)
Time Spent: 1.5h  (was: 1h 20m)

> Implement listPartitionNames on temporary tables 
> -
>
> Key: HIVE-24044
> URL: https://issues.apache.org/jira/browse/HIVE-24044
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Temporary tables can have their own partitions,  and IMetaStoreClient use
> {code:java}
> List listPartitionNames(PartitionsByExprRequest request){code}
> to filter or sort the results. This method can be implemented on temporary 
> tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24044) Implement listPartitionNames on temporary tables

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24044?focusedWorklogId=502113=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502113
 ]

ASF GitHub Bot logged work on HIVE-24044:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 11:32
Start Date: 19/Oct/20 11:32
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #1408:
URL: https://github.com/apache/hive/pull/1408#issuecomment-712095260


   @lcspinter Could someone help push this in if there has no objections? thank 
you!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502113)
Time Spent: 1h 20m  (was: 1h 10m)

> Implement listPartitionNames on temporary tables 
> -
>
> Key: HIVE-24044
> URL: https://issues.apache.org/jira/browse/HIVE-24044
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Temporary tables can have their own partitions,  and IMetaStoreClient use
> {code:java}
> List listPartitionNames(PartitionsByExprRequest request){code}
> to filter or sort the results. This method can be implemented on temporary 
> tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24263) Create an HMS endpoint to list partition locations

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24263?focusedWorklogId=502097=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502097
 ]

ASF GitHub Bot logged work on HIVE-24263:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 10:43
Start Date: 19/Oct/20 10:43
Worklog Time Spent: 10m 
  Work Description: szehonCriteo commented on pull request #1572:
URL: https://github.com/apache/hive/pull/1572#issuecomment-712026766


   Hi @vihangk1 sorry to ping, do you have any thoughts on this new API?  Thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502097)
Time Spent: 20m  (was: 10m)

> Create an HMS endpoint to list partition locations
> --
>
> Key: HIVE-24263
> URL: https://issues.apache.org/jira/browse/HIVE-24263
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24263.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In our company, we have a use-case to get quickly a list of partition 
> locations.  Currently it is done via listPartitions, which is a very heavy 
> operation in terms of memory and performance.
> This JIRA proposes an API: Map listPartitionLocations(String 
> db, String table, short max) that returns a map of partition names to 
> locations.
> For example, we have an integration from output of a Hive pipeline to Spark 
> jobs that consume directly from HDFS.  The Spark job scheduler needs to know 
> the partition paths that are available for consumption (the partition name is 
> not sufficient as it's input is HDFS path), and so we have to do heavy 
> listPartitions() for this.
> Another use-case is for a HDFS data removal tool that does a nightly crawl to 
> see if there are associated hive partitions mapped to a given partition path. 
>  The nightly crawling job could be much less resource-intensive if we had a 
> listPartitionLocations().
> As there is already an internal method in the ObjectStore for this done for 
> dropPartitions, it is only a matter of exposing this API to 
> HiveMetaStoreClient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-19253) HMS ignores tableType property for external tables

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-19253?focusedWorklogId=502096=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502096
 ]

ASF GitHub Bot logged work on HIVE-19253:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 10:42
Start Date: 19/Oct/20 10:42
Worklog Time Spent: 10m 
  Work Description: szehonCriteo commented on pull request #1537:
URL: https://github.com/apache/hive/pull/1537#issuecomment-712024961


   Hi @vihangk1 would you mind to take another look?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502096)
Time Spent: 40m  (was: 0.5h)

> HMS ignores tableType property for external tables
> --
>
> Key: HIVE-19253
> URL: https://issues.apache.org/jira/browse/HIVE-19253
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0, 3.1.0, 4.0.0
>Reporter: Alex Kolbasov
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: newbie, pull-request-available
> Attachments: HIVE-19253.01.patch, HIVE-19253.02.patch, 
> HIVE-19253.03.patch, HIVE-19253.03.patch, HIVE-19253.04.patch, 
> HIVE-19253.05.patch, HIVE-19253.06.patch, HIVE-19253.07.patch, 
> HIVE-19253.08.patch, HIVE-19253.09.patch, HIVE-19253.10.patch, 
> HIVE-19253.11.patch, HIVE-19253.12.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When someone creates a table using Thrift API they may think that setting 
> tableType to {{EXTERNAL_TABLE}} creates an external table. And boom - their 
> table is gone later because HMS will silently change it to managed table.
> here is the offending code:
> {code:java}
>   private MTable convertToMTable(Table tbl) throws InvalidObjectException,
>   MetaException {
> ...
> // If the table has property EXTERNAL set, update table type
> // accordingly
> String tableType = tbl.getTableType();
> boolean isExternal = 
> Boolean.parseBoolean(tbl.getParameters().get("EXTERNAL"));
> if (TableType.MANAGED_TABLE.toString().equals(tableType)) {
>   if (isExternal) {
> tableType = TableType.EXTERNAL_TABLE.toString();
>   }
> }
> if (TableType.EXTERNAL_TABLE.toString().equals(tableType)) {
>   if (!isExternal) { // Here!
> tableType = TableType.MANAGED_TABLE.toString();
>   }
> }
> {code}
> So if the EXTERNAL parameter is not set, table type is changed to managed 
> even if it was external in the first place - which is wrong.
> More over, in other places code looks at the table property to decide table 
> type and some places look at parameter. HMS should really make its mind which 
> one to use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24217?focusedWorklogId=502091=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502091
 ]

ASF GitHub Bot logged work on HIVE-24217:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 10:19
Start Date: 19/Oct/20 10:19
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1542:
URL: https://github.com/apache/hive/pull/1542#discussion_r507634498



##
File path: standalone-metastore/metastore-server/src/main/resources/package.jdo
##
@@ -1549,6 +1549,83 @@
 
   
 
+
+
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+
+
+
+
+  
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+  
+
+  

Review comment:
   > I understand it is only for showing the user, but it it feels like an 
afterthought, I don't really like it.
   
   Okay; then we don't need to add and store it - we could also use a simple 
UDF to get the signature if we want that.
   
   About  `search for the return type` I think we should come up with a 
"working solution first" and go after these things when we need it (maybe 
never?)
   I think that `MethodFinder.methodFor` will need a lot more than just a 
signature...so it's a little bit too much
   
   > These things might sound unimportant but I think language design and tool 
development shouldn't be separated. Tool support should be considered from day 
1 when creating a language.
   
   We may leave this thing out completely right now - because we don't need it 
right now.
   I think instead of adding something which may not perfectly align with our 
future needs ; leaving something out will not lock us in at all.
   
   > but interpreting the AST (method body) as a bottleneck.
   
   that's the problem of the language implementation itself - I think in case 
the language function is defined in some kind of "text" then we should store it 
as "text" - to have a less convoluted contract with the embedded language.
   
   > If we only allow defining procedures in terms of the "host language", then 
this is true indeed. My assumption was that we might want to accept procedure 
definitions in terms of the "foreign language".
   
   yes; we should only store those procedures/functions which are usable from 
the HiveQL - don't we?
   
   > `CREATE FUNCTION func1(a int) RETURNS int LANGUAGE XX BEGIN ... END;`
   
   I think we should simply store the whole definition from `CREATE` to the 
closing `;`.
   Storing anything less could become unparsable.
   
   > > I don't know what feature you are refering
   
   > It's probably that one. I think that illustrates nicely why parsing is not 
always a prerequisite of calling a procedure.
   
   There might be some misunderstanding here by "parsing" - I mean to process 
it with the host sql language. 
   
   If the client language has performance issues doing parsing/etc is then that 
issue belong to the language itself.
   In case the language is an on-the fly interpreted language which also 
happens to have a compiled version ; then that other version could either be 
registered as a separate language (and refer to a binary somehow) or the 
language could add some caching/etc to avoid unneccessary parsing overhead.
   Couldn't we do something like this for hplsql?
   
   In any case: I think that extending the basic implementation with a "generic 
blob storage" option to provide additional services for the stored procedures 
(which could potentially be used to speed up sql function execution) should be 
a separate feature - and as such; should be discussed(and implemented) 
separetly.
   (Honestly I think there would be marginal benefits implementing this - and 
could be dodged by the client language implementation with a few caches/etc.)
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502091)
Time Spent: 2.5h  (was: 2h 20m)

> HMS storage backend for HPL/SQL stored procedures
> -
>
> Key: HIVE-24217
> URL: https://issues.apache.org/jira/browse/HIVE-24217
>

[jira] [Work logged] (HIVE-22641) Columns returned in sorted order when show columns query is run with no search pattern.

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22641?focusedWorklogId=502081=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502081
 ]

ASF GitHub Bot logged work on HIVE-22641:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 09:36
Start Date: 19/Oct/20 09:36
Worklog Time Spent: 10m 
  Work Description: klcopp commented on pull request #1585:
URL: https://github.com/apache/hive/pull/1585#issuecomment-711925934


   Can you confirm that 
   `SHOW COLUMNS IN table_name LIKE "*a";`
   will be listed in alphabetical order,
   
   but these two cases will be listed in the order that columns appear in the 
table (column-ordered):
   ```
   SHOW COLUMNS IN table_name LIKE "*"
   SHOW COLUMNS IN table_name;
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502081)
Time Spent: 20m  (was: 10m)

> Columns returned in sorted order when show columns query is run with no 
> search pattern.
> ---
>
> Key: HIVE-22641
> URL: https://issues.apache.org/jira/browse/HIVE-22641
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, HiveServer2
>Affects Versions: 3.1.0
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22641.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In Hive 1.2.1 and 2.0 while displaying columns for a table, it used to return 
> in same order as it was created. for example
> {code}
> create table col_order_test(server_name string, task_name string, 
> partition_name string, start_time string, end_time string, table_owner 
> string, table_name string) stored as orc;
> show columns in col_order_test;
> +-+--+
> |  field  |
> +-+--+
> | server_name |
> | task_name   |
> | partition_name  |
> | start_time  |
> | end_time|
> | table_owner |
> | table_name  |
> +-+--+
> {code}
> For Hive 3 columns are returned in sorted order for the same query, below is 
> output.
> {code}
> create table col_order_test(server_name string, task_name string, 
> partition_name string, start_time string, end_time string, table_owner 
> string, table_name string) stored as orc;
> show columns in col_order_test;
> +-+
> |  field  |
> +-+
> | end_time|
> | partition_name  |
> | server_name |
> | start_time  |
> | table_name  |
> | table_owner |
> | task_name   |
> +-+
> {code}
> Above behavior looks to be changed with the introduction of search column 
> feature as part of Jira [HIVE-18373 
> |https://issues.apache.org/jira/browse/HIVE-18373]
> This behavior change can cause an existing process to fail in a few 
> environments, for example, code to generate the INSERT OVERWRITE in a 
> different manner,  which may result in query failure.
> I would like to request a community if we can improve the Jira [HIVE-18373 
> |https://issues.apache.org/jira/browse/HIVE-18373] by returning column order 
> same as it was created if the search pattern provided by the user is null.
> Attaching patch with the change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24282) Show columns shouldn't sort output columns unless explicitly mentioned.

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24282?focusedWorklogId=502056=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502056
 ]

ASF GitHub Bot logged work on HIVE-24282:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 07:57
Start Date: 19/Oct/20 07:57
Worklog Time Spent: 10m 
  Work Description: miklosgergely merged pull request #1584:
URL: https://github.com/apache/hive/pull/1584


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502056)
Time Spent: 20m  (was: 10m)

> Show columns shouldn't sort output columns unless explicitly mentioned.
> ---
>
> Key: HIVE-24282
> URL: https://issues.apache.org/jira/browse/HIVE-24282
> Project: Hive
>  Issue Type: Improvement
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> CREATE TABLE foo_n7(c INT, a INT, b INT);
> show columns in foo_n7;
> {code:java}
> // current output
> a
> b 
> c
> // expected
> c
> a 
> b{code}
> HIVE-18373 changed the original behaviour to sorted output.
> Suggesting to provide an optional keyword sorted to sort the show columns 
> output
> eg., 
> {code:java}
> show sorted columns in foo_n7;
> a
> b 
> c
> show columns in foo_n7
> c
> a 
> b{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24217?focusedWorklogId=502050=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502050
 ]

ASF GitHub Bot logged work on HIVE-24217:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 07:40
Start Date: 19/Oct/20 07:40
Worklog Time Spent: 10m 
  Work Description: zeroflag commented on a change in pull request #1542:
URL: https://github.com/apache/hive/pull/1542#discussion_r507534867



##
File path: standalone-metastore/metastore-server/src/main/resources/package.jdo
##
@@ -1549,6 +1549,83 @@
 
   
 
+
+
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+
+
+
+
+  
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+  
+
+  

Review comment:
   > That was a suggestion to provide a way to store a human readable..
   
   I understand it is only for showing the user, but it it feels like an 
afterthought, I don't really like it. It adds redundancy and it only has a 
single use. The structured information can be used to generate this output, (or 
in fact other outputs, like showing the parameters in a table) or to supply 
meta programming information to various tools, like code completion tools or 
method finders.
   
   For example, search for the return type:
   
   ```
   int i = fn.. // find every function that starts with "fn" and returns an int
   ```
   
   Although this is a search for the return type not on the parameter but it's 
the same problem since the return type is also part of the signature.
   
   I we had higher order functions (in fact JavaScript has and that might be 
the 2nd supported language) then:
   
   ```
   filter(fn.., listOfString); // find every function that starts with "fn" and 
takes a single string parameter
   ```
   Method finder:
   
   This might look like a bit scifi, there are only 2 programming environments 
I'm aware of, which know how to do this.
   
   ```
   MethodFinder.methodFor('somefile.txt', 'txt'); // find which method returns 
the extension part of filename, by an by example,
   ```
   
   It will return:
   ```
   "FilenameUtils.getExtension()"
   ```
   
   These things might sound unimportant but I think language design and tool 
development shouldn't be separated. Tool support should be considered from day 
1 when creating a language.
   
   
   > If at some point in time the "parsing" will prove to be a bottle neck
   
   I'm was not talking about parsing as a bottleneck (though it could be) but 
interpreting the AST (method body) as a bottleneck. I can't think of any 
imperative language that can be taken seriously that works that way. Perhaps 
shell scripts, or early versions of Ruby which was notoriously slow so later 
they changed it.
   
   > I think we should clarify/separate 2 things
   
   If we only allow defining procedures in terms of the "host language", then 
this is true indeed. My assumption was that we might want to accept procedure 
definitions in terms of the "foreign language". For example function(x,y) {}. 
But ok, let's say this is not allowed. Then you're right, using the alternative 
format seems to be viable if we don't count the other issues.
   
   If we go to that route, what would you do with the other columns which are 
also part of the signature, like "LANG", "RET_TYPE", "NAME" ?
   
   ```
   CREATE FUNCTION func1(a int) RETURNS int LANGUAGE XX BEGIN ... END;
   ```
   
   > I don't know what feature you are refering 
   
   It's probably that one. I think that illustrates nicely why parsing is not 
always a prerequisite of calling a procedure. 
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502050)
Time Spent: 2h 20m  (was: 2h 10m)

> HMS storage backend for HPL/SQL stored procedures
> -
>
> Key: HIVE-24217
> URL: https://issues.apache.org/jira/browse/HIVE-24217
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  

[jira] [Updated] (HIVE-24284) NPE when parsing druid logs using Hive

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24284:
--
Labels: pull-request-available  (was: )

> NPE when parsing druid logs using Hive
> --
>
> Key: HIVE-24284
> URL: https://issues.apache.org/jira/browse/HIVE-24284
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As per current Sys-logger parser, its always expecting a valid proc id. But 
> as per RFC3164 and RFC5424, the proc id can be skipped.So hive should handled 
> it by using NILVALUE/empty string in case the proc id is null.
>  
> {code:java}
> Caused by: java.lang.NullPointerException: null
> at java.lang.String.(String.java:566)
> at 
> org.apache.hadoop.hive.ql.log.syslog.SyslogParser.createEvent(SyslogParser.java:361)
> at 
> org.apache.hadoop.hive.ql.log.syslog.SyslogParser.readEvent(SyslogParser.java:326)
> at 
> org.apache.hadoop.hive.ql.log.syslog.SyslogSerDe.deserialize(SyslogSerDe.java:95)
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24284) NPE when parsing druid logs using Hive

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24284?focusedWorklogId=502031=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502031
 ]

ASF GitHub Bot logged work on HIVE-24284:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 06:54
Start Date: 19/Oct/20 06:54
Worklog Time Spent: 10m 
  Work Description: maheshk114 opened a new pull request #1586:
URL: https://github.com/apache/hive/pull/1586


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502031)
Remaining Estimate: 0h
Time Spent: 10m

> NPE when parsing druid logs using Hive
> --
>
> Key: HIVE-24284
> URL: https://issues.apache.org/jira/browse/HIVE-24284
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As per current Sys-logger parser, its always expecting a valid proc id. But 
> as per RFC3164 and RFC5424, the proc id can be skipped.So hive should handled 
> it by using NILVALUE/empty string in case the proc id is null.
>  
> {code:java}
> Caused by: java.lang.NullPointerException: null
> at java.lang.String.(String.java:566)
> at 
> org.apache.hadoop.hive.ql.log.syslog.SyslogParser.createEvent(SyslogParser.java:361)
> at 
> org.apache.hadoop.hive.ql.log.syslog.SyslogParser.readEvent(SyslogParser.java:326)
> at 
> org.apache.hadoop.hive.ql.log.syslog.SyslogSerDe.deserialize(SyslogSerDe.java:95)
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23695) [CachedStore] Add check/default constraints in CachedStore

2020-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23695?focusedWorklogId=502009=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502009
 ]

ASF GitHub Bot logged work on HIVE-23695:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 05:43
Start Date: 19/Oct/20 05:43
Worklog Time Spent: 10m 
  Work Description: sankarh merged pull request #1527:
URL: https://github.com/apache/hive/pull/1527


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502009)
Time Spent: 1h 50m  (was: 1h 40m)

> [CachedStore] Add check/default constraints in CachedStore
> --
>
> Key: HIVE-23695
> URL: https://issues.apache.org/jira/browse/HIVE-23695
> Project: Hive
>  Issue Type: Sub-task
>  Components: Standalone Metastore
>Reporter: Adesh Kumar Rao
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> This is blocked by HIVE-23618 (notification events are not generated for 
> default/unique constraints, hence created a separate sub-task from 
> HIVE-22015).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23695) [CachedStore] Add check/default constraints in CachedStore

2020-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23695?focusedWorklogId=502007=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502007
 ]

ASF GitHub Bot logged work on HIVE-23695:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 05:39
Start Date: 19/Oct/20 05:39
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #1527:
URL: https://github.com/apache/hive/pull/1527#discussion_r506278466



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStoreUpdateUsingEvents.java
##
@@ -314,160 +308,129 @@ public void testConstraintsForUpdateUsingEvents() 
throws Exception {
 hmsHandler.create_database(db);
 db = rawStore.getDatabase(DEFAULT_CATALOG_NAME, dbName);
 
-String foreignDbName = "test_table_ops_foreign";
-Database foreignDb = createTestDb(foreignDbName, dbOwner);
-hmsHandler.create_database(foreignDb);
-foreignDb = rawStore.getDatabase(DEFAULT_CATALOG_NAME, foreignDbName);
 // Add a table via rawStore
+String parentTableName = "ftbl";
 String tblName = "tbl";
 String tblOwner = "user1";
 FieldSchema col1 = new FieldSchema("col1", "int", "integer column");
 FieldSchema col2 = new FieldSchema("col2", "string", "string column");
+FieldSchema col3 = new FieldSchema("col3", "int", "integer column");
 List cols = new ArrayList();
 cols.add(col1);
 cols.add(col2);
+cols.add(col3);
 List ptnCols = new ArrayList();
+Table parentTable = createTestTbl(dbName, parentTableName, tblOwner, cols, 
ptnCols);
 Table tbl = createTestTbl(dbName, tblName, tblOwner, cols, ptnCols);
-String foreignTblName = "ftbl";
-Table foreignTbl = createTestTbl(foreignDbName, foreignTblName, tblOwner, 
cols, ptnCols);
-
-SQLPrimaryKey key = new SQLPrimaryKey(dbName, tblName, col1.getName(), 1, 
"pk1",
-false, false, false);
-SQLUniqueConstraint uC = new SQLUniqueConstraint(DEFAULT_CATALOG_NAME, 
dbName, tblName,
-col1.getName(), 2, "uc1", false, false, false);
-SQLNotNullConstraint nN = new SQLNotNullConstraint(DEFAULT_CATALOG_NAME, 
dbName, tblName,
-col1.getName(), "nn1", false, false, false);
-SQLForeignKey foreignKey = new SQLForeignKey(key.getTable_db(), 
key.getTable_name(), key.getColumn_name(),
-foreignDbName, foreignTblName, key.getColumn_name(), 2, 1,2,
-"fk1", key.getPk_name(), false, false, false);
-
-hmsHandler.create_table_with_constraints(tbl,
-Arrays.asList(key), null, Arrays.asList(uC), Arrays.asList(nN), 
null, null);
-hmsHandler.create_table_with_constraints(foreignTbl, null, 
Arrays.asList(foreignKey),
-null, null, null, null);
+
+// Constraints for parent Table
+List parentPkBase =
+Arrays.asList(new SQLPrimaryKey(dbName, parentTableName, 
col1.getName(), 1, "parentpk1", false, false, false));
+
+// Constraints for table
+List pkBase =
+Arrays.asList(new SQLPrimaryKey(dbName, tblName, col1.getName(), 1, 
"pk1", false, false, false));
+List ucBase = Arrays.asList(
+new SQLUniqueConstraint(DEFAULT_CATALOG_NAME, dbName, tblName, 
col1.getName(), 2, "uc1", false, false, false));
+List nnBase = Arrays.asList(
+new SQLNotNullConstraint(DEFAULT_CATALOG_NAME, dbName, tblName, 
col1.getName(), "nn1", false, false, false));
+List dcBase = Arrays.asList(
+new SQLDefaultConstraint(DEFAULT_CATALOG_NAME, tbl.getDbName(), 
tbl.getTableName(), col2.getName(), "1", "dc1",
+false, false, false));
+List ccBase = Arrays.asList(
+new SQLCheckConstraint(DEFAULT_CATALOG_NAME, tbl.getDbName(), 
tbl.getTableName(), col2.getName(), "1", "cc1",
+false, false, false));
+List fkBase = Arrays.asList(
+new SQLForeignKey(parentPkBase.get(0).getTable_db(), 
parentPkBase.get(0).getTable_name(),
+parentPkBase.get(0).getColumn_name(), dbName, tblName, 
col3.getName(), 2, 1, 2, "fk1",
+parentPkBase.get(0).getPk_name(), false, false, false));
+
+// Create table and parent table
+hmsHandler.create_table_with_constraints(parentTable, parentPkBase, null, 
null, null, null, null);
+hmsHandler.create_table_with_constraints(tbl, pkBase, fkBase, ucBase, 
nnBase, dcBase, ccBase);
 
 tbl = rawStore.getTable(DEFAULT_CATALOG_NAME, dbName, tblName);
-foreignTbl = rawStore.getTable(DEFAULT_CATALOG_NAME, foreignDbName, 
foreignTblName);
+parentTable = rawStore.getTable(DEFAULT_CATALOG_NAME, dbName, 
parentTableName);
 
 // Read database, table via CachedStore
-Database dbRead= sharedCache.getDatabaseFromCache(DEFAULT_CATALOG_NAME, 
dbName);
+Database dbRead = sharedCache.getDatabaseFromCache(DEFAULT_CATALOG_NAME, 
dbName);
 Assert.assertEquals(db, dbRead);
+
+// Read table via CachedStore
 

[jira] [Work logged] (HIVE-22934) Hive server interactive log counters to error stream

2020-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22934?focusedWorklogId=501946=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-501946
 ]

ASF GitHub Bot logged work on HIVE-22934:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 00:58
Start Date: 19/Oct/20 00:58
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1200:
URL: https://github.com/apache/hive/pull/1200


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 501946)
Time Spent: 40m  (was: 0.5h)

> Hive server interactive log counters to error stream
> 
>
> Key: HIVE-22934
> URL: https://issues.apache.org/jira/browse/HIVE-22934
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-22934.01.patch, HIVE-22934.02.patch, 
> HIVE-22934.03.patch, HIVE-22934.04.patch, HIVE-22934.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Hive server is logging the console output to system error stream.
> This need to be fixed because 
> First we do not roll the file.
> Second writing to such file is done sequential and can lead to throttle/poor 
> perf.
> {code}
> -rw-r--r--  1 hive hadoop 9.5G Feb 26 17:22 hive-server2-interactive.err
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23998) Upgrave Guava to 27 for Hive 2.3

2020-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23998?focusedWorklogId=501945=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-501945
 ]

ASF GitHub Bot logged work on HIVE-23998:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 00:57
Start Date: 19/Oct/20 00:57
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1394:
URL: https://github.com/apache/hive/pull/1394


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 501945)
Time Spent: 7h 40m  (was: 7.5h)

> Upgrave Guava to 27 for Hive 2.3
> 
>
> Key: HIVE-23998
> URL: https://issues.apache.org/jira/browse/HIVE-23998
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.7
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23998.01.branch-2.3.patch
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Try to upgrade Guava to 27.0-jre for Hive 2.3 branch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23993) Handle irrecoverable errors

2020-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23993?focusedWorklogId=501944=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-501944
 ]

ASF GitHub Bot logged work on HIVE-23993:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 00:57
Start Date: 19/Oct/20 00:57
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1367:
URL: https://github.com/apache/hive/pull/1367


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 501944)
Time Spent: 2.5h  (was: 2h 20m)

> Handle irrecoverable errors
> ---
>
> Key: HIVE-23993
> URL: https://issues.apache.org/jira/browse/HIVE-23993
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, 
> HIVE-23993.03.patch, HIVE-23993.04.patch, HIVE-23993.05.patch, 
> HIVE-23993.06.patch, HIVE-23993.07.patch, HIVE-23993.08.patch, Retry Logic 
> for Replication.pdf
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24051) Hive lineage information exposed in ExecuteWithHookContext

2020-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24051?focusedWorklogId=501943=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-501943
 ]

ASF GitHub Bot logged work on HIVE-24051:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 00:57
Start Date: 19/Oct/20 00:57
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1413:
URL: https://github.com/apache/hive/pull/1413#issuecomment-711453788


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 501943)
Time Spent: 20m  (was: 10m)

> Hive lineage information exposed in ExecuteWithHookContext
> --
>
> Key: HIVE-24051
> URL: https://issues.apache.org/jira/browse/HIVE-24051
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24051.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The lineage information is not populated unless certain hooks are enabled.
> However, this is a bit fragile, and no way for another hook that we write to 
> get this information.  This proposes a flag to enable this instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23930) Upgrade to tez 0.10.0

2020-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23930?focusedWorklogId=501929=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-501929
 ]

ASF GitHub Bot logged work on HIVE-23930:
-

Author: ASF GitHub Bot
Created on: 18/Oct/20 20:53
Start Date: 18/Oct/20 20:53
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #1311:
URL: https://github.com/apache/hive/pull/1311#issuecomment-711421451


   green run with released tez 0.10.0 artifacts + the contents of HIVE-23190, 
HIVE-24108
   cc: @ashutoshc
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 501929)
Time Spent: 1h 20m  (was: 1h 10m)

> Upgrade to tez 0.10.0
> -
>
> Key: HIVE-23930
> URL: https://issues.apache.org/jira/browse/HIVE-23930
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Tez 0.10.0 is not yet released, but this ticket is for tracking the effort 
> and the needed hive changes.
> Currently, Hive depends on 0.9.1
> Hadoop dependencies:
> Hive/master: *3.1.0*
> Tez/master: *3.1.3*
> Tez/branch-0.9:  *2.7.2*
> TODOs: 
> - check why HIVE-23689 broke some unit tests intermittently (0.9.2 ->0.9.3 
> bump), because a 0.10.x upgrade will also contain those tez changes which 
> could be related
> - maintain the needed hive changes (reflecting tez api changes):
> HIVE-23190: LLAP: modify IndexCache to pass filesystem object to 
> TezSpillRecord



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23849) Hive skips the creation of ColumnAccessInfo when creating a view

2020-10-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23849?focusedWorklogId=501838=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-501838
 ]

ASF GitHub Bot logged work on HIVE-23849:
-

Author: ASF GitHub Bot
Created on: 18/Oct/20 00:58
Start Date: 18/Oct/20 00:58
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1387:
URL: https://github.com/apache/hive/pull/1387


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 501838)
Time Spent: 3h 10m  (was: 3h)

> Hive skips the creation of ColumnAccessInfo when creating a view
> 
>
> Key: HIVE-23849
> URL: https://issues.apache.org/jira/browse/HIVE-23849
> Project: Hive
>  Issue Type: Bug
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> When creating a view, Hive skips the creation of ColumnAccessInfo that should 
> be created at [step 8|#L12601]. This causes Authorization error. 
> Currently, this issue is "hidden" when CBO is enabled. By introducing 
> HIVE-14496, CalcitePlanner creates this ColumnAccessInfo at [step 
> 2|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12459].
>  But after turning off CBO, the issue is still there. 
> I think the return statement in [step 
> 5|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12574]
>  is not necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >