[jira] [Work logged] (HIVE-23814) Clean up Driver

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457524
 ]

ASF GitHub Bot logged work on HIVE-23814:
-

Author: ASF GitHub Bot
Created on: 11/Jul/20 05:09
Start Date: 11/Jul/20 05:09
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #1222:
URL: https://github.com/apache/hive/pull/1222#issuecomment-656991606


   I am not entirely comfortable with my knowledge around the area, but tried 
to fo my best when reviewing.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457524)
Time Spent: 2h 20m  (was: 2h 10m)

> Clean up Driver
> ---
>
> Key: HIVE-23814
> URL: https://issues.apache.org/jira/browse/HIVE-23814
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Driver is now cut down to it's minimal size by extracting all of it's sub 
> tasks to separate classes. The rest should be cleaned up by
>  * moving out some smaller parts of the code to sub task and utility classes 
> wherever it is still possible
>  * cut large functions to meaningful and manageable parts
>  * re-order the functions to follow the order of processing
>  * fix checkstyle issues
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23814) Clean up Driver

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457523
 ]

ASF GitHub Bot logged work on HIVE-23814:
-

Author: ASF GitHub Bot
Created on: 11/Jul/20 05:04
Start Date: 11/Jul/20 05:04
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1222:
URL: https://github.com/apache/hive/pull/1222#discussion_r453156405



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -139,205 +119,215 @@ public Driver(QueryState queryState, QueryInfo 
queryInfo, HiveTxnManager txnMana
 driverTxnHandler = new DriverTxnHandler(this, driverContext, driverState);
   }
 
-  /**
-   * Compile a new query, but potentially reset taskID counter.  Not resetting 
task counter
-   * is useful for generating re-entrant QL queries.
-   * @param command  The HiveQL query to compile
-   * @param resetTaskIds Resets taskID counter if true.
-   * @return 0 for ok
-   */
-  public int compile(String command, boolean resetTaskIds) {
-try {
-  compile(command, resetTaskIds, false);
-  return 0;
-} catch (CommandProcessorException cpr) {
-  return cpr.getErrorCode();
-}
+  @Override
+  public Context getContext() {
+return context;
   }
 
-  // deferClose indicates if the close/destroy should be deferred when the 
process has been
-  // interrupted, it should be set to true if the compile is called within 
another method like
-  // runInternal, which defers the close to the called in that method.
-  @VisibleForTesting
-  public void compile(String command, boolean resetTaskIds, boolean 
deferClose) throws CommandProcessorException {
-preparForCompile(resetTaskIds);
-
-Compiler compiler = new Compiler(context, driverContext, driverState);
-QueryPlan plan = compiler.compile(command, deferClose);
-driverContext.setPlan(plan);
-
-compileFinished(deferClose);
+  @Override
+  public HiveConf getConf() {
+return driverContext.getConf();
   }
 
-  private void compileFinished(boolean deferClose) {
-if (DriverState.getDriverState().isAborted() && !deferClose) {
-  closeInProcess(true);
-}
+  @Override
+  public CommandProcessorResponse run() throws CommandProcessorException {
+return run(null, true);
   }
 
-  private void preparForCompile(boolean resetTaskIds) throws 
CommandProcessorException {
-driverTxnHandler.createTxnManager();
-DriverState.setDriverState(driverState);
-prepareContext();
-setQueryId();

Review comment:
   Won't we miss setting the query id?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457523)
Time Spent: 2h 10m  (was: 2h)

> Clean up Driver
> ---
>
> Key: HIVE-23814
> URL: https://issues.apache.org/jira/browse/HIVE-23814
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Driver is now cut down to it's minimal size by extracting all of it's sub 
> tasks to separate classes. The rest should be cleaned up by
>  * moving out some smaller parts of the code to sub task and utility classes 
> wherever it is still possible
>  * cut large functions to meaningful and manageable parts
>  * re-order the functions to follow the order of processing
>  * fix checkstyle issues
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23814) Clean up Driver

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457522
 ]

ASF GitHub Bot logged work on HIVE-23814:
-

Author: ASF GitHub Bot
Created on: 11/Jul/20 05:03
Start Date: 11/Jul/20 05:03
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1222:
URL: https://github.com/apache/hive/pull/1222#discussion_r453156366



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -139,205 +119,215 @@ public Driver(QueryState queryState, QueryInfo 
queryInfo, HiveTxnManager txnMana
 driverTxnHandler = new DriverTxnHandler(this, driverContext, driverState);
   }
 
-  /**
-   * Compile a new query, but potentially reset taskID counter.  Not resetting 
task counter
-   * is useful for generating re-entrant QL queries.
-   * @param command  The HiveQL query to compile
-   * @param resetTaskIds Resets taskID counter if true.
-   * @return 0 for ok
-   */
-  public int compile(String command, boolean resetTaskIds) {
-try {
-  compile(command, resetTaskIds, false);
-  return 0;
-} catch (CommandProcessorException cpr) {
-  return cpr.getErrorCode();
-}
+  @Override
+  public Context getContext() {
+return context;
   }
 
-  // deferClose indicates if the close/destroy should be deferred when the 
process has been
-  // interrupted, it should be set to true if the compile is called within 
another method like
-  // runInternal, which defers the close to the called in that method.
-  @VisibleForTesting
-  public void compile(String command, boolean resetTaskIds, boolean 
deferClose) throws CommandProcessorException {
-preparForCompile(resetTaskIds);
-
-Compiler compiler = new Compiler(context, driverContext, driverState);
-QueryPlan plan = compiler.compile(command, deferClose);
-driverContext.setPlan(plan);
-
-compileFinished(deferClose);
+  @Override
+  public HiveConf getConf() {
+return driverContext.getConf();
   }
 
-  private void compileFinished(boolean deferClose) {
-if (DriverState.getDriverState().isAborted() && !deferClose) {
-  closeInProcess(true);
-}
+  @Override
+  public CommandProcessorResponse run() throws CommandProcessorException {
+return run(null, true);
   }
 
-  private void preparForCompile(boolean resetTaskIds) throws 
CommandProcessorException {
-driverTxnHandler.createTxnManager();
-DriverState.setDriverState(driverState);

Review comment:
   Won't we miss this state setting? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457522)
Time Spent: 2h  (was: 1h 50m)

> Clean up Driver
> ---
>
> Key: HIVE-23814
> URL: https://issues.apache.org/jira/browse/HIVE-23814
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Driver is now cut down to it's minimal size by extracting all of it's sub 
> tasks to separate classes. The rest should be cleaned up by
>  * moving out some smaller parts of the code to sub task and utility classes 
> wherever it is still possible
>  * cut large functions to meaningful and manageable parts
>  * re-order the functions to follow the order of processing
>  * fix checkstyle issues
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23814) Clean up Driver

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457519=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457519
 ]

ASF GitHub Bot logged work on HIVE-23814:
-

Author: ASF GitHub Bot
Created on: 11/Jul/20 04:56
Start Date: 11/Jul/20 04:56
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1222:
URL: https://github.com/apache/hive/pull/1222#discussion_r453155831



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -139,205 +119,215 @@ public Driver(QueryState queryState, QueryInfo 
queryInfo, HiveTxnManager txnMana
 driverTxnHandler = new DriverTxnHandler(this, driverContext, driverState);
   }
 
-  /**
-   * Compile a new query, but potentially reset taskID counter.  Not resetting 
task counter
-   * is useful for generating re-entrant QL queries.
-   * @param command  The HiveQL query to compile
-   * @param resetTaskIds Resets taskID counter if true.
-   * @return 0 for ok
-   */
-  public int compile(String command, boolean resetTaskIds) {
-try {
-  compile(command, resetTaskIds, false);
-  return 0;
-} catch (CommandProcessorException cpr) {
-  return cpr.getErrorCode();
-}
+  @Override
+  public Context getContext() {
+return context;
   }
 
-  // deferClose indicates if the close/destroy should be deferred when the 
process has been
-  // interrupted, it should be set to true if the compile is called within 
another method like
-  // runInternal, which defers the close to the called in that method.
-  @VisibleForTesting
-  public void compile(String command, boolean resetTaskIds, boolean 
deferClose) throws CommandProcessorException {
-preparForCompile(resetTaskIds);
-
-Compiler compiler = new Compiler(context, driverContext, driverState);
-QueryPlan plan = compiler.compile(command, deferClose);
-driverContext.setPlan(plan);
-
-compileFinished(deferClose);
+  @Override
+  public HiveConf getConf() {
+return driverContext.getConf();
   }
 
-  private void compileFinished(boolean deferClose) {
-if (DriverState.getDriverState().isAborted() && !deferClose) {
-  closeInProcess(true);
-}
+  @Override
+  public CommandProcessorResponse run() throws CommandProcessorException {
+return run(null, true);
   }
 
-  private void preparForCompile(boolean resetTaskIds) throws 
CommandProcessorException {
-driverTxnHandler.createTxnManager();
-DriverState.setDriverState(driverState);
-prepareContext();
-setQueryId();
+  @Override
+  public CommandProcessorResponse run(String command) throws 
CommandProcessorException {
+return run(command, false);
+  }
 
-if (resetTaskIds) {
-  TaskFactory.resetId();
+  private CommandProcessorResponse run(String command, boolean 
alreadyCompiled) throws CommandProcessorException {
+try {
+  runInternal(command, alreadyCompiled);
+  return new CommandProcessorResponse(getSchema(), null);
+} catch (CommandProcessorException cpe) {
+  processRunException(cpe);
+  throw cpe;
 }
   }
 
-  private void prepareContext() throws CommandProcessorException {
-if (context != null && context.getExplainAnalyze() != 
AnalyzeState.RUNNING) {
-  // close the existing ctx etc before compiling a new query, but does not 
destroy driver
-  closeInProcess(false);
-}
+  private void runInternal(String command, boolean alreadyCompiled) throws 
CommandProcessorException {
+DriverState.setDriverState(driverState);
+setInitialStateForRun(alreadyCompiled);
 
+// a flag that helps to set the correct driver state in finally block by 
tracking if
+// the method has been returned by an error or not.
+boolean isFinishedWithError = true;
 try {
-  if (context == null) {
-context = new Context(driverContext.getConf());
+  HiveDriverRunHookContext hookContext = new 
HiveDriverRunHookContextImpl(driverContext.getConf(),
+  alreadyCompiled ? context.getCmd() : command);
+  runPreDriverHooks(hookContext);
+
+  if (!alreadyCompiled) {
+compileInternal(command, true);
+  } else {
+
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
   }
-} catch (IOException e) {
-  throw new CommandProcessorException(e);
-}
 
-context.setHiveTxnManager(driverContext.getTxnManager());
-context.setStatsSource(driverContext.getStatsSource());
-context.setHDFSCleanup(true);
+  // Reset the PerfLogger so that it doesn't retain any previous values.
+  // Any value from compilation phase can be obtained through the map set 
in queryDisplay during compilation.
+  PerfLogger perfLogger = SessionState.getPerfLogger(true);
 
-driverTxnHandler.setContext(context);
-  }
+  // the reason that we set the txn manager for the cxt here 

[jira] [Work logged] (HIVE-23814) Clean up Driver

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457516
 ]

ASF GitHub Bot logged work on HIVE-23814:
-

Author: ASF GitHub Bot
Created on: 11/Jul/20 04:53
Start Date: 11/Jul/20 04:53
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1222:
URL: https://github.com/apache/hive/pull/1222#discussion_r453155585



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -139,205 +119,215 @@ public Driver(QueryState queryState, QueryInfo 
queryInfo, HiveTxnManager txnMana
 driverTxnHandler = new DriverTxnHandler(this, driverContext, driverState);
   }
 
-  /**
-   * Compile a new query, but potentially reset taskID counter.  Not resetting 
task counter
-   * is useful for generating re-entrant QL queries.
-   * @param command  The HiveQL query to compile
-   * @param resetTaskIds Resets taskID counter if true.
-   * @return 0 for ok
-   */
-  public int compile(String command, boolean resetTaskIds) {
-try {
-  compile(command, resetTaskIds, false);
-  return 0;
-} catch (CommandProcessorException cpr) {
-  return cpr.getErrorCode();
-}
+  @Override
+  public Context getContext() {
+return context;
   }
 
-  // deferClose indicates if the close/destroy should be deferred when the 
process has been
-  // interrupted, it should be set to true if the compile is called within 
another method like
-  // runInternal, which defers the close to the called in that method.
-  @VisibleForTesting
-  public void compile(String command, boolean resetTaskIds, boolean 
deferClose) throws CommandProcessorException {
-preparForCompile(resetTaskIds);
-
-Compiler compiler = new Compiler(context, driverContext, driverState);
-QueryPlan plan = compiler.compile(command, deferClose);
-driverContext.setPlan(plan);
-
-compileFinished(deferClose);
+  @Override
+  public HiveConf getConf() {
+return driverContext.getConf();
   }
 
-  private void compileFinished(boolean deferClose) {
-if (DriverState.getDriverState().isAborted() && !deferClose) {
-  closeInProcess(true);
-}
+  @Override
+  public CommandProcessorResponse run() throws CommandProcessorException {
+return run(null, true);
   }
 
-  private void preparForCompile(boolean resetTaskIds) throws 
CommandProcessorException {
-driverTxnHandler.createTxnManager();
-DriverState.setDriverState(driverState);
-prepareContext();
-setQueryId();
+  @Override
+  public CommandProcessorResponse run(String command) throws 
CommandProcessorException {

Review comment:
   Javadoc maybe here too, but at least it is easier to understand :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457516)
Time Spent: 1h 40m  (was: 1.5h)

> Clean up Driver
> ---
>
> Key: HIVE-23814
> URL: https://issues.apache.org/jira/browse/HIVE-23814
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Driver is now cut down to it's minimal size by extracting all of it's sub 
> tasks to separate classes. The rest should be cleaned up by
>  * moving out some smaller parts of the code to sub task and utility classes 
> wherever it is still possible
>  * cut large functions to meaningful and manageable parts
>  * re-order the functions to follow the order of processing
>  * fix checkstyle issues
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23814) Clean up Driver

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457515
 ]

ASF GitHub Bot logged work on HIVE-23814:
-

Author: ASF GitHub Bot
Created on: 11/Jul/20 04:52
Start Date: 11/Jul/20 04:52
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1222:
URL: https://github.com/apache/hive/pull/1222#discussion_r453155521



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -139,205 +119,215 @@ public Driver(QueryState queryState, QueryInfo 
queryInfo, HiveTxnManager txnMana
 driverTxnHandler = new DriverTxnHandler(this, driverContext, driverState);
   }
 
-  /**
-   * Compile a new query, but potentially reset taskID counter.  Not resetting 
task counter
-   * is useful for generating re-entrant QL queries.
-   * @param command  The HiveQL query to compile
-   * @param resetTaskIds Resets taskID counter if true.
-   * @return 0 for ok
-   */
-  public int compile(String command, boolean resetTaskIds) {
-try {
-  compile(command, resetTaskIds, false);
-  return 0;
-} catch (CommandProcessorException cpr) {
-  return cpr.getErrorCode();
-}
+  @Override
+  public Context getContext() {
+return context;
   }
 
-  // deferClose indicates if the close/destroy should be deferred when the 
process has been
-  // interrupted, it should be set to true if the compile is called within 
another method like
-  // runInternal, which defers the close to the called in that method.
-  @VisibleForTesting
-  public void compile(String command, boolean resetTaskIds, boolean 
deferClose) throws CommandProcessorException {
-preparForCompile(resetTaskIds);
-
-Compiler compiler = new Compiler(context, driverContext, driverState);
-QueryPlan plan = compiler.compile(command, deferClose);
-driverContext.setPlan(plan);
-
-compileFinished(deferClose);
+  @Override
+  public HiveConf getConf() {
+return driverContext.getConf();
   }
 
-  private void compileFinished(boolean deferClose) {
-if (DriverState.getDriverState().isAborted() && !deferClose) {
-  closeInProcess(true);
-}
+  @Override
+  public CommandProcessorResponse run() throws CommandProcessorException {
+return run(null, true);

Review comment:
   What does this public method do? Javadoc might be useful





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457515)
Time Spent: 1.5h  (was: 1h 20m)

> Clean up Driver
> ---
>
> Key: HIVE-23814
> URL: https://issues.apache.org/jira/browse/HIVE-23814
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Driver is now cut down to it's minimal size by extracting all of it's sub 
> tasks to separate classes. The rest should be cleaned up by
>  * moving out some smaller parts of the code to sub task and utility classes 
> wherever it is still possible
>  * cut large functions to meaningful and manageable parts
>  * re-order the functions to follow the order of processing
>  * fix checkstyle issues
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23814) Clean up Driver

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457514=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457514
 ]

ASF GitHub Bot logged work on HIVE-23814:
-

Author: ASF GitHub Bot
Created on: 11/Jul/20 04:49
Start Date: 11/Jul/20 04:49
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1222:
URL: https://github.com/apache/hive/pull/1222#discussion_r453155258



##
File path: ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java
##
@@ -529,6 +529,34 @@ private void addTableFromEntity(Entity entity, Map tables) {
   .collect(Collectors.toList());
   }
 
+  void rollback(CommandProcessorException cpe) throws 
CommandProcessorException {
+try {
+  releaseLocksAndCommitOrRollback(false);
+} catch (LockException e) {
+  LOG.error("rollback() FAILED: " + cpe); //make sure not to loose
+  DriverUtils.handleHiveException(driverContext, e, 12, "Additional info 
in hive.log at \"rollback() FAILED\"");
+}
+  }
+
+  void handleTransactionAfterExecution() throws CommandProcessorException {
+try {
+  if (driverContext.getTxnManager().isImplicitTransactionOpen() ||
+  driverContext.getPlan().getOperation() == HiveOperation.COMMIT) {
+releaseLocksAndCommitOrRollback(true);
+  } else if (driverContext.getPlan().getOperation() == 
HiveOperation.ROLLBACK) {
+releaseLocksAndCommitOrRollback(false);
+  } else if (!driverContext.getTxnManager().isTxnOpen() &&
+  driverContext.getQueryState().getHiveOperation() == 
HiveOperation.REPLLOAD) {
+// repl load during migration, commits the explicit txn and start some 
internal txns. Call
+// releaseLocksAndCommitOrRollback to do the clean up.
+releaseLocksAndCommitOrRollback(false);
+  }
+  // if none of the above is true, then txn (if there is one started) is 
not finished

Review comment:
   How could this happen? Maybe at least a debug level log would be good.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457514)
Time Spent: 1h 20m  (was: 1h 10m)

> Clean up Driver
> ---
>
> Key: HIVE-23814
> URL: https://issues.apache.org/jira/browse/HIVE-23814
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Driver is now cut down to it's minimal size by extracting all of it's sub 
> tasks to separate classes. The rest should be cleaned up by
>  * moving out some smaller parts of the code to sub task and utility classes 
> wherever it is still possible
>  * cut large functions to meaningful and manageable parts
>  * re-order the functions to follow the order of processing
>  * fix checkstyle issues
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23814) Clean up Driver

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457513=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457513
 ]

ASF GitHub Bot logged work on HIVE-23814:
-

Author: ASF GitHub Bot
Created on: 11/Jul/20 04:40
Start Date: 11/Jul/20 04:40
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1222:
URL: https://github.com/apache/hive/pull/1222#discussion_r453154673



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -410,260 +386,304 @@ private void compileInternal(String command, boolean 
deferClose) throws CommandP
   }
 }
 //Save compile-time PerfLogging for WebUI.
-//Execution-time Perf logs are done by either another thread's PerfLogger
-//or a reset PerfLogger.
+//Execution-time Perf logs are done by either another thread's PerfLogger 
or a reset PerfLogger.
 
driverContext.getQueryDisplay().setPerfLogStarts(QueryDisplay.Phase.COMPILATION,
 perfLogger.getStartTimes());
 
driverContext.getQueryDisplay().setPerfLogEnds(QueryDisplay.Phase.COMPILATION, 
perfLogger.getEndTimes());
   }
 
-  private void runInternal(String command, boolean alreadyCompiled) throws 
CommandProcessorException {
+  /**
+   * Compile a new query, but potentially reset taskID counter.  Not resetting 
task counter
+   * is useful for generating re-entrant QL queries.
+   * @param command  The HiveQL query to compile
+   * @param resetTaskIds Resets taskID counter if true.
+   * @return 0 for ok
+   */
+  public int compile(String command, boolean resetTaskIds) {
+try {
+  compile(command, resetTaskIds, false);
+  return 0;
+} catch (CommandProcessorException cpr) {
+  return cpr.getErrorCode();
+}
+  }
+
+  // deferClose indicates if the close/destroy should be deferred when the 
process has been
+  // interrupted, it should be set to true if the compile is called within 
another method like
+  // runInternal, which defers the close to the called in that method.
+  @VisibleForTesting
+  public void compile(String command, boolean resetTaskIds, boolean 
deferClose) throws CommandProcessorException {
+preparForCompile(resetTaskIds);
+
+Compiler compiler = new Compiler(context, driverContext, driverState);
+QueryPlan plan = compiler.compile(command, deferClose);
+driverContext.setPlan(plan);
+
+compileFinished(deferClose);
+  }
+
+  private void preparForCompile(boolean resetTaskIds) throws 
CommandProcessorException {
+driverTxnHandler.createTxnManager();
 DriverState.setDriverState(driverState);
+prepareContext();
+setQueryId();
 
-driverState.lock();
-try {
-  if (alreadyCompiled) {
-if (driverState.isCompiled()) {
-  driverState.executing();
-} else {
-  String errorMessage = "FAILED: Precompiled query has been cancelled 
or closed.";
-  CONSOLE.printError(errorMessage);
-  throw DriverUtils.createProcessorException(driverContext, 12, 
errorMessage, null, null);
-}
-  } else {
-driverState.compiling();
-  }
-} finally {
-  driverState.unlock();
+if (resetTaskIds) {
+  TaskFactory.resetId();
+}
+  }
+
+  private void prepareContext() throws CommandProcessorException {
+if (context != null && context.getExplainAnalyze() != 
AnalyzeState.RUNNING) {
+  // close the existing ctx etc before compiling a new query, but does not 
destroy driver
+  closeInProcess(false);
 }
 
-// a flag that helps to set the correct driver state in finally block by 
tracking if
-// the method has been returned by an error or not.
-boolean isFinishedWithError = true;
 try {
-  HiveDriverRunHookContext hookContext = new 
HiveDriverRunHookContextImpl(driverContext.getConf(),
-  alreadyCompiled ? context.getCmd() : command);
-  // Get all the driver run hooks and pre-execute them.
-  try {
-driverContext.getHookRunner().runPreDriverHooks(hookContext);
-  } catch (Exception e) {
-String errorMessage = "FAILED: Hive Internal Error: " + 
Utilities.getNameMessage(e);
-CONSOLE.printError(errorMessage + "\n" + 
StringUtils.stringifyException(e));
-throw DriverUtils.createProcessorException(driverContext, 12, 
errorMessage,
-ErrorMsg.findSQLState(e.getMessage()), e);
+  if (context == null) {
+context = new Context(driverContext.getConf());
   }
+} catch (IOException e) {
+  throw new CommandProcessorException(e);
+}
 
-  if (!alreadyCompiled) {
-// compile internal will automatically reset the perf logger
-compileInternal(command, true);
-  } else {
-// Since we're reusing the compiled plan, we need to update its start 
time for current run
-

[jira] [Work logged] (HIVE-23814) Clean up Driver

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457512=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457512
 ]

ASF GitHub Bot logged work on HIVE-23814:
-

Author: ASF GitHub Bot
Created on: 11/Jul/20 04:34
Start Date: 11/Jul/20 04:34
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1222:
URL: https://github.com/apache/hive/pull/1222#discussion_r453154237



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -139,205 +119,215 @@ public Driver(QueryState queryState, QueryInfo 
queryInfo, HiveTxnManager txnMana
 driverTxnHandler = new DriverTxnHandler(this, driverContext, driverState);
   }
 
-  /**
-   * Compile a new query, but potentially reset taskID counter.  Not resetting 
task counter
-   * is useful for generating re-entrant QL queries.
-   * @param command  The HiveQL query to compile
-   * @param resetTaskIds Resets taskID counter if true.
-   * @return 0 for ok
-   */
-  public int compile(String command, boolean resetTaskIds) {
-try {
-  compile(command, resetTaskIds, false);
-  return 0;
-} catch (CommandProcessorException cpr) {
-  return cpr.getErrorCode();
-}
+  @Override
+  public Context getContext() {
+return context;
   }
 
-  // deferClose indicates if the close/destroy should be deferred when the 
process has been
-  // interrupted, it should be set to true if the compile is called within 
another method like
-  // runInternal, which defers the close to the called in that method.
-  @VisibleForTesting
-  public void compile(String command, boolean resetTaskIds, boolean 
deferClose) throws CommandProcessorException {
-preparForCompile(resetTaskIds);
-
-Compiler compiler = new Compiler(context, driverContext, driverState);
-QueryPlan plan = compiler.compile(command, deferClose);
-driverContext.setPlan(plan);
-
-compileFinished(deferClose);
+  @Override
+  public HiveConf getConf() {
+return driverContext.getConf();
   }
 
-  private void compileFinished(boolean deferClose) {
-if (DriverState.getDriverState().isAborted() && !deferClose) {
-  closeInProcess(true);
-}
+  @Override
+  public CommandProcessorResponse run() throws CommandProcessorException {
+return run(null, true);
   }
 
-  private void preparForCompile(boolean resetTaskIds) throws 
CommandProcessorException {
-driverTxnHandler.createTxnManager();
-DriverState.setDriverState(driverState);
-prepareContext();
-setQueryId();
+  @Override
+  public CommandProcessorResponse run(String command) throws 
CommandProcessorException {
+return run(command, false);
+  }
 
-if (resetTaskIds) {
-  TaskFactory.resetId();
+  private CommandProcessorResponse run(String command, boolean 
alreadyCompiled) throws CommandProcessorException {
+try {
+  runInternal(command, alreadyCompiled);
+  return new CommandProcessorResponse(getSchema(), null);
+} catch (CommandProcessorException cpe) {
+  processRunException(cpe);
+  throw cpe;
 }
   }
 
-  private void prepareContext() throws CommandProcessorException {
-if (context != null && context.getExplainAnalyze() != 
AnalyzeState.RUNNING) {
-  // close the existing ctx etc before compiling a new query, but does not 
destroy driver
-  closeInProcess(false);
-}
+  private void runInternal(String command, boolean alreadyCompiled) throws 
CommandProcessorException {
+DriverState.setDriverState(driverState);
+setInitialStateForRun(alreadyCompiled);
 
+// a flag that helps to set the correct driver state in finally block by 
tracking if
+// the method has been returned by an error or not.
+boolean isFinishedWithError = true;
 try {
-  if (context == null) {
-context = new Context(driverContext.getConf());
+  HiveDriverRunHookContext hookContext = new 
HiveDriverRunHookContextImpl(driverContext.getConf(),
+  alreadyCompiled ? context.getCmd() : command);
+  runPreDriverHooks(hookContext);
+
+  if (!alreadyCompiled) {
+compileInternal(command, true);
+  } else {
+
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
   }
-} catch (IOException e) {
-  throw new CommandProcessorException(e);
-}
 
-context.setHiveTxnManager(driverContext.getTxnManager());
-context.setStatsSource(driverContext.getStatsSource());
-context.setHDFSCleanup(true);
+  // Reset the PerfLogger so that it doesn't retain any previous values.
+  // Any value from compilation phase can be obtained through the map set 
in queryDisplay during compilation.
+  PerfLogger perfLogger = SessionState.getPerfLogger(true);
 
-driverTxnHandler.setContext(context);
-  }
+  // the reason that we set the txn manager for the cxt here 

[jira] [Work logged] (HIVE-23814) Clean up Driver

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457510=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457510
 ]

ASF GitHub Bot logged work on HIVE-23814:
-

Author: ASF GitHub Bot
Created on: 11/Jul/20 04:31
Start Date: 11/Jul/20 04:31
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1222:
URL: https://github.com/apache/hive/pull/1222#discussion_r453154061



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -139,205 +119,215 @@ public Driver(QueryState queryState, QueryInfo 
queryInfo, HiveTxnManager txnMana
 driverTxnHandler = new DriverTxnHandler(this, driverContext, driverState);
   }
 
-  /**
-   * Compile a new query, but potentially reset taskID counter.  Not resetting 
task counter
-   * is useful for generating re-entrant QL queries.
-   * @param command  The HiveQL query to compile
-   * @param resetTaskIds Resets taskID counter if true.
-   * @return 0 for ok
-   */
-  public int compile(String command, boolean resetTaskIds) {
-try {
-  compile(command, resetTaskIds, false);
-  return 0;
-} catch (CommandProcessorException cpr) {
-  return cpr.getErrorCode();
-}
+  @Override
+  public Context getContext() {
+return context;
   }
 
-  // deferClose indicates if the close/destroy should be deferred when the 
process has been
-  // interrupted, it should be set to true if the compile is called within 
another method like
-  // runInternal, which defers the close to the called in that method.
-  @VisibleForTesting
-  public void compile(String command, boolean resetTaskIds, boolean 
deferClose) throws CommandProcessorException {
-preparForCompile(resetTaskIds);
-
-Compiler compiler = new Compiler(context, driverContext, driverState);
-QueryPlan plan = compiler.compile(command, deferClose);
-driverContext.setPlan(plan);
-
-compileFinished(deferClose);
+  @Override
+  public HiveConf getConf() {
+return driverContext.getConf();
   }
 
-  private void compileFinished(boolean deferClose) {
-if (DriverState.getDriverState().isAborted() && !deferClose) {
-  closeInProcess(true);
-}
+  @Override
+  public CommandProcessorResponse run() throws CommandProcessorException {
+return run(null, true);
   }
 
-  private void preparForCompile(boolean resetTaskIds) throws 
CommandProcessorException {
-driverTxnHandler.createTxnManager();
-DriverState.setDriverState(driverState);
-prepareContext();
-setQueryId();
+  @Override
+  public CommandProcessorResponse run(String command) throws 
CommandProcessorException {
+return run(command, false);
+  }
 
-if (resetTaskIds) {
-  TaskFactory.resetId();
+  private CommandProcessorResponse run(String command, boolean 
alreadyCompiled) throws CommandProcessorException {
+try {
+  runInternal(command, alreadyCompiled);
+  return new CommandProcessorResponse(getSchema(), null);
+} catch (CommandProcessorException cpe) {
+  processRunException(cpe);
+  throw cpe;
 }
   }
 
-  private void prepareContext() throws CommandProcessorException {
-if (context != null && context.getExplainAnalyze() != 
AnalyzeState.RUNNING) {
-  // close the existing ctx etc before compiling a new query, but does not 
destroy driver
-  closeInProcess(false);
-}
+  private void runInternal(String command, boolean alreadyCompiled) throws 
CommandProcessorException {
+DriverState.setDriverState(driverState);
+setInitialStateForRun(alreadyCompiled);
 
+// a flag that helps to set the correct driver state in finally block by 
tracking if
+// the method has been returned by an error or not.
+boolean isFinishedWithError = true;
 try {
-  if (context == null) {
-context = new Context(driverContext.getConf());
+  HiveDriverRunHookContext hookContext = new 
HiveDriverRunHookContextImpl(driverContext.getConf(),
+  alreadyCompiled ? context.getCmd() : command);
+  runPreDriverHooks(hookContext);
+
+  if (!alreadyCompiled) {
+compileInternal(command, true);
+  } else {
+
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
   }
-} catch (IOException e) {
-  throw new CommandProcessorException(e);
-}
 
-context.setHiveTxnManager(driverContext.getTxnManager());
-context.setStatsSource(driverContext.getStatsSource());
-context.setHDFSCleanup(true);
+  // Reset the PerfLogger so that it doesn't retain any previous values.
+  // Any value from compilation phase can be obtained through the map set 
in queryDisplay during compilation.
+  PerfLogger perfLogger = SessionState.getPerfLogger(true);
 
-driverTxnHandler.setContext(context);
-  }
+  // the reason that we set the txn manager for the cxt here 

[jira] [Work logged] (HIVE-23814) Clean up Driver

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457509
 ]

ASF GitHub Bot logged work on HIVE-23814:
-

Author: ASF GitHub Bot
Created on: 11/Jul/20 04:28
Start Date: 11/Jul/20 04:28
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1222:
URL: https://github.com/apache/hive/pull/1222#discussion_r453153797



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -139,205 +119,215 @@ public Driver(QueryState queryState, QueryInfo 
queryInfo, HiveTxnManager txnMana
 driverTxnHandler = new DriverTxnHandler(this, driverContext, driverState);
   }
 
-  /**
-   * Compile a new query, but potentially reset taskID counter.  Not resetting 
task counter
-   * is useful for generating re-entrant QL queries.
-   * @param command  The HiveQL query to compile
-   * @param resetTaskIds Resets taskID counter if true.
-   * @return 0 for ok
-   */
-  public int compile(String command, boolean resetTaskIds) {
-try {
-  compile(command, resetTaskIds, false);
-  return 0;
-} catch (CommandProcessorException cpr) {
-  return cpr.getErrorCode();
-}
+  @Override
+  public Context getContext() {
+return context;
   }
 
-  // deferClose indicates if the close/destroy should be deferred when the 
process has been
-  // interrupted, it should be set to true if the compile is called within 
another method like
-  // runInternal, which defers the close to the called in that method.
-  @VisibleForTesting
-  public void compile(String command, boolean resetTaskIds, boolean 
deferClose) throws CommandProcessorException {
-preparForCompile(resetTaskIds);
-
-Compiler compiler = new Compiler(context, driverContext, driverState);
-QueryPlan plan = compiler.compile(command, deferClose);
-driverContext.setPlan(plan);
-
-compileFinished(deferClose);
+  @Override
+  public HiveConf getConf() {
+return driverContext.getConf();
   }
 
-  private void compileFinished(boolean deferClose) {
-if (DriverState.getDriverState().isAborted() && !deferClose) {
-  closeInProcess(true);
-}
+  @Override
+  public CommandProcessorResponse run() throws CommandProcessorException {
+return run(null, true);
   }
 
-  private void preparForCompile(boolean resetTaskIds) throws 
CommandProcessorException {
-driverTxnHandler.createTxnManager();
-DriverState.setDriverState(driverState);
-prepareContext();
-setQueryId();
+  @Override
+  public CommandProcessorResponse run(String command) throws 
CommandProcessorException {
+return run(command, false);
+  }
 
-if (resetTaskIds) {
-  TaskFactory.resetId();
+  private CommandProcessorResponse run(String command, boolean 
alreadyCompiled) throws CommandProcessorException {
+try {
+  runInternal(command, alreadyCompiled);
+  return new CommandProcessorResponse(getSchema(), null);
+} catch (CommandProcessorException cpe) {
+  processRunException(cpe);
+  throw cpe;
 }
   }
 
-  private void prepareContext() throws CommandProcessorException {
-if (context != null && context.getExplainAnalyze() != 
AnalyzeState.RUNNING) {
-  // close the existing ctx etc before compiling a new query, but does not 
destroy driver
-  closeInProcess(false);
-}
+  private void runInternal(String command, boolean alreadyCompiled) throws 
CommandProcessorException {
+DriverState.setDriverState(driverState);
+setInitialStateForRun(alreadyCompiled);
 
+// a flag that helps to set the correct driver state in finally block by 
tracking if
+// the method has been returned by an error or not.
+boolean isFinishedWithError = true;
 try {
-  if (context == null) {
-context = new Context(driverContext.getConf());
+  HiveDriverRunHookContext hookContext = new 
HiveDriverRunHookContextImpl(driverContext.getConf(),
+  alreadyCompiled ? context.getCmd() : command);
+  runPreDriverHooks(hookContext);
+
+  if (!alreadyCompiled) {
+compileInternal(command, true);
+  } else {
+
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
   }
-} catch (IOException e) {
-  throw new CommandProcessorException(e);
-}
 
-context.setHiveTxnManager(driverContext.getTxnManager());
-context.setStatsSource(driverContext.getStatsSource());
-context.setHDFSCleanup(true);
+  // Reset the PerfLogger so that it doesn't retain any previous values.
+  // Any value from compilation phase can be obtained through the map set 
in queryDisplay during compilation.
+  PerfLogger perfLogger = SessionState.getPerfLogger(true);
 
-driverTxnHandler.setContext(context);
-  }
+  // the reason that we set the txn manager for the cxt here 

[jira] [Work logged] (HIVE-23814) Clean up Driver

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457508
 ]

ASF GitHub Bot logged work on HIVE-23814:
-

Author: ASF GitHub Bot
Created on: 11/Jul/20 04:23
Start Date: 11/Jul/20 04:23
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1222:
URL: https://github.com/apache/hive/pull/1222#discussion_r453153417



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -139,205 +119,215 @@ public Driver(QueryState queryState, QueryInfo 
queryInfo, HiveTxnManager txnMana
 driverTxnHandler = new DriverTxnHandler(this, driverContext, driverState);
   }
 
-  /**
-   * Compile a new query, but potentially reset taskID counter.  Not resetting 
task counter
-   * is useful for generating re-entrant QL queries.
-   * @param command  The HiveQL query to compile
-   * @param resetTaskIds Resets taskID counter if true.
-   * @return 0 for ok
-   */
-  public int compile(String command, boolean resetTaskIds) {
-try {
-  compile(command, resetTaskIds, false);
-  return 0;
-} catch (CommandProcessorException cpr) {
-  return cpr.getErrorCode();
-}
+  @Override
+  public Context getContext() {
+return context;
   }
 
-  // deferClose indicates if the close/destroy should be deferred when the 
process has been
-  // interrupted, it should be set to true if the compile is called within 
another method like
-  // runInternal, which defers the close to the called in that method.
-  @VisibleForTesting
-  public void compile(String command, boolean resetTaskIds, boolean 
deferClose) throws CommandProcessorException {
-preparForCompile(resetTaskIds);
-
-Compiler compiler = new Compiler(context, driverContext, driverState);
-QueryPlan plan = compiler.compile(command, deferClose);
-driverContext.setPlan(plan);
-
-compileFinished(deferClose);
+  @Override
+  public HiveConf getConf() {
+return driverContext.getConf();
   }
 
-  private void compileFinished(boolean deferClose) {
-if (DriverState.getDriverState().isAborted() && !deferClose) {
-  closeInProcess(true);
-}
+  @Override
+  public CommandProcessorResponse run() throws CommandProcessorException {
+return run(null, true);
   }
 
-  private void preparForCompile(boolean resetTaskIds) throws 
CommandProcessorException {
-driverTxnHandler.createTxnManager();
-DriverState.setDriverState(driverState);
-prepareContext();
-setQueryId();
+  @Override
+  public CommandProcessorResponse run(String command) throws 
CommandProcessorException {
+return run(command, false);
+  }
 
-if (resetTaskIds) {
-  TaskFactory.resetId();
+  private CommandProcessorResponse run(String command, boolean 
alreadyCompiled) throws CommandProcessorException {
+try {
+  runInternal(command, alreadyCompiled);
+  return new CommandProcessorResponse(getSchema(), null);
+} catch (CommandProcessorException cpe) {
+  processRunException(cpe);
+  throw cpe;
 }
   }
 
-  private void prepareContext() throws CommandProcessorException {
-if (context != null && context.getExplainAnalyze() != 
AnalyzeState.RUNNING) {
-  // close the existing ctx etc before compiling a new query, but does not 
destroy driver
-  closeInProcess(false);
-}
+  private void runInternal(String command, boolean alreadyCompiled) throws 
CommandProcessorException {
+DriverState.setDriverState(driverState);
+setInitialStateForRun(alreadyCompiled);
 
+// a flag that helps to set the correct driver state in finally block by 
tracking if
+// the method has been returned by an error or not.
+boolean isFinishedWithError = true;
 try {
-  if (context == null) {
-context = new Context(driverContext.getConf());
+  HiveDriverRunHookContext hookContext = new 
HiveDriverRunHookContextImpl(driverContext.getConf(),
+  alreadyCompiled ? context.getCmd() : command);
+  runPreDriverHooks(hookContext);
+
+  if (!alreadyCompiled) {
+compileInternal(command, true);
+  } else {
+
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
   }
-} catch (IOException e) {
-  throw new CommandProcessorException(e);
-}
 
-context.setHiveTxnManager(driverContext.getTxnManager());
-context.setStatsSource(driverContext.getStatsSource());
-context.setHDFSCleanup(true);
+  // Reset the PerfLogger so that it doesn't retain any previous values.
+  // Any value from compilation phase can be obtained through the map set 
in queryDisplay during compilation.
+  PerfLogger perfLogger = SessionState.getPerfLogger(true);
 
-driverTxnHandler.setContext(context);
-  }
+  // the reason that we set the txn manager for the cxt here 

[jira] [Work logged] (HIVE-23814) Clean up Driver

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=457507=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457507
 ]

ASF GitHub Bot logged work on HIVE-23814:
-

Author: ASF GitHub Bot
Created on: 11/Jul/20 04:19
Start Date: 11/Jul/20 04:19
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1222:
URL: https://github.com/apache/hive/pull/1222#discussion_r453153111



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -139,205 +119,215 @@ public Driver(QueryState queryState, QueryInfo 
queryInfo, HiveTxnManager txnMana
 driverTxnHandler = new DriverTxnHandler(this, driverContext, driverState);
   }
 
-  /**
-   * Compile a new query, but potentially reset taskID counter.  Not resetting 
task counter
-   * is useful for generating re-entrant QL queries.
-   * @param command  The HiveQL query to compile
-   * @param resetTaskIds Resets taskID counter if true.
-   * @return 0 for ok
-   */
-  public int compile(String command, boolean resetTaskIds) {
-try {
-  compile(command, resetTaskIds, false);
-  return 0;
-} catch (CommandProcessorException cpr) {
-  return cpr.getErrorCode();
-}
+  @Override
+  public Context getContext() {
+return context;
   }
 
-  // deferClose indicates if the close/destroy should be deferred when the 
process has been
-  // interrupted, it should be set to true if the compile is called within 
another method like
-  // runInternal, which defers the close to the called in that method.
-  @VisibleForTesting
-  public void compile(String command, boolean resetTaskIds, boolean 
deferClose) throws CommandProcessorException {
-preparForCompile(resetTaskIds);
-
-Compiler compiler = new Compiler(context, driverContext, driverState);
-QueryPlan plan = compiler.compile(command, deferClose);
-driverContext.setPlan(plan);
-
-compileFinished(deferClose);
+  @Override
+  public HiveConf getConf() {
+return driverContext.getConf();
   }
 
-  private void compileFinished(boolean deferClose) {
-if (DriverState.getDriverState().isAborted() && !deferClose) {
-  closeInProcess(true);
-}
+  @Override
+  public CommandProcessorResponse run() throws CommandProcessorException {
+return run(null, true);
   }
 
-  private void preparForCompile(boolean resetTaskIds) throws 
CommandProcessorException {
-driverTxnHandler.createTxnManager();
-DriverState.setDriverState(driverState);
-prepareContext();
-setQueryId();
+  @Override
+  public CommandProcessorResponse run(String command) throws 
CommandProcessorException {
+return run(command, false);
+  }
 
-if (resetTaskIds) {
-  TaskFactory.resetId();
+  private CommandProcessorResponse run(String command, boolean 
alreadyCompiled) throws CommandProcessorException {
+try {
+  runInternal(command, alreadyCompiled);
+  return new CommandProcessorResponse(getSchema(), null);
+} catch (CommandProcessorException cpe) {
+  processRunException(cpe);
+  throw cpe;
 }
   }
 
-  private void prepareContext() throws CommandProcessorException {
-if (context != null && context.getExplainAnalyze() != 
AnalyzeState.RUNNING) {
-  // close the existing ctx etc before compiling a new query, but does not 
destroy driver
-  closeInProcess(false);
-}
+  private void runInternal(String command, boolean alreadyCompiled) throws 
CommandProcessorException {
+DriverState.setDriverState(driverState);
+setInitialStateForRun(alreadyCompiled);
 
+// a flag that helps to set the correct driver state in finally block by 
tracking if
+// the method has been returned by an error or not.
+boolean isFinishedWithError = true;
 try {
-  if (context == null) {
-context = new Context(driverContext.getConf());
+  HiveDriverRunHookContext hookContext = new 
HiveDriverRunHookContextImpl(driverContext.getConf(),
+  alreadyCompiled ? context.getCmd() : command);
+  runPreDriverHooks(hookContext);
+
+  if (!alreadyCompiled) {
+compileInternal(command, true);
+  } else {
+
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
   }
-} catch (IOException e) {
-  throw new CommandProcessorException(e);
-}
 
-context.setHiveTxnManager(driverContext.getTxnManager());
-context.setStatsSource(driverContext.getStatsSource());
-context.setHDFSCleanup(true);
+  // Reset the PerfLogger so that it doesn't retain any previous values.
+  // Any value from compilation phase can be obtained through the map set 
in queryDisplay during compilation.

Review comment:
   Shouldn't this be the first thing in the life cycle? Or minimally in 
this method? Previous code paths might already started to use the perf logger.





[jira] [Resolved] (HIVE-23825) Create a flag to turn off _orc_acid_version file creation

2020-07-10 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-23825.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.

Thanks for the review [~klcopp]!

> Create a flag to turn off _orc_acid_version file creation
> -
>
> Key: HIVE-23825
> URL: https://issues.apache.org/jira/browse/HIVE-23825
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We do not really use the version files, and creating them could be costly.
> We would like to add the possibility to prevent the overhead, and do not 
> create them



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23825) Create a flag to turn off _orc_acid_version file creation

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23825?focusedWorklogId=457504=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457504
 ]

ASF GitHub Bot logged work on HIVE-23825:
-

Author: ASF GitHub Bot
Created on: 11/Jul/20 04:13
Start Date: 11/Jul/20 04:13
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #1236:
URL: https://github.com/apache/hive/pull/1236


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457504)
Time Spent: 20m  (was: 10m)

> Create a flag to turn off _orc_acid_version file creation
> -
>
> Key: HIVE-23825
> URL: https://issues.apache.org/jira/browse/HIVE-23825
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We do not really use the version files, and creating them could be costly.
> We would like to add the possibility to prevent the overhead, and do not 
> create them



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23351) Ranger Replication Scheduling

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23351?focusedWorklogId=457455=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457455
 ]

ASF GitHub Bot logged work on HIVE-23351:
-

Author: ASF GitHub Bot
Created on: 11/Jul/20 00:31
Start Date: 11/Jul/20 00:31
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1004:
URL: https://github.com/apache/hive/pull/1004#issuecomment-656949812


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457455)
Time Spent: 3h  (was: 2h 50m)

> Ranger Replication Scheduling
> -
>
> Key: HIVE-23351
> URL: https://issues.apache.org/jira/browse/HIVE-23351
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23351.01.patch, HIVE-23351.02.patch, 
> HIVE-23351.03.patch, HIVE-23351.04.patch, HIVE-23351.05.patch, 
> HIVE-23351.06.patch, HIVE-23351.07.patch, HIVE-23351.08.patch, 
> HIVE-23351.09.patch, HIVE-23351.10.patch, HIVE-23351.10.patch, 
> HIVE-23351.11.patch, HIVE-23351.12.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23339) SBA does not check permissions for DB location specified in Create database query

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23339?focusedWorklogId=457456=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457456
 ]

ASF GitHub Bot logged work on HIVE-23339:
-

Author: ASF GitHub Bot
Created on: 11/Jul/20 00:31
Start Date: 11/Jul/20 00:31
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1011:
URL: https://github.com/apache/hive/pull/1011#issuecomment-656949806


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457456)
Time Spent: 20m  (was: 10m)

> SBA does not check permissions for DB location specified in Create database 
> query
> -
>
> Key: HIVE-23339
> URL: https://issues.apache.org/jira/browse/HIVE-23339
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0
>Reporter: Riju Trivedi
>Assignee: Shubham Chaurasia
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-23339.01.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> With doAs=true and StorageBasedAuthorization provider, create database with 
> specific location succeeds even if user doesn't have access to that path.
>  
> {code:java}
>   hadoop fs -ls -d /tmp/cannot_write
>  drwx-- - hive hadoop 0 2020-04-01 22:53 /tmp/cannot_write
> create a database under /tmp/cannot_write. We would expect it to fail, but is 
> actually created successfully with "hive" as the owner:
> rtrivedi@bdp01:~> beeline -e "create database rtrivedi_1 location 
> '/tmp/cannot_write/rtrivedi_1'"
>  INFO : OK
>  No rows affected (0.116 seconds)
> hive@hpchdd2e:~> hadoop fs -ls /tmp/cannot_write
>  Found 1 items
>  drwx-- - hive hadoop 0 2020-04-01 23:05 /tmp/cannot_write/rtrivedi_1
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22415) Upgrade to Java 11

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22415?focusedWorklogId=457312=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457312
 ]

ASF GitHub Bot logged work on HIVE-22415:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 19:11
Start Date: 10/Jul/20 19:11
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1241:
URL: https://github.com/apache/hive/pull/1241#discussion_r453030985



##
File path: 
standalone-metastore/metastore-server/src/main/resources/datanucleus-log4j.properties
##
@@ -15,3 +15,5 @@ log4j.category.DataNucleus.ValueGeneration=DEBUG, A1
 
 log4j.category.DataNucleus.Enhancer=INFO, A1
 log4j.category.DataNucleus.SchemaTool=INFO, A1
+
+log4j.category.DataNucleus.Persistence=INFO, A1

Review comment:
   Remove this change.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457312)
Time Spent: 20m  (was: 10m)

> Upgrade to Java 11
> --
>
> Key: HIVE-22415
> URL: https://issues.apache.org/jira/browse/HIVE-22415
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Upgrade Hive to Java JDK 11



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22415) Upgrade to Java 11

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-22415:
--
Labels: pull-request-available  (was: )

> Upgrade to Java 11
> --
>
> Key: HIVE-22415
> URL: https://issues.apache.org/jira/browse/HIVE-22415
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Upgrade Hive to Java JDK 11



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22415) Upgrade to Java 11

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22415?focusedWorklogId=457311=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457311
 ]

ASF GitHub Bot logged work on HIVE-22415:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 19:10
Start Date: 10/Jul/20 19:10
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1241:
URL: https://github.com/apache/hive/pull/1241


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457311)
Remaining Estimate: 0h
Time Spent: 10m

> Upgrade to Java 11
> --
>
> Key: HIVE-22415
> URL: https://issues.apache.org/jira/browse/HIVE-22415
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Upgrade Hive to Java JDK 11



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23700) HiveConf static initialization fails when JAR URI is opaque

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23700?focusedWorklogId=457273=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457273
 ]

ASF GitHub Bot logged work on HIVE-23700:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 17:39
Start Date: 10/Jul/20 17:39
Worklog Time Spent: 10m 
  Work Description: uptycs-anudeep opened a new pull request #1240:
URL: https://github.com/apache/hive/pull/1240


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457273)
Remaining Estimate: 119h 20m  (was: 119.5h)
Time Spent: 40m  (was: 0.5h)

> HiveConf static initialization fails when JAR URI is opaque
> ---
>
> Key: HIVE-23700
> URL: https://issues.apache.org/jira/browse/HIVE-23700
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.7
>Reporter: Francisco Guerrero
>Assignee: Francisco Guerrero
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-23700.1.patch
>
>   Original Estimate: 120h
>  Time Spent: 40m
>  Remaining Estimate: 119h 20m
>
> HiveConf static initialization fails when the jar URI is opaque, for example 
> when it's embedded as a fat jar in a spring boot application. Then 
> initialization of the HiveConf static block fails and the HiveConf class does 
> not get classloaded. The opaque URI in my case looks like this 
> _jar:file:/usr/local/server/some-service-jar.jar!/BOOT-INF/lib/hive-common-2.3.7.jar!/_
> HiveConf#findConfigFile should be able to handle `IllegalArgumentException` 
> when the jar `URI` provided to `File` throws the exception.
> To surface this issue three conditions need to be met.
> 1. hive-site.xml should not be on the classpath
> 2. hive-site.xml should not be on "HIVE_CONF_DIR"
> 3. hive-site.xml should not be on "HIVE_HOME"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23753) Make LLAP Secretmanager token path configurable

2020-07-10 Thread Rajkumar Singh (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155636#comment-17155636
 ] 

Rajkumar Singh commented on HIVE-23753:
---

https://github.com/apache/hive/pull/1171

> Make LLAP Secretmanager token path configurable
> ---
>
> Key: HIVE-23753
> URL: https://issues.apache.org/jira/browse/HIVE-23753
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 4.0.0
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
>
> In a very Busy LLAP cluster if for some reason the Tokens under 
> zkdtsm_hive_llap0 zk path are not cleaned then LLAP Daemon startup takes a 
> very long time to startup, this may lead to service outage if LLAP daemons 
> are not started and the number of retries while checking LLAP app status 
> exceeds. upon looking the jstack of llap daemon it seems to traverse the 
> zkdtsm_hive_llap0 zk path before starting the secret manager.
> {code:java}
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:502)
>   at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1386)
>   - locked <0x7fef36cdd338> (a org.apache.zookeeper.ClientCnxn$Packet)
>   at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1153)
>   at 
> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:302)
>   at 
> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:291)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>   at 
> org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:288)
>   at 
> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:279)
>   at 
> org.apache.curator.framework.imps.GetDataBuilderImpl$2.forPath(GetDataBuilderImpl.java:142)
>   at 
> org.apache.curator.framework.imps.GetDataBuilderImpl$2.forPath(GetDataBuilderImpl.java:138)
>   at 
> org.apache.curator.framework.recipes.cache.PathChildrenCache.internalRebuildNode(PathChildrenCache.java:591)
>   at 
> org.apache.curator.framework.recipes.cache.PathChildrenCache.rebuild(PathChildrenCache.java:331)
>   at 
> org.apache.curator.framework.recipes.cache.PathChildrenCache.start(PathChildrenCache.java:300)
>   at 
> org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.startThreads(ZKDelegationTokenSecretManager.java:370)
>   at 
> org.apache.hadoop.hive.llap.security.SecretManager.startThreads(SecretManager.java:82)
>   at 
> org.apache.hadoop.hive.llap.security.SecretManager$1.run(SecretManager.java:223)
>   at 
> org.apache.hadoop.hive.llap.security.SecretManager$1.run(SecretManager.java:218)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1846)
>   at 
> org.apache.hadoop.hive.llap.security.SecretManager.createSecretManager(SecretManager.java:218)
>   at 
> org.apache.hadoop.hive.llap.security.SecretManager.createSecretManager(SecretManager.java:212)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.(LlapDaemon.java:279)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22957) Support Partition Filtering In MSCK REPAIR TABLE Command

2020-07-10 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155621#comment-17155621
 ] 

Syed Shameerur Rahman commented on HIVE-22957:
--

Tests have passed now!

> Support Partition Filtering In MSCK REPAIR TABLE Command
> 
>
> Key: HIVE-22957
> URL: https://issues.apache.org/jira/browse/HIVE-22957
> Project: Hive
>  Issue Type: Improvement
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> *Design Doc:*
> [^Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23809) Data loss occurs when using tez engine to join different bucketing_version tables

2020-07-10 Thread ZhangQiDong (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangQiDong updated HIVE-23809:
---
Attachment: HIVE-23809.1.patch

> Data loss occurs when using tez engine to join different bucketing_version 
> tables
> -
>
> Key: HIVE-23809
> URL: https://issues.apache.org/jira/browse/HIVE-23809
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Tez
>Affects Versions: 3.1.0
>Reporter: ZhangQiDong
>Assignee: ZhangQiDong
>Priority: Major
>  Labels: hive, tez
> Attachments: HIVE-23809.1.patch
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> *Test case:*
> create table table_a (a int, b string,c string);
> create table table_b (a int, b string,c string);
> insert into table_a values 
> (11,'a','aa'),(22,'b','bb'),(33,'c','cc'),(44,'d','dd'),(5,'e','ee'),(6,'f','ff'),(7,'g','gg');
> insert into table_b values 
> (11,'a','aa'),(22,'b','bb'),(33,'c','cc'),(44,'d','dd'),(5,'e','ee'),(6,'f','ff'),(7,'g','gg');
> alter table table_a set tblproperties ("bucketing_version"='1');
> alter table table_b set tblproperties ("bucketing_version"='2');
>  *Hivesql:*
>  *set hive.auto.convert.join=false;*
>  *set mapred.reduce.tasks=2;*
>  select ta.a as a_a, tb.b as b_b from table_a ta join table_b tb 
> on(ta.a=tb.a);
> set hive.execution.engine=mr;
>  +---+-+
> |a_a|b_b|
> +---+-+
> |5|e|
> |6|f|
> |7|g|
> |11|a|
> |22|b|
> |33|c|
> |44|d|
> +---+-+
> set hive.execution.engine=tez;
>  +---+-+
> |a_a|b_b|
> +---+-+
> |6|f|
> |5|e|
> |11|a|
> |33|c|
> +---+-+
>  
>   
>   
>   
>   
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23809) Data loss occurs when using tez engine to join different bucketing_version tables

2020-07-10 Thread ZhangQiDong (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangQiDong updated HIVE-23809:
---
Hadoop Flags: Reviewed
Tags: Tez
Target Version/s: 3.1.0
  Status: Patch Available  (was: Open)

The modification logic is the same as that of HIVE-22098 patch. However, the 
HIVE-22098 patch only works for hive on Mr. HIVE-23809 patch solves the problem 
of hive on Tez. If you want tez and MR to have the same result, you need to 
apply the HIVE-22098 patch and HIVE-23809 patch at the same time.

> Data loss occurs when using tez engine to join different bucketing_version 
> tables
> -
>
> Key: HIVE-23809
> URL: https://issues.apache.org/jira/browse/HIVE-23809
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Tez
>Affects Versions: 3.1.0
>Reporter: ZhangQiDong
>Assignee: ZhangQiDong
>Priority: Major
>  Labels: hive, tez
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> *Test case:*
> create table table_a (a int, b string,c string);
> create table table_b (a int, b string,c string);
> insert into table_a values 
> (11,'a','aa'),(22,'b','bb'),(33,'c','cc'),(44,'d','dd'),(5,'e','ee'),(6,'f','ff'),(7,'g','gg');
> insert into table_b values 
> (11,'a','aa'),(22,'b','bb'),(33,'c','cc'),(44,'d','dd'),(5,'e','ee'),(6,'f','ff'),(7,'g','gg');
> alter table table_a set tblproperties ("bucketing_version"='1');
> alter table table_b set tblproperties ("bucketing_version"='2');
>  *Hivesql:*
>  *set hive.auto.convert.join=false;*
>  *set mapred.reduce.tasks=2;*
>  select ta.a as a_a, tb.b as b_b from table_a ta join table_b tb 
> on(ta.a=tb.a);
> set hive.execution.engine=mr;
>  +---+-+
> |a_a|b_b|
> +---+-+
> |5|e|
> |6|f|
> |7|g|
> |11|a|
> |22|b|
> |33|c|
> |44|d|
> +---+-+
> set hive.execution.engine=tez;
>  +---+-+
> |a_a|b_b|
> +---+-+
> |6|f|
> |5|e|
> |11|a|
> |33|c|
> +---+-+
>  
>   
>   
>   
>   
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23830) Remove shutdownhook after query is completed

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23830?focusedWorklogId=457211=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457211
 ]

ASF GitHub Bot logged work on HIVE-23830:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 15:54
Start Date: 10/Jul/20 15:54
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on a change in pull request #1235:
URL: https://github.com/apache/hive/pull/1235#discussion_r452929806



##
File path: ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java
##
@@ -553,11 +553,13 @@ private void release(boolean releaseLocks) {
 LOG.warn("Exception when releasing locking in destroy: " + 
e.getMessage());
   }
 }
-ShutdownHookManager.removeShutdownHook(shutdownRunner);
+ShutdownHookManager.removeShutdownHook(txnRollbackRunner);
   }
 
   void releaseLocksAndCommitOrRollback(boolean commit) throws LockException {

Review comment:
   I would not rename to `commitAndCleanup` because this method also rolls 
back transaction. I'll rename it to `endTransactionAndCleanup`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457211)
Time Spent: 1h  (was: 50m)

> Remove shutdownhook after query is completed
> 
>
> Key: HIVE-23830
> URL: https://issues.apache.org/jira/browse/HIVE-23830
> Project: Hive
>  Issue Type: Bug
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Each query registers a shutdownHook to release transactional resources in 
> case JVM shuts down mid query. These hooks are not cleaned up until session 
> is closed. Session life time is unbounded. So these hooks are a memory leak. 
> They should be cleaned as soon as transaction is completed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23836) Make "cols" dependent so that it cascade deletes

2020-07-10 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23836:
--
Description: 
{quote}
If you want the deletion of a persistent object to cause the deletion of 
related objects then you need to mark the related fields in the mapping to be 
"dependent".
{quote}

http://www.datanucleus.org/products/accessplatform/jdo/persistence.html#dependent_fields
http://www.datanucleus.org/products/datanucleus/jdo/persistence.html#_deleting_an_object

The database won't do it:

{code:sql|title=Derby Schema}
ALTER TABLE "APP"."COLUMNS_V2" ADD CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY 
("CD_ID") REFERENCES "APP"."CDS" ("CD_ID") ON DELETE NO ACTION ON UPDATE NO 
ACTION;
{code}

https://github.com/apache/hive/blob/65cf6957cf9432277a096f91b40985237274579f/standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql#L452


  was:
{quote}
If you want the deletion of a persistent object to cause the deletion of 
related objects then you need to mark the related fields in the mapping to be 
"dependent".
{quote}

http://www.datanucleus.org/products/accessplatform/jdo/persistence.html#dependent_fields
http://www.datanucleus.org/products/datanucleus/jdo/persistence.html#_deleting_an_object



> Make "cols" dependent so that it cascade deletes
> 
>
> Key: HIVE-23836
> URL: https://issues.apache.org/jira/browse/HIVE-23836
> Project: Hive
>  Issue Type: Bug
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {quote}
> If you want the deletion of a persistent object to cause the deletion of 
> related objects then you need to mark the related fields in the mapping to be 
> "dependent".
> {quote}
> http://www.datanucleus.org/products/accessplatform/jdo/persistence.html#dependent_fields
> http://www.datanucleus.org/products/datanucleus/jdo/persistence.html#_deleting_an_object
> The database won't do it:
> {code:sql|title=Derby Schema}
> ALTER TABLE "APP"."COLUMNS_V2" ADD CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY 
> ("CD_ID") REFERENCES "APP"."CDS" ("CD_ID") ON DELETE NO ACTION ON UPDATE NO 
> ACTION;
> {code}
> https://github.com/apache/hive/blob/65cf6957cf9432277a096f91b40985237274579f/standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql#L452



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23836) Make "cols" dependent so that it cascade deletes

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23836:
--
Labels: pull-request-available  (was: )

> Make "cols" dependent so that it cascade deletes
> 
>
> Key: HIVE-23836
> URL: https://issues.apache.org/jira/browse/HIVE-23836
> Project: Hive
>  Issue Type: Bug
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {quote}
> If you want the deletion of a persistent object to cause the deletion of 
> related objects then you need to mark the related fields in the mapping to be 
> "dependent".
> {quote}
> http://www.datanucleus.org/products/accessplatform/jdo/persistence.html#dependent_fields
> http://www.datanucleus.org/products/datanucleus/jdo/persistence.html#_deleting_an_object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23836) Make "cols" dependent so that it cascade deletes

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23836?focusedWorklogId=457196=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457196
 ]

ASF GitHub Bot logged work on HIVE-23836:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 15:19
Start Date: 10/Jul/20 15:19
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1239:
URL: https://github.com/apache/hive/pull/1239


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457196)
Remaining Estimate: 0h
Time Spent: 10m

> Make "cols" dependent so that it cascade deletes
> 
>
> Key: HIVE-23836
> URL: https://issues.apache.org/jira/browse/HIVE-23836
> Project: Hive
>  Issue Type: Bug
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {quote}
> If you want the deletion of a persistent object to cause the deletion of 
> related objects then you need to mark the related fields in the mapping to be 
> "dependent".
> {quote}
> http://www.datanucleus.org/products/accessplatform/jdo/persistence.html#dependent_fields
> http://www.datanucleus.org/products/datanucleus/jdo/persistence.html#_deleting_an_object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23836) Make "cols" dependent so that it cascade deletes

2020-07-10 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23836:
--
Description: 
{quote}
If you want the deletion of a persistent object to cause the deletion of 
related objects then you need to mark the related fields in the mapping to be 
"dependent".
{quote}

http://www.datanucleus.org/products/accessplatform/jdo/persistence.html#dependent_fields
http://www.datanucleus.org/products/datanucleus/jdo/persistence.html#_deleting_an_object


> Make "cols" dependent so that it cascade deletes
> 
>
> Key: HIVE-23836
> URL: https://issues.apache.org/jira/browse/HIVE-23836
> Project: Hive
>  Issue Type: Bug
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> {quote}
> If you want the deletion of a persistent object to cause the deletion of 
> related objects then you need to mark the related fields in the mapping to be 
> "dependent".
> {quote}
> http://www.datanucleus.org/products/accessplatform/jdo/persistence.html#dependent_fields
> http://www.datanucleus.org/products/datanucleus/jdo/persistence.html#_deleting_an_object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23836) Make "cols" dependent so that it cascade deletes

2020-07-10 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-23836:
-


> Make "cols" dependent so that it cascade deletes
> 
>
> Key: HIVE-23836
> URL: https://issues.apache.org/jira/browse/HIVE-23836
> Project: Hive
>  Issue Type: Bug
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23363) Upgrade DataNucleus dependency to 5.2

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23363?focusedWorklogId=457190=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457190
 ]

ASF GitHub Bot logged work on HIVE-23363:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 15:03
Start Date: 10/Jul/20 15:03
Worklog Time Spent: 10m 
  Work Description: belugabehr edited a comment on pull request #1118:
URL: https://github.com/apache/hive/pull/1118#issuecomment-656722071


   
   > Foreign Keys
   > 
   > So we now have given the datastore control over the cascade deletion 
strategy for objects stored in these tables. Please be aware that JDO provides 
Dependent Fields as a way of allowing cascade deletion. The difference here is 
that Dependent Fields is controlled by DataNucleus, whereas foreign key delete 
actions are controlled by the datastore (assuming the datastore supports it 
even)
   ```
   
   http://www.datanucleus.org/products/accessplatform/jdo/mapping.html#fk



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457190)
Time Spent: 2.5h  (was: 2h 20m)

> Upgrade DataNucleus dependency to 5.2
> -
>
> Key: HIVE-23363
> URL: https://issues.apache.org/jira/browse/HIVE-23363
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Zoltan Chovan
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23363.2.patch, HIVE-23363.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been 
> retired:
> [http://www.datanucleus.org/documentation/products.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23363) Upgrade DataNucleus dependency to 5.2

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23363?focusedWorklogId=457189=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457189
 ]

ASF GitHub Bot logged work on HIVE-23363:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 15:02
Start Date: 10/Jul/20 15:02
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1118:
URL: https://github.com/apache/hive/pull/1118#issuecomment-656722071


   ```
   Foreign Keys
   
   So we now have given the datastore control over the cascade deletion 
strategy for objects stored in these tables. Please be aware that JDO provides 
Dependent Fields as a way of allowing cascade deletion. The difference here is 
that Dependent Fields is controlled by DataNucleus, whereas foreign key delete 
actions are controlled by the datastore (assuming the datastore supports it 
even)
   ```
   
   http://www.datanucleus.org/products/accessplatform/jdo/mapping.html#fk



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457189)
Time Spent: 2h 20m  (was: 2h 10m)

> Upgrade DataNucleus dependency to 5.2
> -
>
> Key: HIVE-23363
> URL: https://issues.apache.org/jira/browse/HIVE-23363
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Zoltan Chovan
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23363.2.patch, HIVE-23363.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been 
> retired:
> [http://www.datanucleus.org/documentation/products.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23509) MapJoin AssertionError: Capacity must be power of 2

2020-07-10 Thread Shashank Pedamallu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155534#comment-17155534
 ] 

Shashank Pedamallu commented on HIVE-23509:
---

Thank you very much for getting this through!

> MapJoin AssertionError: Capacity must be power of 2
> ---
>
> Key: HIVE-23509
> URL: https://issues.apache.org/jira/browse/HIVE-23509
> Project: Hive
>  Issue Type: Bug
> Environment: Hive-2.3.6
>Reporter: Shashank Pedamallu
>Assignee: Shashank Pedamallu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Observed AssertionError errors in Hive query when rowCount for join is issued 
> as (2^x)+(2^(x+1)).
> Following is the stacktrace:
> {noformat}
> [2020-05-11 05:43:12,135] {base_task_runner.py:95} INFO - Subtask: ERROR : 
> Vertex failed, vertexName=Map 4, vertexId=vertex_1588729523139_51702_1_06, 
> diagnostics=[Task failed, taskId=task_1588729523139_51702_1_06_001286, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1588729523139_51702_1_06_001286_0:java.lang.RuntimeException: 
> java.lang.AssertionError: Capacity must be a power of two [2020-05-11 
> 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) 
> [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.security.AccessController.doPrivileged(Native Method) [2020-05-11 
> 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> javax.security.auth.Subject.doAs(Subject.java:422) [2020-05-11 05:43:12,136] 
> {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) 
> [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266) [2020-05-11 
> 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> java.lang.Thread.run(Thread.java:748) [2020-05-11 05:43:12,137] 
> {base_task_runner.py:95} INFO - Subtask: Caused by: java.lang.AssertionError: 
> Capacity must be a power of two [2020-05-11 05:43:12,137] 
> {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.validateCapacity(BytesBytesMultiHashMap.java:552)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.expandAndRehashImpl(BytesBytesMultiHashMap.java:731)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.expandAndRehashToTarget(BytesBytesMultiHashMap.java:545)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer$HashPartition.getHashMapFromDisk(HybridHashTableContainer.java:183)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.reloadHashTable(MapJoinOperator.java:641)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.continueProcess(MapJoinOperator.java:603)
>  

[jira] [Work logged] (HIVE-23363) Upgrade DataNucleus dependency to 5.2

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23363?focusedWorklogId=457186=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457186
 ]

ASF GitHub Bot logged work on HIVE-23363:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 14:54
Start Date: 10/Jul/20 14:54
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1118:
URL: https://github.com/apache/hive/pull/1118#issuecomment-656718317


   ```
   ALTER TABLE "APP"."COLUMNS_V2" ADD CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY 
("CD_ID") REFERENCES "APP"."CDS" ("CD_ID") ON DELETE NO ACTION ON UPDATE NO 
ACTION;
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457186)
Time Spent: 2h 10m  (was: 2h)

> Upgrade DataNucleus dependency to 5.2
> -
>
> Key: HIVE-23363
> URL: https://issues.apache.org/jira/browse/HIVE-23363
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Zoltan Chovan
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23363.2.patch, HIVE-23363.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been 
> retired:
> [http://www.datanucleus.org/documentation/products.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22412) StatsUtils throw NPE when explain

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22412?focusedWorklogId=457183=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457183
 ]

ASF GitHub Bot logged work on HIVE-22412:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 14:45
Start Date: 10/Jul/20 14:45
Worklog Time Spent: 10m 
  Work Description: StefanXiepj commented on a change in pull request #1209:
URL: https://github.com/apache/hive/pull/1209#discussion_r452889505



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
##
@@ -1336,6 +1341,9 @@ public static long 
getSizeOfPrimitiveTypeArraysFromType(String colType, int leng
*/
   public static long getSizeOfMap(StandardConstantMapObjectInspector scmoi) {
 Map map = scmoi.getWritableConstantValue();
+if (null == map || map.isEmpty()) {
+  return 0L;
+}

Review comment:
   @belugabehr & @kgyrtkirk , I agree entirely with you!  It have been 
updated.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457183)
Time Spent: 3h  (was: 2h 50m)

> StatsUtils throw NPE when explain
> -
>
> Key: HIVE-22412
> URL: https://issues.apache.org/jira/browse/HIVE-22412
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.2.1, 2.0.0, 3.0.0
>Reporter: xiepengjie
>Assignee: xiepengjie
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22412.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> The demo like this:
> {code:java}
> drop table if exists explain_npe_map;
> drop table if exists explain_npe_array;
> drop table if exists explain_npe_struct;
> create table explain_npe_map( c1 map );
> create table explain_npe_array  ( c1 array );
> create table explain_npe_struct ( c1 struct );
> -- error
> set hive.cbo.enable=false;
> explain select c1 from explain_npe_map where c1 is null;
> explain select c1 from explain_npe_array where c1 is null;
> explain select c1 from explain_npe_struct where c1 is null;
> -- correct
> set hive.cbo.enable=true;
> explain select c1 from explain_npe_map where c1 is null;
> explain select c1 from explain_npe_array where c1 is null;
> explain select c1 from explain_npe_struct where c1 is null;{code}
>  
> if the conf 'hive.cbo.enable' set false , NPE will be thrown ; otherwise will 
> not.
> {code:java}
> hive> drop table if exists explain_npe_map;
> OK
> Time taken: 0.063 seconds
> hive> drop table if exists explain_npe_array;
> OK
> Time taken: 0.035 seconds
> hive> drop table if exists explain_npe_struct;
> OK
> Time taken: 0.015 seconds
> hive>
> > create table explain_npe_map( c1 map );
> OK
> Time taken: 0.584 seconds
> hive> create table explain_npe_array  ( c1 array );
> OK
> Time taken: 0.216 seconds
> hive> create table explain_npe_struct ( c1 struct );
> OK
> Time taken: 0.17 seconds
> hive>
> > set hive.cbo.enable=false;
> hive> explain select c1 from explain_npe_map where c1 is null;
> FAILED: NullPointerException null
> hive> explain select c1 from explain_npe_array where c1 is null;
> FAILED: NullPointerException null
> hive> explain select c1 from explain_npe_struct where c1 is null;
> FAILED: RuntimeException Error invoking signature method
> hive>
> > set hive.cbo.enable=true;
> hive> explain select c1 from explain_npe_map where c1 is null;
> OK
> STAGE DEPENDENCIES:
>   Stage-0 is a root stageSTAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> TableScan
>   alias: explain_npe_map
>   Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column 
> stats: NONE
>   Filter Operator
> predicate: false (type: boolean)
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column 
> stats: NONE
> Select Operator
>   expressions: c1 (type: map)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
> Column stats: NONE
>   ListSinkTime taken: 1.593 seconds, Fetched: 20 row(s)
> hive> explain select c1 from explain_npe_array where c1 is null;
> OK
> STAGE DEPENDENCIES:
>   Stage-0 is a root stageSTAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> TableScan
>   alias: explain_npe_array
>   Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column 
> stats: NONE

[jira] [Work logged] (HIVE-23793) Review of QueryInfo Class

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23793?focusedWorklogId=457180=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457180
 ]

ASF GitHub Bot logged work on HIVE-23793:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 14:28
Start Date: 10/Jul/20 14:28
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1197:
URL: https://github.com/apache/hive/pull/1197


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457180)
Time Spent: 1h 10m  (was: 1h)

> Review of QueryInfo Class
> -
>
> Key: HIVE-23793
> URL: https://issues.apache.org/jira/browse/HIVE-23793
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23793) Review of QueryInfo Class

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23793?focusedWorklogId=457179=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457179
 ]

ASF GitHub Bot logged work on HIVE-23793:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 14:27
Start Date: 10/Jul/20 14:27
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #1197:
URL: https://github.com/apache/hive/pull/1197


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457179)
Time Spent: 1h  (was: 50m)

> Review of QueryInfo Class
> -
>
> Key: HIVE-23793
> URL: https://issues.apache.org/jira/browse/HIVE-23793
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23638) Fix FindBug issues in hive-common

2020-07-10 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-23638.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

pushed to master. Thank you [~pgaref]!

> Fix FindBug issues in hive-common
> -
>
> Key: HIVE-23638
> URL: https://issues.apache.org/jira/browse/HIVE-23638
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: spotbugsXml.xml
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> mvn -Pspotbugs 
> -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugin.surefire.SurefirePlugin=INFO
>  -pl :hive-common test-compile 
> com.github.spotbugs:spotbugs-maven-plugin:4.0.0:check



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23638) Fix FindBug issues in hive-common

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23638?focusedWorklogId=457160=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457160
 ]

ASF GitHub Bot logged work on HIVE-23638:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 13:13
Start Date: 10/Jul/20 13:13
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1161:
URL: https://github.com/apache/hive/pull/1161


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457160)
Time Spent: 2h 40m  (was: 2.5h)

> Fix FindBug issues in hive-common
> -
>
> Key: HIVE-23638
> URL: https://issues.apache.org/jira/browse/HIVE-23638
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> mvn -Pspotbugs 
> -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugin.surefire.SurefirePlugin=INFO
>  -pl :hive-common test-compile 
> com.github.spotbugs:spotbugs-maven-plugin:4.0.0:check



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23824) LLAP - add API to look up ORC metadata for certain Path

2020-07-10 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita updated HIVE-23824:
--
Status: Patch Available  (was: Open)

> LLAP - add API to look up ORC metadata for certain Path
> ---
>
> Key: HIVE-23824
> URL: https://issues.apache.org/jira/browse/HIVE-23824
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LLAP IO supports caching but currently this is only done via LlapRecordReader 
> / using splits, aka good old mapreduce way.
> At certain times it would worth to leverage the caching of files on certain 
> paths, that are not necessarily associated with a record reader directly. An 
> example of this could be the caching of ACID delete delta files, as they are 
> currently being read without caching.
> With this patch we'd extend the LLAP API and offer another entry point for 
> retrieving metadata of ORC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23824) LLAP - add API to look up ORC metadata for certain Path

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23824:
--
Labels: pull-request-available  (was: )

> LLAP - add API to look up ORC metadata for certain Path
> ---
>
> Key: HIVE-23824
> URL: https://issues.apache.org/jira/browse/HIVE-23824
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LLAP IO supports caching but currently this is only done via LlapRecordReader 
> / using splits, aka good old mapreduce way.
> At certain times it would worth to leverage the caching of files on certain 
> paths, that are not necessarily associated with a record reader directly. An 
> example of this could be the caching of ACID delete delta files, as they are 
> currently being read without caching.
> With this patch we'd extend the LLAP API and offer another entry point for 
> retrieving metadata of ORC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23824) LLAP - add API to look up ORC metadata for certain Path

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23824?focusedWorklogId=457155=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457155
 ]

ASF GitHub Bot logged work on HIVE-23824:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 13:06
Start Date: 10/Jul/20 13:06
Worklog Time Spent: 10m 
  Work Description: szlta opened a new pull request #1238:
URL: https://github.com/apache/hive/pull/1238


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457155)
Remaining Estimate: 0h
Time Spent: 10m

> LLAP - add API to look up ORC metadata for certain Path
> ---
>
> Key: HIVE-23824
> URL: https://issues.apache.org/jira/browse/HIVE-23824
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LLAP IO supports caching but currently this is only done via LlapRecordReader 
> / using splits, aka good old mapreduce way.
> At certain times it would worth to leverage the caching of files on certain 
> paths, that are not necessarily associated with a record reader directly. An 
> example of this could be the caching of ACID delete delta files, as they are 
> currently being read without caching.
> With this patch we'd extend the LLAP API and offer another entry point for 
> retrieving metadata of ORC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23824) LLAP - add API to look up ORC metadata for certain Path

2020-07-10 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita updated HIVE-23824:
--
Description: 
LLAP IO supports caching but currently this is only done via LlapRecordReader / 
using splits, aka good old mapreduce way.

At certain times it would worth to leverage the caching of files on certain 
paths, that are not necessarily associated with a record reader directly. An 
example of this could be the caching of ACID delete delta files, as they are 
currently being read without caching.

With this patch we'd extend the LLAP API and offer another entry point for 
retrieving metadata of ORC files.

> LLAP - add API to look up ORC metadata for certain Path
> ---
>
> Key: HIVE-23824
> URL: https://issues.apache.org/jira/browse/HIVE-23824
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>
> LLAP IO supports caching but currently this is only done via LlapRecordReader 
> / using splits, aka good old mapreduce way.
> At certain times it would worth to leverage the caching of files on certain 
> paths, that are not necessarily associated with a record reader directly. An 
> example of this could be the caching of ACID delete delta files, as they are 
> currently being read without caching.
> With this patch we'd extend the LLAP API and offer another entry point for 
> retrieving metadata of ORC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457139=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457139
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:41
Start Date: 10/Jul/20 12:41
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r452816736



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
##
@@ -330,6 +333,20 @@ a database ( directory )
 return 0;
   }
 
+  private void addLazyDataCopyTask(TaskTracker loadTaskTracker) {

Review comment:
   This is only for external tables. This will be before metadata copy as 
we are doing currently for external tables.

##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
##
@@ -148,6 +148,13 @@ public static synchronized ReplChangeManager 
getInstance(Configuration conf)
 return instance;
   }
 
+  public static synchronized ReplChangeManager getInstance() {

Review comment:
   Needed utility method of ReplChangeManager which earlier used to be 
static.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457139)
Time Spent: 2h 40m  (was: 2.5h)

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.
> Also introduces a config option to run data copy tasks during repl load 
> operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22957) Support Partition Filtering In MSCK REPAIR TABLE Command

2020-07-10 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155448#comment-17155448
 ] 

Syed Shameerur Rahman commented on HIVE-22957:
--

[~kgyrtkirk] Thank you for the review. I have tried to address all your 
comments and updated the PR. Please take a look?
FYI: The test failures are unrelated and passed on local run.

> Support Partition Filtering In MSCK REPAIR TABLE Command
> 
>
> Key: HIVE-22957
> URL: https://issues.apache.org/jira/browse/HIVE-22957
> Project: Hive
>  Issue Type: Improvement
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> *Design Doc:*
> [^Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457137=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457137
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:38
Start Date: 10/Jul/20 12:38
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r452816170



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileListStreamer.java
##
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.repl.util;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedWriter;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStreamWriter;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.TimeUnit;
+
+public class FileListStreamer extends Thread implements Closeable {
+  private static final Logger LOG = 
LoggerFactory.getLogger(FileListStreamer.class);
+  private static final long TIMEOUT_IN_SECS = 5L;
+  private volatile boolean stop;
+  private final LinkedBlockingQueue cache;
+  private Path backingFile;
+  private Configuration conf;
+  private BufferedWriter backingFileWriter;
+  private volatile boolean valid = true;
+  private volatile boolean asyncMode = false;
+  private final Object COMPLETION_LOCK = new Object();
+  private volatile boolean completed = false;
+
+
+
+  public FileListStreamer(LinkedBlockingQueue cache, Path backingFile, 
Configuration conf) throws IOException {
+this.cache = cache;
+this.backingFile = backingFile;
+this.conf = conf;
+init();
+  }
+
+  private void init() throws IOException {
+FileSystem fs = FileSystem.get(backingFile.toUri(), conf);
+backingFileWriter = new BufferedWriter(new 
OutputStreamWriter(fs.create(backingFile, !asyncMode)));

Review comment:
   I will get rid of the synchronous  mode altogether as currently not 
needed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457137)
Time Spent: 2.5h  (was: 2h 20m)

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.
> Also introduces a config option to run data copy tasks during repl load 
> operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457138=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457138
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:38
Start Date: 10/Jul/20 12:38
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r452816256



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java
##
@@ -1754,6 +1760,16 @@ public void testForeignKeys() {
 Assert.assertEquals(cachedKeys.get(0).getFkcolumn_name(), "col2");
 Assert.assertEquals(cachedKeys.get(0).getCatName(), DEFAULT_CATALOG_NAME);
 
+cachedKeys = sharedCache.listCachedForeignKeys(
+DEFAULT_CATALOG_NAME, tbl.getDbName(), tbl.getTableName(), 
tbl1.getDbName(), tbl1.getTableName());
+
+Assert.assertEquals(cachedKeys.size(), 1);
+Assert.assertEquals(cachedKeys.get(0).getFk_name(), "fk2");
+Assert.assertEquals(cachedKeys.get(0).getFktable_db(), "db");
+Assert.assertEquals(cachedKeys.get(0).getFktable_name(), 
tbl.getTableName());
+Assert.assertEquals(cachedKeys.get(0).getFkcolumn_name(), "col1");

Review comment:
   done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457138)
Time Spent: 4.5h  (was: 4h 20m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23618) NotificationLog should also contain events for default/check constraints

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23618?focusedWorklogId=457135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457135
 ]

ASF GitHub Bot logged work on HIVE-23618:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:37
Start Date: 10/Jul/20 12:37
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on pull request #1237:
URL: https://github.com/apache/hive/pull/1237#issuecomment-656653951


   @maheshk114  @pkumarsinha Can you please take a look at the PR? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457135)
Remaining Estimate: 0h
Time Spent: 10m

> NotificationLog should also contain events for default/check constraints
> 
>
> Key: HIVE-23618
> URL: https://issues.apache.org/jira/browse/HIVE-23618
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Adesh Kumar Rao
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This should follow similar approach of notNull/Unique constraints. This will 
> also include event replication for these constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457136=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457136
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:37
Start Date: 10/Jul/20 12:37
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r452816142



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java
##
@@ -261,44 +283,57 @@ public int getObjectSize(Class clazz, Object obj) {
 private Map parameters;
 private byte[] sdHash;
 private int otherSize;
-private int tableColStatsCacheSize;
-private int partitionCacheSize;
-private int partitionColStatsCacheSize;
-private int aggrColStatsCacheSize;
+
+// Arrays to hold the size/updated bit of cached objects.
+// These arrays are to be referenced using MemberName enum only.
+private int[] memberObjectsSize = new int[MemberName.values().length];
+private AtomicBoolean[] memberCacheUpdated = new 
AtomicBoolean[MemberName.values().length];
 
 private ReentrantReadWriteLock tableLock = new 
ReentrantReadWriteLock(true);
 // For caching column stats for an unpartitioned table
 // Key is column name and the value is the col stat object
 private Map tableColStatsCache = new 
ConcurrentHashMap();
-private AtomicBoolean isTableColStatsCacheDirty = new AtomicBoolean(false);
 // For caching partition objects
 // Ket is partition values and the value is a wrapper around the partition 
object
 private Map partitionCache = new 
ConcurrentHashMap();
-private AtomicBoolean isPartitionCacheDirty = new AtomicBoolean(false);
 // For caching column stats for a partitioned table
 // Key is aggregate of partition values, column name and the value is the 
col stat object
 private Map partitionColStatsCache =
 new ConcurrentHashMap();
-private AtomicBoolean isPartitionColStatsCacheDirty = new 
AtomicBoolean(false);
 // For caching aggregate column stats for all and all minus default 
partition
 // Key is column name and the value is a list of 2 col stat objects
 // (all partitions and all but default)
 private Map> aggrColStatsCache =
 new ConcurrentHashMap>();
-private AtomicBoolean isAggrPartitionColStatsCacheDirty = new 
AtomicBoolean(false);
+
+private Map primaryKeyCache = new 
ConcurrentHashMap<>();
+
+private Map foreignKeyCache = new 
ConcurrentHashMap<>();
+
+private Map notNullConstraintCache = new 
ConcurrentHashMap<>();
+
+private Map uniqueConstraintCache = new 
ConcurrentHashMap<>();
 
 TableWrapper(Table t, byte[] sdHash, String location, Map 
parameters) {
   this.t = t;
   this.sdHash = sdHash;
   this.location = location;
   this.parameters = parameters;
-  this.tableColStatsCacheSize = 0;
-  this.partitionCacheSize = 0;
-  this.partitionColStatsCacheSize = 0;
-  this.aggrColStatsCacheSize = 0;
+  for(MemberName mn : MemberName.values()) {
+this.memberObjectsSize[mn.getValue()] = 0;

Review comment:
   In second thought, I think ordinal is better as we freshly load cache 
entries during HMS startup. So, the ordering doesn't matter. However, setting 
values can be a problem if someone pass incorrect value or remove an element 
without updating other values.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457136)
Time Spent: 4h 20m  (was: 4h 10m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457134=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457134
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:37
Start Date: 10/Jul/20 12:37
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r452815690



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileListStreamer.java
##
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.repl.util;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedWriter;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStreamWriter;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.TimeUnit;
+
+public class FileListStreamer extends Thread implements Closeable {

Review comment:
   FileListStreamer is a treated as a specialized Worker and hence 
extending the Thread. If it would have been treated as a job then Runnable 
route might have been fine. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457134)
Time Spent: 2h 20m  (was: 2h 10m)

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.
> Also introduces a config option to run data copy tasks during repl load 
> operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23618) NotificationLog should also contain events for default/check constraints

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23618:
--
Labels: pull-request-available  (was: )

> NotificationLog should also contain events for default/check constraints
> 
>
> Key: HIVE-23618
> URL: https://issues.apache.org/jira/browse/HIVE-23618
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Adesh Kumar Rao
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This should follow similar approach of notNull/Unique constraints. This will 
> also include event replication for these constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457132=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457132
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:35
Start Date: 10/Jul/20 12:35
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r452815017



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileListStreamer.java
##
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.repl.util;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedWriter;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStreamWriter;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.TimeUnit;
+
+public class FileListStreamer extends Thread implements Closeable {
+  private static final Logger LOG = 
LoggerFactory.getLogger(FileListStreamer.class);
+  private static final long TIMEOUT_IN_SECS = 5L;
+  private volatile boolean stop;
+  private final LinkedBlockingQueue cache;
+  private Path backingFile;
+  private Configuration conf;
+  private BufferedWriter backingFileWriter;
+  private volatile boolean valid = true;
+  private volatile boolean asyncMode = false;
+  private final Object COMPLETION_LOCK = new Object();
+  private volatile boolean completed = false;
+
+
+
+  public FileListStreamer(LinkedBlockingQueue cache, Path backingFile, 
Configuration conf) throws IOException {
+this.cache = cache;
+this.backingFile = backingFile;
+this.conf = conf;
+init();
+  }
+
+  private void init() throws IOException {
+FileSystem fs = FileSystem.get(backingFile.toUri(), conf);
+backingFileWriter = new BufferedWriter(new 
OutputStreamWriter(fs.create(backingFile, !asyncMode)));
+LOG.info("Initialized a file based store to save a list at: {}, 
ayncMode:{}", backingFile, asyncMode);
+  }
+
+  public boolean isValid() {
+return valid;
+  }
+
+  @Override
+  public void close() throws IOException {
+if (!asyncMode) {
+  closeBackingFile();
+  return;
+}
+stop = true;
+synchronized (COMPLETION_LOCK) {
+  while (!completed && isValid()) {
+try {
+  COMPLETION_LOCK.wait(TimeUnit.SECONDS.toMillis(TIMEOUT_IN_SECS));
+} catch (InterruptedException e) {
+}
+  }
+}
+if (!isValid()) {

Review comment:
   No, it can't be moved above as this ensures the correctness of the 
consumption of the remaining entries from the cache.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457132)
Time Spent: 2h 10m  (was: 2h)

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.
> Also introduces a config option to run data copy tasks during 

[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457129=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457129
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:32
Start Date: 10/Jul/20 12:32
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r452813518



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileListStreamer.java
##
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.repl.util;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedWriter;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStreamWriter;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.TimeUnit;
+
+public class FileListStreamer extends Thread implements Closeable {
+  private static final Logger LOG = 
LoggerFactory.getLogger(FileListStreamer.class);
+  private static final long TIMEOUT_IN_SECS = 5L;
+  private volatile boolean stop;
+  private final LinkedBlockingQueue cache;
+  private Path backingFile;
+  private Configuration conf;
+  private BufferedWriter backingFileWriter;
+  private volatile boolean valid = true;
+  private volatile boolean asyncMode = false;
+  private final Object COMPLETION_LOCK = new Object();
+  private volatile boolean completed = false;
+
+
+
+  public FileListStreamer(LinkedBlockingQueue cache, Path backingFile, 
Configuration conf) throws IOException {
+this.cache = cache;
+this.backingFile = backingFile;
+this.conf = conf;
+init();
+  }
+
+  private void init() throws IOException {
+FileSystem fs = FileSystem.get(backingFile.toUri(), conf);
+backingFileWriter = new BufferedWriter(new 
OutputStreamWriter(fs.create(backingFile, !asyncMode)));
+LOG.info("Initialized a file based store to save a list at: {}, 
ayncMode:{}", backingFile, asyncMode);
+  }
+
+  public boolean isValid() {
+return valid;
+  }
+
+  @Override
+  public void close() throws IOException {
+if (!asyncMode) {
+  closeBackingFile();
+  return;
+}
+stop = true;
+synchronized (COMPLETION_LOCK) {
+  while (!completed && isValid()) {
+try {
+  COMPLETION_LOCK.wait(TimeUnit.SECONDS.toMillis(TIMEOUT_IN_SECS));
+} catch (InterruptedException e) {
+}
+  }
+}
+if (!isValid()) {
+  throw new IOException("File list is not in a valid state:" + 
backingFile);
+}
+LOG.info("Completed close for File List backed by ", backingFile);
+  }
+
+  public synchronized void writeInThread(String nextEntry) throws 
SemanticException {
+try {
+  backingFileWriter.write(nextEntry);
+  backingFileWriter.newLine();
+} catch (IOException e) {
+  throw new SemanticException(e);
+}
+  }
+  @Override
+  public void run() {
+asyncMode = true;
+boolean exThrown = false;
+while (!exThrown && (!stop || !cache.isEmpty())) {
+  try {
+String nextEntry = cache.poll(TIMEOUT_IN_SECS, TimeUnit.SECONDS);
+if (nextEntry != null) {
+  backingFileWriter.write(nextEntry);
+  backingFileWriter.newLine();
+  LOG.debug("Writing entry {} to file list backed by {}", nextEntry, 
backingFile);
+}
+  } catch (Exception iEx) {
+if (!(iEx instanceof InterruptedException)) {
+  // not draining any more. Inform the producer to avoid OOM.
+  valid = false;
+  LOG.error("Exception while saving the list to file " + backingFile, 
iEx);
+  exThrown = true;
+}
+  }
+}
+try{
+  closeBackingFile();
+  completed = true;
+} finally {
+  synchronized (COMPLETION_LOCK) {
+COMPLETION_LOCK.notify();
+  }
+  

[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457128=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457128
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:31
Start Date: 10/Jul/20 12:31
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r452812877



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileList.java
##
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.repl.util;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.Iterator;
+import java.util.NoSuchElementException;
+import java.util.concurrent.LinkedBlockingQueue;
+
+
+/**
+ * A file backed list of Strings which is in-memory till the threshold.
+ */
+public class FileList implements Closeable, Iterator {

Review comment:
   Also add concurrency tests - Can you please suggest on this?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457128)
Time Spent: 1h 50m  (was: 1h 40m)

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.
> Also introduces a config option to run data copy tasks during repl load 
> operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457127=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457127
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:30
Start Date: 10/Jul/20 12:30
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r452812525



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileList.java
##
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.repl.util;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.Iterator;
+import java.util.NoSuchElementException;
+import java.util.concurrent.LinkedBlockingQueue;
+
+
+/**
+ * A file backed list of Strings which is in-memory till the threshold.
+ */
+public class FileList implements Closeable, Iterator {
+  private static final Logger LOG = LoggerFactory.getLogger(FileList.class);
+  private static int fileListStreamerID = 0;
+  private static final String  FILE_LIST_STREAMER_PREFIX = 
"file-list-streamer-";
+
+  private LinkedBlockingQueue cache;
+  private volatile boolean thresholdHit = false;
+  private int thresholdPoint;
+  private float thresholdFactor = 0.9f;
+  private Path backingFile;
+  private FileListStreamer fileListStreamer;
+  private FileListOpMode fileListOpMode;
+  private String nextElement;
+  private boolean noMoreElement;
+  private HiveConf conf;
+  private BufferedReader backingFileReader;
+  private volatile boolean asyncMode;
+
+
+  /**
+   * To be used only for READ mode;
+   */
+  public FileList(Path backingFile, HiveConf conf) {

Review comment:
   It would be risky to operate on same file both READ and WRITE at the 
same time and hence modes are there to prevent that.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457127)
Time Spent: 1h 40m  (was: 1.5h)

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.
> Also introduces a config option to run data copy tasks during repl load 
> operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457126=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457126
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:29
Start Date: 10/Jul/20 12:29
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r452812001



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileList.java
##
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.repl.util;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.Iterator;
+import java.util.NoSuchElementException;
+import java.util.concurrent.LinkedBlockingQueue;
+
+
+/**
+ * A file backed list of Strings which is in-memory till the threshold.
+ */
+public class FileList implements Closeable, Iterator {
+  private static final Logger LOG = LoggerFactory.getLogger(FileList.class);
+  private static int fileListStreamerID = 0;
+  private static final String  FILE_LIST_STREAMER_PREFIX = 
"file-list-streamer-";
+
+  private LinkedBlockingQueue cache;
+  private volatile boolean thresholdHit = false;
+  private int thresholdPoint;
+  private float thresholdFactor = 0.9f;
+  private Path backingFile;
+  private FileListStreamer fileListStreamer;
+  private FileListOpMode fileListOpMode;
+  private String nextElement;
+  private boolean noMoreElement;
+  private HiveConf conf;
+  private BufferedReader backingFileReader;
+  private volatile boolean asyncMode;
+
+
+  /**
+   * To be used only for READ mode;
+   */
+  public FileList(Path backingFile, HiveConf conf) {
+this.backingFile = backingFile;
+thresholdHit = true;
+fileListOpMode = FileListOpMode.READ;
+this.conf = conf;
+  }
+
+  /**
+   * To be used only for WRITE mode;
+   */
+  public FileList(Path backingFile, int cacheSize, HiveConf conf, boolean 
asyncMode) throws IOException {

Review comment:
   If it is called other wise, it won't allow to use it anyway.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457126)
Time Spent: 1.5h  (was: 1h 20m)

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.
> Also introduces a config option to run data copy tasks during repl load 
> operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457124=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457124
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:28
Start Date: 10/Jul/20 12:28
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r452811664



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##
@@ -591,14 +590,25 @@ private Long incrementalDump(Path dumpRoot, DumpMetaData 
dmd, Path cmRoot, Hive
 }
   }
   dumpTableListToDumpLocation(tableList, dumpRoot, dbName, conf);
-  extTableCopyWorks = dirLocationsToCopy(extTableLocations);
 }
-work.setDirCopyIterator(extTableCopyWorks.iterator());
-work.setManagedTableCopyPathIterator(managedTableCopyPaths.iterator());
+setDataCopyIterators(extTableFileList, managedTblList);
 work.getMetricCollector().reportStageEnd(getName(), Status.SUCCESS, 
lastReplId);
 return lastReplId;
   }
 
+  private void setDataCopyIterators(FileList extTableFileList, FileList 
managedTableFileList) throws IOException {
+boolean dataCopyAtLoad = 
conf.getBoolVar(HiveConf.ConfVars.REPL_DATA_COPY_LAZY);
+extTableFileList.close();

Review comment:
   Close make sure that every thing is flushed out and the list can be used 
in READ mode





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457124)
Time Spent: 1h 20m  (was: 1h 10m)

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.
> Also introduces a config option to run data copy tasks during repl load 
> operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457122=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457122
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:26
Start Date: 10/Jul/20 12:26
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r452810700



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileList.java
##
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.repl.util;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.Iterator;
+import java.util.NoSuchElementException;
+import java.util.concurrent.LinkedBlockingQueue;
+
+
+/**
+ * A file backed list of Strings which is in-memory till the threshold.
+ */
+public class FileList implements Closeable, Iterator {
+  private static final Logger LOG = LoggerFactory.getLogger(FileList.class);
+  private static int fileListStreamerID = 0;
+  private static final String  FILE_LIST_STREAMER_PREFIX = 
"file-list-streamer-";
+
+  private LinkedBlockingQueue cache;
+  private volatile boolean thresholdHit = false;
+  private int thresholdPoint;

Review comment:
   thresholdHit is a boolean which once hit is used to take action. 
thresholdPoint is a value after which thresholdHit is set.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457122)
Time Spent: 1h 10m  (was: 1h)

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.
> Also introduces a config option to run data copy tasks during repl load 
> operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457118=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457118
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:24
Start Date: 10/Jul/20 12:24
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r452810053



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##
@@ -465,9 +463,13 @@ private Long incrementalDump(Path dumpRoot, DumpMetaData 
dmd, Path cmRoot, Hive
 String validTxnList = null;
 long waitUntilTime = 0;
 long bootDumpBeginReplId = -1;
-List managedTableCopyPaths = 
Collections.emptyList();
-List extTableCopyWorks = Collections.emptyList();
+
+int cacheSize = 
conf.getIntVar(HiveConf.ConfVars.REPL_FILE_LIST_CACHE_SIZE);

Review comment:
   Cache is rebuilt and in case of file, it should be overwritten





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457118)
Time Spent: 1h  (was: 50m)

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.
> Also introduces a config option to run data copy tasks during repl load 
> operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457115=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457115
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:23
Start Date: 10/Jul/20 12:23
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r452809466



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/io/FileOperations.java
##
@@ -165,4 +175,92 @@ private void validateSrcPathListExists() throws 
IOException, LoginException {
   throw new FileNotFoundException(FILE_NOT_FOUND.format(e.getMessage()));
 }
   }
+
+  /**
+   * This needs the root data directory to which the data needs to be exported 
to.
+   * The data export here is a list of files either in table/partition that 
are written to the _files
+   * in the exportRootDataDir provided.
+   */
+  private void exportFilesAsList() throws SemanticException, IOException, 
LoginException {
+if (dataPathList.isEmpty()) {
+  return;
+}
+boolean done = false;
+int repeat = 0;
+while (!done) {
+  // This is only called for replication that handles MM tables; no need 
for mmCtx.
+  try (BufferedWriter writer = writer()) {
+for (Path dataPath : dataPathList) {
+  writeFilesList(listFilesInDir(dataPath), writer, 
AcidUtils.getAcidSubDir(dataPath));
+}
+done = true;
+  } catch (IOException e) {
+if (e instanceof FileNotFoundException) {
+  logger.error("exporting data files in dir : " + dataPathList + " to 
" + exportRootDataDir + " failed");
+  throw new 
FileNotFoundException(FILE_NOT_FOUND.format(e.getMessage()));
+}
+repeat++;
+logger.info("writeFilesList failed", e);
+if (repeat >= FileUtils.MAX_IO_ERROR_RETRY) {
+  logger.error("exporting data files in dir : " + dataPathList + " to 
" + exportRootDataDir + " failed");
+  throw new 
IOException(ErrorMsg.REPL_FILE_SYSTEM_OPERATION_RETRY.getMsg());
+}
+
+int sleepTime = FileUtils.getSleepTime(repeat - 1);
+logger.info(" sleep for {} milliseconds for retry num {} ", sleepTime 
, repeat);
+try {
+  Thread.sleep(sleepTime);
+} catch (InterruptedException timerEx) {
+  logger.info("thread sleep interrupted", timerEx.getMessage());
+}
+
+// in case of io error, reset the file system object
+FileSystem.closeAllForUGI(Utils.getUGI());
+dataFileSystem = dataPathList.get(0).getFileSystem(hiveConf);
+exportFileSystem = exportRootDataDir.getFileSystem(hiveConf);
+Path exportPath = new Path(exportRootDataDir, EximUtil.FILES_NAME);
+if (exportFileSystem.exists(exportPath)) {
+  exportFileSystem.delete(exportPath, true);
+}
+  }
+}
+  }
+
+  private void writeFilesList(FileStatus[] fileStatuses, BufferedWriter 
writer, String encodedSubDirs)
+  throws IOException {
+ReplChangeManager replChangeManager = ReplChangeManager.getInstance();

Review comment:
   Can you please elaborate? Didn't get which parameter you are referring 
to.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457115)
Time Spent: 50m  (was: 40m)

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.
> Also introduces a config option to run data copy tasks during repl load 
> operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457113=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457113
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:22
Start Date: 10/Jul/20 12:22
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r452808706



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/io/FileOperations.java
##
@@ -165,4 +175,92 @@ private void validateSrcPathListExists() throws 
IOException, LoginException {
   throw new FileNotFoundException(FILE_NOT_FOUND.format(e.getMessage()));
 }
   }
+
+  /**
+   * This needs the root data directory to which the data needs to be exported 
to.
+   * The data export here is a list of files either in table/partition that 
are written to the _files
+   * in the exportRootDataDir provided.
+   */
+  private void exportFilesAsList() throws SemanticException, IOException, 
LoginException {
+if (dataPathList.isEmpty()) {
+  return;
+}
+boolean done = false;
+int repeat = 0;
+while (!done) {
+  // This is only called for replication that handles MM tables; no need 
for mmCtx.
+  try (BufferedWriter writer = writer()) {
+for (Path dataPath : dataPathList) {
+  writeFilesList(listFilesInDir(dataPath), writer, 
AcidUtils.getAcidSubDir(dataPath));
+}
+done = true;
+  } catch (IOException e) {
+if (e instanceof FileNotFoundException) {
+  logger.error("exporting data files in dir : " + dataPathList + " to 
" + exportRootDataDir + " failed");
+  throw new 
FileNotFoundException(FILE_NOT_FOUND.format(e.getMessage()));
+}
+repeat++;
+logger.info("writeFilesList failed", e);
+if (repeat >= FileUtils.MAX_IO_ERROR_RETRY) {
+  logger.error("exporting data files in dir : " + dataPathList + " to 
" + exportRootDataDir + " failed");
+  throw new 
IOException(ErrorMsg.REPL_FILE_SYSTEM_OPERATION_RETRY.getMsg());
+}
+
+int sleepTime = FileUtils.getSleepTime(repeat - 1);
+logger.info(" sleep for {} milliseconds for retry num {} ", sleepTime 
, repeat);
+try {
+  Thread.sleep(sleepTime);
+} catch (InterruptedException timerEx) {
+  logger.info("thread sleep interrupted", timerEx.getMessage());
+}
+
+// in case of io error, reset the file system object
+FileSystem.closeAllForUGI(Utils.getUGI());
+dataFileSystem = dataPathList.get(0).getFileSystem(hiveConf);
+exportFileSystem = exportRootDataDir.getFileSystem(hiveConf);
+Path exportPath = new Path(exportRootDataDir, EximUtil.FILES_NAME);
+if (exportFileSystem.exists(exportPath)) {
+  exportFileSystem.delete(exportPath, true);
+}
+  }
+}
+  }
+
+  private void writeFilesList(FileStatus[] fileStatuses, BufferedWriter 
writer, String encodedSubDirs)
+  throws IOException {
+ReplChangeManager replChangeManager = ReplChangeManager.getInstance();
+for (FileStatus fileStatus : fileStatuses) {
+  if (fileStatus.isDirectory()) {
+// Write files inside the sub-directory.
+Path subDir = fileStatus.getPath();
+writeFilesList(listFilesInDir(subDir), writer, 
encodedSubDir(encodedSubDirs, subDir));
+  } else {
+writer.write(encodedUri(replChangeManager, fileStatus, 
encodedSubDirs));
+writer.newLine();
+  }
+}
+  }
+
+  private BufferedWriter writer() throws IOException {
+Path exportToFile = new Path(exportRootDataDir, EximUtil.FILES_NAME);
+logger.debug("exporting data files in dir : " + dataPathList + " to " + 
exportToFile);
+return new BufferedWriter(
+new OutputStreamWriter(exportFileSystem.create(exportToFile))
+);
+  }
+
+  private String encodedSubDir(String encodedParentDirs, Path subDir) {
+if (null == encodedParentDirs) {
+  return subDir.getName();
+} else {
+  return encodedParentDirs + Path.SEPARATOR + subDir.getName();
+}
+  }
+
+  private String encodedUri(ReplChangeManager replChangeManager, FileStatus 
fileStatus, String encodedSubDir)
+  throws IOException {
+Path currentDataFilePath = fileStatus.getPath();
+String checkSum = ReplChangeManager.checksumFor(currentDataFilePath, 
dataFileSystem);

Review comment:
   Which method?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking

[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457109=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457109
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:16
Start Date: 10/Jul/20 12:16
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r452806379



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java
##
@@ -1754,6 +1760,16 @@ public void testForeignKeys() {
 Assert.assertEquals(cachedKeys.get(0).getFkcolumn_name(), "col2");
 Assert.assertEquals(cachedKeys.get(0).getCatName(), DEFAULT_CATALOG_NAME);
 
+cachedKeys = sharedCache.listCachedForeignKeys(
+DEFAULT_CATALOG_NAME, tbl.getDbName(), tbl.getTableName(), 
tbl1.getDbName(), tbl1.getTableName());
+
+Assert.assertEquals(cachedKeys.size(), 1);
+Assert.assertEquals(cachedKeys.get(0).getFk_name(), "fk2");
+Assert.assertEquals(cachedKeys.get(0).getFktable_db(), "db");
+Assert.assertEquals(cachedKeys.get(0).getFktable_name(), 
tbl.getTableName());
+Assert.assertEquals(cachedKeys.get(0).getFkcolumn_name(), "col1");

Review comment:
   Also validate if parent tbl key is proper too.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457109)
Time Spent: 4h  (was: 3h 50m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457110=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457110
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:16
Start Date: 10/Jul/20 12:16
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r452806379



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java
##
@@ -1754,6 +1760,16 @@ public void testForeignKeys() {
 Assert.assertEquals(cachedKeys.get(0).getFkcolumn_name(), "col2");
 Assert.assertEquals(cachedKeys.get(0).getCatName(), DEFAULT_CATALOG_NAME);
 
+cachedKeys = sharedCache.listCachedForeignKeys(
+DEFAULT_CATALOG_NAME, tbl.getDbName(), tbl.getTableName(), 
tbl1.getDbName(), tbl1.getTableName());
+
+Assert.assertEquals(cachedKeys.size(), 1);
+Assert.assertEquals(cachedKeys.get(0).getFk_name(), "fk2");
+Assert.assertEquals(cachedKeys.get(0).getFktable_db(), "db");
+Assert.assertEquals(cachedKeys.get(0).getFktable_name(), 
tbl.getTableName());
+Assert.assertEquals(cachedKeys.get(0).getFkcolumn_name(), "col1");

Review comment:
   Also validate if parent tbl name is proper too.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457110)
Time Spent: 4h 10m  (was: 4h)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457108=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457108
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:12
Start Date: 10/Jul/20 12:12
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r452798321



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##
@@ -591,14 +590,25 @@ private Long incrementalDump(Path dumpRoot, DumpMetaData 
dmd, Path cmRoot, Hive
 }
   }
   dumpTableListToDumpLocation(tableList, dumpRoot, dbName, conf);
-  extTableCopyWorks = dirLocationsToCopy(extTableLocations);
 }
-work.setDirCopyIterator(extTableCopyWorks.iterator());
-work.setManagedTableCopyPathIterator(managedTableCopyPaths.iterator());
+setDataCopyIterators(extTableFileList, managedTblList);
 work.getMetricCollector().reportStageEnd(getName(), Status.SUCCESS, 
lastReplId);
 return lastReplId;
   }
 
+  private void setDataCopyIterators(FileList extTableFileList, FileList 
managedTableFileList) throws IOException {
+boolean dataCopyAtLoad = 
conf.getBoolVar(HiveConf.ConfVars.REPL_DATA_COPY_LAZY);
+extTableFileList.close();

Review comment:
   Is this serving the purpose of flush? Its not clear why close is called 
before setting the iterator. Needs to be simplified.

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileList.java
##
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.repl.util;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.Iterator;
+import java.util.NoSuchElementException;
+import java.util.concurrent.LinkedBlockingQueue;
+
+
+/**
+ * A file backed list of Strings which is in-memory till the threshold.
+ */
+public class FileList implements Closeable, Iterator {

Review comment:
   Add UTs

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileListStreamer.java
##
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.repl.util;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedWriter;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStreamWriter;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.TimeUnit;
+
+public class FileListStreamer extends Thread implements Closeable {
+  private static final Logger LOG = 
LoggerFactory.getLogger(FileListStreamer.class);
+  private static final long TIMEOUT_IN_SECS = 5L;
+  private volatile boolean stop;
+  private final 

[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457105=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457105
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:01
Start Date: 10/Jul/20 12:01
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r452799879



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java
##
@@ -261,44 +283,57 @@ public int getObjectSize(Class clazz, Object obj) {
 private Map parameters;
 private byte[] sdHash;
 private int otherSize;
-private int tableColStatsCacheSize;
-private int partitionCacheSize;
-private int partitionColStatsCacheSize;
-private int aggrColStatsCacheSize;
+
+// Arrays to hold the size/updated bit of cached objects.
+// These arrays are to be referenced using MemberName enum only.
+private int[] memberObjectsSize = new int[MemberName.values().length];
+private AtomicBoolean[] memberCacheUpdated = new 
AtomicBoolean[MemberName.values().length];
 
 private ReentrantReadWriteLock tableLock = new 
ReentrantReadWriteLock(true);
 // For caching column stats for an unpartitioned table
 // Key is column name and the value is the col stat object
 private Map tableColStatsCache = new 
ConcurrentHashMap();
-private AtomicBoolean isTableColStatsCacheDirty = new AtomicBoolean(false);
 // For caching partition objects
 // Ket is partition values and the value is a wrapper around the partition 
object
 private Map partitionCache = new 
ConcurrentHashMap();
-private AtomicBoolean isPartitionCacheDirty = new AtomicBoolean(false);
 // For caching column stats for a partitioned table
 // Key is aggregate of partition values, column name and the value is the 
col stat object
 private Map partitionColStatsCache =
 new ConcurrentHashMap();
-private AtomicBoolean isPartitionColStatsCacheDirty = new 
AtomicBoolean(false);
 // For caching aggregate column stats for all and all minus default 
partition
 // Key is column name and the value is a list of 2 col stat objects
 // (all partitions and all but default)
 private Map> aggrColStatsCache =
 new ConcurrentHashMap>();
-private AtomicBoolean isAggrPartitionColStatsCacheDirty = new 
AtomicBoolean(false);
+
+private Map primaryKeyCache = new 
ConcurrentHashMap<>();
+
+private Map foreignKeyCache = new 
ConcurrentHashMap<>();
+
+private Map notNullConstraintCache = new 
ConcurrentHashMap<>();
+
+private Map uniqueConstraintCache = new 
ConcurrentHashMap<>();
 
 TableWrapper(Table t, byte[] sdHash, String location, Map 
parameters) {
   this.t = t;
   this.sdHash = sdHash;
   this.location = location;
   this.parameters = parameters;
-  this.tableColStatsCacheSize = 0;
-  this.partitionCacheSize = 0;
-  this.partitionColStatsCacheSize = 0;
-  this.aggrColStatsCacheSize = 0;
+  for(MemberName mn : MemberName.values()) {
+this.memberObjectsSize[mn.getValue()] = 0;

Review comment:
   Java treats enum as objects. Array indexes can be integers only. 
Therefore, I have to use mn.getValue() only. 
   
   PS: Enum also provides `ordinal` method that returns the position of enum 
member, but that can cause issues if order is changed. So, I decided to go 
ahead with creating own getValue() method.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457105)
Time Spent: 3h 50m  (was: 3h 40m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23237) Display HiveServer2 hostname in the operation logs

2020-07-10 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155396#comment-17155396
 ] 

Zhihua Deng commented on HIVE-23237:


This may resolved at https://issues.apache.org/jira/browse/HIVE-23722

> Display HiveServer2 hostname in the operation logs
> --
>
> Key: HIVE-23237
> URL: https://issues.apache.org/jira/browse/HIVE-23237
> Project: Hive
>  Issue Type: Improvement
>Reporter: Miklos Szurap
>Priority: Major
>  Labels: supportability
>
> Hive deployments often have an external load-balancer in front of multiple 
> HiveServer2 instances. 
> In such cases the client does not know which HiveServer2 it is connected to. 
> If there are some issues all HiveServer2 logs have to be searched for clues 
> instead of directly going to the right host. It would be great if the HS2 
> hostname was logged to the client logs (for example to beeline's output). 
> We can "work around" by printing out this information with executing a "set 
> hive.server2.thrift.bind.host;" however that requires an explicit 
> modification to every application. 
> Can we print this information in the operation logs and that way streaming it 
> back to the client? 
> Likely some users - customers do not want to expose that, so the behavior 
> should be configurable.
> This could make the issue/error investigation much easier.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-20441) NPE in ExprNodeGenericFuncDesc when hive.allow.udf.load.on.demand is set to true

2020-07-10 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-20441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153437#comment-17153437
 ] 

Zhihua Deng edited comment on HIVE-20441 at 7/10/20, 11:38 AM:
---

...The problem may still be there in the trunk,  [~BIGrey] are you still 
working on this ? if not, can I take over? thanks!


was (Author: dengzh):
...The problem may still be there in the trunk,  [~BIGrey] are you still 
working on this ?

> NPE in ExprNodeGenericFuncDesc  when hive.allow.udf.load.on.demand is set to 
> true
> -
>
> Key: HIVE-20441
> URL: https://issues.apache.org/jira/browse/HIVE-20441
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, HiveServer2
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Major
> Attachments: HIVE-20441.1.patch, HIVE-20441.2.patch, 
> HIVE-20441.3.patch, HIVE-20441.4.patch, HIVE-20441.patch
>
>
> When hive.allow.udf.load.on.demand is set to true and hiveserver2 has been 
> started, the new created function from other clients or hiveserver2 will be 
> loaded from the metastore at the first time. 
> When the udf is used in where clause, we got a NPE like:
> {code:java}
> Error executing statement:
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: NullPointerException null
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:320) 
> ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAP
> SHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHO
> T]
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:542)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA
> PSHOT]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA
> PSHOT]
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:57)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_77]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_77]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_77]
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:236)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:1104)
>  ~[hive-exec-2.
> 3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1359)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.
> 3.4-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> 

[jira] [Assigned] (HIVE-23835) Repl Dump should dump function binaries to staging directory

2020-07-10 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha reassigned HIVE-23835:
---


> Repl Dump should dump function binaries to staging directory
> 
>
> Key: HIVE-23835
> URL: https://issues.apache.org/jira/browse/HIVE-23835
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>
> {color:#172b4d}When hive function's binaries are on source HDFS, repl dump 
> should dump it to the staging location in order to break cross clusters 
> visibility requirement.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23760) Upgrading to Kafka 2.5 Clients

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23760?focusedWorklogId=457097=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457097
 ]

ASF GitHub Bot logged work on HIVE-23760:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 11:27
Start Date: 10/Jul/20 11:27
Worklog Time Spent: 10m 
  Work Description: klcopp merged pull request #1216:
URL: https://github.com/apache/hive/pull/1216


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457097)
Time Spent: 2h  (was: 1h 50m)

> Upgrading to Kafka 2.5 Clients
> --
>
> Key: HIVE-23760
> URL: https://issues.apache.org/jira/browse/HIVE-23760
> Project: Hive
>  Issue Type: Improvement
>  Components: kafka integration
>Reporter: Andras Katona
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457096=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457096
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 11:11
Start Date: 10/Jul/20 11:11
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r452779486



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java
##
@@ -470,6 +484,107 @@ boolean cachePartitions(Iterable parts, 
SharedCache sharedCache, bool
   }
 }
 
+boolean cachePrimaryKeys(List primaryKeys, boolean 
fromPrewarm) {
+  return cacheConstraints(primaryKeys, fromPrewarm, 
MemberName.PRIMARY_KEY_CACHE);
+}
+
+boolean cacheForeignKeys(List foreignKeys, boolean 
fromPrewarm) {
+  return cacheConstraints(foreignKeys, fromPrewarm, 
MemberName.FOREIGN_KEY_CACHE);
+}
+
+boolean cacheUniqueConstraints(List 
uniqueConstraints, boolean fromPrewarm) {
+  return cacheConstraints(uniqueConstraints, fromPrewarm, 
MemberName.UNIQUE_CONSTRAINT_CACHE);
+}
+
+boolean cacheNotNullConstraints(List 
notNullConstraints, boolean fromPrewarm) {
+  return cacheConstraints(notNullConstraints, fromPrewarm, 
MemberName.NOTNULL_CONSTRAINT_CACHE);
+}
+
+// Common method to cache constraints
+private boolean cacheConstraints(List constraintsList,
+ boolean fromPrewarm,
+ MemberName mn) {
+  if (constraintsList == null || constraintsList.isEmpty()) {
+return true;
+  }
+  try {
+tableLock.writeLock().lock();
+final int[] size = {0};

Review comment:
   This is being used inside lambda function. It requires the variable to 
be used as final. Because of this, I can't use int or Integer. So I chose int 
array instead.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457096)
Time Spent: 3h 40m  (was: 3.5h)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457095=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457095
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 11:07
Start Date: 10/Jul/20 11:07
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r45293



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -2490,26 +2616,99 @@ long getPartsFound() {
 
   @Override public List getPrimaryKeys(String catName, String 
dbName, String tblName)
   throws MetaException {
-// TODO constraintCache
-return rawStore.getPrimaryKeys(catName, dbName, tblName);
+catName = normalizeIdentifier(catName);
+dbName = StringUtils.normalizeIdentifier(dbName);
+tblName = StringUtils.normalizeIdentifier(tblName);
+if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && 
rawStore.isActiveTransaction())) {
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+
+Table tbl = sharedCache.getTableFromCache(catName, dbName, tblName);
+if (tbl == null) {
+  // The table containing the primary keys is not yet loaded in cache
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+List keys = sharedCache.listCachedPrimaryKeys(catName, 
dbName, tblName);
+if (keys == null || keys.isEmpty()) {

Review comment:
   Created a follow up jira. 
https://issues.apache.org/jira/browse/HIVE-23834





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457095)
Time Spent: 3.5h  (was: 3h 20m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23834) [CachedStore] Add flag in TableWrapper in CacheStore to check if constraints are set or not

2020-07-10 Thread Adesh Kumar Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adesh Kumar Rao reassigned HIVE-23834:
--


> [CachedStore] Add flag in TableWrapper in CacheStore to check if constraints 
> are set or not
> ---
>
> Key: HIVE-23834
> URL: https://issues.apache.org/jira/browse/HIVE-23834
> Project: Hive
>  Issue Type: Sub-task
>  Components: Standalone Metastore
>Reporter: Adesh Kumar Rao
>Assignee: Adesh Kumar Rao
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457093=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457093
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 11:02
Start Date: 10/Jul/20 11:02
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r452771399



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java
##
@@ -514,6 +629,131 @@ public boolean containsPartition(List partVals) {
   return containsPart;
 }
 
+public void removeConstraint(String name) {
+  try {
+tableLock.writeLock().lock();
+Object constraint = null;
+MemberName mn = null;
+Class constraintClass = null;
+name = name.toLowerCase();
+if (this.primaryKeyCache.containsKey(name)) {
+  constraint = this.primaryKeyCache.remove(name);
+  mn = MemberName.PRIMARY_KEY_CACHE;
+  constraintClass = SQLPrimaryKey.class;
+} else if (this.foreignKeyCache.containsKey(name)) {
+  constraint = this.foreignKeyCache.remove(name);
+  mn = MemberName.FOREIGN_KEY_CACHE;
+  constraintClass = SQLForeignKey.class;
+} else if (this.notNullConstraintCache.containsKey(name)) {
+  constraint = this.notNullConstraintCache.remove(name);
+  mn = MemberName.NOTNULL_CONSTRAINT_CACHE;
+  constraintClass = SQLNotNullConstraint.class;
+} else if (this.uniqueConstraintCache.containsKey(name)) {
+  constraint = this.uniqueConstraintCache.remove(name);
+  mn = MemberName.UNIQUE_CONSTRAINT_CACHE;
+  constraintClass = SQLUniqueConstraint.class;
+}
+
+if(constraint == null) {
+  LOG.debug("Constraint: " + name + " does not exist in cache.");
+  return;
+}
+setMemberCacheUpdated(mn, true);
+int size = getObjectSize(constraintClass, constraint);
+updateMemberSize(mn, -1 * size, SizeMode.Delta);
+  } finally {
+tableLock.writeLock().unlock();
+  }
+}
+
+public void refreshPrimaryKeys(List keys) {
+  Map newKeys = new ConcurrentHashMap<>();
+  try {
+tableLock.writeLock().lock();
+int size = 0;
+for (SQLPrimaryKey key : keys) {
+  if (compareAndSetMemberCacheUpdated(MemberName.PRIMARY_KEY_CACHE, 
true, false)) {
+LOG.debug("Skipping primary key cache update for table: " + 
getTable().getTableName()
++ "; the primary keys we have is dirty.");
+return;
+  }
+  newKeys.put(key.getPk_name().toLowerCase(), key);
+  size += getObjectSize(SQLPrimaryKey.class, key);
+}
+primaryKeyCache = newKeys;
+updateMemberSize(MemberName.PRIMARY_KEY_CACHE, size, 
SizeMode.Snapshot);
+LOG.debug("Primary keys refresh in cache was successful.");

Review comment:
   Shall add catalog, db and table names in the log msg otherwise this is 
no use. Same for other methods too.

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -2490,26 +2616,99 @@ long getPartsFound() {
 
   @Override public List getPrimaryKeys(String catName, String 
dbName, String tblName)
   throws MetaException {
-// TODO constraintCache
-return rawStore.getPrimaryKeys(catName, dbName, tblName);
+catName = normalizeIdentifier(catName);
+dbName = StringUtils.normalizeIdentifier(dbName);
+tblName = StringUtils.normalizeIdentifier(tblName);
+if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && 
rawStore.isActiveTransaction())) {
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+
+Table tbl = sharedCache.getTableFromCache(catName, dbName, tblName);
+if (tbl == null) {
+  // The table containing the primary keys is not yet loaded in cache
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+List keys = sharedCache.listCachedPrimaryKeys(catName, 
dbName, tblName);
+if (keys == null || keys.isEmpty()) {

Review comment:
   Can we have a flag in TableWrapper in Cache to tell if it was set or 
not? Can be a follow-up jira.

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java
##
@@ -470,6 +484,107 @@ boolean cachePartitions(Iterable parts, 
SharedCache sharedCache, bool
   }
 }
 
+boolean cachePrimaryKeys(List primaryKeys, boolean 
fromPrewarm) {
+  return cacheConstraints(primaryKeys, fromPrewarm, 
MemberName.PRIMARY_KEY_CACHE);
+}
+
+boolean cacheForeignKeys(List 

[jira] [Work started] (HIVE-23695) [CachedStore] Add unique/default constraints in CachedStore

2020-07-10 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23695 started by Ashish Sharma.

> [CachedStore] Add unique/default constraints in CachedStore
> ---
>
> Key: HIVE-23695
> URL: https://issues.apache.org/jira/browse/HIVE-23695
> Project: Hive
>  Issue Type: Sub-task
>  Components: Standalone Metastore
>Reporter: Adesh Kumar Rao
>Assignee: Ashish Sharma
>Priority: Major
> Fix For: 4.0.0
>
>
> This is blocked by HIVE-23618 (notification events are not generated for 
> default/unique constraints, hence created a separate sub-task from 
> HIVE-22015).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23727) Improve SQLOperation log handling when cancel background

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23727?focusedWorklogId=457080=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457080
 ]

ASF GitHub Bot logged work on HIVE-23727:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 09:45
Start Date: 10/Jul/20 09:45
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 edited a comment on pull request #1149:
URL: https://github.com/apache/hive/pull/1149#issuecomment-656586562


   @belugabehr  @kgyrtkirk could you please review? a small fix on the log 
output, thank you!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457080)
Time Spent: 2h 40m  (was: 2.5h)

> Improve SQLOperation log handling when cancel background
> 
>
> Key: HIVE-23727
> URL: https://issues.apache.org/jira/browse/HIVE-23727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> The SQLOperation checks _if (shouldRunAsync() && state != 
> OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the 
> background task. If true, the state should not be OperationState.CANCELED, so 
> logging under the state == OperationState.CANCELED should never happen.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23727) Improve SQLOperation log handling when cancel background

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23727?focusedWorklogId=457077=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457077
 ]

ASF GitHub Bot logged work on HIVE-23727:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 09:41
Start Date: 10/Jul/20 09:41
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #1149:
URL: https://github.com/apache/hive/pull/1149#issuecomment-656586562


   @belugabehr  @kgyrtkirk could you please review? a small fix on the log 
output.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457077)
Time Spent: 2.5h  (was: 2h 20m)

> Improve SQLOperation log handling when cancel background
> 
>
> Key: HIVE-23727
> URL: https://issues.apache.org/jira/browse/HIVE-23727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The SQLOperation checks _if (shouldRunAsync() && state != 
> OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the 
> background task. If true, the state should not be OperationState.CANCELED, so 
> logging under the state == OperationState.CANCELED should never happen.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23727) Improve SQLOperation log handling when cancel background

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23727?focusedWorklogId=457076=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457076
 ]

ASF GitHub Bot logged work on HIVE-23727:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 09:40
Start Date: 10/Jul/20 09:40
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 removed a comment on pull request #1149:
URL: https://github.com/apache/hive/pull/1149#issuecomment-648507858


   @belugabehr can you please take a look at the changes? thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457076)
Time Spent: 2h 20m  (was: 2h 10m)

> Improve SQLOperation log handling when cancel background
> 
>
> Key: HIVE-23727
> URL: https://issues.apache.org/jira/browse/HIVE-23727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The SQLOperation checks _if (shouldRunAsync() && state != 
> OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the 
> background task. If true, the state should not be OperationState.CANCELED, so 
> logging under the state == OperationState.CANCELED should never happen.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22957) Support Partition Filtering In MSCK REPAIR TABLE Command

2020-07-10 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-22957:
-
Attachment: (was: HIVE-22957.03.patch)

> Support Partition Filtering In MSCK REPAIR TABLE Command
> 
>
> Key: HIVE-22957
> URL: https://issues.apache.org/jira/browse/HIVE-22957
> Project: Hive
>  Issue Type: Improvement
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> *Design Doc:*
> [^Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22957) Support Partition Filtering In MSCK REPAIR TABLE Command

2020-07-10 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-22957:
-
Attachment: (was: HIVE-22957.02.patch)

> Support Partition Filtering In MSCK REPAIR TABLE Command
> 
>
> Key: HIVE-22957
> URL: https://issues.apache.org/jira/browse/HIVE-22957
> Project: Hive
>  Issue Type: Improvement
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> *Design Doc:*
> [^Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-07-10 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23737:
-
Attachment: (was: HIVE-23737.01.patch)

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-07-10 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23737:
-
Attachment: (was: HIVE-23737.02.patch)

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22957) Support Partition Filtering In MSCK REPAIR TABLE Command

2020-07-10 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-22957:
-
Attachment: (was: HIVE-22957.01.patch)

> Support Partition Filtering In MSCK REPAIR TABLE Command
> 
>
> Key: HIVE-22957
> URL: https://issues.apache.org/jira/browse/HIVE-22957
> Project: Hive
>  Issue Type: Improvement
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> *Design Doc:*
> [^Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23806) Avoid clearing column stat states in all partition in case schema is extended

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23806?focusedWorklogId=457068=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457068
 ]

ASF GitHub Bot logged work on HIVE-23806:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 09:14
Start Date: 10/Jul/20 09:14
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1215:
URL: https://github.com/apache/hive/pull/1215#discussion_r452724372



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
##
@@ -501,6 +502,28 @@ public static boolean areSameColumns(List 
oldCols, List p, 
List s) {
+if (p == s) {
+  return true;
+}
+if (p.size() > s.size()) {
+  return false;
+}
+Iterator itP = p.iterator();

Review comment:
   @kgyrtkirk Maybe we can use `ListUtils.isEqualList(p, s.subList(0, 
p.size()))`? that way we can avoid most of the code here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457068)
Time Spent: 0.5h  (was: 20m)

> Avoid clearing column stat states in all partition in case schema is extended
> -
>
> Key: HIVE-23806
> URL: https://issues.apache.org/jira/browse/HIVE-23806
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> in case there are many partitions; adding a new column without cascade may 
> take a while - because we want to make sure in schema evolution cases that we 
> don't reuse stats later-on by mistake...
> however this is not neccessary in case the schema is extended



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23806) Avoid clearing column stat states in all partition in case schema is extended

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23806?focusedWorklogId=457067=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457067
 ]

ASF GitHub Bot logged work on HIVE-23806:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 09:13
Start Date: 10/Jul/20 09:13
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1215:
URL: https://github.com/apache/hive/pull/1215#discussion_r452724372



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
##
@@ -501,6 +502,28 @@ public static boolean areSameColumns(List 
oldCols, List p, 
List s) {
+if (p == s) {
+  return true;
+}
+if (p.size() > s.size()) {
+  return false;
+}
+Iterator itP = p.iterator();

Review comment:
   @kgyrtkirk Maybe we can use `ListUtils.isEqualList(p, p.subList(0, 
p.size()))`? that way we can avoid most of the code here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457067)
Time Spent: 20m  (was: 10m)

> Avoid clearing column stat states in all partition in case schema is extended
> -
>
> Key: HIVE-23806
> URL: https://issues.apache.org/jira/browse/HIVE-23806
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> in case there are many partitions; adding a new column without cascade may 
> take a while - because we want to make sure in schema evolution cases that we 
> don't reuse stats later-on by mistake...
> however this is not neccessary in case the schema is extended



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23833) wrong explain and result when full join with join

2020-07-10 Thread chuanjie.duan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chuanjie.duan updated HIVE-23833:
-
Description: 
Reproduce:
 # Create three tables, mytest_t1, mytest_t2, mytest_t4
 # hive -e "explain select coalesce(t1.wh_guid,t2.wh_guid) as wh_guid from 
dw_dev.mytest_t1 t1 full join dw_dev.mytest_t2 t2 on t1.material_code = 
t2.material_code;"
 # hive -e "explain select coalesce(t1.wh_guid,t2.wh_guid) as wh_guid from 
dw_dev.mytest_t1 t1 full join dw_dev.mytest_t2 t2 on t1.material_code = 
t2.material_code {color:#ff}join dw_dev.mytest_t5 t5 on t5.material_code = 
coalesce(t1.material_code,t2.material_code){color};"
 # expect output row are over 6000, but actually get 685 rows

2 - explain

 Map Reduce
 Map Operator Tree:
 TableScan
 alias: t1
 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column 
stats: NONE
 Select Operator
 expressions: material_code (type: string), wh_guid (type: string)
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column 
stats: NONE
 Reduce Output Operator
 key expressions: _col0 (type: string)
 sort order: +
 Map-reduce partition columns: _col0 (type: string)
 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column 
stats: NONE
 value expressions: _col1 (type: string)
 TableScan
 alias: t2
 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column 
stats: NONE
 Select Operator
 expressions: material_code (type: string), wh_guid (type: string)
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column 
stats: NONE
 Reduce Output Operator
 key expressions: _col0 (type: string)
 sort order: +
 Map-reduce partition columns: _col0 (type: string)
 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column 
stats: NONE
 value expressions: _col1 (type: string)
 Reduce Operator Tree:
 Join Operator
 condition map:
 {color:#ff}Outer Join 0 to 1{color}
 keys:
 0 _col0 (type: string)
 1 _col0 (type: string)
 outputColumnNames: _col1, _col3
 Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column 
stats: NONE
 Select Operator
 expressions: COALESCE(_col1,_col3) (type: string)
 outputColumnNames: _col0
 Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column 
stats: NONE
 File Output Operator
 compressed: false
 Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column 
stats: NONE
 table:
 input format: org.apache.hadoop.mapred.SequenceFileInputFormat
 output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 

3 - explain

STAGE PLANS:
 Stage: Stage-7
 Map Reduce Local Work
 Alias -> Map Local Tables:
 $hdt$_1:t2 
 Fetch Operator
 limit: -1
 $hdt$_2:t5 
 Fetch Operator
 limit: -1
 Alias -> Map Local Operator Tree:
 $hdt$_1:t2 
 TableScan
 alias: t2
 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column 
stats: NONE
 Filter Operator
 predicate: material_code is not null (type: boolean)
 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column 
stats: NONE
 Select Operator
 expressions: material_code (type: string), wh_guid (type: string)
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column 
stats: NONE
 HashTable Sink Operator
 keys:
 0 _col0 (type: string)
 1 _col0 (type: string)
 $hdt$_2:t5 
 TableScan
 alias: t5
 Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column 
stats: NONE
 Filter Operator
 predicate: material_code is not null (type: boolean)
 Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column 
stats: NONE
 Select Operator
 expressions: material_code (type: string)
 outputColumnNames: _col0
 Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column 
stats: NONE
 HashTable Sink Operator
 keys:
 0 COALESCE(_col0,_col2) (type: string)
 1 _col0 (type: string)

Stage: Stage-5
 Map Reduce
 Map Operator Tree:
 TableScan
 alias: t1
 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column 
stats: NONE
 Filter Operator
 predicate: material_code is not null (type: boolean)
 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column 
stats: NONE
 Select Operator
 expressions: material_code (type: string), wh_guid (type: string)
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column 
stats: NONE
 Map Join Operator
 condition map:
 {color:#FF} Inner Join 0 to 1{color}
 keys:
 0 _col0 (type: string)
 1 _col0 (type: string)
 outputColumnNames: _col0, _col1, _col2, _col3
 Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column 
stats: NONE
 Map Join Operator
 condition map:
 {color:#FF} Inner Join 0 to 1{color}
 keys:
 0 

[jira] [Work started] (HIVE-23832) Compaction cleaner fails to clean up deltas when using blocking compaction

2020-07-10 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23832 started by Denys Kuzmenko.
-
> Compaction cleaner fails to clean up deltas when using blocking compaction
> --
>
> Key: HIVE-23832
> URL: https://issues.apache.org/jira/browse/HIVE-23832
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> {code}
> CREATE TABLE default.compcleanup (
>cda_id int,
>cda_run_id varchar(255),
>cda_load_tstimestamp,
>global_party_idstring,
>group_id   string)
> COMMENT 'gp_2_gr'
> PARTITIONED BY (
>cda_date   int,
>cda_job_name   varchar(12))
> STORED AS ORC;
> -- cda_date=20200601/cda_job_name=core_base
> INSERT INTO default.compcleanup VALUES 
> (1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base');
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1;
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, 
> cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT;
> {code}
> When using blocking compaction Cleaner skips processing due to the presence 
> of open txn (by `ALTER TABLE`) below Compactor's one.
> {code}
> AcidUtils - getChildState() ignoring([]) 
> pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035
> {code}
> AcidUtils.processBaseDir
> {code}
> if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, 
> validTxnList)) {
>return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23800) Add hooks when HiveServer2 stops due to OutOfMemoryError

2020-07-10 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-23800:
---
Summary: Add hooks when HiveServer2 stops due to OutOfMemoryError  (was: 
Make HiveServer2 oom hook interface)

> Add hooks when HiveServer2 stops due to OutOfMemoryError
> 
>
> Key: HIVE-23800
> URL: https://issues.apache.org/jira/browse/HIVE-23800
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Make oom hook an interface of HiveServer2,  so user can implement the hook to 
> do something before HS2 stops, such as dumping the heap or altering the 
> devops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-23618) NotificationLog should also contain events for default/check constraints

2020-07-10 Thread Adesh Kumar Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23618 started by Adesh Kumar Rao.
--
> NotificationLog should also contain events for default/check constraints
> 
>
> Key: HIVE-23618
> URL: https://issues.apache.org/jira/browse/HIVE-23618
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Adesh Kumar Rao
>Assignee: Adesh Kumar Rao
>Priority: Major
>
> This should follow similar approach of notNull/Unique constraints. This will 
> also include event replication for these constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23695) [CachedStore] Add unique/default constraints in CachedStore

2020-07-10 Thread Adesh Kumar Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adesh Kumar Rao reassigned HIVE-23695:
--

Assignee: Ashish Sharma  (was: Adesh Kumar Rao)

> [CachedStore] Add unique/default constraints in CachedStore
> ---
>
> Key: HIVE-23695
> URL: https://issues.apache.org/jira/browse/HIVE-23695
> Project: Hive
>  Issue Type: Sub-task
>  Components: Standalone Metastore
>Reporter: Adesh Kumar Rao
>Assignee: Ashish Sharma
>Priority: Major
> Fix For: 4.0.0
>
>
> This is blocked by HIVE-23618 (notification events are not generated for 
> default/unique constraints, hence created a separate sub-task from 
> HIVE-22015).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23827) Upgrade to datasketches 1.1.0

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23827?focusedWorklogId=457059=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457059
 ]

ASF GitHub Bot logged work on HIVE-23827:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 08:22
Start Date: 10/Jul/20 08:22
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1233:
URL: https://github.com/apache/hive/pull/1233


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457059)
Time Spent: 20m  (was: 10m)

> Upgrade to datasketches 1.1.0
> -
>
> Key: HIVE-23827
> URL: https://issues.apache.org/jira/browse/HIVE-23827
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23827) Upgrade to datasketches 1.1.0

2020-07-10 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-23827.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

pushed to master. Thank you Denys for reviewing the change!

> Upgrade to datasketches 1.1.0
> -
>
> Key: HIVE-23827
> URL: https://issues.apache.org/jira/browse/HIVE-23827
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23509) MapJoin AssertionError: Capacity must be power of 2

2020-07-10 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-23509.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you [~spedamallu] for fixing this!

> MapJoin AssertionError: Capacity must be power of 2
> ---
>
> Key: HIVE-23509
> URL: https://issues.apache.org/jira/browse/HIVE-23509
> Project: Hive
>  Issue Type: Bug
> Environment: Hive-2.3.6
>Reporter: Shashank Pedamallu
>Assignee: Shashank Pedamallu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Observed AssertionError errors in Hive query when rowCount for join is issued 
> as (2^x)+(2^(x+1)).
> Following is the stacktrace:
> {noformat}
> [2020-05-11 05:43:12,135] {base_task_runner.py:95} INFO - Subtask: ERROR : 
> Vertex failed, vertexName=Map 4, vertexId=vertex_1588729523139_51702_1_06, 
> diagnostics=[Task failed, taskId=task_1588729523139_51702_1_06_001286, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1588729523139_51702_1_06_001286_0:java.lang.RuntimeException: 
> java.lang.AssertionError: Capacity must be a power of two [2020-05-11 
> 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) 
> [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.security.AccessController.doPrivileged(Native Method) [2020-05-11 
> 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> javax.security.auth.Subject.doAs(Subject.java:422) [2020-05-11 05:43:12,136] 
> {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) 
> [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266) [2020-05-11 
> 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> java.lang.Thread.run(Thread.java:748) [2020-05-11 05:43:12,137] 
> {base_task_runner.py:95} INFO - Subtask: Caused by: java.lang.AssertionError: 
> Capacity must be a power of two [2020-05-11 05:43:12,137] 
> {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.validateCapacity(BytesBytesMultiHashMap.java:552)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.expandAndRehashImpl(BytesBytesMultiHashMap.java:731)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.expandAndRehashToTarget(BytesBytesMultiHashMap.java:545)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer$HashPartition.getHashMapFromDisk(HybridHashTableContainer.java:183)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.reloadHashTable(MapJoinOperator.java:641)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> 

[jira] [Work logged] (HIVE-23509) MapJoin AssertionError: Capacity must be power of 2

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23509?focusedWorklogId=457057=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457057
 ]

ASF GitHub Bot logged work on HIVE-23509:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 08:21
Start Date: 10/Jul/20 08:21
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1026:
URL: https://github.com/apache/hive/pull/1026


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457057)
Time Spent: 20m  (was: 10m)

> MapJoin AssertionError: Capacity must be power of 2
> ---
>
> Key: HIVE-23509
> URL: https://issues.apache.org/jira/browse/HIVE-23509
> Project: Hive
>  Issue Type: Bug
> Environment: Hive-2.3.6
>Reporter: Shashank Pedamallu
>Assignee: Shashank Pedamallu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Observed AssertionError errors in Hive query when rowCount for join is issued 
> as (2^x)+(2^(x+1)).
> Following is the stacktrace:
> {noformat}
> [2020-05-11 05:43:12,135] {base_task_runner.py:95} INFO - Subtask: ERROR : 
> Vertex failed, vertexName=Map 4, vertexId=vertex_1588729523139_51702_1_06, 
> diagnostics=[Task failed, taskId=task_1588729523139_51702_1_06_001286, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1588729523139_51702_1_06_001286_0:java.lang.RuntimeException: 
> java.lang.AssertionError: Capacity must be a power of two [2020-05-11 
> 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) 
> [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.security.AccessController.doPrivileged(Native Method) [2020-05-11 
> 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> javax.security.auth.Subject.doAs(Subject.java:422) [2020-05-11 05:43:12,136] 
> {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) 
> [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266) [2020-05-11 
> 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> java.lang.Thread.run(Thread.java:748) [2020-05-11 05:43:12,137] 
> {base_task_runner.py:95} INFO - Subtask: Caused by: java.lang.AssertionError: 
> Capacity must be a power of two [2020-05-11 05:43:12,137] 
> {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.validateCapacity(BytesBytesMultiHashMap.java:552)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.expandAndRehashImpl(BytesBytesMultiHashMap.java:731)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> 

[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=457056=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457056
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 08:17
Start Date: 10/Jul/20 08:17
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r451668879



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/io/FileOperations.java
##
@@ -165,4 +175,92 @@ private void validateSrcPathListExists() throws 
IOException, LoginException {
   throw new FileNotFoundException(FILE_NOT_FOUND.format(e.getMessage()));
 }
   }
+
+  /**
+   * This needs the root data directory to which the data needs to be exported 
to.
+   * The data export here is a list of files either in table/partition that 
are written to the _files
+   * in the exportRootDataDir provided.
+   */
+  private void exportFilesAsList() throws SemanticException, IOException, 
LoginException {
+if (dataPathList.isEmpty()) {

Review comment:
   this check is added to different methods

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/events/InsertHandler.java
##
@@ -76,16 +77,29 @@ public void handle(Context withinContext) throws Exception {
 withinContext.hiveConf);
 Iterable files = eventMessage.getFiles();
 
+boolean copyAtLoad = 
withinContext.hiveConf.getBoolVar(HiveConf.ConfVars.REPL_DATA_COPY_LAZY);
+
 /*
   * Insert into/overwrite operation shall operate on one or more 
partitions or even partitions from multiple tables.
   * But, Insert event is generated for each partition to which the data is 
inserted.
   * So, qlPtns list will have only one entry.
  */
 Partition ptn = (null == qlPtns || qlPtns.isEmpty()) ? null : 
qlPtns.get(0);
 if (files != null) {
-  // encoded filename/checksum of files, write into _files
-  for (String file : files) {
-writeFileEntry(qlMdTable, ptn, file, withinContext);
+  if (copyAtLoad) {
+// encoded filename/checksum of files, write into _files
+Path dataPath = null;
+if ((null == qlPtns) || qlPtns.isEmpty()) {
+  dataPath = new Path(withinContext.eventRoot, 
EximUtil.DATA_PATH_NAME);
+} else {
+  dataPath = new Path(withinContext.eventRoot, EximUtil.DATA_PATH_NAME 
+ File.separator

Review comment:
   Use path constructor instead of appending with File separator

##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
##
@@ -148,6 +148,13 @@ public static synchronized ReplChangeManager 
getInstance(Configuration conf)
 return instance;
   }
 
+  public static synchronized ReplChangeManager getInstance() {

Review comment:
   why do you need this?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/io/FileOperations.java
##
@@ -165,4 +175,92 @@ private void validateSrcPathListExists() throws 
IOException, LoginException {
   throw new FileNotFoundException(FILE_NOT_FOUND.format(e.getMessage()));
 }
   }
+
+  /**
+   * This needs the root data directory to which the data needs to be exported 
to.
+   * The data export here is a list of files either in table/partition that 
are written to the _files
+   * in the exportRootDataDir provided.
+   */
+  private void exportFilesAsList() throws SemanticException, IOException, 
LoginException {
+if (dataPathList.isEmpty()) {
+  return;
+}
+boolean done = false;
+int repeat = 0;
+while (!done) {

Review comment:
   use existing retry interface

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/io/FileOperations.java
##
@@ -165,4 +175,92 @@ private void validateSrcPathListExists() throws 
IOException, LoginException {
   throw new FileNotFoundException(FILE_NOT_FOUND.format(e.getMessage()));
 }
   }
+
+  /**
+   * This needs the root data directory to which the data needs to be exported 
to.
+   * The data export here is a list of files either in table/partition that 
are written to the _files
+   * in the exportRootDataDir provided.
+   */
+  private void exportFilesAsList() throws SemanticException, IOException, 
LoginException {
+if (dataPathList.isEmpty()) {
+  return;
+}
+boolean done = false;
+int repeat = 0;
+while (!done) {
+  // This is only called for replication that handles MM tables; no need 
for mmCtx.
+  try (BufferedWriter writer = writer()) {
+for (Path dataPath : dataPathList) {
+  writeFilesList(listFilesInDir(dataPath), writer, 
AcidUtils.getAcidSubDir(dataPath));
+}
+done = true;
+  } catch (IOException e) {

[jira] [Work logged] (HIVE-22412) StatsUtils throw NPE when explain

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22412?focusedWorklogId=457055=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457055
 ]

ASF GitHub Bot logged work on HIVE-22412:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 08:14
Start Date: 10/Jul/20 08:14
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1209:
URL: https://github.com/apache/hive/pull/1209#discussion_r452692916



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
##
@@ -1336,6 +1341,9 @@ public static long 
getSizeOfPrimitiveTypeArraysFromType(String colType, int leng
*/
   public static long getSizeOfMap(StandardConstantMapObjectInspector scmoi) {
 Map map = scmoi.getWritableConstantValue();
+if (null == map || map.isEmpty()) {
+  return 0L;
+}

Review comment:
   @StefanXiepj could you update that if according @belugabehr 's proposal?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457055)
Time Spent: 2h 50m  (was: 2h 40m)

> StatsUtils throw NPE when explain
> -
>
> Key: HIVE-22412
> URL: https://issues.apache.org/jira/browse/HIVE-22412
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.2.1, 2.0.0, 3.0.0
>Reporter: xiepengjie
>Assignee: xiepengjie
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22412.patch
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> The demo like this:
> {code:java}
> drop table if exists explain_npe_map;
> drop table if exists explain_npe_array;
> drop table if exists explain_npe_struct;
> create table explain_npe_map( c1 map );
> create table explain_npe_array  ( c1 array );
> create table explain_npe_struct ( c1 struct );
> -- error
> set hive.cbo.enable=false;
> explain select c1 from explain_npe_map where c1 is null;
> explain select c1 from explain_npe_array where c1 is null;
> explain select c1 from explain_npe_struct where c1 is null;
> -- correct
> set hive.cbo.enable=true;
> explain select c1 from explain_npe_map where c1 is null;
> explain select c1 from explain_npe_array where c1 is null;
> explain select c1 from explain_npe_struct where c1 is null;{code}
>  
> if the conf 'hive.cbo.enable' set false , NPE will be thrown ; otherwise will 
> not.
> {code:java}
> hive> drop table if exists explain_npe_map;
> OK
> Time taken: 0.063 seconds
> hive> drop table if exists explain_npe_array;
> OK
> Time taken: 0.035 seconds
> hive> drop table if exists explain_npe_struct;
> OK
> Time taken: 0.015 seconds
> hive>
> > create table explain_npe_map( c1 map );
> OK
> Time taken: 0.584 seconds
> hive> create table explain_npe_array  ( c1 array );
> OK
> Time taken: 0.216 seconds
> hive> create table explain_npe_struct ( c1 struct );
> OK
> Time taken: 0.17 seconds
> hive>
> > set hive.cbo.enable=false;
> hive> explain select c1 from explain_npe_map where c1 is null;
> FAILED: NullPointerException null
> hive> explain select c1 from explain_npe_array where c1 is null;
> FAILED: NullPointerException null
> hive> explain select c1 from explain_npe_struct where c1 is null;
> FAILED: RuntimeException Error invoking signature method
> hive>
> > set hive.cbo.enable=true;
> hive> explain select c1 from explain_npe_map where c1 is null;
> OK
> STAGE DEPENDENCIES:
>   Stage-0 is a root stageSTAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> TableScan
>   alias: explain_npe_map
>   Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column 
> stats: NONE
>   Filter Operator
> predicate: false (type: boolean)
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column 
> stats: NONE
> Select Operator
>   expressions: c1 (type: map)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
> Column stats: NONE
>   ListSinkTime taken: 1.593 seconds, Fetched: 20 row(s)
> hive> explain select c1 from explain_npe_array where c1 is null;
> OK
> STAGE DEPENDENCIES:
>   Stage-0 is a root stageSTAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> TableScan
>   alias: explain_npe_array
>   Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column 
> stats: 

[jira] [Updated] (HIVE-23833) wrong explain and result when full join with join

2020-07-10 Thread chuanjie.duan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chuanjie.duan updated HIVE-23833:
-
  Component/s: Hive
Affects Version/s: 2.1.1
  Description: 
Reproduce:
 # Create three tables, mytest_t1, mytest_t2, mytest_t4
 # hive -e "explain select coalesce(t1.wh_guid,t2.wh_guid) as wh_guid from 
dw_dev.mytest_t1 t1 full join dw_dev.mytest_t2 t2 on t1.material_code = 
t2.material_code;"
 # hive -e "explain select coalesce(t1.wh_guid,t2.wh_guid) as wh_guid from 
dw_dev.mytest_t1 t1 full join dw_dev.mytest_t2 t2 on t1.material_code = 
t2.material_code {color:#FF}join dw_dev.mytest_t5 t5 on t5.material_code = 
coalesce(t1.material_code,t2.material_code){color};"

2 - explain

 Map Reduce
 Map Operator Tree:
 TableScan
 alias: t1
 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column 
stats: NONE
 Select Operator
 expressions: material_code (type: string), wh_guid (type: string)
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column 
stats: NONE
 Reduce Output Operator
 key expressions: _col0 (type: string)
 sort order: +
 Map-reduce partition columns: _col0 (type: string)
 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column 
stats: NONE
 value expressions: _col1 (type: string)
 TableScan
 alias: t2
 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column 
stats: NONE
 Select Operator
 expressions: material_code (type: string), wh_guid (type: string)
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column 
stats: NONE
 Reduce Output Operator
 key expressions: _col0 (type: string)
 sort order: +
 Map-reduce partition columns: _col0 (type: string)
 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column 
stats: NONE
 value expressions: _col1 (type: string)
 Reduce Operator Tree:
 Join Operator
 condition map:
 {color:#FF}Outer Join 0 to 1{color}
 keys:
 0 _col0 (type: string)
 1 _col0 (type: string)
 outputColumnNames: _col1, _col3
 Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column 
stats: NONE
 Select Operator
 expressions: COALESCE(_col1,_col3) (type: string)
 outputColumnNames: _col0
 Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column 
stats: NONE
 File Output Operator
 compressed: false
 Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column 
stats: NONE
 table:
 input format: org.apache.hadoop.mapred.SequenceFileInputFormat
 output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 

3 - explain

STAGE PLANS:
 Stage: Stage-7
 Map Reduce Local Work
 Alias -> Map Local Tables:
 $hdt$_1:t2 
 Fetch Operator
 limit: -1
 $hdt$_2:t5 
 Fetch Operator
 limit: -1
 Alias -> Map Local Operator Tree:
 $hdt$_1:t2 
 TableScan
 alias: t2
 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column 
stats: NONE
 Filter Operator
 predicate: material_code is not null (type: boolean)
 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column 
stats: NONE
 Select Operator
 expressions: material_code (type: string), wh_guid (type: string)
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column 
stats: NONE
 HashTable Sink Operator
 keys:
 0 _col0 (type: string)
 1 _col0 (type: string)
 $hdt$_2:t5 
 TableScan
 alias: t5
 Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column 
stats: NONE
 Filter Operator
 predicate: material_code is not null (type: boolean)
 Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column 
stats: NONE
 Select Operator
 expressions: material_code (type: string)
 outputColumnNames: _col0
 Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column 
stats: NONE
 HashTable Sink Operator
 keys:
 0 COALESCE(_col0,_col2) (type: string)
 1 _col0 (type: string)

Stage: Stage-5
 Map Reduce
 Map Operator Tree:
 TableScan
 alias: t1
 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column 
stats: NONE
 Filter Operator
 predicate: material_code is not null (type: boolean)
 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column 
stats: NONE
 Select Operator
 expressions: material_code (type: string), wh_guid (type: string)
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column 
stats: NONE
 Map Join Operator
 condition map:
{color:red} Inner Join 0 to 1{color}
 keys:
 0 _col0 (type: string)
 1 _col0 (type: string)
 outputColumnNames: _col0, _col1, _col2, _col3
 Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column 
stats: NONE
 Map Join Operator
 condition map:
{color:red} Inner Join 0 to 1{color}
 keys:
 0 

[jira] [Updated] (HIVE-23832) Compaction cleaner fails to clean up deltas when using blocking compaction

2020-07-10 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-23832:
--
Description: 
{code}
CREATE TABLE default.compcleanup (
   cda_id int,
   cda_run_id varchar(255),
   cda_load_tstimestamp,
   global_party_idstring,
   group_id   string)
COMMENT 'gp_2_gr'
PARTITIONED BY (
   cda_date   int,
   cda_job_name   varchar(12))
STORED AS ORC;
-- cda_date=20200601/cda_job_name=core_base
INSERT INTO default.compcleanup VALUES 
(1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base');
SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name = 
'core_base';
UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1;
SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name = 
'core_base';
ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, 
cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT;
{code}

When using blocking compaction Cleaner skips processing due to the presence of 
open txn (by `ALTER TABLE`) below Compactor's one.

{code}
AcidUtils - getChildState() ignoring([]) 
pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035
{code}

AcidUtils.processBaseDir
{code}
if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, 
validTxnList)) {
   return;
}
{code}

  was:
{code}
CREATE TABLE default.compcleanup (
   cda_id int,
   cda_run_id varchar(255),
   cda_load_tstimestamp,
   global_party_idstring,
   group_id   string)
COMMENT 'gp_2_gr'
PARTITIONED BY (
   cda_date   int,
   cda_job_name   varchar(12))
STORED AS ORC;
-- cda_date=20200601/cda_job_name=core_base
INSERT INTO default.compcleanup VALUES 
(1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base');
SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name = 
'core_base';
UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1;
SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name = 
'core_base';
ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, 
cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT;
{code}

When using blocking compaction Cleaner skips processing due to the open txn by 
`ALTER TABLE` below Compactor's one.

{code}
AcidUtils - getChildState() ignoring([]) 
pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035
{code}

AcidUtils.processBaseDir
{code}
if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, 
validTxnList)) {
   return;
}
{code}


> Compaction cleaner fails to clean up deltas when using blocking compaction
> --
>
> Key: HIVE-23832
> URL: https://issues.apache.org/jira/browse/HIVE-23832
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> {code}
> CREATE TABLE default.compcleanup (
>cda_id int,
>cda_run_id varchar(255),
>cda_load_tstimestamp,
>global_party_idstring,
>group_id   string)
> COMMENT 'gp_2_gr'
> PARTITIONED BY (
>cda_date   int,
>cda_job_name   varchar(12))
> STORED AS ORC;
> -- cda_date=20200601/cda_job_name=core_base
> INSERT INTO default.compcleanup VALUES 
> (1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base');
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1;
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, 
> cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT;
> {code}
> When using blocking compaction Cleaner skips processing due to the presence 
> of open txn (by `ALTER TABLE`) below Compactor's one.
> {code}
> AcidUtils - getChildState() ignoring([]) 
> pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035
> {code}
> AcidUtils.processBaseDir
> {code}
> if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, 
> validTxnList)) {
>return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23832) Compaction cleaner fails to clean up deltas when using blocking compaction

2020-07-10 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-23832:
--
Description: 
{code}
CREATE TABLE default.compcleanup (
   cda_id int,
   cda_run_id varchar(255),
   cda_load_tstimestamp,
   global_party_idstring,
   group_id   string)
COMMENT 'gp_2_gr'
PARTITIONED BY (
   cda_date   int,
   cda_job_name   varchar(12))
STORED AS ORC;
-- cda_date=20200601/cda_job_name=core_base
INSERT INTO default.compcleanup VALUES 
(1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base');
SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name = 
'core_base';
UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1;
SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name = 
'core_base';
ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, 
cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT;
{code}

When using blocking compaction Cleaner skips processing due to the open txn by 
`ALTER TABLE` below Compactor's one.

{code}
AcidUtils - getChildState() ignoring([]) 
pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035
{code}

{code}
if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, 
validTxnList)) {
   return;
}
{code}

  was:
{code}
CREATE TABLE default.compcleanup (
   cda_id int,
   cda_run_id varchar(255),
   cda_load_tstimestamp,
   global_party_idstring,
   group_id   string)
COMMENT 'gp_2_gr'
PARTITIONED BY (
   cda_date   int,
   cda_job_name   varchar(12))
STORED AS ORC;
-- cda_date=20200601/cda_job_name=core_base
INSERT INTO default.compcleanup VALUES 
(1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base');
SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name = 
'core_base';
UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1;
SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name = 
'core_base';
ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, 
cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT;
{code}

When using blocking compaction Cleaner skips processing due to the open txn by 
`ALTER TABLE` below Compactor's one.

{code}
AcidUtils - getChildState() ignoring([]) 
pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035
{code}

{code}
if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, 
validTxnList)) {
  return;
}
{code}


> Compaction cleaner fails to clean up deltas when using blocking compaction
> --
>
> Key: HIVE-23832
> URL: https://issues.apache.org/jira/browse/HIVE-23832
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> {code}
> CREATE TABLE default.compcleanup (
>cda_id int,
>cda_run_id varchar(255),
>cda_load_tstimestamp,
>global_party_idstring,
>group_id   string)
> COMMENT 'gp_2_gr'
> PARTITIONED BY (
>cda_date   int,
>cda_job_name   varchar(12))
> STORED AS ORC;
> -- cda_date=20200601/cda_job_name=core_base
> INSERT INTO default.compcleanup VALUES 
> (1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base');
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1;
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, 
> cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT;
> {code}
> When using blocking compaction Cleaner skips processing due to the open txn 
> by `ALTER TABLE` below Compactor's one.
> {code}
> AcidUtils - getChildState() ignoring([]) 
> pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035
> {code}
> {code}
> if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, 
> validTxnList)) {
>return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23832) Compaction cleaner fails to clean up deltas when using blocking compaction

2020-07-10 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-23832:
--
Description: 
{code}
CREATE TABLE default.compcleanup (
   cda_id int,
   cda_run_id varchar(255),
   cda_load_tstimestamp,
   global_party_idstring,
   group_id   string)
COMMENT 'gp_2_gr'
PARTITIONED BY (
   cda_date   int,
   cda_job_name   varchar(12))
STORED AS ORC;
-- cda_date=20200601/cda_job_name=core_base
INSERT INTO default.compcleanup VALUES 
(1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base');
SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name = 
'core_base';
UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1;
SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name = 
'core_base';
ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, 
cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT;
{code}

When using blocking compaction Cleaner skips processing due to the open txn by 
`ALTER TABLE` below Compactor's one.

{code}
AcidUtils - getChildState() ignoring([]) 
pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035
{code}

{code}
if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, 
validTxnList)) {
  return;
}
{code}

  was:
{code}
CREATE TABLE default.compcleanup (
   cda_id int,
   cda_run_id varchar(255),
   cda_load_tstimestamp,
   global_party_idstring,
   group_id   string)
COMMENT 'gp_2_gr'
PARTITIONED BY (
   cda_date   int,
   cda_job_name   varchar(12))
STORED AS ORC;
-- cda_date=20200601/cda_job_name=core_base
INSERT INTO default.compcleanup VALUES 
(1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base');
SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name = 
'core_base';
UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1;
SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name = 
'core_base';
ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, 
cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT;
{code}

When using blocking compaction Cleaner skips processing due to the open txn (by 
ALTER TABLE) below Compactor's one.

{code}
AcidUtils - getChildState() ignoring([]) 
pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035
{code}


> Compaction cleaner fails to clean up deltas when using blocking compaction
> --
>
> Key: HIVE-23832
> URL: https://issues.apache.org/jira/browse/HIVE-23832
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> {code}
> CREATE TABLE default.compcleanup (
>cda_id int,
>cda_run_id varchar(255),
>cda_load_tstimestamp,
>global_party_idstring,
>group_id   string)
> COMMENT 'gp_2_gr'
> PARTITIONED BY (
>cda_date   int,
>cda_job_name   varchar(12))
> STORED AS ORC;
> -- cda_date=20200601/cda_job_name=core_base
> INSERT INTO default.compcleanup VALUES 
> (1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base');
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1;
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, 
> cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT;
> {code}
> When using blocking compaction Cleaner skips processing due to the open txn 
> by `ALTER TABLE` below Compactor's one.
> {code}
> AcidUtils - getChildState() ignoring([]) 
> pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035
> {code}
> {code}
> if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, 
> validTxnList)) {
>   return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >