[jira] [Updated] (HIVE-23339) SBA does not check permissions for DB location specified in Create or Alter database query

2020-07-14 Thread Shubham Chaurasia (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shubham Chaurasia updated HIVE-23339:
-
Summary: SBA does not check permissions for DB location specified in Create 
or Alter database query  (was: SBA does not check permissions for DB location 
specified in Create database query)

> SBA does not check permissions for DB location specified in Create or Alter 
> database query
> --
>
> Key: HIVE-23339
> URL: https://issues.apache.org/jira/browse/HIVE-23339
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0
>Reporter: Riju Trivedi
>Assignee: Shubham Chaurasia
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-23339.01.patch, HIVE-23339.02.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> With doAs=true and StorageBasedAuthorization provider, create database with 
> specific location succeeds even if user doesn't have access to that path.
>  
> {code:java}
>   hadoop fs -ls -d /tmp/cannot_write
>  drwx-- - hive hadoop 0 2020-04-01 22:53 /tmp/cannot_write
> create a database under /tmp/cannot_write. We would expect it to fail, but is 
> actually created successfully with "hive" as the owner:
> rtrivedi@bdp01:~> beeline -e "create database rtrivedi_1 location 
> '/tmp/cannot_write/rtrivedi_1'"
>  INFO : OK
>  No rows affected (0.116 seconds)
> hive@hpchdd2e:~> hadoop fs -ls /tmp/cannot_write
>  Found 1 items
>  drwx-- - hive hadoop 0 2020-04-01 23:05 /tmp/cannot_write/rtrivedi_1
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23847) Extracting hive-parser module broke exec jar upload in tez

2020-07-14 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits reassigned HIVE-23847:
--

Assignee: Antal Sinkovits

> Extracting hive-parser module broke exec jar upload in tez
> --
>
> Key: HIVE-23847
> URL: https://issues.apache.org/jira/browse/HIVE-23847
> Project: Hive
>  Issue Type: Bug
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>
> 2020-07-13 16:53:50,551 [INFO] [Dispatcher thread {Central}] 
> |HistoryEventHandler.criticalEvents|: 
> [HISTORY][DAG:dag_1594632473849_0001_1][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Map 1, taskAttemptId=attempt_1594632473849_0001_1_00_00_0, 
> creationTime=1594652027059, allocationTime=1594652028460, 
> startTime=1594652029356, finishTime=1594652030546, timeTaken=1190, 
> status=FAILED, taskFailureType=NON_FATAL, errorEnum=FRAMEWORK_ERROR, 
> diagnostics=Error: Error while running task ( failure ) : 
> attempt_1594632473849_0001_1_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:340)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
>   ... 16 more
> Caused by: java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hive/ql/parse/ParseException
>   at java.lang.Class.getDeclaredConstructors0(Native Method)
>   at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
>   at java.lang.Class.getConstructor0(Class.java:3075)
>   at java.lang.Class.getDeclaredConstructor(Class.java:2178)
>   at 
> org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:79)
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDTF(Registry.java:225)
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDTF(Registry.java:217)
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.(FunctionRegistry.java:544)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.isDeterministic(ExprNodeGenericFuncEvaluator.java:154)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.isConsistentWithinQuery(ExprNodeEvaluator.java:117)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.iterate(ExprNodeEvaluatorFactory.java:102)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.toCachedEvals(ExprNodeEvaluatorFactory.java:76)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:69)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:359)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:502)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:368)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:506)
>   at 
> 

[jira] [Updated] (HIVE-23339) SBA does not check permissions for DB location specified in Create database query

2020-07-14 Thread Shubham Chaurasia (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shubham Chaurasia updated HIVE-23339:
-
Attachment: HIVE-23339.02.patch

> SBA does not check permissions for DB location specified in Create database 
> query
> -
>
> Key: HIVE-23339
> URL: https://issues.apache.org/jira/browse/HIVE-23339
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0
>Reporter: Riju Trivedi
>Assignee: Shubham Chaurasia
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-23339.01.patch, HIVE-23339.02.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> With doAs=true and StorageBasedAuthorization provider, create database with 
> specific location succeeds even if user doesn't have access to that path.
>  
> {code:java}
>   hadoop fs -ls -d /tmp/cannot_write
>  drwx-- - hive hadoop 0 2020-04-01 22:53 /tmp/cannot_write
> create a database under /tmp/cannot_write. We would expect it to fail, but is 
> actually created successfully with "hive" as the owner:
> rtrivedi@bdp01:~> beeline -e "create database rtrivedi_1 location 
> '/tmp/cannot_write/rtrivedi_1'"
>  INFO : OK
>  No rows affected (0.116 seconds)
> hive@hpchdd2e:~> hadoop fs -ls /tmp/cannot_write
>  Found 1 items
>  drwx-- - hive hadoop 0 2020-04-01 23:05 /tmp/cannot_write/rtrivedi_1
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-07-14 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23737:
-
Comment: was deleted

(was: Added unit tests in HIVE-23737.02.patch)

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-07-14 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155129#comment-17155129
 ] 

Syed Shameerur Rahman edited comment on HIVE-23737 at 7/15/20, 4:49 AM:


[~gopalv] [~prasanth_j] [~rajesh.balamohan] ping for review request


was (Author: srahman):
[~gopalv] [~prasanth_j] ping for review request

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23839) Use LongAdder instead of AtomicLong

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23839?focusedWorklogId=459053=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459053
 ]

ASF GitHub Bot logged work on HIVE-23839:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 01:40
Start Date: 15/Jul/20 01:40
Worklog Time Spent: 10m 
  Work Description: dai edited a comment on pull request #1246:
URL: https://github.com/apache/hive/pull/1246#issuecomment-657601071


   @belugabehr Thank you for pointing that out.
If we need to use methods like `compareAndSet(a,b)` and `incrementAndGet()` 
(the return value is not ignored), then AtomicLong would be preferable.
   These cases just update a sum for collecting statistics.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459053)
Time Spent: 1h  (was: 50m)

> Use LongAdder instead of AtomicLong
> ---
>
> Key: HIVE-23839
> URL: https://issues.apache.org/jira/browse/HIVE-23839
> Project: Hive
>  Issue Type: Improvement
>Reporter: Dai Wenqing
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> LongAdder performs better than AtomicLong in high concurrent environment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23846) Avoid unnecessary serialization and deserialization of bitvectors

2020-07-14 Thread Yu-Wen Lai (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu-Wen Lai reassigned HIVE-23846:
-


> Avoid unnecessary serialization and deserialization of bitvectors
> -
>
> Key: HIVE-23846
> URL: https://issues.apache.org/jira/browse/HIVE-23846
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>
> In the method *getNdvEstimator* of *ColumnStatsDataInspector*, it 
> will call isSetBitVectors(), in which it serializes the bitvectors again when 
> we already have deserialized bitvectors _ndvEstimator_. For example, we can 
> see this pattern from 
> [LongColumnStatsDataInspector|[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/columnstats/cache/LongColumnStatsDataInspector.java#L106]].
> This method could check if the _ndvEstimator_ is set first so that it won't 
> need to serialize and deserialize back.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23793) Review of QueryInfo Class

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23793?focusedWorklogId=459049=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459049
 ]

ASF GitHub Bot logged work on HIVE-23793:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 01:25
Start Date: 15/Jul/20 01:25
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #1197:
URL: https://github.com/apache/hive/pull/1197#discussion_r454735696



##
File path: ql/src/java/org/apache/hadoop/hive/ql/QueryInfo.java
##
@@ -70,36 +80,57 @@ public String getExecutionEngine() {
 return executionEngine;
   }
 
-  public synchronized String getState() {
+  public String getState() {
 return state;
   }
 
+  /**
+   * The time the query began in milliseconds.
+   *
+   * @return The time the query began
+   */
   public long getBeginTime() {
-return beginTime;
+return TimeUnit.NANOSECONDS.toMillis(beginTime);
   }
 
-  public synchronized Long getEndTime() {
-return endTime;
+  /**
+   * Get the end time in milliseconds. Only valid if {@link #isRunning()}
+   * returns false.
+   *
+   * @return Query end time
+   */
+  public long getEndTime() {
+return TimeUnit.NANOSECONDS.toMillis(endTime);
   }
 
-  public synchronized void updateState(String state) {
+  public void updateState(String state) {
 this.state = state;
   }
 
   public String getOperationId() {
 return operationId;
   }
 
-  public synchronized void setEndTime() {
-this.endTime = System.currentTimeMillis();
+  public void setEndTime() {
+this.endTime = System.nanoTime();
   }
 
-  public synchronized void setRuntime(long runtime) {
-this.runtime = runtime;
+  /**
+   * Set the amount of time the query spent actually running in milliseconds.
+   *
+   * @param runtime The amount of time this query spent running
+   */
+  public void setRuntime(long runtime) {
+this.runtime = TimeUnit.MILLISECONDS.toNanos(runtime);

Review comment:
   the ```getRuntime``` returns value in milliseconds,  maybe there is no 
need to transfer the ```runtime``` back and forth?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459049)
Time Spent: 2h  (was: 1h 50m)

> Review of QueryInfo Class
> -
>
> Key: HIVE-23793
> URL: https://issues.apache.org/jira/browse/HIVE-23793
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23800) Add hooks when HiveServer2 stops due to OutOfMemoryError

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23800?focusedWorklogId=459045=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459045
 ]

ASF GitHub Bot logged work on HIVE-23800:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 00:57
Start Date: 15/Jul/20 00:57
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #1205:
URL: https://github.com/apache/hive/pull/1205#issuecomment-658484942


   > when I was using some of these hooks earlier: they are too specific to 
some purpose.
   > right now the proposed callback is a {{Runnable}} ... which means you will 
have to create a completely new one in case you want to pass some things to it 
later...
   > 
   > so...I would recommend the following:
   > 
   > * remove the "oom" keyword from the name of the hook/etc - it could be 
diag, debug or something like that
   > * add a callback which has 2 arguments:
   >   
   >   * a callback type (some enum; which should have a value which is 
specific to "oom")
   >   * some callback payload object to which we can add things later on... it 
could be an object using the {{Adaptable}} pattern
   > 
   > all this stuff is to make it a bit more reusable if we need to reuse it 
later on...
   
   ...Sorry,  I don't get much clear about "some callback payload object to 
which we can add things later on.",  is something like this:
   ```
public class HookRuntime {
   private static final HookRuntime runtime = new HookRuntime();
   public static HookRuntime getHookRuntime() {
 return runtime;
   }
   
   private Map hooks = new HashMap<>();
   
   void callback(Enum Type, HookPayloadObject payload) {
   }
   
   HookPayloadObject getHookPayLoad(Enum typs) {
   
 return null;
   }
 }
   
 public interface HookPayloadObject extends Runnable {
   void addHook(Hook hook);
   void setContext(T context);
 }
   ```
   Thanks a lot for your time!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459045)
Time Spent: 2h 10m  (was: 2h)

> Add hooks when HiveServer2 stops due to OutOfMemoryError
> 
>
> Key: HIVE-23800
> URL: https://issues.apache.org/jira/browse/HIVE-23800
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Make oom hook an interface of HiveServer2,  so user can implement the hook to 
> do something before HS2 stops, such as dumping the heap or altering the 
> devops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23822) Sorted dynamic partition optimization could remove auto stat task

2020-07-14 Thread Vineet Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg resolved HIVE-23822.

Fix Version/s: 4.0.0
   Resolution: Fixed

> Sorted dynamic partition optimization could remove auto stat task
> -
>
> Key: HIVE-23822
> URL: https://issues.apache.org/jira/browse/HIVE-23822
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> {{mm_dp}} has reproducer where INSERT query is missing auto stats task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23822) Sorted dynamic partition optimization could remove auto stat task

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23822?focusedWorklogId=459044=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459044
 ]

ASF GitHub Bot logged work on HIVE-23822:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 00:56
Start Date: 15/Jul/20 00:56
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 merged pull request #1231:
URL: https://github.com/apache/hive/pull/1231


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459044)
Time Spent: 1.5h  (was: 1h 20m)

> Sorted dynamic partition optimization could remove auto stat task
> -
>
> Key: HIVE-23822
> URL: https://issues.apache.org/jira/browse/HIVE-23822
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> {{mm_dp}} has reproducer where INSERT query is missing auto stats task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23822) Sorted dynamic partition optimization could remove auto stat task

2020-07-14 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157783#comment-17157783
 ] 

Jesus Camacho Rodriguez commented on HIVE-23822:


+1

> Sorted dynamic partition optimization could remove auto stat task
> -
>
> Key: HIVE-23822
> URL: https://issues.apache.org/jira/browse/HIVE-23822
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {{mm_dp}} has reproducer where INSERT query is missing auto stats task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23477) LLAP : mmap allocation interruptions fails to notify other threads

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23477?focusedWorklogId=459035=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459035
 ]

ASF GitHub Bot logged work on HIVE-23477:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 00:32
Start Date: 15/Jul/20 00:32
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1020:
URL: https://github.com/apache/hive/pull/1020#issuecomment-658479205


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459035)
Time Spent: 20m  (was: 10m)

> LLAP : mmap allocation interruptions fails to notify other threads
> --
>
> Key: HIVE-23477
> URL: https://issues.apache.org/jira/browse/HIVE-23477
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23477.1.patch, HIVE-23477.2.patch, 
> HIVE-23477.3.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> BuddyAllocator always uses lazy allocation if mmap is enabled. If query 
> fragment is interrupted at the time of arena allocation, 
> ClosedByInterruptionException is thrown. This exception artificially triggers 
> allocator OutOfMemoryError and fails to notify other threads waiting to 
> allocate arenas. 
> {code:java}
> 2020-05-15 00:03:23.254  WARN [TezTR-128417_1_3_1_1_0] LlapIoImpl: Failed 
> trying to allocate memory mapped arena
> java.nio.channels.ClosedByInterruptException
> at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:970)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.preallocateArenaBuffer(BuddyAllocator.java:867)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.access$1100(BuddyAllocator.java:69)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.init(BuddyAllocator.java:900)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.allocateWithExpand(BuddyAllocator.java:1458)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.access$800(BuddyAllocator.java:884)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateWithExpand(BuddyAllocator.java:740)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:330)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.wrapBbForFile(MetadataCache.java:257)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:216)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:49)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:343)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:238)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.(VectorizedParquetRecordReader.java:160)
> at 
> org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50)
> at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:427)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156)
> at 
> 

[jira] [Work logged] (HIVE-23822) Sorted dynamic partition optimization could remove auto stat task

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23822?focusedWorklogId=458955=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458955
 ]

ASF GitHub Bot logged work on HIVE-23822:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 21:06
Start Date: 14/Jul/20 21:06
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 commented on a change in pull request 
#1231:
URL: https://github.com/apache/hive/pull/1231#discussion_r454645150



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
##
@@ -409,26 +409,54 @@ private boolean 
removeRSInsertedByEnforceBucketing(FileSinkOperator fsOp) {
   // and grand child
   if (found) {
 Operator rsParent = 
rsToRemove.getParentOperators().get(0);
-Operator rsChild = 
rsToRemove.getChildOperators().get(0);
-Operator rsGrandChild = 
rsChild.getChildOperators().get(0);
-
-if (rsChild instanceof SelectOperator) {
-  // if schema size cannot be matched, then it could be because of 
constant folding
-  // converting partition column expression to constant expression. 
The constant
-  // expression will then get pruned by column pruner since it will 
not reference to
-  // any columns.
-  if (rsParent.getSchema().getSignature().size() !=
-  rsChild.getSchema().getSignature().size()) {
+List> rsChildren = 
rsToRemove.getChildOperators();
+
+Operator rsChildToRemove = null;
+
+for (Operator rsChild : rsChildren) {

Review comment:
   @jcamachor all tests passed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458955)
Time Spent: 1h 20m  (was: 1h 10m)

> Sorted dynamic partition optimization could remove auto stat task
> -
>
> Key: HIVE-23822
> URL: https://issues.apache.org/jira/browse/HIVE-23822
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {{mm_dp}} has reproducer where INSERT query is missing auto stats task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23807) Wrong results with vectorization enabled

2020-07-14 Thread Vineet Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-23807:
---
Status: Patch Available  (was: Open)

Pushed to branch-2, Thanks for taking a look [~jcamachorodriguez]

> Wrong results with vectorization enabled
> 
>
> Key: HIVE-23807
> URL: https://issues.apache.org/jira/browse/HIVE-23807
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 2.3.0
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: compatibility, pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> *Repro*
> {code:sql}
> CREATE TABLE `test13`(
>   `portfolio_valuation_date` string,
>   `price_cut_off_datetime` string,
>   `portfolio_id_valuation_source` string,
>   `contributor_full_path` string,
>   `position_market_value` double,
>   `mandate_name` string)
> STORED AS ORC;
> INSERT INTO test13 values (
> "2020-01-31", "2020-02-07T03:14:48.007Z", "37",   NULL,   -0.26,  "foo");
> INSERT INTO test13 values (
> "2020-01-31", "2020-02-07T03:14:48.007Z", "37",   NULL,   0.33,   "foo");
> INSERT INTO test13 values (
> "2020-01-31", "2020-02-07T03:14:48.007Z", "37",   NULL,   -0.03,  "foo");
> INSERT INTO test13 values (
> "2020-01-31", "2020-02-07T03:14:48.007Z", "37",   NULL,   0.16,   "foo");
> INSERT INTO test13 values (
> "2020-01-31", "2020-02-07T03:14:48.007Z", "37",   NULL,   0.08,   "foo");
> set hive.fetch.task.conversion=none;
> set hive.explain.user=false;
> set hive.vectorized.execution.enabled=false;
> select Cast(`test13`.`price_cut_off_datetime` AS date) from test13; <-- 
> produces NULL
> set hive.vectorized.execution.enabled=true;
> select Cast(`test13`.`price_cut_off_datetime` AS date) from test13; <-- 
> produces non-null values
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23807) Wrong results with vectorization enabled

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23807?focusedWorklogId=458951=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458951
 ]

ASF GitHub Bot logged work on HIVE-23807:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 21:03
Start Date: 14/Jul/20 21:03
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 merged pull request #1234:
URL: https://github.com/apache/hive/pull/1234


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458951)
Time Spent: 1h  (was: 50m)

> Wrong results with vectorization enabled
> 
>
> Key: HIVE-23807
> URL: https://issues.apache.org/jira/browse/HIVE-23807
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 2.3.0
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: compatibility, pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> *Repro*
> {code:sql}
> CREATE TABLE `test13`(
>   `portfolio_valuation_date` string,
>   `price_cut_off_datetime` string,
>   `portfolio_id_valuation_source` string,
>   `contributor_full_path` string,
>   `position_market_value` double,
>   `mandate_name` string)
> STORED AS ORC;
> INSERT INTO test13 values (
> "2020-01-31", "2020-02-07T03:14:48.007Z", "37",   NULL,   -0.26,  "foo");
> INSERT INTO test13 values (
> "2020-01-31", "2020-02-07T03:14:48.007Z", "37",   NULL,   0.33,   "foo");
> INSERT INTO test13 values (
> "2020-01-31", "2020-02-07T03:14:48.007Z", "37",   NULL,   -0.03,  "foo");
> INSERT INTO test13 values (
> "2020-01-31", "2020-02-07T03:14:48.007Z", "37",   NULL,   0.16,   "foo");
> INSERT INTO test13 values (
> "2020-01-31", "2020-02-07T03:14:48.007Z", "37",   NULL,   0.08,   "foo");
> set hive.fetch.task.conversion=none;
> set hive.explain.user=false;
> set hive.vectorized.execution.enabled=false;
> select Cast(`test13`.`price_cut_off_datetime` AS date) from test13; <-- 
> produces NULL
> set hive.vectorized.execution.enabled=true;
> select Cast(`test13`.`price_cut_off_datetime` AS date) from test13; <-- 
> produces non-null values
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23765) Use ORC file format by default when creating transactional table

2020-07-14 Thread Xiaomeng Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Zhang resolved HIVE-23765.
---
Resolution: Won't Do

> Use ORC file format by default when creating transactional table
> 
>
> Key: HIVE-23765
> URL: https://issues.apache.org/jira/browse/HIVE-23765
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Xiaomeng Zhang
>Assignee: Xiaomeng Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently we support "transactional" keyword in CREATE TABLE command. But if 
> not add "stored as ORC", the table created is insert_only.
> We want to add a feature that when table is transactional (either using 
> "transanctional" keyword or adding tblproperties ('transactional'='true'), 
> the table created by default will be ORC, and 
> 'transactional_properties'='default'.
>  
> {code:java}
> 0: jdbc:hive2://localhost:1> create transactional table xm_tran(id int);
> +---+++
> |           col_name            |                     data_type               
>        |                      comment                       |
> +---+++
> | id                            | int                                         
>        |                                                    |
> |                               | NULL                                        
>        | NULL                                               |
> | # Detailed Table Information  | NULL                                        
>        | NULL                                               |
> | Database:                     | default                                     
>        | NULL                                               |
> | OwnerType:                    | USER                                        
>        | NULL                                               |
> | Owner:                        | hive                                        
>        | NULL                                               |
> | CreateTime:                   | Thu Jun 18 14:01:45 PDT 2020                
>        | NULL                                               |
> | LastAccessTime:               | UNKNOWN                                     
>        | NULL                                               |
> | Retention:                    | 0                                           
>        | NULL                                               |
> | Location:                     | file:/tmp/warehouse/managed/xm_tran         
>        | NULL                                               |
> | Table Type:                   | MANAGED_TABLE                               
>        | NULL                                               |
> | Table Parameters:             | NULL                                        
>        | NULL                                               |
> |                               | COLUMN_STATS_ACCURATE                       
>        | {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"id\":\"true\"}} |
> |                               | bucketing_version                           
>        | 2                                                  |
> |                               | numFiles                                    
>        | 0                                                  |
> |                               | numRows                                     
>        | 0                                                  |
> |                               | rawDataSize                                 
>        | 0                                                  |
> |                               | totalSize                                   
>        | 0                                                  |
> |                               | transactional                               
>        | true                                               |
> |                               | transactional_properties                    
>        | insert_only                                        |
> |                               | transient_lastDdlTime                       
>        | 1592514105                                         |
> |                               | NULL                                        
>        | NULL                                               |
> | # Storage Information         | NULL                                        
>    

[jira] [Work logged] (HIVE-23765) Use ORC file format by default when creating transactional table

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23765?focusedWorklogId=458943=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458943
 ]

ASF GitHub Bot logged work on HIVE-23765:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 20:57
Start Date: 14/Jul/20 20:57
Worklog Time Spent: 10m 
  Work Description: xiaomengzhang closed pull request #1213:
URL: https://github.com/apache/hive/pull/1213


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458943)
Time Spent: 0.5h  (was: 20m)

> Use ORC file format by default when creating transactional table
> 
>
> Key: HIVE-23765
> URL: https://issues.apache.org/jira/browse/HIVE-23765
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Xiaomeng Zhang
>Assignee: Xiaomeng Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently we support "transactional" keyword in CREATE TABLE command. But if 
> not add "stored as ORC", the table created is insert_only.
> We want to add a feature that when table is transactional (either using 
> "transanctional" keyword or adding tblproperties ('transactional'='true'), 
> the table created by default will be ORC, and 
> 'transactional_properties'='default'.
>  
> {code:java}
> 0: jdbc:hive2://localhost:1> create transactional table xm_tran(id int);
> +---+++
> |           col_name            |                     data_type               
>        |                      comment                       |
> +---+++
> | id                            | int                                         
>        |                                                    |
> |                               | NULL                                        
>        | NULL                                               |
> | # Detailed Table Information  | NULL                                        
>        | NULL                                               |
> | Database:                     | default                                     
>        | NULL                                               |
> | OwnerType:                    | USER                                        
>        | NULL                                               |
> | Owner:                        | hive                                        
>        | NULL                                               |
> | CreateTime:                   | Thu Jun 18 14:01:45 PDT 2020                
>        | NULL                                               |
> | LastAccessTime:               | UNKNOWN                                     
>        | NULL                                               |
> | Retention:                    | 0                                           
>        | NULL                                               |
> | Location:                     | file:/tmp/warehouse/managed/xm_tran         
>        | NULL                                               |
> | Table Type:                   | MANAGED_TABLE                               
>        | NULL                                               |
> | Table Parameters:             | NULL                                        
>        | NULL                                               |
> |                               | COLUMN_STATS_ACCURATE                       
>        | {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"id\":\"true\"}} |
> |                               | bucketing_version                           
>        | 2                                                  |
> |                               | numFiles                                    
>        | 0                                                  |
> |                               | numRows                                     
>        | 0                                                  |
> |                               | rawDataSize                                 
>        | 0                                                  |
> |                               | totalSize                                   
>        

[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=458914=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458914
 ]

ASF GitHub Bot logged work on HIVE-20447:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 20:17
Start Date: 14/Jul/20 20:17
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1169:
URL: https://github.com/apache/hive/pull/1169#discussion_r454618095



##
File path: beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java
##
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * This source file is based on code taken from SQLLine 1.9
+ * See SQLLine notice in LICENSE
+ */
+package org.apache.hive.beeline;
+
+import java.sql.SQLException;
+import java.sql.Types;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import com.fasterxml.jackson.core.JsonEncoding;
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonGenerator;
+
+/**
+ * OutputFormat for standard JSON format.
+ * 
+ */ 
+public class JSONOutputFormat extends AbstractOutputFormat {
+  protected final BeeLine beeLine;
+  protected JsonGenerator generator;
+  
+
+  /**
+   * @param beeLine
+   */
+  JSONOutputFormat(BeeLine beeLine){ 
+this.beeLine = beeLine;
+ByteArrayOutputStream buf = new ByteArrayOutputStream();
+try {
+  this.generator = new JsonFactory().createGenerator(buf, 
JsonEncoding.UTF8);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printHeader(Rows.Row header) {
+try {
+  generator.writeStartObject();
+  generator.writeArrayFieldStart("resultset");
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printFooter(Rows.Row header) {
+try {
+  generator.writeEndArray();
+  generator.writeEndObject();
+  generator.flush();
+  String out = ((ByteArrayOutputStream) 
generator.getOutputTarget()).toString("UTF-8");
+  beeLine.output(out);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printRow(Rows rows, Rows.Row header, Rows.Row row) {
+String[] head = header.values;
+String[] vals = row.values;
+
+try {
+  int colCount = rows.rsMeta.getColumnCount();
+  boolean objStartFlag = true;
+
+  for (int i = 0; (i < head.length) && (i < vals.length); i++) {
+if (objStartFlag) {
+  generator.writeStartObject();
+  objStartFlag = false;
+}
+
+generator.writeFieldName(head[i]);
+switch(rows.rsMeta.getColumnType(i+1)) {
+  case Types.TINYINT:
+  case Types.SMALLINT:
+  case Types.INTEGER:
+  case Types.BIGINT:
+  case Types.REAL:
+  case Types.FLOAT:
+  case Types.DOUBLE:
+  case Types.DECIMAL:
+  case Types.NUMERIC:
+  case Types.ROWID:
+generator.writeNumber(vals[i]);
+break;
+  case Types.NULL:
+generator.writeNull();
+break;
+  case Types.BOOLEAN:
+generator.writeBoolean(Boolean.parseBoolean(vals[i]));
+break;
+  default:

Review comment:
   
https://fasterxml.github.io/jackson-core/javadoc/2.8/com/fasterxml/jackson/core/JsonGenerator.html#writeBinary(byte[])





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458914)
Time Spent: 4h 10m  (was: 4h)

> Add JSON Outputformat support
> -
>
> Key: HIVE-20447
> URL: https://issues.apache.org/jira/browse/HIVE-20447
> Project: Hive
>  Issue Type: Task
>  Components: Beeline
> 

[jira] [Work logged] (HIVE-23840) Use LLAP to get orc metadata

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23840?focusedWorklogId=458894=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458894
 ]

ASF GitHub Bot logged work on HIVE-23840:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 19:48
Start Date: 14/Jul/20 19:48
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1251:
URL: https://github.com/apache/hive/pull/1251#discussion_r454602904



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -129,6 +137,16 @@
*/
   private SearchArgument deleteEventSarg = null;
 
+  /**
+   * Cachetag associated with the Split
+   */
+  private final CacheTag cacheTag;
+
+  /**
+   * Skip using Llap IO cache for checking delete_delta files if the 
configuration is not correct
+   */
+  private static boolean skipLlapCache = true;

Review comment:
   That was a mistake. Corrected, and initialized as false





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458894)
Time Spent: 50m  (was: 40m)

> Use LLAP to get orc metadata
> 
>
> Key: HIVE-23840
> URL: https://issues.apache.org/jira/browse/HIVE-23840
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HIVE-23824 added the possibility to access ORC metadata. We can use this to 
> decide which delta files should be read, and which could be omitted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23840) Use LLAP to get orc metadata

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23840?focusedWorklogId=458893=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458893
 ]

ASF GitHub Bot logged work on HIVE-23840:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 19:48
Start Date: 14/Jul/20 19:48
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1251:
URL: https://github.com/apache/hive/pull/1251#discussion_r454602727



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -232,6 +250,17 @@ private VectorizedOrcAcidRowBatchReader(JobConf conf, 
OrcSplit orcSplit, Reporte
 
 this.syntheticProps = orcSplit.getSyntheticAcidProps();
 
+if (LlapHiveUtils.isLlapMode(conf) && LlapProxy.isDaemon()
+&& HiveConf.getBoolVar(conf, ConfVars.LLAP_TRACK_CACHE_USAGE))
+{
+  MapWork mapWork = LlapHiveUtils.findMapWork(conf);

Review comment:
   Good idea, done!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458893)
Time Spent: 40m  (was: 0.5h)

> Use LLAP to get orc metadata
> 
>
> Key: HIVE-23840
> URL: https://issues.apache.org/jira/browse/HIVE-23840
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HIVE-23824 added the possibility to access ORC metadata. We can use this to 
> decide which delta files should be read, and which could be omitted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23840) Use LLAP to get orc metadata

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23840?focusedWorklogId=458895=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458895
 ]

ASF GitHub Bot logged work on HIVE-23840:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 19:48
Start Date: 14/Jul/20 19:48
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1251:
URL: https://github.com/apache/hive/pull/1251#discussion_r454603042



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -1562,20 +1580,31 @@ public int compareTo(CompressedOwid other) {
   try {
 final Path[] deleteDeltaDirs = getDeleteDeltaDirsFromSplit(orcSplit);
 if (deleteDeltaDirs.length > 0) {
+  FileSystem fs = orcSplit.getPath().getFileSystem(conf);
+  AcidOutputFormat.Options orcSplitMinMaxWriteIds =
+  AcidUtils.parseBaseOrDeltaBucketFilename(orcSplit.getPath(), 
conf);
   int totalDeleteEventCount = 0;
   for (Path deleteDeltaDir : deleteDeltaDirs) {
-FileSystem fs = deleteDeltaDir.getFileSystem(conf);
+if (!isQualifiedDeleteDeltaForSplit(orcSplitMinMaxWriteIds, 
deleteDeltaDir)) {
+  continue;
+}
 Path[] deleteDeltaFiles = 
OrcRawRecordMerger.getDeltaFiles(deleteDeltaDir, bucket,
 new OrcRawRecordMerger.Options().isCompacting(false), null);
 for (Path deleteDeltaFile : deleteDeltaFiles) {
   try {
-/**
- * todo: we have OrcSplit.orcTail so we should be able to get 
stats from there
- */
-Reader deleteDeltaReader = 
OrcFile.createReader(deleteDeltaFile, OrcFile.readerOptions(conf));
-if (deleteDeltaReader.getNumberOfRows() <= 0) {
+ReaderData readerData = getOrcTail(deleteDeltaFile, conf, 
cacheTag);
+OrcTail orcTail = readerData.orcTail;
+if (orcTail.getFooter().getNumberOfRows() <= 0) {
   continue; // just a safe check to ensure that we are not 
reading empty delete files.
 }
+OrcRawRecordMerger.KeyInterval deleteKeyInterval = 
findDeleteMinMaxKeys(orcTail, deleteDeltaFile);
+if (!deleteKeyInterval.isIntersects(keyInterval)) {
+  // If there is no intersection between data and delete 
delta, do not read delete file
+  continue;
+}
+// Create the reader if we got the OrcTail from cache

Review comment:
   Added more comment





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458895)
Time Spent: 1h  (was: 50m)

> Use LLAP to get orc metadata
> 
>
> Key: HIVE-23840
> URL: https://issues.apache.org/jira/browse/HIVE-23840
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> HIVE-23824 added the possibility to access ORC metadata. We can use this to 
> decide which delta files should be read, and which could be omitted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23767) Send ValidWriteIDList in request for all the new HMS get_* APIs that are in request/response form

2020-07-14 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved HIVE-23767.

Fix Version/s: 4.0.0
   Resolution: Fixed

> Send ValidWriteIDList in request for all the new HMS get_* APIs that are in 
> request/response form
> -
>
> Key: HIVE-23767
> URL: https://issues.apache.org/jira/browse/HIVE-23767
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
> Fix For: 4.0.0
>
>
> We recently introduced new set of HMS APIs that take ValidWriteIDList in the 
> request, as part of HIVE-22017.
> We should switch to these new APIs, wherever required and start sending 
> ValidWriteIDList in request for all the new HMS get_* APIs that are in 
> request/response form.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23767) Send ValidWriteIDList in request for all the new HMS get_* APIs that are in request/response form

2020-07-14 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157636#comment-17157636
 ] 

Vihang Karajgaonkar commented on HIVE-23767:


Patch was reviewed here https://github.com/apache/hive/pull/1217 and merged in 
master branch. Thanks for your contribution [~kishendas].



> Send ValidWriteIDList in request for all the new HMS get_* APIs that are in 
> request/response form
> -
>
> Key: HIVE-23767
> URL: https://issues.apache.org/jira/browse/HIVE-23767
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> We recently introduced new set of HMS APIs that take ValidWriteIDList in the 
> request, as part of HIVE-22017.
> We should switch to these new APIs, wherever required and start sending 
> ValidWriteIDList in request for all the new HMS get_* APIs that are in 
> request/response form.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=458884=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458884
 ]

ASF GitHub Bot logged work on HIVE-20447:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 19:25
Start Date: 14/Jul/20 19:25
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1169:
URL: https://github.com/apache/hive/pull/1169#discussion_r454590457



##
File path: beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java
##
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * This source file is based on code taken from SQLLine 1.9
+ * See SQLLine notice in LICENSE
+ */
+package org.apache.hive.beeline;
+
+import java.sql.SQLException;
+import java.sql.Types;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import com.fasterxml.jackson.core.JsonEncoding;
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonGenerator;
+
+/**
+ * OutputFormat for standard JSON format.
+ * 
+ */ 
+public class JSONOutputFormat extends AbstractOutputFormat {
+  protected final BeeLine beeLine;
+  protected JsonGenerator generator;
+  
+
+  /**
+   * @param beeLine
+   */
+  JSONOutputFormat(BeeLine beeLine){ 
+this.beeLine = beeLine;
+ByteArrayOutputStream buf = new ByteArrayOutputStream();
+try {
+  this.generator = new JsonFactory().createGenerator(buf, 
JsonEncoding.UTF8);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printHeader(Rows.Row header) {
+try {
+  generator.writeStartObject();
+  generator.writeArrayFieldStart("resultset");
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printFooter(Rows.Row header) {
+try {
+  generator.writeEndArray();
+  generator.writeEndObject();
+  generator.flush();
+  String out = ((ByteArrayOutputStream) 
generator.getOutputTarget()).toString("UTF-8");
+  beeLine.output(out);
+} catch(IOException e) {

Review comment:
   Please clear the `ByteArrayInputStream` here otherwise it will carry 
around all the buffered data for a while.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458884)
Time Spent: 4h  (was: 3h 50m)

> Add JSON Outputformat support
> -
>
> Key: HIVE-20447
> URL: https://issues.apache.org/jira/browse/HIVE-20447
> Project: Hive
>  Issue Type: Task
>  Components: Beeline
>Reporter: Max Efremov
>Assignee: Hunter Logan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20447.01.patch
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> This function is present in SQLLine. We need add it to beeline too.
> https://github.com/julianhyde/sqlline/pull/84



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=458883=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458883
 ]

ASF GitHub Bot logged work on HIVE-20447:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 19:24
Start Date: 14/Jul/20 19:24
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1169:
URL: https://github.com/apache/hive/pull/1169#discussion_r454589888



##
File path: beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java
##
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * This source file is based on code taken from SQLLine 1.9
+ * See SQLLine notice in LICENSE
+ */
+package org.apache.hive.beeline;
+
+import java.sql.SQLException;
+import java.sql.Types;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import com.fasterxml.jackson.core.JsonEncoding;
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonGenerator;
+
+/**
+ * OutputFormat for standard JSON format.
+ * 
+ */ 
+public class JSONOutputFormat extends AbstractOutputFormat {
+  protected final BeeLine beeLine;
+  protected JsonGenerator generator;
+  
+
+  /**
+   * @param beeLine
+   */
+  JSONOutputFormat(BeeLine beeLine){ 
+this.beeLine = beeLine;
+ByteArrayOutputStream buf = new ByteArrayOutputStream();
+try {
+  this.generator = new JsonFactory().createGenerator(buf, 
JsonEncoding.UTF8);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printHeader(Rows.Row header) {
+try {
+  generator.writeStartObject();
+  generator.writeArrayFieldStart("resultset");
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printFooter(Rows.Row header) {
+try {
+  generator.writeEndArray();
+  generator.writeEndObject();
+  generator.flush();
+  String out = ((ByteArrayOutputStream) 
generator.getOutputTarget()).toString("UTF-8");
+  beeLine.output(out);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printRow(Rows rows, Rows.Row header, Rows.Row row) {
+String[] head = header.values;
+String[] vals = row.values;
+
+try {
+  int colCount = rows.rsMeta.getColumnCount();
+  boolean objStartFlag = true;
+
+  for (int i = 0; (i < head.length) && (i < vals.length); i++) {
+if (objStartFlag) {
+  generator.writeStartObject();
+  objStartFlag = false;
+}
+
+generator.writeFieldName(head[i]);
+switch(rows.rsMeta.getColumnType(i+1)) {
+  case Types.TINYINT:
+  case Types.SMALLINT:
+  case Types.INTEGER:
+  case Types.BIGINT:
+  case Types.REAL:
+  case Types.FLOAT:
+  case Types.DOUBLE:
+  case Types.DECIMAL:
+  case Types.NUMERIC:
+  case Types.ROWID:
+generator.writeNumber(vals[i]);
+break;
+  case Types.NULL:
+generator.writeNull();
+break;
+  case Types.BOOLEAN:
+generator.writeBoolean(Boolean.parseBoolean(vals[i]));
+break;
+  default:

Review comment:
   Ya, this is not a particularly helpful format.  The default in JSON is 
Base64 for binary data, so you should use that and not rely on this default





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458883)
Time Spent: 3h 50m  (was: 3h 40m)

> Add JSON Outputformat support
> -
>
> Key: HIVE-20447
> URL: https://issues.apache.org/jira/browse/HIVE-20447
> Project: Hive
>  Issue Type: Task
>

[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=458882=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458882
 ]

ASF GitHub Bot logged work on HIVE-20447:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 19:23
Start Date: 14/Jul/20 19:23
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1169:
URL: https://github.com/apache/hive/pull/1169#discussion_r454589304



##
File path: beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java
##
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * This source file is based on code taken from SQLLine 1.9
+ * See SQLLine notice in LICENSE
+ */
+package org.apache.hive.beeline;
+
+import java.sql.SQLException;
+import java.sql.Types;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import com.fasterxml.jackson.core.JsonEncoding;
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonGenerator;
+
+/**
+ * OutputFormat for standard JSON format.
+ * 
+ */ 
+public class JSONOutputFormat extends AbstractOutputFormat {
+  protected final BeeLine beeLine;
+  protected JsonGenerator generator;
+  
+
+  /**
+   * @param beeLine
+   */
+  JSONOutputFormat(BeeLine beeLine){ 
+this.beeLine = beeLine;
+ByteArrayOutputStream buf = new ByteArrayOutputStream();
+try {
+  this.generator = new JsonFactory().createGenerator(buf, 
JsonEncoding.UTF8);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printHeader(Rows.Row header) {
+try {
+  generator.writeStartObject();
+  generator.writeArrayFieldStart("resultset");
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printFooter(Rows.Row header) {
+try {
+  generator.writeEndArray();
+  generator.writeEndObject();
+  generator.flush();
+  String out = ((ByteArrayOutputStream) 
generator.getOutputTarget()).toString("UTF-8");

Review comment:
   OK.  I see what you're saying.  The JDK folks have not gotten around to 
adding the method to accept a discrete `Charset` yet.  Please do use the 
`.name()` call.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458882)
Time Spent: 3h 40m  (was: 3.5h)

> Add JSON Outputformat support
> -
>
> Key: HIVE-20447
> URL: https://issues.apache.org/jira/browse/HIVE-20447
> Project: Hive
>  Issue Type: Task
>  Components: Beeline
>Reporter: Max Efremov
>Assignee: Hunter Logan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20447.01.patch
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> This function is present in SQLLine. We need add it to beeline too.
> https://github.com/julianhyde/sqlline/pull/84



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=458881=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458881
 ]

ASF GitHub Bot logged work on HIVE-20447:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 19:20
Start Date: 14/Jul/20 19:20
Worklog Time Spent: 10m 
  Work Description: HunterL commented on a change in pull request #1169:
URL: https://github.com/apache/hive/pull/1169#discussion_r454588109



##
File path: beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java
##
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * This source file is based on code taken from SQLLine 1.9
+ * See SQLLine notice in LICENSE
+ */
+package org.apache.hive.beeline;
+
+import java.sql.SQLException;
+import java.sql.Types;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import com.fasterxml.jackson.core.JsonEncoding;
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonGenerator;
+
+/**
+ * OutputFormat for standard JSON format.
+ * 
+ */ 
+public class JSONOutputFormat extends AbstractOutputFormat {
+  protected final BeeLine beeLine;
+  protected JsonGenerator generator;
+  
+
+  /**
+   * @param beeLine
+   */
+  JSONOutputFormat(BeeLine beeLine){ 
+this.beeLine = beeLine;
+ByteArrayOutputStream buf = new ByteArrayOutputStream();
+try {
+  this.generator = new JsonFactory().createGenerator(buf, 
JsonEncoding.UTF8);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printHeader(Rows.Row header) {
+try {
+  generator.writeStartObject();
+  generator.writeArrayFieldStart("resultset");
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printFooter(Rows.Row header) {
+try {
+  generator.writeEndArray();
+  generator.writeEndObject();
+  generator.flush();
+  String out = ((ByteArrayOutputStream) 
generator.getOutputTarget()).toString("UTF-8");
+  beeLine.output(out);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printRow(Rows rows, Rows.Row header, Rows.Row row) {
+String[] head = header.values;
+String[] vals = row.values;
+
+try {
+  int colCount = rows.rsMeta.getColumnCount();
+  boolean objStartFlag = true;
+
+  for (int i = 0; (i < head.length) && (i < vals.length); i++) {
+if (objStartFlag) {
+  generator.writeStartObject();
+  objStartFlag = false;
+}
+
+generator.writeFieldName(head[i]);
+switch(rows.rsMeta.getColumnType(i+1)) {
+  case Types.TINYINT:
+  case Types.SMALLINT:
+  case Types.INTEGER:
+  case Types.BIGINT:
+  case Types.REAL:
+  case Types.FLOAT:
+  case Types.DOUBLE:
+  case Types.DECIMAL:
+  case Types.NUMERIC:
+  case Types.ROWID:
+generator.writeNumber(vals[i]);
+break;
+  case Types.NULL:
+generator.writeNull();
+break;
+  case Types.BOOLEAN:
+generator.writeBoolean(Boolean.parseBoolean(vals[i]));
+break;
+  default:

Review comment:
   Thinking this through more `Arrays.toString(byte[] a)` will convert it 
to some string that looks like `[16, 34, 67]`, meaning the resulting JSON will 
be `{..."binary_field":"[16,34,67]"...}`.
   
   `JSONOutputFormat` does not have access to the underlying resultset, just 
the values that have already been `toString()`ed. Is the correct behavior then 
to convert back to `byte[]` and then `Base64.encode()`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458881)
Time 

[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=458879=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458879
 ]

ASF GitHub Bot logged work on HIVE-20447:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 19:07
Start Date: 14/Jul/20 19:07
Worklog Time Spent: 10m 
  Work Description: HunterL commented on a change in pull request #1169:
URL: https://github.com/apache/hive/pull/1169#discussion_r454580408



##
File path: beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java
##
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * This source file is based on code taken from SQLLine 1.9
+ * See SQLLine notice in LICENSE
+ */
+package org.apache.hive.beeline;
+
+import java.sql.SQLException;
+import java.sql.Types;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import com.fasterxml.jackson.core.JsonEncoding;
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonGenerator;
+
+/**
+ * OutputFormat for standard JSON format.
+ * 
+ */ 
+public class JSONOutputFormat extends AbstractOutputFormat {
+  protected final BeeLine beeLine;
+  protected JsonGenerator generator;
+  
+
+  /**
+   * @param beeLine
+   */
+  JSONOutputFormat(BeeLine beeLine){ 
+this.beeLine = beeLine;
+ByteArrayOutputStream buf = new ByteArrayOutputStream();
+try {
+  this.generator = new JsonFactory().createGenerator(buf, 
JsonEncoding.UTF8);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printHeader(Rows.Row header) {
+try {
+  generator.writeStartObject();
+  generator.writeArrayFieldStart("resultset");
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printFooter(Rows.Row header) {
+try {
+  generator.writeEndArray();
+  generator.writeEndObject();
+  generator.flush();
+  String out = ((ByteArrayOutputStream) 
generator.getOutputTarget()).toString("UTF-8");

Review comment:
   For clarities sake I'm changing this line to `String out = 
((ByteArrayOutputStream) 
generator.getOutputTarget()).toString(StandardCharsets.UTF_8.name());`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458879)
Time Spent: 3h 20m  (was: 3h 10m)

> Add JSON Outputformat support
> -
>
> Key: HIVE-20447
> URL: https://issues.apache.org/jira/browse/HIVE-20447
> Project: Hive
>  Issue Type: Task
>  Components: Beeline
>Reporter: Max Efremov
>Assignee: Hunter Logan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20447.01.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> This function is present in SQLLine. We need add it to beeline too.
> https://github.com/julianhyde/sqlline/pull/84



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=458878=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458878
 ]

ASF GitHub Bot logged work on HIVE-20447:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 19:06
Start Date: 14/Jul/20 19:06
Worklog Time Spent: 10m 
  Work Description: HunterL commented on a change in pull request #1169:
URL: https://github.com/apache/hive/pull/1169#discussion_r454580133



##
File path: beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java
##
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * This source file is based on code taken from SQLLine 1.9
+ * See SQLLine notice in LICENSE
+ */
+package org.apache.hive.beeline;
+
+import java.sql.SQLException;
+import java.sql.Types;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import com.fasterxml.jackson.core.JsonEncoding;
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonGenerator;
+
+/**
+ * OutputFormat for standard JSON format.
+ * 
+ */ 
+public class JSONOutputFormat extends AbstractOutputFormat {
+  protected final BeeLine beeLine;
+  protected JsonGenerator generator;
+  
+
+  /**
+   * @param beeLine
+   */
+  JSONOutputFormat(BeeLine beeLine){ 
+this.beeLine = beeLine;
+ByteArrayOutputStream buf = new ByteArrayOutputStream();
+try {
+  this.generator = new JsonFactory().createGenerator(buf, 
JsonEncoding.UTF8);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printHeader(Rows.Row header) {
+try {
+  generator.writeStartObject();
+  generator.writeArrayFieldStart("resultset");
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printFooter(Rows.Row header) {
+try {
+  generator.writeEndArray();
+  generator.writeEndObject();
+  generator.flush();
+  String out = ((ByteArrayOutputStream) 
generator.getOutputTarget()).toString("UTF-8");

Review comment:
   The Javadocs for `ByteArrayOutputStream.toString(String charsetName)`[1] 
point to [this charset 
page](https://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html).
   Which says that "UTF-8" as the correct canonical name for the 
StandardCharset. I _think_ my implementation is correct for all JVM's?
   
   [1] 
https://docs.oracle.com/javase/7/docs/api/java/io/ByteArrayOutputStream.html





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458878)
Time Spent: 3h 10m  (was: 3h)

> Add JSON Outputformat support
> -
>
> Key: HIVE-20447
> URL: https://issues.apache.org/jira/browse/HIVE-20447
> Project: Hive
>  Issue Type: Task
>  Components: Beeline
>Reporter: Max Efremov
>Assignee: Hunter Logan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20447.01.patch
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> This function is present in SQLLine. We need add it to beeline too.
> https://github.com/julianhyde/sqlline/pull/84



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=458863=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458863
 ]

ASF GitHub Bot logged work on HIVE-20447:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 18:19
Start Date: 14/Jul/20 18:19
Worklog Time Spent: 10m 
  Work Description: HunterL commented on a change in pull request #1169:
URL: https://github.com/apache/hive/pull/1169#discussion_r454553580



##
File path: beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java
##
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * This source file is based on code taken from SQLLine 1.9
+ * See SQLLine notice in LICENSE
+ */
+package org.apache.hive.beeline;
+
+import java.sql.SQLException;
+import java.sql.Types;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import com.fasterxml.jackson.core.JsonEncoding;
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonGenerator;
+
+/**
+ * OutputFormat for standard JSON format.
+ * 
+ */ 
+public class JSONOutputFormat extends AbstractOutputFormat {
+  protected final BeeLine beeLine;
+  protected JsonGenerator generator;
+  
+
+  /**
+   * @param beeLine
+   */
+  JSONOutputFormat(BeeLine beeLine){ 
+this.beeLine = beeLine;
+ByteArrayOutputStream buf = new ByteArrayOutputStream();
+try {
+  this.generator = new JsonFactory().createGenerator(buf, 
JsonEncoding.UTF8);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printHeader(Rows.Row header) {
+try {
+  generator.writeStartObject();
+  generator.writeArrayFieldStart("resultset");
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printFooter(Rows.Row header) {
+try {
+  generator.writeEndArray();
+  generator.writeEndObject();
+  generator.flush();
+  String out = ((ByteArrayOutputStream) 
generator.getOutputTarget()).toString("UTF-8");
+  beeLine.output(out);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printRow(Rows rows, Rows.Row header, Rows.Row row) {
+String[] head = header.values;
+String[] vals = row.values;
+
+try {
+  int colCount = rows.rsMeta.getColumnCount();
+  boolean objStartFlag = true;
+
+  for (int i = 0; (i < head.length) && (i < vals.length); i++) {
+if (objStartFlag) {
+  generator.writeStartObject();
+  objStartFlag = false;
+}
+
+generator.writeFieldName(head[i]);
+switch(rows.rsMeta.getColumnType(i+1)) {
+  case Types.TINYINT:
+  case Types.SMALLINT:
+  case Types.INTEGER:
+  case Types.BIGINT:
+  case Types.REAL:
+  case Types.FLOAT:
+  case Types.DOUBLE:
+  case Types.DECIMAL:
+  case Types.NUMERIC:
+  case Types.ROWID:
+generator.writeNumber(vals[i]);
+break;
+  case Types.NULL:
+generator.writeNull();
+break;
+  case Types.BOOLEAN:
+generator.writeBoolean(Boolean.parseBoolean(vals[i]));
+break;
+  default:

Review comment:
   So `Rows.java` already does this conversion 
   ```
   else if (o instanceof byte[]) { 
 value = convertBinaryArray ? new String((byte[])o) : 
Arrays.toString((byte[])o);
   }
   ```
   I'm not 100% sure what the expected behavior here in the output format would 
be then? Currently it falls into the default case and prints the already 
converted `byte[]`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458863)
Time Spent: 3h  (was: 2h 50m)

> Add JSON Outputformat 

[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=458855=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458855
 ]

ASF GitHub Bot logged work on HIVE-20447:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 18:15
Start Date: 14/Jul/20 18:15
Worklog Time Spent: 10m 
  Work Description: HunterL commented on a change in pull request #1169:
URL: https://github.com/apache/hive/pull/1169#discussion_r454551180



##
File path: beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java
##
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * This source file is based on code taken from SQLLine 1.9
+ * See SQLLine notice in LICENSE
+ */
+package org.apache.hive.beeline;
+
+import java.sql.SQLException;
+import java.sql.Types;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import com.fasterxml.jackson.core.JsonEncoding;
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonGenerator;
+
+/**
+ * OutputFormat for standard JSON format.
+ * 
+ */ 
+public class JSONOutputFormat extends AbstractOutputFormat {
+  protected final BeeLine beeLine;
+  protected JsonGenerator generator;
+  
+
+  /**
+   * @param beeLine
+   */
+  JSONOutputFormat(BeeLine beeLine){ 
+this.beeLine = beeLine;
+ByteArrayOutputStream buf = new ByteArrayOutputStream();
+try {
+  this.generator = new JsonFactory().createGenerator(buf, 
JsonEncoding.UTF8);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printHeader(Rows.Row header) {
+try {
+  generator.writeStartObject();
+  generator.writeArrayFieldStart("resultset");
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printFooter(Rows.Row header) {
+try {
+  generator.writeEndArray();
+  generator.writeEndObject();
+  generator.flush();
+  String out = ((ByteArrayOutputStream) 
generator.getOutputTarget()).toString("UTF-8");
+  beeLine.output(out);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printRow(Rows rows, Rows.Row header, Rows.Row row) {
+String[] head = header.values;
+String[] vals = row.values;
+
+try {
+  int colCount = rows.rsMeta.getColumnCount();
+  boolean objStartFlag = true;
+
+  for (int i = 0; (i < head.length) && (i < vals.length); i++) {
+if (objStartFlag) {
+  generator.writeStartObject();
+  objStartFlag = false;

Review comment:
   Yup overthought that one, fixed  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458855)
Time Spent: 2h 50m  (was: 2h 40m)

> Add JSON Outputformat support
> -
>
> Key: HIVE-20447
> URL: https://issues.apache.org/jira/browse/HIVE-20447
> Project: Hive
>  Issue Type: Task
>  Components: Beeline
>Reporter: Max Efremov
>Assignee: Hunter Logan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20447.01.patch
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> This function is present in SQLLine. We need add it to beeline too.
> https://github.com/julianhyde/sqlline/pull/84



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=458841=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458841
 ]

ASF GitHub Bot logged work on HIVE-20447:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 18:00
Start Date: 14/Jul/20 18:00
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1169:
URL: https://github.com/apache/hive/pull/1169#discussion_r454541905



##
File path: beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java
##
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * This source file is based on code taken from SQLLine 1.9
+ * See SQLLine notice in LICENSE
+ */
+package org.apache.hive.beeline;
+
+import java.sql.SQLException;
+import java.sql.Types;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import com.fasterxml.jackson.core.JsonEncoding;
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonGenerator;
+
+/**
+ * OutputFormat for standard JSON format.
+ * 
+ */ 
+public class JSONOutputFormat extends AbstractOutputFormat {
+  protected final BeeLine beeLine;
+  protected JsonGenerator generator;
+  
+
+  /**
+   * @param beeLine
+   */
+  JSONOutputFormat(BeeLine beeLine){ 
+this.beeLine = beeLine;
+ByteArrayOutputStream buf = new ByteArrayOutputStream();
+try {
+  this.generator = new JsonFactory().createGenerator(buf, 
JsonEncoding.UTF8);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printHeader(Rows.Row header) {
+try {
+  generator.writeStartObject();
+  generator.writeArrayFieldStart("resultset");
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printFooter(Rows.Row header) {
+try {
+  generator.writeEndArray();
+  generator.writeEndObject();
+  generator.flush();
+  String out = ((ByteArrayOutputStream) 
generator.getOutputTarget()).toString("UTF-8");
+  beeLine.output(out);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printRow(Rows rows, Rows.Row header, Rows.Row row) {
+String[] head = header.values;
+String[] vals = row.values;
+
+try {
+  int colCount = rows.rsMeta.getColumnCount();
+  boolean objStartFlag = true;
+
+  for (int i = 0; (i < head.length) && (i < vals.length); i++) {
+if (objStartFlag) {
+  generator.writeStartObject();
+  objStartFlag = false;
+}
+
+generator.writeFieldName(head[i]);
+switch(rows.rsMeta.getColumnType(i+1)) {
+  case Types.TINYINT:
+  case Types.SMALLINT:
+  case Types.INTEGER:
+  case Types.BIGINT:
+  case Types.REAL:
+  case Types.FLOAT:
+  case Types.DOUBLE:
+  case Types.DECIMAL:
+  case Types.NUMERIC:
+  case Types.ROWID:
+generator.writeNumber(vals[i]);
+break;
+  case Types.NULL:
+generator.writeNull();
+break;
+  case Types.BOOLEAN:
+generator.writeBoolean(Boolean.parseBoolean(vals[i]));
+break;
+  default:

Review comment:
   Need a type for BINARY data.  Make it spit it out as Base-64 to be 
text-friendly.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458841)
Time Spent: 2h 40m  (was: 2.5h)

> Add JSON Outputformat support
> -
>
> Key: HIVE-20447
> URL: https://issues.apache.org/jira/browse/HIVE-20447
> Project: Hive
>  Issue Type: Task
>  Components: Beeline
>Reporter: Max Efremov
> 

[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=458840=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458840
 ]

ASF GitHub Bot logged work on HIVE-20447:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 17:59
Start Date: 14/Jul/20 17:59
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1169:
URL: https://github.com/apache/hive/pull/1169#discussion_r454541362



##
File path: beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java
##
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * This source file is based on code taken from SQLLine 1.9
+ * See SQLLine notice in LICENSE
+ */
+package org.apache.hive.beeline;
+
+import java.sql.SQLException;
+import java.sql.Types;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import com.fasterxml.jackson.core.JsonEncoding;
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonGenerator;
+
+/**
+ * OutputFormat for standard JSON format.
+ * 
+ */ 
+public class JSONOutputFormat extends AbstractOutputFormat {
+  protected final BeeLine beeLine;
+  protected JsonGenerator generator;
+  
+
+  /**
+   * @param beeLine
+   */
+  JSONOutputFormat(BeeLine beeLine){ 
+this.beeLine = beeLine;
+ByteArrayOutputStream buf = new ByteArrayOutputStream();
+try {
+  this.generator = new JsonFactory().createGenerator(buf, 
JsonEncoding.UTF8);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printHeader(Rows.Row header) {
+try {
+  generator.writeStartObject();
+  generator.writeArrayFieldStart("resultset");
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printFooter(Rows.Row header) {
+try {
+  generator.writeEndArray();
+  generator.writeEndObject();
+  generator.flush();
+  String out = ((ByteArrayOutputStream) 
generator.getOutputTarget()).toString("UTF-8");
+  beeLine.output(out);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printRow(Rows rows, Rows.Row header, Rows.Row row) {
+String[] head = header.values;
+String[] vals = row.values;
+
+try {
+  int colCount = rows.rsMeta.getColumnCount();
+  boolean objStartFlag = true;
+
+  for (int i = 0; (i < head.length) && (i < vals.length); i++) {
+if (objStartFlag) {
+  generator.writeStartObject();
+  objStartFlag = false;

Review comment:
   This looks a bit suspect to me.  Should be able to just put one of these 
before the loop and the corresponding `writeEndObject` after the loop.
   
   ```
   final int printCount = Math.min(colCount, Math.min(head.length, 
vals.length));
   generator.writeStartObject();
   for (int i = 0; i < printCount; i++) {
   ...
   }
   generator.writeEndObject();
   
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458840)
Time Spent: 2.5h  (was: 2h 20m)

> Add JSON Outputformat support
> -
>
> Key: HIVE-20447
> URL: https://issues.apache.org/jira/browse/HIVE-20447
> Project: Hive
>  Issue Type: Task
>  Components: Beeline
>Reporter: Max Efremov
>Assignee: Hunter Logan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20447.01.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This function is present in SQLLine. We need add it to beeline too.
> https://github.com/julianhyde/sqlline/pull/84



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=458833=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458833
 ]

ASF GitHub Bot logged work on HIVE-20447:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 17:55
Start Date: 14/Jul/20 17:55
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1169:
URL: https://github.com/apache/hive/pull/1169#discussion_r454539338



##
File path: beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java
##
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * This source file is based on code taken from SQLLine 1.9
+ * See SQLLine notice in LICENSE
+ */
+package org.apache.hive.beeline;
+
+import java.sql.SQLException;
+import java.sql.Types;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import com.fasterxml.jackson.core.JsonEncoding;
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonGenerator;
+
+/**
+ * OutputFormat for standard JSON format.
+ * 
+ */ 
+public class JSONOutputFormat extends AbstractOutputFormat {
+  protected final BeeLine beeLine;
+  protected JsonGenerator generator;
+  
+
+  /**
+   * @param beeLine
+   */
+  JSONOutputFormat(BeeLine beeLine){ 
+this.beeLine = beeLine;
+ByteArrayOutputStream buf = new ByteArrayOutputStream();
+try {
+  this.generator = new JsonFactory().createGenerator(buf, 
JsonEncoding.UTF8);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printHeader(Rows.Row header) {
+try {
+  generator.writeStartObject();
+  generator.writeArrayFieldStart("resultset");
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printFooter(Rows.Row header) {
+try {
+  generator.writeEndArray();
+  generator.writeEndObject();
+  generator.flush();
+  String out = ((ByteArrayOutputStream) 
generator.getOutputTarget()).toString("UTF-8");

Review comment:
   Check out `StandardCharsets.UTF_8`.  It's not a requirement that all 
JVMs support UTF-8.
   
   
https://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458833)
Time Spent: 2h 20m  (was: 2h 10m)

> Add JSON Outputformat support
> -
>
> Key: HIVE-20447
> URL: https://issues.apache.org/jira/browse/HIVE-20447
> Project: Hive
>  Issue Type: Task
>  Components: Beeline
>Reporter: Max Efremov
>Assignee: Hunter Logan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20447.01.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This function is present in SQLLine. We need add it to beeline too.
> https://github.com/julianhyde/sqlline/pull/84



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23822) Sorted dynamic partition optimization could remove auto stat task

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23822?focusedWorklogId=458798=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458798
 ]

ASF GitHub Bot logged work on HIVE-23822:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 17:11
Start Date: 14/Jul/20 17:11
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 commented on a change in pull request 
#1231:
URL: https://github.com/apache/hive/pull/1231#discussion_r454511428



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
##
@@ -409,26 +409,54 @@ private boolean 
removeRSInsertedByEnforceBucketing(FileSinkOperator fsOp) {
   // and grand child
   if (found) {
 Operator rsParent = 
rsToRemove.getParentOperators().get(0);
-Operator rsChild = 
rsToRemove.getChildOperators().get(0);
-Operator rsGrandChild = 
rsChild.getChildOperators().get(0);
-
-if (rsChild instanceof SelectOperator) {
-  // if schema size cannot be matched, then it could be because of 
constant folding
-  // converting partition column expression to constant expression. 
The constant
-  // expression will then get pruned by column pruner since it will 
not reference to
-  // any columns.
-  if (rsParent.getSchema().getSignature().size() !=
-  rsChild.getSchema().getSignature().size()) {
+List> rsChildren = 
rsToRemove.getChildOperators();
+
+Operator rsChildToRemove = null;
+
+for (Operator rsChild : rsChildren) {

Review comment:
   @jcamachor I have addressed in latest commit.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458798)
Time Spent: 1h 10m  (was: 1h)

> Sorted dynamic partition optimization could remove auto stat task
> -
>
> Key: HIVE-23822
> URL: https://issues.apache.org/jira/browse/HIVE-23822
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {{mm_dp}} has reproducer where INSERT query is missing auto stats task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23822) Sorted dynamic partition optimization could remove auto stat task

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23822?focusedWorklogId=458797=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458797
 ]

ASF GitHub Bot logged work on HIVE-23822:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 17:10
Start Date: 14/Jul/20 17:10
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 commented on a change in pull request 
#1231:
URL: https://github.com/apache/hive/pull/1231#discussion_r454511428



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
##
@@ -409,26 +409,54 @@ private boolean 
removeRSInsertedByEnforceBucketing(FileSinkOperator fsOp) {
   // and grand child
   if (found) {
 Operator rsParent = 
rsToRemove.getParentOperators().get(0);
-Operator rsChild = 
rsToRemove.getChildOperators().get(0);
-Operator rsGrandChild = 
rsChild.getChildOperators().get(0);
-
-if (rsChild instanceof SelectOperator) {
-  // if schema size cannot be matched, then it could be because of 
constant folding
-  // converting partition column expression to constant expression. 
The constant
-  // expression will then get pruned by column pruner since it will 
not reference to
-  // any columns.
-  if (rsParent.getSchema().getSignature().size() !=
-  rsChild.getSchema().getSignature().size()) {
+List> rsChildren = 
rsToRemove.getChildOperators();
+
+Operator rsChildToRemove = null;
+
+for (Operator rsChild : rsChildren) {

Review comment:
   @jcamachor I have addressed in latest comment.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458797)
Time Spent: 1h  (was: 50m)

> Sorted dynamic partition optimization could remove auto stat task
> -
>
> Key: HIVE-23822
> URL: https://issues.apache.org/jira/browse/HIVE-23822
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {{mm_dp}} has reproducer where INSERT query is missing auto stats task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23822) Sorted dynamic partition optimization could remove auto stat task

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23822?focusedWorklogId=458743=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458743
 ]

ASF GitHub Bot logged work on HIVE-23822:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 15:58
Start Date: 14/Jul/20 15:58
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1231:
URL: https://github.com/apache/hive/pull/1231#discussion_r454464296



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
##
@@ -409,26 +409,54 @@ private boolean 
removeRSInsertedByEnforceBucketing(FileSinkOperator fsOp) {
   // and grand child
   if (found) {
 Operator rsParent = 
rsToRemove.getParentOperators().get(0);
-Operator rsChild = 
rsToRemove.getChildOperators().get(0);
-Operator rsGrandChild = 
rsChild.getChildOperators().get(0);
-
-if (rsChild instanceof SelectOperator) {
-  // if schema size cannot be matched, then it could be because of 
constant folding
-  // converting partition column expression to constant expression. 
The constant
-  // expression will then get pruned by column pruner since it will 
not reference to
-  // any columns.
-  if (rsParent.getSchema().getSignature().size() !=
-  rsChild.getSchema().getSignature().size()) {
+List> rsChildren = 
rsToRemove.getChildOperators();
+
+Operator rsChildToRemove = null;
+
+for (Operator rsChild : rsChildren) {

Review comment:
   Yes, there should not be a RS with multiple children, we can simplify 
that code. You can even add an assert to the new code to make sure.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458743)
Time Spent: 50m  (was: 40m)

> Sorted dynamic partition optimization could remove auto stat task
> -
>
> Key: HIVE-23822
> URL: https://issues.apache.org/jira/browse/HIVE-23822
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {{mm_dp}} has reproducer where INSERT query is missing auto stats task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23822) Sorted dynamic partition optimization could remove auto stat task

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23822?focusedWorklogId=458740=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458740
 ]

ASF GitHub Bot logged work on HIVE-23822:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 15:55
Start Date: 14/Jul/20 15:55
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 commented on a change in pull request 
#1231:
URL: https://github.com/apache/hive/pull/1231#discussion_r454462383



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
##
@@ -409,26 +409,54 @@ private boolean 
removeRSInsertedByEnforceBucketing(FileSinkOperator fsOp) {
   // and grand child
   if (found) {
 Operator rsParent = 
rsToRemove.getParentOperators().get(0);
-Operator rsChild = 
rsToRemove.getChildOperators().get(0);
-Operator rsGrandChild = 
rsChild.getChildOperators().get(0);
-
-if (rsChild instanceof SelectOperator) {
-  // if schema size cannot be matched, then it could be because of 
constant folding
-  // converting partition column expression to constant expression. 
The constant
-  // expression will then get pruned by column pruner since it will 
not reference to
-  // any columns.
-  if (rsParent.getSchema().getSignature().size() !=
-  rsChild.getSchema().getSignature().size()) {
+List> rsChildren = 
rsToRemove.getChildOperators();
+
+Operator rsChildToRemove = null;
+
+for (Operator rsChild : rsChildren) {

Review comment:
   I assumed that this could be possibility and therefore accounted for it, 
but if this assumption is wrong I update the code.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458740)
Time Spent: 40m  (was: 0.5h)

> Sorted dynamic partition optimization could remove auto stat task
> -
>
> Key: HIVE-23822
> URL: https://issues.apache.org/jira/browse/HIVE-23822
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {{mm_dp}} has reproducer where INSERT query is missing auto stats task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22344) I can't run hive in command line

2020-07-14 Thread xiepengjie (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157449#comment-17157449
 ] 

xiepengjie commented on HIVE-22344:
---

Because the versions of guava in Hadoop 3.1.2 is different from hive's.

> I can't run hive in command line
> 
>
> Key: HIVE-22344
> URL: https://issues.apache.org/jira/browse/HIVE-22344
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 3.1.2
> Environment: hive: 3.1.2
> hadoop 3.2.1
>  
>Reporter: Smith Cruise
>Priority: Blocker
>
> I can't run hive in command. It tell me :
> {code:java}
> [hadoop@master lib]$ hive
> which: no hbase in 
> (/home/hadoop/apache-hive-3.1.2-bin/bin:{{pwd}}/bin:/home/hadoop/.local/bin:/home/hadoop/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/home/hadoop/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/hadoop/hadoop3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Exception in thread "main" java.lang.NoSuchMethodError: 
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at org.apache.hadoop.mapred.JobConf.setJar(JobConf.java:536)
> at org.apache.hadoop.mapred.JobConf.setJarByClass(JobConf.java:554)
> at org.apache.hadoop.mapred.JobConf.(JobConf.java:448)
> at org.apache.hadoop.hive.conf.HiveConf.initialize(HiveConf.java:5141)
> at org.apache.hadoop.hive.conf.HiveConf.(HiveConf.java:5099)
> at 
> org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:97)
> at 
> org.apache.hadoop.hive.common.LogUtils.initHiveLog4j(LogUtils.java:81)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:699)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
> {code}
> I don't know what's wrong about it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-22668) ClassNotFoundException:HiveHBaseTableInputFormat when tez include reduce operation

2020-07-14 Thread xiepengjie (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157443#comment-17157443
 ] 

xiepengjie edited comment on HIVE-22668 at 7/14/20, 3:03 PM:
-

Your jar are stored locally, you should upload it to remote space like hdfs, 
then, you can add jar like this:

```

add jar hdfs:///usr/hdp/hive-hbase-handler-3.1.0.3.1.4.0-315.jar

```

Beeline and jar are on your local machine, but the thrift server of hiveserver2 
on the remote machine, hs2 can not find the path 
'/usr/hdp/3.1.4.0-315/hive/lib/...'

 

If it does't work, maybe you should add the jar in HIVE_AUX_JARS_PATH and 
restart hiveserver2:

```

export HIVE_AUX_JARS_PATH='/hive/aux/jar/path/a.jar,/hive/aux/jar/path/b.jar'

```


was (Author: xiepengjie):
Your jar are stored locally, you should upload it to remote space like hdfs, 
then, you can add jar like this:

```

add jar hdfs:///usr/hdp/hive-hbase-handler-3.1.0.3.1.4.0-315.jar

```

Beeline and jar are on your local machine, but the thrift server of hiveserver2 
on the remote machine, hs2 can not find the path 
'/usr/hdp/3.1.4.0-315/hive/lib/...'

 

If it does't work, maybe you should add the add in HIVE_AUX_JARS_PATH and 
restart hiveserver2:

```

export HIVE_AUX_JARS_PATH='/hive/aux/jar/path/a.jar,/hive/aux/jar/path/b.jar'

```

> ClassNotFoundException:HiveHBaseTableInputFormat when tez include reduce 
> operation
> --
>
> Key: HIVE-22668
> URL: https://issues.apache.org/jira/browse/HIVE-22668
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, Hive
>Affects Versions: 3.1.0
>Reporter: Michael
>Priority: Blocker
>
> When I use beeline to execute script which the operation is insert data from 
> hive to hbase.
> If the operation include reduce step, this exception will appearance.
> I try to add jar in beeline like this:
> {code:java}
> ADD JAR /usr/hdp/3.1.4.0-315/hive/lib/hive-hbase-handler-3.1.0.3.1.4.0-315.jar
> ADD JAR /usr/hdp/3.1.4.0-315/hive/lib/guava-28.0-jre.jar
> ADD JAR /usr/hdp/3.1.4.0-315/hive/lib/zookeeper-3.4.6.3.1.4.0-315.jar{code}
> but this problem always exist. 
> {code:java}
> Serialization trace:
> inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
> tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)
> conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator)
> reducer (org.apache.hadoop.hive.ql.plan.ReduceWork)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:185)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:326)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:314)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:759)
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObjectOrNull(SerializationUtilities.java:203)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:132)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
> at 
> 

[jira] [Commented] (HIVE-22668) ClassNotFoundException:HiveHBaseTableInputFormat when tez include reduce operation

2020-07-14 Thread xiepengjie (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157443#comment-17157443
 ] 

xiepengjie commented on HIVE-22668:
---

Your jar are stored locally, you should upload it to remote space like hdfs, 
then, you can add jar like this:

```

add jar hdfs:///usr/hdp/hive-hbase-handler-3.1.0.3.1.4.0-315.jar

```

Beeline and jar are on your local machine, but the thrift server of hiveserver2 
on the remote machine, hs2 can not find the path 
'/usr/hdp/3.1.4.0-315/hive/lib/...'

 

If it does't work, maybe you should add the add in HIVE_AUX_JARS_PATH and 
restart hiveserver2:

```

export HIVE_AUX_JARS_PATH='/hive/aux/jar/path/a.jar,/hive/aux/jar/path/b.jar'

```

> ClassNotFoundException:HiveHBaseTableInputFormat when tez include reduce 
> operation
> --
>
> Key: HIVE-22668
> URL: https://issues.apache.org/jira/browse/HIVE-22668
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, Hive
>Affects Versions: 3.1.0
>Reporter: Michael
>Priority: Blocker
>
> When I use beeline to execute script which the operation is insert data from 
> hive to hbase.
> If the operation include reduce step, this exception will appearance.
> I try to add jar in beeline like this:
> {code:java}
> ADD JAR /usr/hdp/3.1.4.0-315/hive/lib/hive-hbase-handler-3.1.0.3.1.4.0-315.jar
> ADD JAR /usr/hdp/3.1.4.0-315/hive/lib/guava-28.0-jre.jar
> ADD JAR /usr/hdp/3.1.4.0-315/hive/lib/zookeeper-3.4.6.3.1.4.0-315.jar{code}
> but this problem always exist. 
> {code:java}
> Serialization trace:
> inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
> tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)
> conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator)
> reducer (org.apache.hadoop.hive.ql.plan.ReduceWork)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:185)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:326)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:314)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:759)
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObjectOrNull(SerializationUtilities.java:203)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:132)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:180)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)
> at 
> 

[jira] [Work logged] (HIVE-23673) Maven Standard Directories for accumulo-handler

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23673?focusedWorklogId=458689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458689
 ]

ASF GitHub Bot logged work on HIVE-23673:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 14:54
Start Date: 14/Jul/20 14:54
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1088:
URL: https://github.com/apache/hive/pull/1088#issuecomment-658227935


   @nrg4878 Can you please take a look at this?  Trivial change but lots of 
files affected since it's just moving them into a new directory.  Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458689)
Time Spent: 1h 40m  (was: 1.5h)

> Maven Standard Directories for accumulo-handler
> ---
>
> Key: HIVE-23673
> URL: https://issues.apache.org/jira/browse/HIVE-23673
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=458686=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458686
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 14:51
Start Date: 14/Jul/20 14:51
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged pull request #1087:
URL: https://github.com/apache/hive/pull/1087


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458686)
Time Spent: 11h 50m  (was: 11h 40m)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23840) Use LLAP to get orc metadata

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23840?focusedWorklogId=458681=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458681
 ]

ASF GitHub Bot logged work on HIVE-23840:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 14:38
Start Date: 14/Jul/20 14:38
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #1251:
URL: https://github.com/apache/hive/pull/1251#discussion_r454393621



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -129,6 +137,16 @@
*/
   private SearchArgument deleteEventSarg = null;
 
+  /**
+   * Cachetag associated with the Split
+   */
+  private final CacheTag cacheTag;
+
+  /**
+   * Skip using Llap IO cache for checking delete_delta files if the 
configuration is not correct
+   */
+  private static boolean skipLlapCache = true;

Review comment:
   Initialized to true on purpose for now? If not, I don't see it getting 
set to false.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -1562,20 +1580,31 @@ public int compareTo(CompressedOwid other) {
   try {
 final Path[] deleteDeltaDirs = getDeleteDeltaDirsFromSplit(orcSplit);
 if (deleteDeltaDirs.length > 0) {
+  FileSystem fs = orcSplit.getPath().getFileSystem(conf);
+  AcidOutputFormat.Options orcSplitMinMaxWriteIds =
+  AcidUtils.parseBaseOrDeltaBucketFilename(orcSplit.getPath(), 
conf);
   int totalDeleteEventCount = 0;
   for (Path deleteDeltaDir : deleteDeltaDirs) {
-FileSystem fs = deleteDeltaDir.getFileSystem(conf);
+if (!isQualifiedDeleteDeltaForSplit(orcSplitMinMaxWriteIds, 
deleteDeltaDir)) {
+  continue;
+}
 Path[] deleteDeltaFiles = 
OrcRawRecordMerger.getDeltaFiles(deleteDeltaDir, bucket,
 new OrcRawRecordMerger.Options().isCompacting(false), null);
 for (Path deleteDeltaFile : deleteDeltaFiles) {
   try {
-/**
- * todo: we have OrcSplit.orcTail so we should be able to get 
stats from there
- */
-Reader deleteDeltaReader = 
OrcFile.createReader(deleteDeltaFile, OrcFile.readerOptions(conf));
-if (deleteDeltaReader.getNumberOfRows() <= 0) {
+ReaderData readerData = getOrcTail(deleteDeltaFile, conf, 
cacheTag);
+OrcTail orcTail = readerData.orcTail;
+if (orcTail.getFooter().getNumberOfRows() <= 0) {
   continue; // just a safe check to ensure that we are not 
reading empty delete files.
 }
+OrcRawRecordMerger.KeyInterval deleteKeyInterval = 
findDeleteMinMaxKeys(orcTail, deleteDeltaFile);
+if (!deleteKeyInterval.isIntersects(keyInterval)) {
+  // If there is no intersection between data and delete 
delta, do not read delete file
+  continue;
+}
+// Create the reader if we got the OrcTail from cache

Review comment:
   nit: comment could be more verbose, like: Reader can be reused if it was 
created before: only for non-LLAP cache cases, otherwise we need to create it 
here





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458681)
Time Spent: 0.5h  (was: 20m)

> Use LLAP to get orc metadata
> 
>
> Key: HIVE-23840
> URL: https://issues.apache.org/jira/browse/HIVE-23840
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> HIVE-23824 added the possibility to access ORC metadata. We can use this to 
> decide which delta files should be read, and which could be omitted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-16490) Hive should not use private HDFS APIs for encryption

2020-07-14 Thread Uma Maheswara Rao G (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-16490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157145#comment-17157145
 ] 

Uma Maheswara Rao G edited comment on HIVE-16490 at 7/14/20, 2:32 PM:
--

Any update on this?

{code}
public HdfsEncryptionShim(URI uri, Configuration conf) throws IOException {
  DistributedFileSystem dfs = (DistributedFileSystem)FileSystem.get(uri, 
conf);

  this.conf = conf;
  this.keyProvider = dfs.getClient().getKeyProvider();
  this.hdfsAdmin = new HdfsAdmin(uri, conf);
}
{code}
Looks like Hadoop23Shims.java is still using DFSClient api directly. HDFS-11687 
added getKeyProvider in HdfsAdmin class. Please use it and can void 
DistributedFileSystem casting as well.
Casting and DFS checks already present in HdfsAdmin constructor, so let's not 
do here. 
Thanks


was (Author: umamaheswararao):
Any update on this?

{code}
public HdfsEncryptionShim(URI uri, Configuration conf) throws IOException {
  DistributedFileSystem dfs = (DistributedFileSystem)FileSystem.get(uri, 
conf);

  this.conf = conf;
  this.keyProvider = dfs.getClient().getKeyProvider();
  this.hdfsAdmin = new HdfsAdmin(uri, conf);
}
{code}
Looks like Hadoop23Shims.java is still using DFSClient api directly. HDFS-11687 
added getKeyProvider in HdfsAdmin class. Please use it and can void 
DistributedFileSystem casting as well.
Casting and DFS checks already present in HdfsAdmin constructor, so we don't to 
do here. 
Thanks

> Hive should not use private HDFS APIs for encryption
> 
>
> Key: HIVE-16490
> URL: https://issues.apache.org/jira/browse/HIVE-16490
> Project: Hive
>  Issue Type: Improvement
>  Components: Encryption
>Affects Versions: 2.2.0
>Reporter: Andrew Wang
>Assignee: Naveen Gangam
>Priority: Critical
>
> When compiling against bleeding edge versions of Hive and Hadoop, we 
> discovered that HIVE-16047 references a private HDFS API, DFSClient, to get 
> at various encryption related information. The private API was recently 
> changed by HADOOP-14104, which broke Hive compilation.
> It'd be better to instead use publicly supported APIs. HDFS-11687 has been 
> filed to add whatever encryption APIs are needed by Hive. This JIRA is to 
> move Hive over to these new APIs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23363) Upgrade DataNucleus dependency to 5.2

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23363?focusedWorklogId=458668=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458668
 ]

ASF GitHub Bot logged work on HIVE-23363:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 14:28
Start Date: 14/Jul/20 14:28
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #1118:
URL: https://github.com/apache/hive/pull/1118


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458668)
Time Spent: 2h 40m  (was: 2.5h)

> Upgrade DataNucleus dependency to 5.2
> -
>
> Key: HIVE-23363
> URL: https://issues.apache.org/jira/browse/HIVE-23363
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Zoltan Chovan
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23363.2.patch, HIVE-23363.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been 
> retired:
> [http://www.datanucleus.org/documentation/products.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23363) Upgrade DataNucleus dependency to 5.2

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23363?focusedWorklogId=458669=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458669
 ]

ASF GitHub Bot logged work on HIVE-23363:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 14:28
Start Date: 14/Jul/20 14:28
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1118:
URL: https://github.com/apache/hive/pull/1118#issuecomment-658213248


   Complete and merged through manual patch process 
   
   https://issues.apache.org/jira/browse/HIVE-23363



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458669)
Time Spent: 2h 50m  (was: 2h 40m)

> Upgrade DataNucleus dependency to 5.2
> -
>
> Key: HIVE-23363
> URL: https://issues.apache.org/jira/browse/HIVE-23363
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Zoltan Chovan
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23363.2.patch, HIVE-23363.patch
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been 
> retired:
> [http://www.datanucleus.org/documentation/products.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23793) Review of QueryInfo Class

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23793?focusedWorklogId=458656=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458656
 ]

ASF GitHub Bot logged work on HIVE-23793:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 14:20
Start Date: 14/Jul/20 14:20
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1197:
URL: https://github.com/apache/hive/pull/1197#discussion_r454391934



##
File path: ql/src/java/org/apache/hadoop/hive/ql/QueryInfo.java
##
@@ -17,48 +17,58 @@
  */
 package org.apache.hadoop.hive.ql;
 
+import java.util.concurrent.TimeUnit;
+
 /**
- * The class is synchronized, as WebUI may access information about a running 
query.
+ * Provide WebUI information about a running query. Class is thread safe so 
that
+ * multiple browser sessions can access the data simultaneously.
  */
 public class QueryInfo {
 
   private final String userName;
   private final String executionEngine;
-  private final long beginTime;
   private final String operationId;
-  private Long runtime;  // tracks only running portion of the query.
 
-  private Long endTime;
   private String state;
   private QueryDisplay queryDisplay;
 
+  /*
+   * Times are stored internally with nanosecond precision.
+   */
+  private final long beginTime;
+  private long runtime;

Review comment:
   Thanks for the review.
   
   `synchronized` and `volatile` are not interchangeable.  This class doesn't 
need synchronization because all of its actions are so trivial, there is no 
issues with them happening concurrently: just assigning or reading a variable 
will not cause an issue with multiple threads.  The classic `volatile` case is 
when a thread is spinning, waiting on some variable to change values.  Without 
`volatile` the thread may cache the value and never stop spinning.  There is no 
such use case here, it is for informational purposes only.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458656)
Time Spent: 1h 50m  (was: 1h 40m)

> Review of QueryInfo Class
> -
>
> Key: HIVE-23793
> URL: https://issues.apache.org/jira/browse/HIVE-23793
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23840) Use LLAP to get orc metadata

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23840?focusedWorklogId=458652=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458652
 ]

ASF GitHub Bot logged work on HIVE-23840:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 14:18
Start Date: 14/Jul/20 14:18
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #1251:
URL: https://github.com/apache/hive/pull/1251#discussion_r454390429



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -232,6 +250,17 @@ private VectorizedOrcAcidRowBatchReader(JobConf conf, 
OrcSplit orcSplit, Reporte
 
 this.syntheticProps = orcSplit.getSyntheticAcidProps();
 
+if (LlapHiveUtils.isLlapMode(conf) && LlapProxy.isDaemon()
+&& HiveConf.getBoolVar(conf, ConfVars.LLAP_TRACK_CACHE_USAGE))
+{
+  MapWork mapWork = LlapHiveUtils.findMapWork(conf);

Review comment:
   We could spare the deserialization of MapWork from JobConf here, if we 
pass the MapWork instance already present in LlapRecordReader to 
VectorizedOrcAcidRowBatchReader ctor. (Downside is that in turn we would need 
to adjust the other ctor's of VectorizedOrcAcidRowBatchReader too)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458652)
Time Spent: 20m  (was: 10m)

> Use LLAP to get orc metadata
> 
>
> Key: HIVE-23840
> URL: https://issues.apache.org/jira/browse/HIVE-23840
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-23824 added the possibility to access ORC metadata. We can use this to 
> decide which delta files should be read, and which could be omitted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23793) Review of QueryInfo Class

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23793?focusedWorklogId=458649=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458649
 ]

ASF GitHub Bot logged work on HIVE-23793:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 14:17
Start Date: 14/Jul/20 14:17
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1197:
URL: https://github.com/apache/hive/pull/1197


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458649)
Time Spent: 1h 40m  (was: 1.5h)

> Review of QueryInfo Class
> -
>
> Key: HIVE-23793
> URL: https://issues.apache.org/jira/browse/HIVE-23793
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23793) Review of QueryInfo Class

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23793?focusedWorklogId=458647=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458647
 ]

ASF GitHub Bot logged work on HIVE-23793:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 14:16
Start Date: 14/Jul/20 14:16
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #1197:
URL: https://github.com/apache/hive/pull/1197


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458647)
Time Spent: 1.5h  (was: 1h 20m)

> Review of QueryInfo Class
> -
>
> Key: HIVE-23793
> URL: https://issues.apache.org/jira/browse/HIVE-23793
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23832) Compaction cleaner fails to clean up deltas when using blocking compaction

2020-07-14 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-23832.
---
Resolution: Fixed

> Compaction cleaner fails to clean up deltas when using blocking compaction
> --
>
> Key: HIVE-23832
> URL: https://issues.apache.org/jira/browse/HIVE-23832
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {code}
> CREATE TABLE default.compcleanup (
>cda_id int,
>cda_run_id varchar(255),
>cda_load_tstimestamp,
>global_party_idstring,
>group_id   string)
> COMMENT 'gp_2_gr'
> PARTITIONED BY (
>cda_date   int,
>cda_job_name   varchar(12))
> STORED AS ORC;
> -- cda_date=20200601/cda_job_name=core_base
> INSERT INTO default.compcleanup VALUES 
> (1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base');
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1;
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, 
> cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT;
> {code}
> When using blocking compaction Cleaner skips processing due to the presence 
> of open txn (by `ALTER TABLE`) below Compactor's one.
> {code}
> AcidUtils - getChildState() ignoring([]) 
> pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035
> {code}
> AcidUtils.processBaseDir
> {code}
> if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, 
> validTxnList)) {
>return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23832) Compaction cleaner fails to clean up deltas when using blocking compaction

2020-07-14 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157369#comment-17157369
 ] 

Denys Kuzmenko commented on HIVE-23832:
---

Pushed to master.
Thanks you [~lpinter] for review!

> Compaction cleaner fails to clean up deltas when using blocking compaction
> --
>
> Key: HIVE-23832
> URL: https://issues.apache.org/jira/browse/HIVE-23832
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {code}
> CREATE TABLE default.compcleanup (
>cda_id int,
>cda_run_id varchar(255),
>cda_load_tstimestamp,
>global_party_idstring,
>group_id   string)
> COMMENT 'gp_2_gr'
> PARTITIONED BY (
>cda_date   int,
>cda_job_name   varchar(12))
> STORED AS ORC;
> -- cda_date=20200601/cda_job_name=core_base
> INSERT INTO default.compcleanup VALUES 
> (1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base');
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1;
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, 
> cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT;
> {code}
> When using blocking compaction Cleaner skips processing due to the presence 
> of open txn (by `ALTER TABLE`) below Compactor's one.
> {code}
> AcidUtils - getChildState() ignoring([]) 
> pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035
> {code}
> AcidUtils.processBaseDir
> {code}
> if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, 
> validTxnList)) {
>return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23843) Improve key evictions in VectorGroupByOperator

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23843?focusedWorklogId=458606=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458606
 ]

ASF GitHub Bot logged work on HIVE-23843:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 13:25
Start Date: 14/Jul/20 13:25
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1250:
URL: https://github.com/apache/hive/pull/1250#discussion_r454349711



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##
@@ -358,6 +360,34 @@ public void close(boolean aborted) throws HiveException {
  */
 private long numRowsCompareHashAggr;
 
+/**
+ * To track current memory usage.
+ */
+private long currMemUsed;
+
+/**
+ * Whether to make use of LRUCache for map aggr buffers or not.
+ */
+private boolean lruCache;
+
+class LRUCache extends LinkedHashMap {
+
+  @Override
+  protected boolean removeEldestEntry(Map.Entry eldest) {
+if (currMemUsed > maxHashTblMemory || size() > maxHtEntries || 
gcCanary.get() == null) {

Review comment:
   this method seems to have been polluted by the "isFull" logic - which is 
unexpected with this method name
   the "isFull" should be moved outside - and remove should only called when 
the condition is met

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4065,6 +4065,9 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 
HIVE_VECTORIZATION_GROUPBY_MAXENTRIES("hive.vectorized.groupby.maxentries", 
100,
 "Max number of entries in the vector group by aggregation hashtables. 
\n" +
 "Exceeding this will trigger a flush irrelevant of memory pressure 
condition."),
+HIVE_VECTORIZATION_GROUPBY_ENABLE_LRU_FOR_AGGR(

Review comment:
   instead of introducing a boolean toggle; add a mode switch 
(default/lru/etc)

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##
@@ -420,35 +460,56 @@ public void doProcessBatch(VectorizedRowBatch batch, 
boolean isFirstGroupingSet,
   //Flush if memory limits were reached
   // We keep flushing until the memory is under threshold
   int preFlushEntriesCount = numEntriesHashTable;
-  while (shouldFlush(batch)) {
-flush(false);
 
-if(gcCanary.get() == null) {
-  gcCanaryFlushes++;
-  gcCanary = new SoftReference(new Object());
-}
+  if (!lruCache) {
+while (shouldFlush(batch)) {
+  flush(false);
+
+  if(gcCanary.get() == null) {
+gcCanaryFlushes++;
+gcCanary = new SoftReference(new Object());
+  }
 
-//Validate that some progress is being made
-if (!(numEntriesHashTable < preFlushEntriesCount)) {
-  if (LOG.isDebugEnabled()) {
-LOG.debug(String.format("Flush did not progress: %d entries 
before, %d entries after",
-preFlushEntriesCount,
-numEntriesHashTable));
+  //Validate that some progress is being made
+  if (!(numEntriesHashTable < preFlushEntriesCount)) {
+if (LOG.isDebugEnabled()) {
+  LOG.debug(String.format("Flush did not progress: %d entries 
before, %d entries after",
+  preFlushEntriesCount,
+  numEntriesHashTable));
+}
+break;
   }
-  break;
+  preFlushEntriesCount = numEntriesHashTable;
 }
-preFlushEntriesCount = numEntriesHashTable;
+  } else {
+checkAndFlushLRU(batch);
   }
 
   if (sumBatchSize == 0 && 0 != batch.size) {
 // Sample the first batch processed for variable sizes.
 updateAvgVariableSize(batch);
+currMemUsed = numEntriesHashTable * (fixedHashEntrySize + 
avgVariableSize);

Review comment:
   this is strange...there is a `currMemUsed` field an there is also a 
`currMemUsed` local variable in `shouldFlush` - they might cause things to me 
more interesting :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458606)
Time Spent: 50m  (was: 40m)

> Improve key evictions in VectorGroupByOperator
> --
>
> Key: HIVE-23843
> URL: https://issues.apache.org/jira/browse/HIVE-23843
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh 

[jira] [Work logged] (HIVE-23843) Improve key evictions in VectorGroupByOperator

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23843?focusedWorklogId=458601=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458601
 ]

ASF GitHub Bot logged work on HIVE-23843:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 13:06
Start Date: 14/Jul/20 13:06
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1250:
URL: https://github.com/apache/hive/pull/1250#discussion_r454341433



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##
@@ -358,6 +360,34 @@ public void close(boolean aborted) throws HiveException {
  */
 private long numRowsCompareHashAggr;
 
+/**
+ * To track current memory usage.
+ */
+private long currMemUsed;
+
+/**
+ * Whether to make use of LRUCache for map aggr buffers or not.
+ */
+private boolean lruCache;
+
+class LRUCache extends LinkedHashMap {

Review comment:
   this doesn't look like an `LRU` cache to me...it removes the eldest 
entry; in case of `lru` you should move the accessed entry to the head of the 
list
   
   I think guava has some lru cache implementation - might worth a look





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458601)
Time Spent: 40m  (was: 0.5h)

> Improve key evictions in VectorGroupByOperator
> --
>
> Key: HIVE-23843
> URL: https://issues.apache.org/jira/browse/HIVE-23843
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Keys in {{mapKeysAggregationBuffers}} are evicted in random order. Tasks also 
> get into GC issues when multiple keys are involved in groupbys. It would be 
> good to provide an option to have LRU based eviction for 
> mapKeysAggregationBuffers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-12331) Remove hive.enforce.bucketing & hive.enforce.sorting configs

2020-07-14 Thread weitianpei (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-12331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157272#comment-17157272
 ] 

weitianpei commented on HIVE-12331:
---

why the new version removed the parameter hive.enforce.sorting & 
hive.enforce.bucketing ? Would it have a bad influence on inserting data to a 
table. 

> Remove hive.enforce.bucketing & hive.enforce.sorting configs
> 
>
> Key: HIVE-12331
> URL: https://issues.apache.org/jira/browse/HIVE-12331
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HIVE-12331.1.patch, HIVE-12331.patch
>
>
> If table is created as bucketed and/or sorted and this config is set to 
> false, you will insert data in wrong buckets and/or sort order and then if 
> you use these tables subsequently in BMJ or SMBJ you will get wrong results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23832) Compaction cleaner fails to clean up deltas when using blocking compaction

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23832?focusedWorklogId=458547=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458547
 ]

ASF GitHub Bot logged work on HIVE-23832:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 10:12
Start Date: 14/Jul/20 10:12
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged pull request #1243:
URL: https://github.com/apache/hive/pull/1243


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458547)
Time Spent: 1h 10m  (was: 1h)

> Compaction cleaner fails to clean up deltas when using blocking compaction
> --
>
> Key: HIVE-23832
> URL: https://issues.apache.org/jira/browse/HIVE-23832
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {code}
> CREATE TABLE default.compcleanup (
>cda_id int,
>cda_run_id varchar(255),
>cda_load_tstimestamp,
>global_party_idstring,
>group_id   string)
> COMMENT 'gp_2_gr'
> PARTITIONED BY (
>cda_date   int,
>cda_job_name   varchar(12))
> STORED AS ORC;
> -- cda_date=20200601/cda_job_name=core_base
> INSERT INTO default.compcleanup VALUES 
> (1,'cda_run_id',NULL,'global_party_id','group_id',20200601,'core_base');
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> UPDATE default.compcleanup SET cda_id = 2 WHERE cda_id = 1;
> SELECT * FROM default.compcleanup where cda_date = 20200601  and cda_job_name 
> = 'core_base';
> ALTER TABLE default.compcleanup PARTITION (cda_date=20200601, 
> cda_job_name='core_base') COMPACT 'MAJOR' AND WAIT;
> {code}
> When using blocking compaction Cleaner skips processing due to the presence 
> of open txn (by `ALTER TABLE`) below Compactor's one.
> {code}
> AcidUtils - getChildState() ignoring([]) 
> pfile:/Users/denyskuzmenko/data/cdh/hive/warehouse/compcleanup5/cda_date=110601/cda_job_name=core_base/base_002_v035
> {code}
> AcidUtils.processBaseDir
> {code}
> if (!isDirUsable(baseDir, parsedBase.getVisibilityTxnId(), aborted, 
> validTxnList)) {
>return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23840) Use LLAP to get orc metadata

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23840:
--
Labels: pull-request-available  (was: )

> Use LLAP to get orc metadata
> 
>
> Key: HIVE-23840
> URL: https://issues.apache.org/jira/browse/HIVE-23840
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-23824 added the possibility to access ORC metadata. We can use this to 
> decide which delta files should be read, and which could be omitted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23840) Use LLAP to get orc metadata

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23840?focusedWorklogId=458543=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458543
 ]

ASF GitHub Bot logged work on HIVE-23840:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 09:48
Start Date: 14/Jul/20 09:48
Worklog Time Spent: 10m 
  Work Description: pvary opened a new pull request #1251:
URL: https://github.com/apache/hive/pull/1251


   Started to use new LLAP getOrcTailFromCache
   Refactored stuff to use the tail instead of the reader related things
   Added some unit tests for the new smaller components



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458543)
Remaining Estimate: 0h
Time Spent: 10m

> Use LLAP to get orc metadata
> 
>
> Key: HIVE-23840
> URL: https://issues.apache.org/jira/browse/HIVE-23840
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-23824 added the possibility to access ORC metadata. We can use this to 
> decide which delta files should be read, and which could be omitted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23844) use fastparquet generate parquet format file, imported into hive, query error

2020-07-14 Thread Henry Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Lu updated HIVE-23844:

Summary: use fastparquet generate parquet format file,  imported into hive, 
 query error  (was: use fastparquet generate parquet format file,  import to 
hive,  query error)

> use fastparquet generate parquet format file,  imported into hive,  query 
> error
> ---
>
> Key: HIVE-23844
> URL: https://issues.apache.org/jira/browse/HIVE-23844
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.1
> Environment: OS:  CentOS Linux release 7.6.1810
> JDK: 1.8.0_181
> hive:  hive-common-2.1.1-cdh6.2.0
> fastparquet: 0.4.0
>Reporter: Henry Lu
>Priority: Major
>  Labels: fastparquet, hive, python
> Attachments: 1912076_20200330_000334(102_4).parquet
>
>
> i use fastparquet generate parquet format file(Please check the attachment 
> !),   message as follows:
> message schema
> {
> optional double timestamps;
>  optional int32 ESC_BrakePressure (UINT_8);
>  optional int32 ESC_BrakePressureValid (UINT_8);
>  optional int32 ESC_EBDWork (UINT_8);
>  optional int32 ESC_ABSWorkLable (UINT_8);
>  optional int32 ESC_EBDAlarm (UINT_8);
>  optional int32 ESC_VehSpdValidFlag (UINT_8);
>  optional int32 ESC_ABSAlarmSignal (UINT_8);
>  optional float ESC_VehSpd;
>  optional float ESC_FrontLeftWHeelSpd;
>  optional int32 ESC_FLWHeelSpdFaultSignal (UINT_8);
>  optional float ESC_FrontRightWHeelSpd;
>  optional int32 ESC_FRWHeelSpdFaultSignal (UINT_8);
>  optional float ESC_RearLeftWheelSpd;
>  optional int32 ESC_RLWHeelSpdFaultSignal (UINT_8);
>  optional float ESC_RearRightWheelSpd;
>  optional int32 ESC_RRWHeelSpdFaultSignal (UINT_8);
>  optional int32 ESC_Longitudinal_Acceleration_flag (UINT_8);
>  optional float ESC_Longitudinal_Acceleration;
>  optional int32 ESC_ESCOFF (UINT_8);
>  optional int32 ESC_ESCWorkStatus (UINT_8);
>  optional int32 ESC_ESCAlarmSig (UINT_8);
>  optional int32 ESC_TCSCFActive (UINT_8);
>  optional int32 ESC_ReqIncreaseTorqueFlag (UINT_8);
>  optional int32 ESC_ReqDecreaseTorqueFlag (UINT_8);
>  optional int32 ESC_ReqIncreaseTorque (UINT_8);
>  optional int32 ESC_ReqDecreaseTorque (UINT_8);
>  optional int32 ESC_ESCValidity (UINT_8);
>  optional int32 ESC_RollingCount_ESC3 (UINT_8);
>  optional int32 ESC_CICkSum_ESC3 (UINT_8);   
> }
> creator: fastparquet-python version 1.0.0 (build 111)
>  extra: pandas = {"column_indexes": [
> {"field_name": null, "metadata": null, "name": null, "numpy_type": "object", 
> "pandas_type": "mixed-integer"}
> ], "columns": [
> {"field_name": "timestamps", "metadata": null, "name": "timestamps", 
> "numpy_type": "float64", "pandas_type": "float64"}
> , 
> {"field_name": "ESC_BrakePressure", "metadata": null, "name": 
> "ESC_BrakePressure", "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_BrakePressureValid", "metadata": null, "name": 
> "ESC_BrakePressureValid", "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_EBDWork", "metadata": null, "name": "ESC_EBDWork", 
> "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_ABSWorkLable", "metadata": null, "name": 
> "ESC_ABSWorkLable", "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_EBDAlarm", "metadata": null, "name": "ESC_EBDAlarm", 
> "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_VehSpdValidFlag", "metadata": null, "name": 
> "ESC_VehSpdValidFlag", "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_ABSAlarmSignal", "metadata": null, "name": 
> "ESC_ABSAlarmSignal", "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_VehSpd", "metadata": null, "name": "ESC_VehSpd", 
> "numpy_type": "float32", "pandas_type": "float32"}
> , 
> {"field_name": "ESC_FrontLeftWHeelSpd", "metadata": null, "name": 
> "ESC_FrontLeftWHeelSpd", "numpy_type": "float32", "pandas_type": "float32"}
> , 
> {"field_name": "ESC_FLWHeelSpdFaultSignal", "metadata": null, "name": 
> "ESC_FLWHeelSpdFaultSignal", "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_FrontRightWHeelSpd", "metadata": null, "name": 
> "ESC_FrontRightWHeelSpd", "numpy_type": "float32", "pandas_type": "float32"}
> , 
> {"field_name": "ESC_FRWHeelSpdFaultSignal", "metadata": null, "name": 
> "ESC_FRWHeelSpdFaultSignal", "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_RearLeftWheelSpd", "metadata": null, "name": 
> "ESC_RearLeftWheelSpd", "numpy_type": "float32", "pandas_type": "float32"}
> , 
> {"field_name": "ESC_RLWHeelSpdFaultSignal", "metadata": null, "name": 
> "ESC_RLWHeelSpdFaultSignal", "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": 

[jira] [Updated] (HIVE-23844) use fastparquet generate parquet format file, import to hive, query error

2020-07-14 Thread Henry Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Lu updated HIVE-23844:

Summary: use fastparquet generate parquet format file,  import to hive,  
query error  (was: use fastparquet generate parquet format file,  import hive,  
query error)

> use fastparquet generate parquet format file,  import to hive,  query error
> ---
>
> Key: HIVE-23844
> URL: https://issues.apache.org/jira/browse/HIVE-23844
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.1
> Environment: OS:  CentOS Linux release 7.6.1810
> JDK: 1.8.0_181
> hive:  hive-common-2.1.1-cdh6.2.0
> fastparquet: 0.4.0
>Reporter: Henry Lu
>Priority: Major
>  Labels: fastparquet, hive, python
> Attachments: 1912076_20200330_000334(102_4).parquet
>
>
> i use fastparquet generate parquet format file(Please check the attachment 
> !),   message as follows:
> message schema
> {
> optional double timestamps;
>  optional int32 ESC_BrakePressure (UINT_8);
>  optional int32 ESC_BrakePressureValid (UINT_8);
>  optional int32 ESC_EBDWork (UINT_8);
>  optional int32 ESC_ABSWorkLable (UINT_8);
>  optional int32 ESC_EBDAlarm (UINT_8);
>  optional int32 ESC_VehSpdValidFlag (UINT_8);
>  optional int32 ESC_ABSAlarmSignal (UINT_8);
>  optional float ESC_VehSpd;
>  optional float ESC_FrontLeftWHeelSpd;
>  optional int32 ESC_FLWHeelSpdFaultSignal (UINT_8);
>  optional float ESC_FrontRightWHeelSpd;
>  optional int32 ESC_FRWHeelSpdFaultSignal (UINT_8);
>  optional float ESC_RearLeftWheelSpd;
>  optional int32 ESC_RLWHeelSpdFaultSignal (UINT_8);
>  optional float ESC_RearRightWheelSpd;
>  optional int32 ESC_RRWHeelSpdFaultSignal (UINT_8);
>  optional int32 ESC_Longitudinal_Acceleration_flag (UINT_8);
>  optional float ESC_Longitudinal_Acceleration;
>  optional int32 ESC_ESCOFF (UINT_8);
>  optional int32 ESC_ESCWorkStatus (UINT_8);
>  optional int32 ESC_ESCAlarmSig (UINT_8);
>  optional int32 ESC_TCSCFActive (UINT_8);
>  optional int32 ESC_ReqIncreaseTorqueFlag (UINT_8);
>  optional int32 ESC_ReqDecreaseTorqueFlag (UINT_8);
>  optional int32 ESC_ReqIncreaseTorque (UINT_8);
>  optional int32 ESC_ReqDecreaseTorque (UINT_8);
>  optional int32 ESC_ESCValidity (UINT_8);
>  optional int32 ESC_RollingCount_ESC3 (UINT_8);
>  optional int32 ESC_CICkSum_ESC3 (UINT_8);   
> }
> creator: fastparquet-python version 1.0.0 (build 111)
>  extra: pandas = {"column_indexes": [
> {"field_name": null, "metadata": null, "name": null, "numpy_type": "object", 
> "pandas_type": "mixed-integer"}
> ], "columns": [
> {"field_name": "timestamps", "metadata": null, "name": "timestamps", 
> "numpy_type": "float64", "pandas_type": "float64"}
> , 
> {"field_name": "ESC_BrakePressure", "metadata": null, "name": 
> "ESC_BrakePressure", "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_BrakePressureValid", "metadata": null, "name": 
> "ESC_BrakePressureValid", "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_EBDWork", "metadata": null, "name": "ESC_EBDWork", 
> "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_ABSWorkLable", "metadata": null, "name": 
> "ESC_ABSWorkLable", "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_EBDAlarm", "metadata": null, "name": "ESC_EBDAlarm", 
> "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_VehSpdValidFlag", "metadata": null, "name": 
> "ESC_VehSpdValidFlag", "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_ABSAlarmSignal", "metadata": null, "name": 
> "ESC_ABSAlarmSignal", "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_VehSpd", "metadata": null, "name": "ESC_VehSpd", 
> "numpy_type": "float32", "pandas_type": "float32"}
> , 
> {"field_name": "ESC_FrontLeftWHeelSpd", "metadata": null, "name": 
> "ESC_FrontLeftWHeelSpd", "numpy_type": "float32", "pandas_type": "float32"}
> , 
> {"field_name": "ESC_FLWHeelSpdFaultSignal", "metadata": null, "name": 
> "ESC_FLWHeelSpdFaultSignal", "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_FrontRightWHeelSpd", "metadata": null, "name": 
> "ESC_FrontRightWHeelSpd", "numpy_type": "float32", "pandas_type": "float32"}
> , 
> {"field_name": "ESC_FRWHeelSpdFaultSignal", "metadata": null, "name": 
> "ESC_FRWHeelSpdFaultSignal", "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_RearLeftWheelSpd", "metadata": null, "name": 
> "ESC_RearLeftWheelSpd", "numpy_type": "float32", "pandas_type": "float32"}
> , 
> {"field_name": "ESC_RLWHeelSpdFaultSignal", "metadata": null, "name": 
> "ESC_RLWHeelSpdFaultSignal", "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_RearRightWheelSpd", 

[jira] [Updated] (HIVE-23844) use fastparquet generate parquet format file, import hive, query error

2020-07-14 Thread Henry Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Lu updated HIVE-23844:

Labels: fastparquet hive python  (was: )

> use fastparquet generate parquet format file,  import hive,  query error
> 
>
> Key: HIVE-23844
> URL: https://issues.apache.org/jira/browse/HIVE-23844
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.1
> Environment: OS:  CentOS Linux release 7.6.1810
> JDK: 1.8.0_181
> hive:  hive-common-2.1.1-cdh6.2.0
> fastparquet: 0.4.0
>Reporter: Henry Lu
>Priority: Major
>  Labels: fastparquet, hive, python
> Attachments: 1912076_20200330_000334(102_4).parquet
>
>
> i use fastparquet generate parquet format file(Please check the attachment 
> !),   message as follows:
> message schema
> {
> optional double timestamps;
>  optional int32 ESC_BrakePressure (UINT_8);
>  optional int32 ESC_BrakePressureValid (UINT_8);
>  optional int32 ESC_EBDWork (UINT_8);
>  optional int32 ESC_ABSWorkLable (UINT_8);
>  optional int32 ESC_EBDAlarm (UINT_8);
>  optional int32 ESC_VehSpdValidFlag (UINT_8);
>  optional int32 ESC_ABSAlarmSignal (UINT_8);
>  optional float ESC_VehSpd;
>  optional float ESC_FrontLeftWHeelSpd;
>  optional int32 ESC_FLWHeelSpdFaultSignal (UINT_8);
>  optional float ESC_FrontRightWHeelSpd;
>  optional int32 ESC_FRWHeelSpdFaultSignal (UINT_8);
>  optional float ESC_RearLeftWheelSpd;
>  optional int32 ESC_RLWHeelSpdFaultSignal (UINT_8);
>  optional float ESC_RearRightWheelSpd;
>  optional int32 ESC_RRWHeelSpdFaultSignal (UINT_8);
>  optional int32 ESC_Longitudinal_Acceleration_flag (UINT_8);
>  optional float ESC_Longitudinal_Acceleration;
>  optional int32 ESC_ESCOFF (UINT_8);
>  optional int32 ESC_ESCWorkStatus (UINT_8);
>  optional int32 ESC_ESCAlarmSig (UINT_8);
>  optional int32 ESC_TCSCFActive (UINT_8);
>  optional int32 ESC_ReqIncreaseTorqueFlag (UINT_8);
>  optional int32 ESC_ReqDecreaseTorqueFlag (UINT_8);
>  optional int32 ESC_ReqIncreaseTorque (UINT_8);
>  optional int32 ESC_ReqDecreaseTorque (UINT_8);
>  optional int32 ESC_ESCValidity (UINT_8);
>  optional int32 ESC_RollingCount_ESC3 (UINT_8);
>  optional int32 ESC_CICkSum_ESC3 (UINT_8);   
> }
> creator: fastparquet-python version 1.0.0 (build 111)
>  extra: pandas = {"column_indexes": [
> {"field_name": null, "metadata": null, "name": null, "numpy_type": "object", 
> "pandas_type": "mixed-integer"}
> ], "columns": [
> {"field_name": "timestamps", "metadata": null, "name": "timestamps", 
> "numpy_type": "float64", "pandas_type": "float64"}
> , 
> {"field_name": "ESC_BrakePressure", "metadata": null, "name": 
> "ESC_BrakePressure", "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_BrakePressureValid", "metadata": null, "name": 
> "ESC_BrakePressureValid", "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_EBDWork", "metadata": null, "name": "ESC_EBDWork", 
> "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_ABSWorkLable", "metadata": null, "name": 
> "ESC_ABSWorkLable", "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_EBDAlarm", "metadata": null, "name": "ESC_EBDAlarm", 
> "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_VehSpdValidFlag", "metadata": null, "name": 
> "ESC_VehSpdValidFlag", "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_ABSAlarmSignal", "metadata": null, "name": 
> "ESC_ABSAlarmSignal", "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_VehSpd", "metadata": null, "name": "ESC_VehSpd", 
> "numpy_type": "float32", "pandas_type": "float32"}
> , 
> {"field_name": "ESC_FrontLeftWHeelSpd", "metadata": null, "name": 
> "ESC_FrontLeftWHeelSpd", "numpy_type": "float32", "pandas_type": "float32"}
> , 
> {"field_name": "ESC_FLWHeelSpdFaultSignal", "metadata": null, "name": 
> "ESC_FLWHeelSpdFaultSignal", "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_FrontRightWHeelSpd", "metadata": null, "name": 
> "ESC_FrontRightWHeelSpd", "numpy_type": "float32", "pandas_type": "float32"}
> , 
> {"field_name": "ESC_FRWHeelSpdFaultSignal", "metadata": null, "name": 
> "ESC_FRWHeelSpdFaultSignal", "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_RearLeftWheelSpd", "metadata": null, "name": 
> "ESC_RearLeftWheelSpd", "numpy_type": "float32", "pandas_type": "float32"}
> , 
> {"field_name": "ESC_RLWHeelSpdFaultSignal", "metadata": null, "name": 
> "ESC_RLWHeelSpdFaultSignal", "numpy_type": "uint8", "pandas_type": "uint8"}
> , 
> {"field_name": "ESC_RearRightWheelSpd", "metadata": null, "name": 
> "ESC_RearRightWheelSpd", "numpy_type": "float32", "pandas_type": "float32"}
> , 
> {"field_name": 

[jira] [Updated] (HIVE-23844) use fastparquet generate parquet format file, import hive, query error

2020-07-14 Thread Henry Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Lu updated HIVE-23844:

Description: 
i use fastparquet generate parquet format file(Please check the attachment !),  
 message as follows:

message schema

{

optional double timestamps;
 optional int32 ESC_BrakePressure (UINT_8);
 optional int32 ESC_BrakePressureValid (UINT_8);
 optional int32 ESC_EBDWork (UINT_8);
 optional int32 ESC_ABSWorkLable (UINT_8);
 optional int32 ESC_EBDAlarm (UINT_8);
 optional int32 ESC_VehSpdValidFlag (UINT_8);
 optional int32 ESC_ABSAlarmSignal (UINT_8);
 optional float ESC_VehSpd;
 optional float ESC_FrontLeftWHeelSpd;
 optional int32 ESC_FLWHeelSpdFaultSignal (UINT_8);
 optional float ESC_FrontRightWHeelSpd;
 optional int32 ESC_FRWHeelSpdFaultSignal (UINT_8);
 optional float ESC_RearLeftWheelSpd;
 optional int32 ESC_RLWHeelSpdFaultSignal (UINT_8);
 optional float ESC_RearRightWheelSpd;
 optional int32 ESC_RRWHeelSpdFaultSignal (UINT_8);
 optional int32 ESC_Longitudinal_Acceleration_flag (UINT_8);
 optional float ESC_Longitudinal_Acceleration;
 optional int32 ESC_ESCOFF (UINT_8);
 optional int32 ESC_ESCWorkStatus (UINT_8);
 optional int32 ESC_ESCAlarmSig (UINT_8);
 optional int32 ESC_TCSCFActive (UINT_8);
 optional int32 ESC_ReqIncreaseTorqueFlag (UINT_8);
 optional int32 ESC_ReqDecreaseTorqueFlag (UINT_8);
 optional int32 ESC_ReqIncreaseTorque (UINT_8);
 optional int32 ESC_ReqDecreaseTorque (UINT_8);
 optional int32 ESC_ESCValidity (UINT_8);
 optional int32 ESC_RollingCount_ESC3 (UINT_8);
 optional int32 ESC_CICkSum_ESC3 (UINT_8);   

}

creator: fastparquet-python version 1.0.0 (build 111)
 extra: pandas = {"column_indexes": [

{"field_name": null, "metadata": null, "name": null, "numpy_type": "object", 
"pandas_type": "mixed-integer"}

], "columns": [

{"field_name": "timestamps", "metadata": null, "name": "timestamps", 
"numpy_type": "float64", "pandas_type": "float64"}

, 

{"field_name": "ESC_BrakePressure", "metadata": null, "name": 
"ESC_BrakePressure", "numpy_type": "uint8", "pandas_type": "uint8"}

, 

{"field_name": "ESC_BrakePressureValid", "metadata": null, "name": 
"ESC_BrakePressureValid", "numpy_type": "uint8", "pandas_type": "uint8"}

, 

{"field_name": "ESC_EBDWork", "metadata": null, "name": "ESC_EBDWork", 
"numpy_type": "uint8", "pandas_type": "uint8"}

, 

{"field_name": "ESC_ABSWorkLable", "metadata": null, "name": 
"ESC_ABSWorkLable", "numpy_type": "uint8", "pandas_type": "uint8"}

, 

{"field_name": "ESC_EBDAlarm", "metadata": null, "name": "ESC_EBDAlarm", 
"numpy_type": "uint8", "pandas_type": "uint8"}

, 

{"field_name": "ESC_VehSpdValidFlag", "metadata": null, "name": 
"ESC_VehSpdValidFlag", "numpy_type": "uint8", "pandas_type": "uint8"}

, 

{"field_name": "ESC_ABSAlarmSignal", "metadata": null, "name": 
"ESC_ABSAlarmSignal", "numpy_type": "uint8", "pandas_type": "uint8"}

, 

{"field_name": "ESC_VehSpd", "metadata": null, "name": "ESC_VehSpd", 
"numpy_type": "float32", "pandas_type": "float32"}

, 

{"field_name": "ESC_FrontLeftWHeelSpd", "metadata": null, "name": 
"ESC_FrontLeftWHeelSpd", "numpy_type": "float32", "pandas_type": "float32"}

, 

{"field_name": "ESC_FLWHeelSpdFaultSignal", "metadata": null, "name": 
"ESC_FLWHeelSpdFaultSignal", "numpy_type": "uint8", "pandas_type": "uint8"}

, 

{"field_name": "ESC_FrontRightWHeelSpd", "metadata": null, "name": 
"ESC_FrontRightWHeelSpd", "numpy_type": "float32", "pandas_type": "float32"}

, 

{"field_name": "ESC_FRWHeelSpdFaultSignal", "metadata": null, "name": 
"ESC_FRWHeelSpdFaultSignal", "numpy_type": "uint8", "pandas_type": "uint8"}

, 

{"field_name": "ESC_RearLeftWheelSpd", "metadata": null, "name": 
"ESC_RearLeftWheelSpd", "numpy_type": "float32", "pandas_type": "float32"}

, 

{"field_name": "ESC_RLWHeelSpdFaultSignal", "metadata": null, "name": 
"ESC_RLWHeelSpdFaultSignal", "numpy_type": "uint8", "pandas_type": "uint8"}

, 

{"field_name": "ESC_RearRightWheelSpd", "metadata": null, "name": 
"ESC_RearRightWheelSpd", "numpy_type": "float32", "pandas_type": "float32"}

, 

{"field_name": "ESC_RRWHeelSpdFaultSignal", "metadata": null, "name": 
"ESC_RRWHeelSpdFaultSignal", "numpy_type": "uint8", "pandas_type": "uint8"}

, 

{"field_name": "ESC_Longitudinal_Acceleration_flag", "metadata": null, "name": 
"ESC_Longitudinal_Acceleration_flag", "numpy_type": "uint8", "pandas_type": 
"uint8"}

, 

{"field_name": "ESC_Longitudinal_Acceleration", "metadata": null, "name": 
"ESC_Longitudinal_Acceleration", "numpy_type": "float32", "pandas_type": 
"float32"}

, 

{"field_name": "ESC_ESCOFF", "metadata": null, "name": "ESC_ESCOFF", 
"numpy_type": "uint8", "pandas_type": "uint8"}

, 

{"field_name": "ESC_ESCWorkStatus", "metadata": null, "name": 
"ESC_ESCWorkStatus", "numpy_type": "uint8", "pandas_type": "uint8"}

, 

{"field_name": "ESC_ESCAlarmSig", "metadata": null, "name": "ESC_ESCAlarmSig", 
"numpy_type": "uint8", "pandas_type": 

[jira] [Updated] (HIVE-23844) use fastparquet generate parquet format file, import hive, query error

2020-07-14 Thread Henry Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Lu updated HIVE-23844:

Description: 
i use fastparquet generate parquet format file(Please check the attachment !),  
 message as follows:

message schema

{

      optional double timestamps;
 optional int32 ESC_BrakePressure (UINT_8);
 optional int32 ESC_BrakePressureValid (UINT_8);
 optional int32 ESC_EBDWork (UINT_8);
 optional int32 ESC_ABSWorkLable (UINT_8);
 optional int32 ESC_EBDAlarm (UINT_8);
 optional int32 ESC_VehSpdValidFlag (UINT_8);
 optional int32 ESC_ABSAlarmSignal (UINT_8);
 optional float ESC_VehSpd;
 optional float ESC_FrontLeftWHeelSpd;
 optional int32 ESC_FLWHeelSpdFaultSignal (UINT_8);
 optional float ESC_FrontRightWHeelSpd;
 optional int32 ESC_FRWHeelSpdFaultSignal (UINT_8);
 optional float ESC_RearLeftWheelSpd;
 optional int32 ESC_RLWHeelSpdFaultSignal (UINT_8);
 optional float ESC_RearRightWheelSpd;
 optional int32 ESC_RRWHeelSpdFaultSignal (UINT_8);
 optional int32 ESC_Longitudinal_Acceleration_flag (UINT_8);
 optional float ESC_Longitudinal_Acceleration;
 optional int32 ESC_ESCOFF (UINT_8);
 optional int32 ESC_ESCWorkStatus (UINT_8);
 optional int32 ESC_ESCAlarmSig (UINT_8);
 optional int32 ESC_TCSCFActive (UINT_8);
 optional int32 ESC_ReqIncreaseTorqueFlag (UINT_8);
 optional int32 ESC_ReqDecreaseTorqueFlag (UINT_8);
 optional int32 ESC_ReqIncreaseTorque (UINT_8);
 optional int32 ESC_ReqDecreaseTorque (UINT_8);
 optional int32 ESC_ESCValidity (UINT_8);
 optional int32 ESC_RollingCount_ESC3 (UINT_8);
 optional int32 ESC_CICkSum_ESC3 (UINT_8);

}

creator: fastparquet-python version 1.0.0 (build 111)
 extra: pandas = {"column_indexes": [

{"field_name": null, "metadata": null, "name": null, "numpy_type": "object", 
"pandas_type": "mixed-integer"}

], "columns": [\\{"field_name": "timestamps", "metadata": null, "name": 
"timestamps", "numpy_type": "float64", "pandas_type": "float64"}, 
\\{"field_name": "ESC_BrakePressure", "metadata": null, "name": 
"ESC_BrakePressure", "numpy_type": "uint8", "pandas_type": "uint8"}, 
\\{"field_name": "ESC_BrakePressureValid", "metadata": null, "name": 
"ESC_BrakePressureValid", "numpy_type": "uint8", "pandas_type": "uint8"}, 
\\{"field_name": "ESC_EBDWork", "metadata": null, "name": "ESC_EBDWork", 
"numpy_type": "uint8", "pandas_type": "uint8"}, \\{"field_name": 
"ESC_ABSWorkLable", "metadata": null, "name": "ESC_ABSWorkLable", "numpy_type": 
"uint8", "pandas_type": "uint8"}, \\{"field_name": "ESC_EBDAlarm", "metadata": 
null, "name": "ESC_EBDAlarm", "numpy_type": "uint8", "pandas_type": "uint8"}, 
\\{"field_name": "ESC_VehSpdValidFlag", "metadata": null, "name": 
"ESC_VehSpdValidFlag", "numpy_type": "uint8", "pandas_type": "uint8"}, 
\\{"field_name": "ESC_ABSAlarmSignal", "metadata": null, "name": 
"ESC_ABSAlarmSignal", "numpy_type": "uint8", "pandas_type": "uint8"}, 
\\{"field_name": "ESC_VehSpd", "metadata": null, "name": "ESC_VehSpd", 
"numpy_type": "float32", "pandas_type": "float32"}, \\{"field_name": 
"ESC_FrontLeftWHeelSpd", "metadata": null, "name": "ESC_FrontLeftWHeelSpd", 
"numpy_type": "float32", "pandas_type": "float32"}, \\{"field_name": 
"ESC_FLWHeelSpdFaultSignal", "metadata": null, "name": 
"ESC_FLWHeelSpdFaultSignal", "numpy_type": "uint8", "pandas_type": "uint8"}, 
\\{"field_name": "ESC_FrontRightWHeelSpd", "metadata": null, "name": 
"ESC_FrontRightWHeelSpd", "numpy_type": "float32", "pandas_type": "float32"}, 
\\{"field_name": "ESC_FRWHeelSpdFaultSignal", "metadata": null, "name": 
"ESC_FRWHeelSpdFaultSignal", "numpy_type": "uint8", "pandas_type": "uint8"}, 
\\{"field_name": "ESC_RearLeftWheelSpd", "metadata": null, "name": 
"ESC_RearLeftWheelSpd", "numpy_type": "float32", "pandas_type": "float32"}, 
\\{"field_name": "ESC_RLWHeelSpdFaultSignal", "metadata": null, "name": 
"ESC_RLWHeelSpdFaultSignal", "numpy_type": "uint8", "pandas_type": "uint8"}, 
\\{"field_name": "ESC_RearRightWheelSpd", "metadata": null, "name": 
"ESC_RearRightWheelSpd", "numpy_type": "float32", "pandas_type": "float32"}, 
\\{"field_name": "ESC_RRWHeelSpdFaultSignal", "metadata": null, "name": 
"ESC_RRWHeelSpdFaultSignal", "numpy_type": "uint8", "pandas_type": "uint8"}, 
\\{"field_name": "ESC_Longitudinal_Acceleration_flag", "metadata": null, 
"name": "ESC_Longitudinal_Acceleration_flag", "numpy_type": "uint8", 
"pandas_type": "uint8"}, \\{"field_name": "ESC_Longitudinal_Acceleration", 
"metadata": null, "name": "ESC_Longitudinal_Acceleration", "numpy_type": 
"float32", "pandas_type": "float32"}, \\{"field_name": "ESC_ESCOFF", 
"metadata": null, "name": "ESC_ESCOFF", "numpy_type": "uint8", "pandas_type": 
"uint8"}, \\{"field_name": "ESC_ESCWorkStatus", "metadata": null, "name": 
"ESC_ESCWorkStatus", "numpy_type": "uint8", "pandas_type": "uint8"}, 
\\{"field_name": "ESC_ESCAlarmSig", "metadata": null, "name": 
"ESC_ESCAlarmSig", "numpy_type": "uint8", "pandas_type": "uint8"}, 

[jira] [Work logged] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23737?focusedWorklogId=458524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458524
 ]

ASF GitHub Bot logged work on HIVE-23737:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 08:56
Start Date: 14/Jul/20 08:56
Worklog Time Spent: 10m 
  Work Description: shameersss1 edited a comment on pull request #1195:
URL: https://github.com/apache/hive/pull/1195#issuecomment-657450269


   @t3rmin4t0r @prasanthj @b-slim   Could you please take a review the PR?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458524)
Time Spent: 40m  (was: 0.5h)

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23824) LLAP - add API to look up ORC metadata for certain Path

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23824?focusedWorklogId=458523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458523
 ]

ASF GitHub Bot logged work on HIVE-23824:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 08:55
Start Date: 14/Jul/20 08:55
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #1238:
URL: https://github.com/apache/hive/pull/1238


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458523)
Time Spent: 1h  (was: 50m)

> LLAP - add API to look up ORC metadata for certain Path
> ---
>
> Key: HIVE-23824
> URL: https://issues.apache.org/jira/browse/HIVE-23824
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> LLAP IO supports caching but currently this is only done via LlapRecordReader 
> / using splits, aka good old mapreduce way.
> At certain times it would worth to leverage the caching of files on certain 
> paths, that are not necessarily associated with a record reader directly. An 
> example of this could be the caching of ACID delete delta files, as they are 
> currently being read without caching.
> With this patch we'd extend the LLAP API and offer another entry point for 
> retrieving metadata of ORC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23824) LLAP - add API to look up ORC metadata for certain Path

2020-07-14 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-23824:
--
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to master.

Thanks for the patch [~szita]!

> LLAP - add API to look up ORC metadata for certain Path
> ---
>
> Key: HIVE-23824
> URL: https://issues.apache.org/jira/browse/HIVE-23824
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> LLAP IO supports caching but currently this is only done via LlapRecordReader 
> / using splits, aka good old mapreduce way.
> At certain times it would worth to leverage the caching of files on certain 
> paths, that are not necessarily associated with a record reader directly. An 
> example of this could be the caching of ACID delete delta files, as they are 
> currently being read without caching.
> With this patch we'd extend the LLAP API and offer another entry point for 
> retrieving metadata of ORC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23818) Use String Switch-Case Statement in StatUtils

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23818?focusedWorklogId=458478=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458478
 ]

ASF GitHub Bot logged work on HIVE-23818:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 06:50
Start Date: 14/Jul/20 06:50
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1229:
URL: https://github.com/apache/hive/pull/1229#issuecomment-658003455


   tests show that this change causes a slight change in datasize estimation



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458478)
Time Spent: 1h 10m  (was: 1h)

> Use String Switch-Case Statement in StatUtils
> -
>
> Key: HIVE-23818
> URL: https://issues.apache.org/jira/browse/HIVE-23818
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> switch-case statements with Java is now available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23800) Add hooks when HiveServer2 stops due to OutOfMemoryError

2020-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23800?focusedWorklogId=458476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458476
 ]

ASF GitHub Bot logged work on HIVE-23800:
-

Author: ASF GitHub Bot
Created on: 14/Jul/20 06:47
Start Date: 14/Jul/20 06:47
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1205:
URL: https://github.com/apache/hive/pull/1205#issuecomment-658002425


   when I was using some of these hooks earlier: they are too specific to some 
purpose.
   right now the proposed callback is a {{Runnable}} ... which means you will 
have to create a completely new one in case you want to pass some things to it 
later...
   
   so...I would recommend the following:
   * remove the "oom" keyword from the name of the hook/etc - it could be diag, 
debug or something like that
   * add a callback which has 2 arguments:
 * a callback type (some enum; which should have a value which is specific 
to "oom")
 * some callback payload object to which we can add things later on... it 
could be an object using the {{Adaptable}} pattern
   
   all this stuff is to make it a bit more reusable if we need to reuse it 
later on...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 458476)
Time Spent: 2h  (was: 1h 50m)

> Add hooks when HiveServer2 stops due to OutOfMemoryError
> 
>
> Key: HIVE-23800
> URL: https://issues.apache.org/jira/browse/HIVE-23800
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Make oom hook an interface of HiveServer2,  so user can implement the hook to 
> do something before HS2 stops, such as dumping the heap or altering the 
> devops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)