[jira] [Updated] (HIVE-20682) Async query execution can potentially fail if shared sessionHive is closed by master thread.

2018-10-25 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20682:

Status: Open  (was: Patch Available)

> Async query execution can potentially fail if shared sessionHive is closed by 
> master thread.
> 
>
> Key: HIVE-20682
> URL: https://issues.apache.org/jira/browse/HIVE-20682
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20682.01.patch, HIVE-20682.02.patch
>
>
> *Problem description:*
> The master thread initializes the *sessionHive* object in *HiveSessionImpl* 
> class when we open a new session for a client connection and by default all 
> queries from this connection shares the same sessionHive object. 
> If the master thread executes a *synchronous* query, it closes the 
> sessionHive object (referred via thread local hiveDb) if  
> {{Hive.isCompatible}} returns false and sets new Hive object in thread local 
> HiveDb but doesn't change the sessionHive object in the session. Whereas, 
> *asynchronous* query execution via async threads never closes the sessionHive 
> object and it just creates a new one if needed and sets it as their thread 
> local hiveDb.
> So, the problem can happen in the case where an *asynchronous* query is being 
> executed by async threads refers to sessionHive object and the master thread 
> receives a *synchronous* query that closes the same sessionHive object. 
> Also, each query execution overwrites the thread local hiveDb object to 
> sessionHive object which potentially leaks a metastore connection if the 
> previous synchronous query execution re-created the Hive object.
> *Possible Fix:*
> The *sessionHive* object could be shared my multiple threads and so it 
> shouldn't be allowed to be closed by any query execution threads when they 
> re-create the Hive object due to changes in Hive configurations. But the Hive 
> objects created by query execution threads should be closed when the thread 
> exits.
> So, it is proposed to have an *isAllowClose* flag (default: *true*) in Hive 
> object which should be set to *false* for *sessionHive* and would be 
> forcefully closed when the session is closed or released.
> cc [~pvary]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20682) Async query execution can potentially fail if shared sessionHive is closed by master thread.

2018-10-25 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-20682:

Description: 
*Problem description:*

The master thread initializes the *sessionHive* object in *HiveSessionImpl* 
class when we open a new session for a client connection and by default all 
queries from this connection shares the same sessionHive object. 

If the master thread executes a *synchronous* query, it closes the sessionHive 
object (referred via thread local hiveDb) if  {{Hive.isCompatible}} returns 
false and sets new Hive object in thread local HiveDb but doesn't change the 
sessionHive object in the session. Whereas, *asynchronous* query execution via 
async threads never closes the sessionHive object and it just creates a new one 
if needed and sets it as their thread local hiveDb.

So, the problem can happen in the case where an *asynchronous* query is being 
executed by async threads refers to sessionHive object and the master thread 
receives a *synchronous* query that closes the same sessionHive object. 

Also, each query execution overwrites the thread local hiveDb object to 
sessionHive object which potentially leaks a metastore connection if the 
previous synchronous query execution re-created the Hive object.

*Possible Fix:*

The *sessionHive* object could be shared my multiple threads and so it 
shouldn't be allowed to be closed by any query execution threads when they 
re-create the Hive object due to changes in Hive configurations. But the Hive 
objects created by query execution threads should be closed when the thread 
exits.

So, it is proposed to have an *isAllowClose* flag (default: *true*) in Hive 
object which should be set to *false* for *sessionHive* and would be forcefully 
closed when the session is closed or released.

cc [~pvary]

  was:
*Problem description:*

The master thread initializes the *sessionHive* object in *HiveSessionImpl* 
class when we open a new session for a client connection and by default all 
queries from this connection shares the same sessionHive object. 

If the master thread executes a *synchronous* query, it closes the sessionHive 
object (referred via thread local hiveDb) if  {{Hive.isCompatible}} returns 
false and sets new Hive object in thread local HiveDb but doesn't change the 
sessionHive object in the session. Whereas, *asynchronous* query execution via 
async threads never closes the sessionHive object and it just creates a new one 
if needed and sets it as their thread local hiveDb.

So, the problem can happen in the case where an *asynchronous* query is being 
executed by async threads refers to sessionHive object and the master thread 
receives a *synchronous* query that closes the same sessionHive object. 

Also, each query execution overwrites the thread local hiveDb object to 
sessionHive object which potentially leaks a metastore connection if the 
previous synchronous query execution re-created the Hive object.

*Possible Fix:*

We shall maintain an Atomic reference counter in the Hive object. We increment 
the counter when somebody sets it in thread local hiveDb and decrement it when 
somebody releases it. Only when the counter is down to 0, we should close the 
connection.

Couple of cases to release the thread local hiveDb:
 * When synchronous query execution in master thread re-creates Hive object due 
to config change. We also, need to update the sessionHive object in the current 
session as we releases it from thread local hiveDb of master thread.
 * When async thread exits after completing execution or due to exception.

If the session is getting closed, it is guaranteed that the reference counter 
is down to 0 as we forcefully close all the async operations and so we can 
close the connection there.

cc [~pvary]


> Async query execution can potentially fail if shared sessionHive is closed by 
> master thread.
> 
>
> Key: HIVE-20682
> URL: https://issues.apache.org/jira/browse/HIVE-20682
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20682.01.patch, HIVE-20682.02.patch
>
>
> *Problem description:*
> The master thread initializes the *sessionHive* object in *HiveSessionImpl* 
> class when we open a new session for a client connection and by default all 
> queries from this connection shares the same sessionHive object. 
> If the master thread executes a *synchronous* query, it closes the 
> sessionHive object (referred via thread local hiveDb) if  
> {{Hive.isCompatible}} returns false and sets new Hive object in thread local 
> HiveDb 

[jira] [Assigned] (HIVE-19351) Vectorization: Followup on why operator numbers are unstable in User EXPLAIN for explainuser_1.q / spark_explainuser_1

2018-10-25 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline reassigned HIVE-19351:
---

Assignee: (was: Matt McCline)

> Vectorization: Followup on why operator numbers are unstable in User EXPLAIN 
> for explainuser_1.q / spark_explainuser_1
> --
>
> Key: HIVE-19351
> URL: https://issues.apache.org/jira/browse/HIVE-19351
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Priority: Critical
>
> Why were the operator numbers unstable for:
> TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]
> TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1] 
> when vectorization was enabled?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19353) Vectorization: ConstantVectorExpression --> RuntimeException: Unexpected column vector type LIST

2018-10-25 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline reassigned HIVE-19353:
---

Assignee: (was: Matt McCline)

> Vectorization: ConstantVectorExpression  --> RuntimeException: Unexpected 
> column vector type LIST
> -
>
> Key: HIVE-19353
> URL: https://issues.apache.org/jira/browse/HIVE-19353
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Priority: Critical
> Attachments: HIVE-19353.01.patch, HIVE-19353.02.patch
>
>
> Found by enabling vectorization for 
> org.apache.hive.jdbc.TestJdbcDriver2.testResultSetMetaData
> {noformat}
> Caused by: java.lang.RuntimeException: Unexpected column vector type LIST
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.ConstantVectorExpression.evaluate(ConstantVectorExpression.java:237)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:955) 
> ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:928) 
> ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.closeOp(VectorMapOperator.java:984)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:722) 
> ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:193) 
> ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-18807) Fix broken tests caused by HIVE-18493

2018-10-25 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline reassigned HIVE-18807:
---

Assignee: (was: Matt McCline)

> Fix broken tests caused by HIVE-18493
> -
>
> Key: HIVE-18807
> URL: https://issues.apache.org/jira/browse/HIVE-18807
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Matt McCline
>Priority: Major
> Fix For: 4.0.0
>
>
> Broken tests =
> org.apache.hive.beeline.cli.TestHiveCli.testNoErrorDB
> org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgress
> org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19088) Vectorization: Turning on vectorization in input_lazyserde.q causes ClassCastException DoubleWritable to StandardUnion

2018-10-25 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline reassigned HIVE-19088:
---

Assignee: (was: Matt McCline)

> Vectorization: Turning on vectorization in input_lazyserde.q causes 
> ClassCastException DoubleWritable to StandardUnion
> --
>
> Key: HIVE-19088
> URL: https://issues.apache.org/jira/browse/HIVE-19088
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Matt McCline
>Priority: Critical
>
> {noformat}
> 2018-03-31T21:19:48,252 ERROR [LocalJobRunner Map Task Executor #0] 
> mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:967)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:154)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.serde2.io.DoubleWritable cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.StandardUnionObjectInspector$StandardUnion
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:608)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:581)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:581)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:581)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:350)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRow(VectorAssignRow.java:998)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:956)
>   ... 11 more{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-14893) vectorized execution may convert LongCV to smaller types incorrectly

2018-10-25 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline reassigned HIVE-14893:
---

Assignee: (was: Matt McCline)

> vectorized execution may convert LongCV to smaller types incorrectly
> 
>
> Key: HIVE-14893
> URL: https://issues.apache.org/jira/browse/HIVE-14893
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Critical
>
> See the results for vectorized in decimal_11 test added in HIVE-14863. 
> We cast decimal to various int types; the cast is specialized for each type 
> on non-vectorized side; on vectorized side, it's only specialized for 
> LongColumnVector, so all the decimals get converted to longs. 
> LongColumnVector gets converted to a proper type in some other mysterious 
> place later, and tiny/small/regular ints become truncated at that point.
> Logically, I am not sure if every vectorized expression should be aware of 
> the underlying type for the LongColumnVector (that seems implausible - I am 
> not sure if type information is even available, and if yes it doesn't look 
> like it's used in other places), or if the long-to-smaller-type automatic 
> conversion should be fixed to produce nulls on overflow.
> However it seems like a good idea to do the latter in any case, to have a 
> catch-all for all the vectorized expressions that might treat LongCV as 
> representing longs at all times.
> Update - I see 10s of places in the code where it does something like this: 
> {noformat}(int) ((LongColumnVector) 
> batch.cols[projectionColumnNum]).vector[adjustedIndex]{noformat}
> Also for other types. These might all be problematic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-17895) Vectorization: Wrong results for schema_evol_text_vec_table.q (LLAP)

2018-10-25 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline reassigned HIVE-17895:
---

Assignee: (was: Matt McCline)

> Vectorization: Wrong results for schema_evol_text_vec_table.q (LLAP)
> 
>
> Key: HIVE-17895
> URL: https://issues.apache.org/jira/browse/HIVE-17895
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Matt McCline
>Priority: Critical
>
> NonVec: 103   NULL0.0 NULLoriginal
> Vec: 103  NULLNULLNULLoriginal



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20370) Vectorization: Add Native Vector MapJoin hash table optimization for Left/Right Outer Joins when there are no Small Table values

2018-10-25 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-20370:

Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

> Vectorization: Add Native Vector MapJoin hash table optimization for 
> Left/Right Outer Joins when there are no Small Table values
> 
>
> Key: HIVE-20370
> URL: https://issues.apache.org/jira/browse/HIVE-20370
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-20370.01.patch, HIVE-20370.02.patch, 
> HIVE-20370.03.patch, HIVE-20370.04.patch
>
>
> Similar to Native Vector MapJoin's InnerBigOnly optimization that uses an 
> efficient Hash Multi-Set with a counter instead of a Hash Map with an empty 
> value, do the same for Outer joins.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20806) Add ASF license for files added in HIVE-20679

2018-10-25 Thread anishek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-20806:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to master.

> Add ASF license for files added in HIVE-20679
> -
>
> Key: HIVE-20806
> URL: https://issues.apache.org/jira/browse/HIVE-20806
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: anishek
>Assignee: anishek
>Priority: Trivial
> Fix For: 4.0.0
>
> Attachments: HIVE-20806.1.patch
>
>
> HIVE-20679 added couple of new files Deserialzer/Serialzer that needs the ASF 
> license header.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20806) Add ASF license for files added in HIVE-20679

2018-10-25 Thread anishek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-20806:
---
Status: Patch Available  (was: Open)

> Add ASF license for files added in HIVE-20679
> -
>
> Key: HIVE-20806
> URL: https://issues.apache.org/jira/browse/HIVE-20806
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: anishek
>Assignee: anishek
>Priority: Trivial
> Fix For: 4.0.0
>
> Attachments: HIVE-20806.1.patch
>
>
> HIVE-20679 added couple of new files Deserialzer/Serialzer that needs the ASF 
> license header.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20815) JdbcRecordReader.next shall not eat exception

2018-10-25 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-20815:
--
Status: Patch Available  (was: Open)

> JdbcRecordReader.next shall not eat exception
> -
>
> Key: HIVE-20815
> URL: https://issues.apache.org/jira/browse/HIVE-20815
> Project: Hive
>  Issue Type: Bug
>  Components: StorageHandler
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-20815.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20815) JdbcRecordReader.next shall not eat exception

2018-10-25 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-20815:
--
Attachment: HIVE-20815.1.patch

> JdbcRecordReader.next shall not eat exception
> -
>
> Key: HIVE-20815
> URL: https://issues.apache.org/jira/browse/HIVE-20815
> Project: Hive
>  Issue Type: Bug
>  Components: StorageHandler
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-20815.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-20815) JdbcRecordReader.next shall not eat exception

2018-10-25 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned HIVE-20815:
-


> JdbcRecordReader.next shall not eat exception
> -
>
> Key: HIVE-20815
> URL: https://issues.apache.org/jira/browse/HIVE-20815
> Project: Hive
>  Issue Type: Bug
>  Components: StorageHandler
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-20814) New create function between HiveServer2 is not synchronized

2018-10-25 Thread leozhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664586#comment-16664586
 ] 

leozhang edited comment on HIVE-20814 at 10/26/18 3:54 AM:
---

use  _reload function_ command


was (Author: leozhang):
use reload functions

>  New create function between HiveServer2 is not synchronized
> 
>
> Key: HIVE-20814
> URL: https://issues.apache.org/jira/browse/HIVE-20814
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 1.1.0
>Reporter: leozhang
>Priority: Major
> Attachments: image-2018-10-26-10-23-48-101.png, 
> image-2018-10-26-10-24-16-669.png, image-2018-10-26-10-24-54-904.png, 
> image-2018-10-26-10-26-00-591.png, image-2018-10-26-10-27-32-291.png
>
>
> I am using CDH open source version 5.15.0, where the Hive version is 1.1.0.
> I have 3 Metastore services and 3 HiveServer2 services in the cluster, 
> Metastore and HiveServer2 don't currently have HA turned on. The cluster 
> opens the Sentry service and configures the Hive Auxiliary JARs Directory 
> attribute.
> Now I am having a problem, I am using beeline to connect to HiveServer2 on 
> node 1 to create a function that is successful and can be queried normally. 
> But if I now connect to HiveServer2 of node 2 via Beeline, I can't see the 
> function I just created on node 1 through _show function_. I have to restart 
> HiveServer2 or use _reload function_ command on node 2 to see the function 
> created on node 1 just now.
> Is there a way to implement the function that I can create on any node? I 
> don't know the reason and solution for this problem, I hope I can get help, 
> thank you!
>  
> exp:
> connect dn1.test.com:1 to create function _zzytest_trans_ , use _show 
> functions_ I can see it
> !image-2018-10-26-10-24-16-669.png!
> !image-2018-10-26-10-24-54-904.png!
>  
> At this point I used beeline to connect to dn4.test.com:1 and use the 
> show functions command to view the function and I can't find the 
> zzytest_trans function. I have to restart HiveServer2 or use _reload 
> function_ command on the dn4.test.com node to see the zzytest_trans function.
> !image-2018-10-26-10-26-00-591.png!
> !image-2018-10-26-10-27-32-291.png!
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20796) jdbc URL can contain sensitive information that should not be logged

2018-10-25 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664600#comment-16664600
 ] 

Hive QA commented on HIVE-20796:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12945596/HIVE-20796.04.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/14643/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/14643/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-14643/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Tests exited with: Exception: Patch URL 
https://issues.apache.org/jira/secure/attachment/12945596/HIVE-20796.04.patch 
was found in seen patch url's cache and a test was probably run already on it. 
Aborting...
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12945596 - PreCommit-HIVE-Build

> jdbc URL can contain sensitive information that should not be logged
> 
>
> Key: HIVE-20796
> URL: https://issues.apache.org/jira/browse/HIVE-20796
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Laszlo Pinter
>Assignee: Laszlo Pinter
>Priority: Major
> Attachments: HIVE-20796.01.patch, HIVE-20796.02.patch, 
> HIVE-20796.03.patch, HIVE-20796.04.patch
>
>
> It is possible to put passwords in the jdbc connection url and some jdbc 
> drivers will supposedly use that. (derby, mysql). This information is 
> considered sensitive, and should be masked out, while logging the connection 
> url.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HIVE-20814) New create function between HiveServer2 is not synchronized

2018-10-25 Thread leozhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leozhang reopened HIVE-20814:
-
  Assignee: (was: leozhang)

I want to know if there are other solutions.

>  New create function between HiveServer2 is not synchronized
> 
>
> Key: HIVE-20814
> URL: https://issues.apache.org/jira/browse/HIVE-20814
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 1.1.0
>Reporter: leozhang
>Priority: Major
> Attachments: image-2018-10-26-10-23-48-101.png, 
> image-2018-10-26-10-24-16-669.png, image-2018-10-26-10-24-54-904.png, 
> image-2018-10-26-10-26-00-591.png, image-2018-10-26-10-27-32-291.png
>
>
> I am using CDH open source version 5.15.0, where the Hive version is 1.1.0.
> I have 3 Metastore services and 3 HiveServer2 services in the cluster, 
> Metastore and HiveServer2 don't currently have HA turned on. The cluster 
> opens the Sentry service and configures the Hive Auxiliary JARs Directory 
> attribute.
> Now I am having a problem, I am using beeline to connect to HiveServer2 on 
> node 1 to create a function that is successful and can be queried normally. 
> But if I now connect to HiveServer2 of node 2 via Beeline, I can't see the 
> function I just created on node 1 through _show function_. I have to restart 
> HiveServer2 or use _reload function_ command on node 2 to see the function 
> created on node 1 just now.
> Is there a way to implement the function that I can create on any node? I 
> don't know the reason and solution for this problem, I hope I can get help, 
> thank you!
>  
> exp:
> connect dn1.test.com:1 to create function _zzytest_trans_ , use _show 
> functions_ I can see it
> !image-2018-10-26-10-24-16-669.png!
> !image-2018-10-26-10-24-54-904.png!
>  
> At this point I used beeline to connect to dn4.test.com:1 and use the 
> show functions command to view the function and I can't find the 
> zzytest_trans function. I have to restart HiveServer2 or use _reload 
> function_ command on the dn4.test.com node to see the zzytest_trans function.
> !image-2018-10-26-10-26-00-591.png!
> !image-2018-10-26-10-27-32-291.png!
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20814) New create function between HiveServer2 is not synchronized

2018-10-25 Thread leozhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leozhang updated HIVE-20814:

Description: 
I am using CDH open source version 5.15.0, where the Hive version is 1.1.0.

I have 3 Metastore services and 3 HiveServer2 services in the cluster, 
Metastore and HiveServer2 don't currently have HA turned on. The cluster opens 
the Sentry service and configures the Hive Auxiliary JARs Directory attribute.

Now I am having a problem, I am using beeline to connect to HiveServer2 on node 
1 to create a function that is successful and can be queried normally. But if I 
now connect to HiveServer2 of node 2 via Beeline, I can't see the function I 
just created on node 1 through _show function_. I have to restart HiveServer2 
or use _reload function_ command on node 2 to see the function created on node 
1 just now.

Is there a way to implement the function that I can create on any node? I don't 
know the reason and solution for this problem, I hope I can get help, thank you!

 

exp:

connect dn1.test.com:1 to create function _zzytest_trans_ , use _show 
functions_ I can see it

!image-2018-10-26-10-24-16-669.png!

!image-2018-10-26-10-24-54-904.png!

 

At this point I used beeline to connect to dn4.test.com:1 and use the show 
functions command to view the function and I can't find the zzytest_trans 
function. I have to restart HiveServer2 or use _reload function_ command on the 
dn4.test.com node to see the zzytest_trans function.

!image-2018-10-26-10-26-00-591.png!

!image-2018-10-26-10-27-32-291.png!

 

 

 

 

 

 

  was:
I am using CDH open source version 5.15.0, where the Hive version is 1.1.0.

I have 3 Metastore services and 3 HiveServer2 services in the cluster, 
Metastore and HiveServer2 don't currently have HA turned on. The cluster opens 
the Sentry service and configures the Hive Auxiliary JARs Directory attribute.

Now I am having a problem, I am using beeline to connect to HiveServer2 on node 
1 to create a function that is successful and can be queried normally. But if I 
now connect to HiveServer2 of node 2 via Beeline, I can't see the function I 
just created on node 1 through _show function_. I have to restart HiveServer2 
on node 2 to see the function created on node 1 just now.

I don't know the reason and solution for this problem, I hope I can get help, 
thank you!

 

exp:

connect dn1.test.com:1 to create function _zzytest_trans_ , use _show 
functions_ I can see it

!image-2018-10-26-10-24-16-669.png!

!image-2018-10-26-10-24-54-904.png!

 

At this point I used beeline to connect to dn4.test.com:1 and use the show 
functions command to view the function and I can't find the zzytest_trans 
function. I have to restart HiveServer2 on the dn4.test.com node to see the 
zzytest_trans function.

!image-2018-10-26-10-26-00-591.png!

!image-2018-10-26-10-27-32-291.png!

 

 

 

 

 

 


>  New create function between HiveServer2 is not synchronized
> 
>
> Key: HIVE-20814
> URL: https://issues.apache.org/jira/browse/HIVE-20814
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 1.1.0
>Reporter: leozhang
>Assignee: leozhang
>Priority: Major
> Attachments: image-2018-10-26-10-23-48-101.png, 
> image-2018-10-26-10-24-16-669.png, image-2018-10-26-10-24-54-904.png, 
> image-2018-10-26-10-26-00-591.png, image-2018-10-26-10-27-32-291.png
>
>
> I am using CDH open source version 5.15.0, where the Hive version is 1.1.0.
> I have 3 Metastore services and 3 HiveServer2 services in the cluster, 
> Metastore and HiveServer2 don't currently have HA turned on. The cluster 
> opens the Sentry service and configures the Hive Auxiliary JARs Directory 
> attribute.
> Now I am having a problem, I am using beeline to connect to HiveServer2 on 
> node 1 to create a function that is successful and can be queried normally. 
> But if I now connect to HiveServer2 of node 2 via Beeline, I can't see the 
> function I just created on node 1 through _show function_. I have to restart 
> HiveServer2 or use _reload function_ command on node 2 to see the function 
> created on node 1 just now.
> Is there a way to implement the function that I can create on any node? I 
> don't know the reason and solution for this problem, I hope I can get help, 
> thank you!
>  
> exp:
> connect dn1.test.com:1 to create function _zzytest_trans_ , use _show 
> functions_ I can see it
> !image-2018-10-26-10-24-16-669.png!
> !image-2018-10-26-10-24-54-904.png!
>  
> At this point I used beeline to connect to dn4.test.com:1 and use the 
> show functions command to view the function and I can't find the 
> zzytest_trans function. I have to restart HiveServer2 or use _reload 
> function_ command on the dn4.test.com 

[jira] [Resolved] (HIVE-20814) New create function between HiveServer2 is not synchronized

2018-10-25 Thread leozhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leozhang resolved HIVE-20814.
-
Resolution: Fixed

use reload functions

>  New create function between HiveServer2 is not synchronized
> 
>
> Key: HIVE-20814
> URL: https://issues.apache.org/jira/browse/HIVE-20814
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 1.1.0
>Reporter: leozhang
>Assignee: leozhang
>Priority: Major
> Attachments: image-2018-10-26-10-23-48-101.png, 
> image-2018-10-26-10-24-16-669.png, image-2018-10-26-10-24-54-904.png, 
> image-2018-10-26-10-26-00-591.png, image-2018-10-26-10-27-32-291.png
>
>
> I am using CDH open source version 5.15.0, where the Hive version is 1.1.0.
> I have 3 Metastore services and 3 HiveServer2 services in the cluster, 
> Metastore and HiveServer2 don't currently have HA turned on. The cluster 
> opens the Sentry service and configures the Hive Auxiliary JARs Directory 
> attribute.
> Now I am having a problem, I am using beeline to connect to HiveServer2 on 
> node 1 to create a function that is successful and can be queried normally. 
> But if I now connect to HiveServer2 of node 2 via Beeline, I can't see the 
> function I just created on node 1 through _show function_. I have to restart 
> HiveServer2 on node 2 to see the function created on node 1 just now.
> I don't know the reason and solution for this problem, I hope I can get help, 
> thank you!
>  
> exp:
> connect dn1.test.com:1 to create function _zzytest_trans_ , use _show 
> functions_ I can see it
> !image-2018-10-26-10-24-16-669.png!
> !image-2018-10-26-10-24-54-904.png!
>  
> At this point I used beeline to connect to dn4.test.com:1 and use the 
> show functions command to view the function and I can't find the 
> zzytest_trans function. I have to restart HiveServer2 on the dn4.test.com 
> node to see the zzytest_trans function.
> !image-2018-10-26-10-26-00-591.png!
> !image-2018-10-26-10-27-32-291.png!
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-20814) New create function between HiveServer2 is not synchronized

2018-10-25 Thread leozhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leozhang reassigned HIVE-20814:
---

Assignee: leozhang

>  New create function between HiveServer2 is not synchronized
> 
>
> Key: HIVE-20814
> URL: https://issues.apache.org/jira/browse/HIVE-20814
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 1.1.0
>Reporter: leozhang
>Assignee: leozhang
>Priority: Major
> Attachments: image-2018-10-26-10-23-48-101.png, 
> image-2018-10-26-10-24-16-669.png, image-2018-10-26-10-24-54-904.png, 
> image-2018-10-26-10-26-00-591.png, image-2018-10-26-10-27-32-291.png
>
>
> I am using CDH open source version 5.15.0, where the Hive version is 1.1.0.
> I have 3 Metastore services and 3 HiveServer2 services in the cluster, 
> Metastore and HiveServer2 don't currently have HA turned on. The cluster 
> opens the Sentry service and configures the Hive Auxiliary JARs Directory 
> attribute.
> Now I am having a problem, I am using beeline to connect to HiveServer2 on 
> node 1 to create a function that is successful and can be queried normally. 
> But if I now connect to HiveServer2 of node 2 via Beeline, I can't see the 
> function I just created on node 1 through _show function_. I have to restart 
> HiveServer2 on node 2 to see the function created on node 1 just now.
> I don't know the reason and solution for this problem, I hope I can get help, 
> thank you!
>  
> exp:
> connect dn1.test.com:1 to create function _zzytest_trans_ , use _show 
> functions_ I can see it
> !image-2018-10-26-10-24-16-669.png!
> !image-2018-10-26-10-24-54-904.png!
>  
> At this point I used beeline to connect to dn4.test.com:1 and use the 
> show functions command to view the function and I can't find the 
> zzytest_trans function. I have to restart HiveServer2 on the dn4.test.com 
> node to see the zzytest_trans function.
> !image-2018-10-26-10-26-00-591.png!
> !image-2018-10-26-10-27-32-291.png!
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-20617) Fix type of constants in IN expressions to have correct type

2018-10-25 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-20617:
--

Assignee: Jesus Camacho Rodriguez  (was: Zoltan Haindrich)

> Fix type of constants in IN expressions to have correct type
> 
>
> Key: HIVE-20617
> URL: https://issues.apache.org/jira/browse/HIVE-20617
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20617.01.patch, HIVE-20617.02.patch, 
> HIVE-20617.03.patch, HIVE-20617.05.patch, HIVE-20617.06.patch, 
> HIVE-20617.07.patch, HIVE-20617.08.patch, HIVE-20617.08.patch, 
> HIVE-20617.08.patch, HIVE-20617.08.patch, HIVE-20617.08.patch, 
> HIVE-20617.08.patch, HIVE-20617.08.patch, HIVE-20617.08.patch, 
> HIVE-20617.09.patch, HIVE-20617.10.patch, HIVE-20617.10.patch, 
> HIVE-20617.11.patch, HIVE-20617.11.patch, HIVE-20617.12.patch
>
>
> In statements like {{struct(a,b) IN (const struct('x','y'), ... )}} the 
> comparision in UDFIn may fail because if a or b is of char/varchar type the 
> constants will retain string type - especially after PointlookupOptimizer 
> compaction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20617) Fix type of constants in IN expressions to have correct type

2018-10-25 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-20617:
---
Attachment: HIVE-20617.12.patch

> Fix type of constants in IN expressions to have correct type
> 
>
> Key: HIVE-20617
> URL: https://issues.apache.org/jira/browse/HIVE-20617
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20617.01.patch, HIVE-20617.02.patch, 
> HIVE-20617.03.patch, HIVE-20617.05.patch, HIVE-20617.06.patch, 
> HIVE-20617.07.patch, HIVE-20617.08.patch, HIVE-20617.08.patch, 
> HIVE-20617.08.patch, HIVE-20617.08.patch, HIVE-20617.08.patch, 
> HIVE-20617.08.patch, HIVE-20617.08.patch, HIVE-20617.08.patch, 
> HIVE-20617.09.patch, HIVE-20617.10.patch, HIVE-20617.10.patch, 
> HIVE-20617.11.patch, HIVE-20617.11.patch, HIVE-20617.12.patch
>
>
> In statements like {{struct(a,b) IN (const struct('x','y'), ... )}} the 
> comparision in UDFIn may fail because if a or b is of char/varchar type the 
> constants will retain string type - especially after PointlookupOptimizer 
> compaction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20796) jdbc URL can contain sensitive information that should not be logged

2018-10-25 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664464#comment-16664464
 ] 

Hive QA commented on HIVE-20796:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12945596/HIVE-20796.04.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 15506 tests 
executed
*Failed tests:*
{noformat}
TestMiniDruidCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=196)

[druidmini_masking.q,druidmini_test1.q,druidkafkamini_basic.q,druidmini_joins.q,druid_timestamptz.q]
org.apache.hive.spark.client.rpc.TestRpc.testClientTimeout (batchId=325)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/14642/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/14642/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-14642/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12945596 - PreCommit-HIVE-Build

> jdbc URL can contain sensitive information that should not be logged
> 
>
> Key: HIVE-20796
> URL: https://issues.apache.org/jira/browse/HIVE-20796
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Laszlo Pinter
>Assignee: Laszlo Pinter
>Priority: Major
> Attachments: HIVE-20796.01.patch, HIVE-20796.02.patch, 
> HIVE-20796.03.patch, HIVE-20796.04.patch
>
>
> It is possible to put passwords in the jdbc connection url and some jdbc 
> drivers will supposedly use that. (derby, mysql). This information is 
> considered sensitive, and should be masked out, while logging the connection 
> url.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-14516) OrcInputFormat.SplitGenerator.callInternal() can be optimized

2018-10-25 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14516:
--
   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

committed to master
thanks Igor for the contribution

> OrcInputFormat.SplitGenerator.callInternal() can be optimized
> -
>
> Key: HIVE-14516
> URL: https://issues.apache.org/jira/browse/HIVE-14516
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Igor Kryvenko
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-14516.01.patch
>
>
> callIntenal() has 
> // We can't eliminate stripes if there are deltas because the
> // deltas may change the rows making them match the predicate.
> but in Acid 2.0, the deltas only have delete events thus eliminating stripes 
> from  "base" of split should be safe.
> cc [~gopalv]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20796) jdbc URL can contain sensitive information that should not be logged

2018-10-25 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664426#comment-16664426
 ] 

Hive QA commented on HIVE-20796:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
47s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 7s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  1m  
8s{color} | {color:blue} standalone-metastore/metastore-server in master has 
181 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
15s{color} | {color:red} standalone-metastore/metastore-server generated 2 new 
+ 181 unchanged - 0 fixed = 183 total (was 181) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
14s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 13m 50s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:standalone-metastore/metastore-server |
|  |  Call to 
String.equals(org.apache.hadoop.hive.metastore.conf.MetastoreConf$ConfVars) in 
org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(Configuration)  
At ObjectStore.java: At ObjectStore.java:[line 478] |
|  |  Return value of String.trim() ignored in 
org.apache.hadoop.hive.metastore.utils.MetaStoreServerUtils.anonymizeConnectionURL(String)
  At MetaStoreServerUtils.java:in 
org.apache.hadoop.hive.metastore.utils.MetaStoreServerUtils.anonymizeConnectionURL(String)
  At MetaStoreServerUtils.java:[line 1163] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-14642/dev-support/hive-personality.sh
 |
| git revision | master / a99be34 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14642/yetus/new-findbugs-standalone-metastore_metastore-server.html
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14642/yetus/patch-asflicense-problems.txt
 |
| modules | C: standalone-metastore/metastore-server U: 
standalone-metastore/metastore-server |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14642/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> jdbc URL can contain sensitive information that should not be logged
> 
>
> Key: HIVE-20796
> URL: https://issues.apache.org/jira/browse/HIVE-20796
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Laszlo Pinter
>Assignee: Laszlo Pinter
>Priority: Major
> Attachments: HIVE-20796.01.patch, HIVE-20796.02.patch, 
> HIVE-20796.03.patch, HIVE-20796.04.patch
>
>
> It is possible to put passwords in the jdbc connection url and some jdbc 
> drivers will 

[jira] [Commented] (HIVE-20813) udf to_epoch_milli need to support timestamp without time zone as well

2018-10-25 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664422#comment-16664422
 ] 

Jesus Camacho Rodriguez commented on HIVE-20813:


+1 (pending tests)

> udf to_epoch_milli need to support timestamp without time zone as well
> --
>
> Key: HIVE-20813
> URL: https://issues.apache.org/jira/browse/HIVE-20813
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-20813.patch
>
>
> Currently the following query will fail with a cast exception (tries to cast 
> timestamp to timestamp with local timezone).
> {code}
>  select to_epoch_milli(current_timestamp)
> {code}
> As a simple fix we need to add support for timestamp object inspector.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key

2018-10-25 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664417#comment-16664417
 ] 

Gopal V commented on HIVE-20419:


No, right now it just duplicates the desc for each partition and makes the plan 
object bigger than it should be.

> Vectorization: Prevent mutation of VectorPartitionDesc after being used in a 
> hashmap key
> 
>
> Key: HIVE-20419
> URL: https://issues.apache.org/jira/browse/HIVE-20419
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Priority: Major
>
> This is going into the loop because the VectorPartitionDesc is modified after 
> it is used in the HashMap key - resulting in a hashcode & equals modification 
> after it has been placed in the hashmap.
> {code}
> HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: 
> 621ms
> java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869  <7 
> recursive calls>
> java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, 
> Object) HashMap.java:1989
> java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 
> HashMap.java:637
> java.util.HashMap.put(Object, Object) HashMap.java:611
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc,
>  VectorPartitionDesc, Map) Vectorizer.java:1272
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc,
>  boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork,
>  String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) 
> Vectorizer.java:1654
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork,
>  Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork,
>  boolean) Vectorizer.java:1109
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node,
>  Stack, Object[]) Vectorizer.java:961
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, 
> TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) 
> TaskGraphWalker.java:180
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, 
> HashMap) TaskGraphWalker.java:125
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext)
>  Vectorizer.java:2442
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, 
> ParseContext, Context) TezCompiler.java:717
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, 
> HashSet, HashSet) TaskCompiler.java:258
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, 
> SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) 
> CalcitePlanner.java:358
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key

2018-10-25 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664417#comment-16664417
 ] 

Gopal V edited comment on HIVE-20419 at 10/25/18 11:24 PM:
---

No, right now it just duplicates the desc for each partition and makes the plan 
object bigger than it should be.

And takes like 300ms of compile time.


was (Author: gopalv):
No, right now it just duplicates the desc for each partition and makes the plan 
object bigger than it should be.

> Vectorization: Prevent mutation of VectorPartitionDesc after being used in a 
> hashmap key
> 
>
> Key: HIVE-20419
> URL: https://issues.apache.org/jira/browse/HIVE-20419
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Priority: Major
>
> This is going into the loop because the VectorPartitionDesc is modified after 
> it is used in the HashMap key - resulting in a hashcode & equals modification 
> after it has been placed in the hashmap.
> {code}
> HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: 
> 621ms
> java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869  <7 
> recursive calls>
> java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, 
> Object) HashMap.java:1989
> java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 
> HashMap.java:637
> java.util.HashMap.put(Object, Object) HashMap.java:611
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc,
>  VectorPartitionDesc, Map) Vectorizer.java:1272
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc,
>  boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork,
>  String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) 
> Vectorizer.java:1654
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork,
>  Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork,
>  boolean) Vectorizer.java:1109
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node,
>  Stack, Object[]) Vectorizer.java:961
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, 
> TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) 
> TaskGraphWalker.java:180
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, 
> HashMap) TaskGraphWalker.java:125
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext)
>  Vectorizer.java:2442
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, 
> ParseContext, Context) TezCompiler.java:717
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, 
> HashSet, HashSet) TaskCompiler.java:258
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, 
> SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) 
> CalcitePlanner.java:358
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20813) udf to_epoch_milli need to support timestamp without time zone as well

2018-10-25 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664416#comment-16664416
 ] 

slim bouguerra commented on HIVE-20813:
---

[~jcamachorodriguez] can you please check this.

> udf to_epoch_milli need to support timestamp without time zone as well
> --
>
> Key: HIVE-20813
> URL: https://issues.apache.org/jira/browse/HIVE-20813
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-20813.patch
>
>
> Currently the following query will fail with a cast exception (tries to cast 
> timestamp to timestamp with local timezone).
> {code}
>  select to_epoch_milli(current_timestamp)
> {code}
> As a simple fix we need to add support for timestamp object inspector.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20813) udf to_epoch_milli need to support timestamp without time zone as well

2018-10-25 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-20813:
--
Status: Patch Available  (was: Open)

> udf to_epoch_milli need to support timestamp without time zone as well
> --
>
> Key: HIVE-20813
> URL: https://issues.apache.org/jira/browse/HIVE-20813
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-20813.patch
>
>
> Currently the following query will fail with a cast exception (tries to cast 
> timestamp to timestamp with local timezone).
> {code}
>  select to_epoch_milli(current_timestamp)
> {code}
> As a simple fix we need to add support for timestamp object inspector.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20813) udf to_epoch_milli need to support timestamp without time zone as well

2018-10-25 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-20813:
--
Attachment: HIVE-20813.patch

> udf to_epoch_milli need to support timestamp without time zone as well
> --
>
> Key: HIVE-20813
> URL: https://issues.apache.org/jira/browse/HIVE-20813
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
> Attachments: HIVE-20813.patch
>
>
> Currently the following query will fail with a cast exception (tries to cast 
> timestamp to timestamp with local timezone).
> {code}
>  select to_epoch_milli(current_timestamp)
> {code}
> As a simple fix we need to add support for timestamp object inspector.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key

2018-10-25 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664409#comment-16664409
 ] 

Sergey Shelukhin commented on HIVE-20419:
-

Hmm. Cannot this also cause incorrect results? If the key is not found later.

> Vectorization: Prevent mutation of VectorPartitionDesc after being used in a 
> hashmap key
> 
>
> Key: HIVE-20419
> URL: https://issues.apache.org/jira/browse/HIVE-20419
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Priority: Major
>
> This is going into the loop because the VectorPartitionDesc is modified after 
> it is used in the HashMap key - resulting in a hashcode & equals modification 
> after it has been placed in the hashmap.
> {code}
> HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: 
> 621ms
> java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869  <7 
> recursive calls>
> java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, 
> Object) HashMap.java:1989
> java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 
> HashMap.java:637
> java.util.HashMap.put(Object, Object) HashMap.java:611
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc,
>  VectorPartitionDesc, Map) Vectorizer.java:1272
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc,
>  boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork,
>  String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) 
> Vectorizer.java:1654
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork,
>  Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork,
>  boolean) Vectorizer.java:1109
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node,
>  Stack, Object[]) Vectorizer.java:961
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, 
> TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) 
> TaskGraphWalker.java:180
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, 
> HashMap) TaskGraphWalker.java:125
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext)
>  Vectorizer.java:2442
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, 
> ParseContext, Context) TezCompiler.java:717
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, 
> HashSet, HashSet) TaskCompiler.java:258
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, 
> SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) 
> CalcitePlanner.java:358
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key

2018-10-25 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664410#comment-16664410
 ] 

Sergey Shelukhin commented on HIVE-20419:
-

cc [~teddy.choi]

> Vectorization: Prevent mutation of VectorPartitionDesc after being used in a 
> hashmap key
> 
>
> Key: HIVE-20419
> URL: https://issues.apache.org/jira/browse/HIVE-20419
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Priority: Major
>
> This is going into the loop because the VectorPartitionDesc is modified after 
> it is used in the HashMap key - resulting in a hashcode & equals modification 
> after it has been placed in the hashmap.
> {code}
> HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: 
> 621ms
> java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869  <7 
> recursive calls>
> java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, 
> Object) HashMap.java:1989
> java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 
> HashMap.java:637
> java.util.HashMap.put(Object, Object) HashMap.java:611
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc,
>  VectorPartitionDesc, Map) Vectorizer.java:1272
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc,
>  boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork,
>  String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) 
> Vectorizer.java:1654
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork,
>  Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork,
>  boolean) Vectorizer.java:1109
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node,
>  Stack, Object[]) Vectorizer.java:961
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, 
> TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) 
> TaskGraphWalker.java:180
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, 
> HashMap) TaskGraphWalker.java:125
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext)
>  Vectorizer.java:2442
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, 
> ParseContext, Context) TezCompiler.java:717
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, 
> HashSet, HashSet) TaskCompiler.java:258
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, 
> SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) 
> CalcitePlanner.java:358
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-20813) udf to_epoch_milli need to support timestamp without time zone as well

2018-10-25 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra reassigned HIVE-20813:
-


> udf to_epoch_milli need to support timestamp without time zone as well
> --
>
> Key: HIVE-20813
> URL: https://issues.apache.org/jira/browse/HIVE-20813
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>
> Currently the following query will fail with a cast exception (tries to cast 
> timestamp to timestamp with local timezone).
> {code}
>  select to_epoch_milli(current_timestamp)
> {code}
> As a simple fix we need to add support for timestamp object inspector.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20811) Turn on dynamic partitioned hash join

2018-10-25 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-20811:
---
Description: 
Currently it is off by default.

Turning if ON by default will help uncover and fix issues if there are any.

  was:
Currently it is off by default.

Turning if ON by default will help fix correctness and other issues.


> Turn on dynamic partitioned hash join
> -
>
> Key: HIVE-20811
> URL: https://issues.apache.org/jira/browse/HIVE-20811
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-20811.1.patch
>
>
> Currently it is off by default.
> Turning if ON by default will help uncover and fix issues if there are any.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20811) Turn on dynamic partitioned hash join

2018-10-25 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664374#comment-16664374
 ] 

Vineet Garg commented on HIVE-20811:


[~sershe] I have retracted my description about correctness, I thought I saw an 
issue where dynamic hash join produces wrong result but I probably confused it 
with some other issue.

Anyhow I believe it should be turned on by default since customers have been 
using it and by keeping it ON for all tests would help uncover issues (e.g. 
HIVE-20514).

> Turn on dynamic partitioned hash join
> -
>
> Key: HIVE-20811
> URL: https://issues.apache.org/jira/browse/HIVE-20811
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-20811.1.patch
>
>
> Currently it is off by default.
> Turning if ON by default will help fix correctness and other issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20812) Update jetty dependency to 9.3.25.v20180904

2018-10-25 Thread Nishant Bangarwa (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664361#comment-16664361
 ] 

Nishant Bangarwa commented on HIVE-20812:
-

:+1

> Update jetty dependency to 9.3.25.v20180904
> ---
>
> Key: HIVE-20812
> URL: https://issues.apache.org/jira/browse/HIVE-20812
> Project: Hive
>  Issue Type: Task
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>Priority: Critical
> Attachments: HIVE-20812.1.patch
>
>
> The jetty version 9.3.20.v20170531 being used currently in master has several 
> CVE associated with it.
> Version 9.3.25.v20180904 has those issues resolved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20811) Turn on dynamic partitioned hash join

2018-10-25 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664346#comment-16664346
 ] 

Sergey Shelukhin commented on HIVE-20811:
-

Hmm... what correctness issues are those? If without it there can be 
correctness issues should we remove the config and have it always on?

> Turn on dynamic partitioned hash join
> -
>
> Key: HIVE-20811
> URL: https://issues.apache.org/jira/browse/HIVE-20811
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-20811.1.patch
>
>
> Currently it is off by default.
> Turning if ON by default will help fix correctness and other issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20812) Update jetty dependency to 9.3.25.v20180904

2018-10-25 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-20812:
-
Status: Patch Available  (was: Open)

> Update jetty dependency to 9.3.25.v20180904
> ---
>
> Key: HIVE-20812
> URL: https://issues.apache.org/jira/browse/HIVE-20812
> Project: Hive
>  Issue Type: Task
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>Priority: Critical
> Attachments: HIVE-20812.1.patch
>
>
> The jetty version 9.3.20.v20170531 being used currently in master has several 
> CVE associated with it.
> Version 9.3.25.v20180904 has those issues resolved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20812) Update jetty dependency to 9.3.25.v20180904

2018-10-25 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-20812:
-
Attachment: HIVE-20812.1.patch

> Update jetty dependency to 9.3.25.v20180904
> ---
>
> Key: HIVE-20812
> URL: https://issues.apache.org/jira/browse/HIVE-20812
> Project: Hive
>  Issue Type: Task
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>Priority: Critical
> Attachments: HIVE-20812.1.patch
>
>
> The jetty version 9.3.20.v20170531 being used currently in master has several 
> CVE associated with it.
> Version 9.3.25.v20180904 has those issues resolved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20812) Update jetty dependency to 9.3.25.v20180904

2018-10-25 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-20812:
-
Priority: Critical  (was: Major)

> Update jetty dependency to 9.3.25.v20180904
> ---
>
> Key: HIVE-20812
> URL: https://issues.apache.org/jira/browse/HIVE-20812
> Project: Hive
>  Issue Type: Task
>Reporter: Thejas M Nair
>Priority: Critical
>
> The jetty version 9.3.20.v20170531 being used currently in master has several 
> CVE associated with it.
> Version 9.3.25.v20180904 has those issues resolved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20812) Update jetty dependency to 9.3.25.v20180904

2018-10-25 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-20812:
-
Issue Type: Task  (was: Improvement)

> Update jetty dependency to 9.3.25.v20180904
> ---
>
> Key: HIVE-20812
> URL: https://issues.apache.org/jira/browse/HIVE-20812
> Project: Hive
>  Issue Type: Task
>Reporter: Thejas M Nair
>Priority: Major
>
> The jetty version 9.3.20.v20170531 being used currently in master has several 
> CVE associated with it.
> Version 9.3.25.v20180904 has those issues resolved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-20812) Update jetty dependency to 9.3.25.v20180904

2018-10-25 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair reassigned HIVE-20812:


Assignee: Thejas M Nair

> Update jetty dependency to 9.3.25.v20180904
> ---
>
> Key: HIVE-20812
> URL: https://issues.apache.org/jira/browse/HIVE-20812
> Project: Hive
>  Issue Type: Task
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>Priority: Critical
>
> The jetty version 9.3.20.v20170531 being used currently in master has several 
> CVE associated with it.
> Version 9.3.25.v20180904 has those issues resolved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20793) add RP namespacing to workload management

2018-10-25 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-20793:

Attachment: HIVE-20793.01.patch

> add RP namespacing to workload management
> -
>
> Key: HIVE-20793
> URL: https://issues.apache.org/jira/browse/HIVE-20793
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-20793.01.nogen.patch, HIVE-20793.01.patch, 
> HIVE-20793.nogen.patch, HIVE-20793.patch
>
>
> The idea is to be able to use the same warehouse for multiple clusters in the 
> cloud use cases. This scenario is not currently supported by WM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20793) add RP namespacing to workload management

2018-10-25 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-20793:

Attachment: HIVE-20793.01.nogen.patch

> add RP namespacing to workload management
> -
>
> Key: HIVE-20793
> URL: https://issues.apache.org/jira/browse/HIVE-20793
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-20793.01.nogen.patch, HIVE-20793.nogen.patch, 
> HIVE-20793.patch
>
>
> The idea is to be able to use the same warehouse for multiple clusters in the 
> cloud use cases. This scenario is not currently supported by WM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20793) add RP namespacing to workload management

2018-10-25 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664318#comment-16664318
 ] 

Sergey Shelukhin commented on HIVE-20793:
-

Fixed some bugs, updated sysdb views more, fixed the sysdb version in the test 
file. 

> add RP namespacing to workload management
> -
>
> Key: HIVE-20793
> URL: https://issues.apache.org/jira/browse/HIVE-20793
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-20793.01.nogen.patch, HIVE-20793.nogen.patch, 
> HIVE-20793.patch
>
>
> The idea is to be able to use the same warehouse for multiple clusters in the 
> cloud use cases. This scenario is not currently supported by WM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20807) Refactor LlapStatusServiceDriver

2018-10-25 Thread Miklos Gergely (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664269#comment-16664269
 ] 

Miklos Gergely commented on HIVE-20807:
---

This patch is the first step of refactoring the program, now all of it 
components are moved under the package org.apache.hadoop.hive.llap.cli.status, 
all the classes and enums are put into a separate file, the overcomplicated 
parts of the command line parsing are replaced with a more simple structure, 
and the findbugs and checkstyle warnings are fixed.

> Refactor LlapStatusServiceDriver
> 
>
> Key: HIVE-20807
> URL: https://issues.apache.org/jira/browse/HIVE-20807
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20807.01.patch, HIVE-20807.02.patch
>
>
> LlapStatusServiceDriver is the class used to determine if LLAP has started. 
> The following problems should be solved by refactoring:
> 1. The main class is more than 800 lines long,should be cut into multiple 
> smaller classes.
> 2. The current design makes it extremely hard to write unit tests.
> 3. There are some overcomplicated, over-engineered parts of the code.
> 4. Most of the code is under org.apache.hadoop.hive.llap.cli, but some parts 
> are under org.apache.hadoop.hive.llap.cli.status. The whole program could be 
> moved to the latter.
> 5. LlapStatusHelpers serves as a class for holding classes, which doesn't 
> make much sense.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20807) Refactor LlapStatusServiceDriver

2018-10-25 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-20807:
--
Attachment: HIVE-20807.02.patch

> Refactor LlapStatusServiceDriver
> 
>
> Key: HIVE-20807
> URL: https://issues.apache.org/jira/browse/HIVE-20807
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20807.01.patch, HIVE-20807.02.patch
>
>
> LlapStatusServiceDriver is the class used to determine if LLAP has started. 
> The following problems should be solved by refactoring:
> 1. The main class is more than 800 lines long,should be cut into multiple 
> smaller classes.
> 2. The current design makes it extremely hard to write unit tests.
> 3. There are some overcomplicated, over-engineered parts of the code.
> 4. Most of the code is under org.apache.hadoop.hive.llap.cli, but some parts 
> are under org.apache.hadoop.hive.llap.cli.status. The whole program could be 
> moved to the latter.
> 5. LlapStatusHelpers serves as a class for holding classes, which doesn't 
> make much sense.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20807) Refactor LlapStatusServiceDriver

2018-10-25 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-20807:
--
Status: Open  (was: Patch Available)

> Refactor LlapStatusServiceDriver
> 
>
> Key: HIVE-20807
> URL: https://issues.apache.org/jira/browse/HIVE-20807
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20807.01.patch, HIVE-20807.02.patch
>
>
> LlapStatusServiceDriver is the class used to determine if LLAP has started. 
> The following problems should be solved by refactoring:
> 1. The main class is more than 800 lines long,should be cut into multiple 
> smaller classes.
> 2. The current design makes it extremely hard to write unit tests.
> 3. There are some overcomplicated, over-engineered parts of the code.
> 4. Most of the code is under org.apache.hadoop.hive.llap.cli, but some parts 
> are under org.apache.hadoop.hive.llap.cli.status. The whole program could be 
> moved to the latter.
> 5. LlapStatusHelpers serves as a class for holding classes, which doesn't 
> make much sense.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20807) Refactor LlapStatusServiceDriver

2018-10-25 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-20807:
--
Status: Patch Available  (was: Open)

> Refactor LlapStatusServiceDriver
> 
>
> Key: HIVE-20807
> URL: https://issues.apache.org/jira/browse/HIVE-20807
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20807.01.patch, HIVE-20807.02.patch
>
>
> LlapStatusServiceDriver is the class used to determine if LLAP has started. 
> The following problems should be solved by refactoring:
> 1. The main class is more than 800 lines long,should be cut into multiple 
> smaller classes.
> 2. The current design makes it extremely hard to write unit tests.
> 3. There are some overcomplicated, over-engineered parts of the code.
> 4. Most of the code is under org.apache.hadoop.hive.llap.cli, but some parts 
> are under org.apache.hadoop.hive.llap.cli.status. The whole program could be 
> moved to the latter.
> 5. LlapStatusHelpers serves as a class for holding classes, which doesn't 
> make much sense.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20811) Turn on dynamic partitioned hash join

2018-10-25 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-20811:
---
Attachment: HIVE-20811.1.patch

> Turn on dynamic partitioned hash join
> -
>
> Key: HIVE-20811
> URL: https://issues.apache.org/jira/browse/HIVE-20811
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-20811.1.patch
>
>
> Currently it is off by default.
> Turning if ON by default will help fix correctness and other issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20811) Turn on dynamic partitioned hash join

2018-10-25 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-20811:
---
Status: Patch Available  (was: Open)

> Turn on dynamic partitioned hash join
> -
>
> Key: HIVE-20811
> URL: https://issues.apache.org/jira/browse/HIVE-20811
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-20811.1.patch
>
>
> Currently it is off by default.
> Turning if ON by default will help fix correctness and other issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-20811) Turn on dynamic partitioned hash join

2018-10-25 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg reassigned HIVE-20811:
--


> Turn on dynamic partitioned hash join
> -
>
> Key: HIVE-20811
> URL: https://issues.apache.org/jira/browse/HIVE-20811
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-20811.1.patch
>
>
> Currently it is off by default.
> Turning if ON by default will help fix correctness and other issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-14557) Nullpointer When both SkewJoin and Mapjoin Enabled

2018-10-25 Thread Yongzhi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664242#comment-16664242
 ] 

Yongzhi Chen commented on HIVE-14557:
-

[~lirui], are you still working on the jira, could you attach an up-to-date 
patch? Thanks

> Nullpointer When both SkewJoin  and Mapjoin Enabled
> ---
>
> Key: HIVE-14557
> URL: https://issues.apache.org/jira/browse/HIVE-14557
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 1.1.0, 2.1.0
>Reporter: Nemon Lou
>Assignee: Rui Li
>Priority: Major
> Attachments: HIVE-14557.2.patch, HIVE-14557.patch
>
>
> The following sql failed with return code 2 on mr.
> {noformat}
> create table a(id int,id1 int);
> create table b(id int,id1 int);
> create table c(id int,id1 int);
> set hive.optimize.skewjoin=true;
> select a.id,b.id,c.id1 from a,b,c where a.id=b.id and a.id1=c.id1;
> {noformat}
> Error log as follows:
> {noformat}
> 2016-08-17 21:13:42,081 INFO [main] 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper: 
> Id =0
>   
> Id =21
>   
> Id =28
>   
> Id =16
>   
>   <\Children>
>   Id = 28 null<\Parent>
> <\FS>
>   <\Children>
>   Id = 21 nullId = 33 
> Id =33
>   null
>   <\Children>
>   <\Parent>
> <\HASHTABLEDUMMY><\Parent>
> <\MAPJOIN>
>   <\Children>
>   Id = 0 null<\Parent>
> <\TS>
>   <\Children>
>   <\Parent>
> <\MAP>
> 2016-08-17 21:13:42,084 INFO [main] 
> org.apache.hadoop.hive.ql.exec.TableScanOperator: Initializing operator TS[21]
> 2016-08-17 21:13:42,084 INFO [main] 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper: Initializing dummy operator
> 2016-08-17 21:13:42,086 INFO [main] 
> org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0, 
> RECORDS_IN:0, 
> 2016-08-17 21:13:42,087 ERROR [main] 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper: Hit error while closing 
> operators - failing tree
> 2016-08-17 21:13:42,088 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.RuntimeException: Hive Runtime Error 
> while closing operators
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:474)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:682)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:696)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:696)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:189)
>   ... 8 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20304) When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, and the execution engine is mr, same stage may launch twice due to the wrong generated plan

2018-10-25 Thread Yongzhi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664241#comment-16664241
 ] 

Yongzhi Chen commented on HIVE-20304:
-

[~belugabehr], the fix is exact the same as HIVE-14557

> When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, 
> and the execution engine is mr, same stage may launch twice due to the wrong 
> generated plan
> 
>
> Key: HIVE-20304
> URL: https://issues.apache.org/jira/browse/HIVE-20304
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Major
> Fix For: 1.2.1
>
> Attachments: HIVE-20304.1.patch, HIVE-20304.patch
>
>
> `When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, 
> and the execution engine is set to mr, same stage of a query may launch twice 
> due to the wrong generated plan. If hive.exec.parallel is also true, the same 
> stage will launch at the same time and the job will failed due to the first 
> completed stage clear the map.xml/reduce.xml file stored in the hdfs.
> use following sql to reproduce the issue:
> {code:java}
> CREATE TABLE `tbl1`(
>   `fence` string);
> CREATE TABLE `tbl2`(
>   `order_id` string,
>   `phone` string,
>   `search_id` string
> )
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl3`(
>   `order_id` string,
>   `platform` string)
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl4`(
>   `groupname` string,
>   `phone` string)
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl5`(
>   `search_id` string,
>   `fence` string)
> PARTITIONED BY (
>   `dt` string);
> SET hive.exec.parallel = TRUE;
> SET hive.auto.convert.join = TRUE;
> SET hive.optimize.skewjoin = TRUE;
> SELECT dt,
>platform,
>groupname,
>count(1) as cnt
> FROM
> (SELECT dt,
> platform,
> groupname
>  FROM
>  (SELECT fence
>   FROM tbl1)ta
>JOIN
>(SELECT a0.dt,
>a1.platform,
>a2.groupname,
>a3.fence
> FROM
> (SELECT dt,
> order_id,
> phone,
> search_id
>  FROM tbl2
>  WHERE dt =20180703 )a0
>   JOIN
>   (SELECT order_id,
>   platform,
>   dt
>FROM tbl3
>WHERE dt =20180703 )a1 ON a0.order_id = a1.order_id
>   INNER JOIN
>   (SELECT groupname,
>   phone,
>   dt
>FROM tbl4
>WHERE dt =20180703 )a2 ON a0.phone = a2.phone
>   LEFT JOIN
>   (SELECT search_id,
>   fence,
>   dt
>FROM tbl5
>WHERE dt =20180703)a3 ON a0.search_id = a3.search_id)t0 ON 
> ta.fence = t0.fence)t11
> GROUP BY dt,
>  platform,
>  groupname;
> DROP TABLE tbl1;
> DROP TABLE tbl2;
> DROP TABLE tbl3;
> DROP TABLE tbl4;
> DROP TABLE tbl5;
> {code}
> We will get some error message like this:
> Examining task ID: task_1531284442065_3637_m_00 (and more) from job 
> job_1531284442065_3637
> Task with the most failures(4):
> 
> Task ID:
>  task_1531284442065_3637_m_00
> URL:
>  
> [http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1531284442065_3637=task_1531284442065_3637_m_00]
> 
> Diagnostic Messages for this Task:
>  File does not exist: 
> hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml
>  java.io.FileNotFoundException: File does not exist: 
> hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml
> Looking into the plan by executing explain, I found that the Stage-4 and 
> Stage-5 can reached from multi root tasks.
> {code:java}
> Explain
> STAGE DEPENDENCIES:
>   Stage-21 is a root stage , consists of Stage-34, Stage-5
>   Stage-34 has a backup stage: Stage-5
>   Stage-20 depends on stages: Stage-34
>   Stage-17 depends on stages: Stage-5, Stage-18, Stage-20 , consists of 
> Stage-32, Stage-33, Stage-1
>   Stage-32 has a backup stage: Stage-1
>   Stage-15 depends on stages: Stage-32
>   Stage-10 depends on stages: Stage-1, Stage-15, Stage-16 , consists of 
> Stage-31, Stage-2
>   

[jira] [Commented] (HIVE-20250) Option to allow external tables to use query results cache

2018-10-25 Thread Thai Bui (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664233#comment-16664233
 ] 

Thai Bui commented on HIVE-20250:
-

[~jdere] Thanks for the reply and no worries about the time to work on the 
ticket. Currently, I'm hacking Hive branch-3.0 with the least amount of work 
(nothing that should be contributed back upstream) to have the cache + 
materialized views to work on external tables and so far so good.

I will try to follow the suggested approach since that's the simplest and most 
straightforward as well. And thanks for pointing out 
https://issues.apache.org/jira/browse/HIVE-19154, that's what I need as well.

 

> Option to allow external tables to use query results cache
> --
>
> Key: HIVE-20250
> URL: https://issues.apache.org/jira/browse/HIVE-20250
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jason Dere
>Priority: Major
> Attachments: HIVE-20250.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20804) Further improvements to group by optimization with constraints

2018-10-25 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-20804:
---
Status: Patch Available  (was: Open)

> Further improvements to group by optimization with constraints
> --
>
> Key: HIVE-20804
> URL: https://issues.apache.org/jira/browse/HIVE-20804
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-20804.1.patch, HIVE-20804.2.patch
>
>
> Continuation of HIVE-17043



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20804) Further improvements to group by optimization with constraints

2018-10-25 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-20804:
---
Status: Open  (was: Patch Available)

> Further improvements to group by optimization with constraints
> --
>
> Key: HIVE-20804
> URL: https://issues.apache.org/jira/browse/HIVE-20804
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-20804.1.patch, HIVE-20804.2.patch
>
>
> Continuation of HIVE-17043



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20804) Further improvements to group by optimization with constraints

2018-10-25 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-20804:
---
Attachment: HIVE-20804.2.patch

> Further improvements to group by optimization with constraints
> --
>
> Key: HIVE-20804
> URL: https://issues.apache.org/jira/browse/HIVE-20804
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-20804.1.patch, HIVE-20804.2.patch
>
>
> Continuation of HIVE-17043



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20807) Refactor LlapStatusServiceDriver

2018-10-25 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-20807:
--
Status: Patch Available  (was: Open)

> Refactor LlapStatusServiceDriver
> 
>
> Key: HIVE-20807
> URL: https://issues.apache.org/jira/browse/HIVE-20807
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20807.01.patch
>
>
> LlapStatusServiceDriver is the class used to determine if LLAP has started. 
> The following problems should be solved by refactoring:
> 1. The main class is more than 800 lines long,should be cut into multiple 
> smaller classes.
> 2. The current design makes it extremely hard to write unit tests.
> 3. There are some overcomplicated, over-engineered parts of the code.
> 4. Most of the code is under org.apache.hadoop.hive.llap.cli, but some parts 
> are under org.apache.hadoop.hive.llap.cli.status. The whole program could be 
> moved to the latter.
> 5. LlapStatusHelpers serves as a class for holding classes, which doesn't 
> make much sense.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20807) Refactor LlapStatusServiceDriver

2018-10-25 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-20807:
--
Attachment: HIVE-20807.01.patch

> Refactor LlapStatusServiceDriver
> 
>
> Key: HIVE-20807
> URL: https://issues.apache.org/jira/browse/HIVE-20807
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20807.01.patch
>
>
> LlapStatusServiceDriver is the class used to determine if LLAP has started. 
> The following problems should be solved by refactoring:
> 1. The main class is more than 800 lines long,should be cut into multiple 
> smaller classes.
> 2. The current design makes it extremely hard to write unit tests.
> 3. There are some overcomplicated, over-engineered parts of the code.
> 4. Most of the code is under org.apache.hadoop.hive.llap.cli, but some parts 
> are under org.apache.hadoop.hive.llap.cli.status. The whole program could be 
> moved to the latter.
> 5. LlapStatusHelpers serves as a class for holding classes, which doesn't 
> make much sense.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20617) Fix type of constants in IN expressions to have correct type

2018-10-25 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-20617:

Attachment: HIVE-20617.11.patch

> Fix type of constants in IN expressions to have correct type
> 
>
> Key: HIVE-20617
> URL: https://issues.apache.org/jira/browse/HIVE-20617
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-20617.01.patch, HIVE-20617.02.patch, 
> HIVE-20617.03.patch, HIVE-20617.05.patch, HIVE-20617.06.patch, 
> HIVE-20617.07.patch, HIVE-20617.08.patch, HIVE-20617.08.patch, 
> HIVE-20617.08.patch, HIVE-20617.08.patch, HIVE-20617.08.patch, 
> HIVE-20617.08.patch, HIVE-20617.08.patch, HIVE-20617.08.patch, 
> HIVE-20617.09.patch, HIVE-20617.10.patch, HIVE-20617.10.patch, 
> HIVE-20617.11.patch, HIVE-20617.11.patch
>
>
> In statements like {{struct(a,b) IN (const struct('x','y'), ... )}} the 
> comparision in UDFIn may fail because if a or b is of char/varchar type the 
> constants will retain string type - especially after PointlookupOptimizer 
> compaction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20259) Cleanup of results cache directory

2018-10-25 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664135#comment-16664135
 ] 

Jason Dere commented on HIVE-20259:
---

RB at https://reviews.apache.org/r/69173/

> Cleanup of results cache directory
> --
>
> Key: HIVE-20259
> URL: https://issues.apache.org/jira/browse/HIVE-20259
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-20259.1.patch, HIVE-20259.2.patch
>
>
> The query results cache directory is currently deleted at process exit. This 
> does not work in the case of a kill -9 or a sudden process exit of Hive. 
> There should be some cleanup mechanism in place to take care of any old cache 
> directories that were not deleted at process exit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20259) Cleanup of results cache directory

2018-10-25 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664134#comment-16664134
 ] 

Jason Dere commented on HIVE-20259:
---

Re-attaching patch, though actually this had a green run last time.
[~gopalv], can you review this one?

> Cleanup of results cache directory
> --
>
> Key: HIVE-20259
> URL: https://issues.apache.org/jira/browse/HIVE-20259
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-20259.1.patch, HIVE-20259.2.patch
>
>
> The query results cache directory is currently deleted at process exit. This 
> does not work in the case of a kill -9 or a sudden process exit of Hive. 
> There should be some cleanup mechanism in place to take care of any old cache 
> directories that were not deleted at process exit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20259) Cleanup of results cache directory

2018-10-25 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-20259:
--
Attachment: HIVE-20259.2.patch

> Cleanup of results cache directory
> --
>
> Key: HIVE-20259
> URL: https://issues.apache.org/jira/browse/HIVE-20259
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-20259.1.patch, HIVE-20259.2.patch
>
>
> The query results cache directory is currently deleted at process exit. This 
> does not work in the case of a kill -9 or a sudden process exit of Hive. 
> There should be some cleanup mechanism in place to take care of any old cache 
> directories that were not deleted at process exit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-20250) Option to allow external tables to use query results cache

2018-10-25 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664120#comment-16664120
 ] 

Jason Dere edited comment on HIVE-20250 at 10/25/18 6:27 PM:
-

Hey [~thai.bui], sorry didn't get around to working on this one.

How about going with the approach outlined 
[here|https://issues.apache.org/jira/browse/HIVE-20250?focusedCommentId=16565715=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16565715].
 That would enable your case with the timeout approach. Feel free to try to 
take up, or I will eventually get around to it when I get a chance.

Regarding the ability to receive notifications to invalidate the cache, there 
is also HIVE-19154, where HS2 is able to listen on metastore notification 
events to invalidate result cache entries. If you are able to generate a basic 
ALTER TABLE or ALTER PARTITION statement on the table which results in the 
metastore generating the notification event, this could suit your purposes. 
Something like ALTER TABLE TOUCH might be useful here.

 


was (Author: jdere):
Hey [~thai.bui], sorry didn't get around to working on this one.

How about going with the approach outlined 
[here|https://issues.apache.org/jira/browse/HIVE-20250?focusedCommentId=16565715=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16565715].
 That would enable your case with the timeout approach. Feel free to try to 
take up, or I will eventually get around to it when I get a chance.

Regarding the ability to receive notifications to invalidate the cache, there 
is also HIVE-19154, where HS2 is able to listen on metastore notification 
events to invalidate result cache entries. If you are able to generate a basic 
ALTER TABLE or ALTER PARTITION statement on the table which results in the 
metastore generating the notification event, this could suit your purposes.

 

> Option to allow external tables to use query results cache
> --
>
> Key: HIVE-20250
> URL: https://issues.apache.org/jira/browse/HIVE-20250
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jason Dere
>Priority: Major
> Attachments: HIVE-20250.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20250) Option to allow external tables to use query results cache

2018-10-25 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664120#comment-16664120
 ] 

Jason Dere commented on HIVE-20250:
---

Hey [~thai.bui], sorry didn't get around to working on this one.

How about going with the approach outlined 
[here|https://issues.apache.org/jira/browse/HIVE-20250?focusedCommentId=16565715=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16565715].
 That would enable your case with the timeout approach. Feel free to try to 
take up, or I will eventually get around to it when I get a chance.

Regarding the ability to receive notifications to invalidate the cache, there 
is also HIVE-19154, where HS2 is able to listen on metastore notification 
events to invalidate result cache entries. If you are able to generate a basic 
ALTER TABLE or ALTER PARTITION statement on the table which results in the 
metastore generating the notification event, this could suit your purposes.

 

> Option to allow external tables to use query results cache
> --
>
> Key: HIVE-20250
> URL: https://issues.apache.org/jira/browse/HIVE-20250
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jason Dere
>Priority: Major
> Attachments: HIVE-20250.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20486) Kafka: Use Row SerDe + vectorization

2018-10-25 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664118#comment-16664118
 ] 

slim bouguerra commented on HIVE-20486:
---

[~gopalv] i reverted the extra refactoring.

> Kafka: Use Row SerDe + vectorization
> 
>
> Key: HIVE-20486
> URL: https://issues.apache.org/jira/browse/HIVE-20486
> Project: Hive
>  Issue Type: Improvement
>  Components: kafka integration
>Reporter: Gopal V
>Assignee: slim bouguerra
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20486.3.patch, HIVE-20486.patch
>
>
> KafkaHandler returns unvectorized rows which causes the operators downstream 
> to be slower and sub-optimal.
> Hive has a vectorization shim which allows Kafka streams without complex 
> projections to be wrapped into a vectorized reader via 
> {{hive.vectorized.use.row.serde.deserialize}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20486) Kafka: Use Row SerDe + vectorization

2018-10-25 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-20486:
--
Attachment: HIVE-20486.3.patch

> Kafka: Use Row SerDe + vectorization
> 
>
> Key: HIVE-20486
> URL: https://issues.apache.org/jira/browse/HIVE-20486
> Project: Hive
>  Issue Type: Improvement
>  Components: kafka integration
>Reporter: Gopal V
>Assignee: slim bouguerra
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20486.3.patch, HIVE-20486.patch
>
>
> KafkaHandler returns unvectorized rows which causes the operators downstream 
> to be slower and sub-optimal.
> Hive has a vectorization shim which allows Kafka streams without complex 
> projections to be wrapped into a vectorized reader via 
> {{hive.vectorized.use.row.serde.deserialize}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20486) Kafka: Use Row SerDe + vectorization

2018-10-25 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-20486:
--
Attachment: (was: HIVE-20486.2.patch)

> Kafka: Use Row SerDe + vectorization
> 
>
> Key: HIVE-20486
> URL: https://issues.apache.org/jira/browse/HIVE-20486
> Project: Hive
>  Issue Type: Improvement
>  Components: kafka integration
>Reporter: Gopal V
>Assignee: slim bouguerra
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20486.3.patch, HIVE-20486.patch
>
>
> KafkaHandler returns unvectorized rows which causes the operators downstream 
> to be slower and sub-optimal.
> Hive has a vectorization shim which allows Kafka streams without complex 
> projections to be wrapped into a vectorized reader via 
> {{hive.vectorized.use.row.serde.deserialize}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20486) Kafka: Use Row SerDe + vectorization

2018-10-25 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-20486:
--
Attachment: HIVE-20486.2.patch

> Kafka: Use Row SerDe + vectorization
> 
>
> Key: HIVE-20486
> URL: https://issues.apache.org/jira/browse/HIVE-20486
> Project: Hive
>  Issue Type: Improvement
>  Components: kafka integration
>Reporter: Gopal V
>Assignee: slim bouguerra
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20486.2.patch, HIVE-20486.patch
>
>
> KafkaHandler returns unvectorized rows which causes the operators downstream 
> to be slower and sub-optimal.
> Hive has a vectorization shim which allows Kafka streams without complex 
> projections to be wrapped into a vectorized reader via 
> {{hive.vectorized.use.row.serde.deserialize}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16839) Unbalanced calls to openTransaction/commitTransaction when alter the same partition concurrently

2018-10-25 Thread Guang Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664114#comment-16664114
 ] 

Guang Yang commented on HIVE-16839:
---

We have seen the similar issue running Hive 0.13, opened a PR: 
https://github.com/apache/hive/pull/453

> Unbalanced calls to openTransaction/commitTransaction when alter the same 
> partition concurrently
> 
>
> Key: HIVE-16839
> URL: https://issues.apache.org/jira/browse/HIVE-16839
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Nemon Lou
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> SQL to reproduce:
> prepare:
> {noformat}
>  hdfs dfs -mkdir -p 
> /hzsrc/external/writing_dc/ltgsm/16e7a9b2-21a1-3f4f-8061-bc3395281627
>  1,create external table tb_ltgsm_external (id int) PARTITIONED by (cp 
> string,ld string);
> {noformat}
> open one beeline run these two sql many times 
> {noformat} 2,ALTER TABLE tb_ltgsm_external ADD IF NOT EXISTS PARTITION 
> (cp=2017060513,ld=2017060610);
>  3,ALTER TABLE tb_ltgsm_external PARTITION (cp=2017060513,ld=2017060610) SET 
> LOCATION 
> 'hdfs://hacluster/hzsrc/external/writing_dc/ltgsm/16e7a9b2-21a1-3f4f-8061-bc3395281627';
> {noformat}
> open another beeline to run this sql many times at the same time.
> {noformat}
>  4,ALTER TABLE tb_ltgsm_external DROP PARTITION (cp=2017060513,ld=2017060610);
> {noformat}
> MetaStore logs:
> {noformat}
> 2017-06-06 21:58:34,213 | ERROR | pool-6-thread-197 | Retrying HMSHandler 
> after 2000 ms (attempt 1 of 10) with error: 
> javax.jdo.JDOObjectNotFoundException: No such database row
> FailedObject:49[OID]org.apache.hadoop.hive.metastore.model.MStorageDescriptor
>   at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:475)
>   at 
> org.datanucleus.api.jdo.JDOAdapter.getApiExceptionForNucleusException(JDOAdapter.java:1158)
>   at 
> org.datanucleus.state.JDOStateManager.isLoaded(JDOStateManager.java:3231)
>   at 
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoGetcd(MStorageDescriptor.java)
>   at 
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.getCD(MStorageDescriptor.java:184)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1282)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1299)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToPart(ObjectStore.java:1680)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:1586)
>   at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
>   at com.sun.proxy.$Proxy0.getPartition(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:538)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions(HiveMetaStore.java:3317)
>   at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
>   at com.sun.proxy.$Proxy12.alter_partitions(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9963)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9947)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> 

[jira] [Updated] (HIVE-18778) Needs to capture input/output entities in explain

2018-10-25 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-18778:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Pushed to branch-2/branch-2.3.

> Needs to capture input/output entities in explain
> -
>
> Key: HIVE-18778
> URL: https://issues.apache.org/jira/browse/HIVE-18778
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Fix For: 4.0.0, 3.2.0, 3.1.1, 2.3.4
>
> Attachments: HIVE-18778-SparkPositive.patch, 
> HIVE-18778.1.branch-2.patch, HIVE-18778.1.patch, 
> HIVE-18778.10.branch-3.patch, HIVE-18778.11.branch-3.1.patch, 
> HIVE-18778.11.branch-3.patch, HIVE-18778.12.branch-3.1.patch, 
> HIVE-18778.2.branch-2.patch, HIVE-18778.2.patch, HIVE-18778.3.branch-2.patch, 
> HIVE-18778.3.patch, HIVE-18778.4.patch, HIVE-18778.5.patch, 
> HIVE-18778.6.patch, HIVE-18778.7.patch, HIVE-18778.8.patch, 
> HIVE-18778.9.branch-3.patch, HIVE-18778.9.patch, 
> HIVE-18778_TestCliDriver.patch, HIVE-18788_SparkNegative.patch, 
> HIVE-18788_SparkPerf.patch
>
>
> With Sentry enabled, commands like explain drop table foo fail with {{explain 
> drop table foo;}}
> {code}
> Error: Error while compiling statement: FAILED: SemanticException No valid 
> privileges
>  Required privilege( Table) not available in input privileges
>  The required privileges: (state=42000,code=4)
> {code}
> Sentry fails to authorize because the ExplainSemanticAnalyzer uses an 
> instance of DDLSemanticAnalyzer to analyze the explain query.
> {code}
> BaseSemanticAnalyzer sem = SemanticAnalyzerFactory.get(conf, input);
> sem.analyze(input, ctx);
> sem.validate()
> {code}
> The inputs/outputs entities for this query are set in the above code. 
> However, these are never set on the instance of ExplainSemanticAnalyzer 
> itself and thus is not propagated into the HookContext in the calling Driver 
> code.
> {code}
> sem.analyze(tree, ctx); --> this results in calling the above code that uses 
> DDLSA
> hookCtx.update(sem); --> sem is an instance of ExplainSemanticAnalyzer, this 
> code attempts to update the HookContext with the input/output info from ESA 
> which is never set.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18778) Needs to capture input/output entities in explain

2018-10-25 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664095#comment-16664095
 ] 

Hive QA commented on HIVE-18778:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12945558/HIVE-18778.3.branch-2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10705 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=227)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_tableproperty_optimize]
 (batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explaindenpendencydiffengs]
 (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[table_nonprintable]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_parquet_types]
 (batchId=155)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[merge_negative_5]
 (batchId=88)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[explaindenpendencydiffengs]
 (batchId=115)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_input_format_excludes]
 (batchId=117)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] 
(batchId=125)
org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=176)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/14640/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/14640/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-14640/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12945558 - PreCommit-HIVE-Build

> Needs to capture input/output entities in explain
> -
>
> Key: HIVE-18778
> URL: https://issues.apache.org/jira/browse/HIVE-18778
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Fix For: 4.0.0, 3.2.0, 3.1.1, 2.3.4
>
> Attachments: HIVE-18778-SparkPositive.patch, 
> HIVE-18778.1.branch-2.patch, HIVE-18778.1.patch, 
> HIVE-18778.10.branch-3.patch, HIVE-18778.11.branch-3.1.patch, 
> HIVE-18778.11.branch-3.patch, HIVE-18778.12.branch-3.1.patch, 
> HIVE-18778.2.branch-2.patch, HIVE-18778.2.patch, HIVE-18778.3.branch-2.patch, 
> HIVE-18778.3.patch, HIVE-18778.4.patch, HIVE-18778.5.patch, 
> HIVE-18778.6.patch, HIVE-18778.7.patch, HIVE-18778.8.patch, 
> HIVE-18778.9.branch-3.patch, HIVE-18778.9.patch, 
> HIVE-18778_TestCliDriver.patch, HIVE-18788_SparkNegative.patch, 
> HIVE-18788_SparkPerf.patch
>
>
> With Sentry enabled, commands like explain drop table foo fail with {{explain 
> drop table foo;}}
> {code}
> Error: Error while compiling statement: FAILED: SemanticException No valid 
> privileges
>  Required privilege( Table) not available in input privileges
>  The required privileges: (state=42000,code=4)
> {code}
> Sentry fails to authorize because the ExplainSemanticAnalyzer uses an 
> instance of DDLSemanticAnalyzer to analyze the explain query.
> {code}
> BaseSemanticAnalyzer sem = SemanticAnalyzerFactory.get(conf, input);
> sem.analyze(input, ctx);
> sem.validate()
> {code}
> The inputs/outputs entities for this query are set in the above code. 
> However, these are never set on the instance of ExplainSemanticAnalyzer 
> itself and thus is not propagated into the HookContext in the calling Driver 
> code.
> {code}
> sem.analyze(tree, ctx); --> this results in calling the above code that uses 
> DDLSA
> hookCtx.update(sem); --> sem is an instance of ExplainSemanticAnalyzer, this 
> code attempts to update the HookContext with the input/output info from ESA 
> which is never set.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20486) Kafka: Use Row SerDe + vectorization

2018-10-25 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664069#comment-16664069
 ] 

slim bouguerra commented on HIVE-20486:
---

[~gopalv] please look at https://issues.apache.org/jira/browse/HIVE-20782 seems 
that this code is dead and not used at all which makes APIs reading very hard.


> Kafka: Use Row SerDe + vectorization
> 
>
> Key: HIVE-20486
> URL: https://issues.apache.org/jira/browse/HIVE-20486
> Project: Hive
>  Issue Type: Improvement
>  Components: kafka integration
>Reporter: Gopal V
>Assignee: slim bouguerra
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20486.patch
>
>
> KafkaHandler returns unvectorized rows which causes the operators downstream 
> to be slower and sub-optimal.
> Hive has a vectorization shim which allows Kafka streams without complex 
> projections to be wrapped into a vectorized reader via 
> {{hive.vectorized.use.row.serde.deserialize}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-14516) OrcInputFormat.SplitGenerator.callInternal() can be optimized

2018-10-25 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664066#comment-16664066
 ] 

Eugene Koifman commented on HIVE-14516:
---

+1

> OrcInputFormat.SplitGenerator.callInternal() can be optimized
> -
>
> Key: HIVE-14516
> URL: https://issues.apache.org/jira/browse/HIVE-14516
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Igor Kryvenko
>Priority: Major
> Attachments: HIVE-14516.01.patch
>
>
> callIntenal() has 
> // We can't eliminate stripes if there are deltas because the
> // deltas may change the rows making them match the predicate.
> but in Acid 2.0, the deltas only have delete events thus eliminating stripes 
> from  "base" of split should be safe.
> cc [~gopalv]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20792) Inserting timestamp with zones truncates the data

2018-10-25 Thread Jaume M (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaume M updated HIVE-20792:
---
Status: Open  (was: Patch Available)

> Inserting timestamp with zones truncates the data
> -
>
> Key: HIVE-20792
> URL: https://issues.apache.org/jira/browse/HIVE-20792
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Major
> Attachments: HIVE-20792.1.patch, HIVE-20792.2.patch, 
> HIVE-20792.3.patch
>
>
> For example with the table:
> {code}
> CREATE TABLE myTable
> (
> a TIMESTAMP
> )
> STORED AS ORC
> tblproperties("transactional"="true");
> {code}
> The following inserts store the wrong data:
> {code}
> INSERT INTO myTable VALUES("2018-10-19 10:35:00 UTC"); -> 2018-10-19 
> 00:00:00.0
> INSERT INTO myTable VALUES("2018-10-19 10:35:00 ZZZ"); -> 2018-10-19 
> 00:00:00.0
> {code}
> The second one should fail since ZZZ is not a time zone.
> Similarly if the column is of type DATE,
> {code}
> INSERT INTO myTableDate VALUES("2018-10-19 "); -> 2018-10-19
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20792) Inserting timestamp with zones truncates the data

2018-10-25 Thread Jaume M (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaume M updated HIVE-20792:
---
Attachment: HIVE-20792.3.patch
Status: Patch Available  (was: Open)

> Inserting timestamp with zones truncates the data
> -
>
> Key: HIVE-20792
> URL: https://issues.apache.org/jira/browse/HIVE-20792
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Major
> Attachments: HIVE-20792.1.patch, HIVE-20792.2.patch, 
> HIVE-20792.3.patch
>
>
> For example with the table:
> {code}
> CREATE TABLE myTable
> (
> a TIMESTAMP
> )
> STORED AS ORC
> tblproperties("transactional"="true");
> {code}
> The following inserts store the wrong data:
> {code}
> INSERT INTO myTable VALUES("2018-10-19 10:35:00 UTC"); -> 2018-10-19 
> 00:00:00.0
> INSERT INTO myTable VALUES("2018-10-19 10:35:00 ZZZ"); -> 2018-10-19 
> 00:00:00.0
> {code}
> The second one should fail since ZZZ is not a time zone.
> Similarly if the column is of type DATE,
> {code}
> INSERT INTO myTableDate VALUES("2018-10-19 "); -> 2018-10-19
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20486) Kafka: Use Row SerDe + vectorization

2018-10-25 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664008#comment-16664008
 ] 

Gopal V commented on HIVE-20486:


The ORC stuff's been modified and interfaces removed from the code.

I'll take a closer look, but looks like you've disabled the ability to write a 
VectorFileSinkOperator for ORC.

> Kafka: Use Row SerDe + vectorization
> 
>
> Key: HIVE-20486
> URL: https://issues.apache.org/jira/browse/HIVE-20486
> Project: Hive
>  Issue Type: Improvement
>  Components: kafka integration
>Reporter: Gopal V
>Assignee: slim bouguerra
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20486.patch
>
>
> KafkaHandler returns unvectorized rows which causes the operators downstream 
> to be slower and sub-optimal.
> Hive has a vectorization shim which allows Kafka streams without complex 
> projections to be wrapped into a vectorized reader via 
> {{hive.vectorized.use.row.serde.deserialize}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20486) Kafka: Use Row SerDe + vectorization

2018-10-25 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664001#comment-16664001
 ] 

slim bouguerra commented on HIVE-20486:
---

[~teddy.choi]/[~gopalv] can you please take a look, my knowledge of vectorize 
stuff is very limited.


> Kafka: Use Row SerDe + vectorization
> 
>
> Key: HIVE-20486
> URL: https://issues.apache.org/jira/browse/HIVE-20486
> Project: Hive
>  Issue Type: Improvement
>  Components: kafka integration
>Reporter: Gopal V
>Assignee: slim bouguerra
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20486.patch
>
>
> KafkaHandler returns unvectorized rows which causes the operators downstream 
> to be slower and sub-optimal.
> Hive has a vectorization shim which allows Kafka streams without complex 
> projections to be wrapped into a vectorized reader via 
> {{hive.vectorized.use.row.serde.deserialize}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20486) Kafka: Use Row SerDe + vectorization

2018-10-25 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-20486:
--
Attachment: HIVE-20486.patch

> Kafka: Use Row SerDe + vectorization
> 
>
> Key: HIVE-20486
> URL: https://issues.apache.org/jira/browse/HIVE-20486
> Project: Hive
>  Issue Type: Improvement
>  Components: kafka integration
>Reporter: Gopal V
>Assignee: slim bouguerra
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20486.patch
>
>
> KafkaHandler returns unvectorized rows which causes the operators downstream 
> to be slower and sub-optimal.
> Hive has a vectorization shim which allows Kafka streams without complex 
> projections to be wrapped into a vectorized reader via 
> {{hive.vectorized.use.row.serde.deserialize}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20486) Kafka: Use Row SerDe + vectorization

2018-10-25 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-20486:
--
Status: Patch Available  (was: In Progress)

> Kafka: Use Row SerDe + vectorization
> 
>
> Key: HIVE-20486
> URL: https://issues.apache.org/jira/browse/HIVE-20486
> Project: Hive
>  Issue Type: Improvement
>  Components: kafka integration
>Reporter: Gopal V
>Assignee: slim bouguerra
>Priority: Major
>
> KafkaHandler returns unvectorized rows which causes the operators downstream 
> to be slower and sub-optimal.
> Hive has a vectorization shim which allows Kafka streams without complex 
> projections to be wrapped into a vectorized reader via 
> {{hive.vectorized.use.row.serde.deserialize}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20486) Kafka: Use Row SerDe + vectorization

2018-10-25 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-20486:
--
Component/s: kafka integration

> Kafka: Use Row SerDe + vectorization
> 
>
> Key: HIVE-20486
> URL: https://issues.apache.org/jira/browse/HIVE-20486
> Project: Hive
>  Issue Type: Improvement
>  Components: kafka integration
>Reporter: Gopal V
>Assignee: slim bouguerra
>Priority: Major
> Fix For: 4.0.0
>
>
> KafkaHandler returns unvectorized rows which causes the operators downstream 
> to be slower and sub-optimal.
> Hive has a vectorization shim which allows Kafka streams without complex 
> projections to be wrapped into a vectorized reader via 
> {{hive.vectorized.use.row.serde.deserialize}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HIVE-20486) Kafka: Use Row SerDe + vectorization

2018-10-25 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-20486 started by slim bouguerra.
-
> Kafka: Use Row SerDe + vectorization
> 
>
> Key: HIVE-20486
> URL: https://issues.apache.org/jira/browse/HIVE-20486
> Project: Hive
>  Issue Type: Improvement
>  Components: kafka integration
>Reporter: Gopal V
>Assignee: slim bouguerra
>Priority: Major
> Fix For: 4.0.0
>
>
> KafkaHandler returns unvectorized rows which causes the operators downstream 
> to be slower and sub-optimal.
> Hive has a vectorization shim which allows Kafka streams without complex 
> projections to be wrapped into a vectorized reader via 
> {{hive.vectorized.use.row.serde.deserialize}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HIVE-20486) Kafka: Use Row SerDe + vectorization

2018-10-25 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-20486 started by slim bouguerra.
-
> Kafka: Use Row SerDe + vectorization
> 
>
> Key: HIVE-20486
> URL: https://issues.apache.org/jira/browse/HIVE-20486
> Project: Hive
>  Issue Type: Improvement
>  Components: kafka integration
>Reporter: Gopal V
>Assignee: slim bouguerra
>Priority: Major
> Fix For: 4.0.0
>
>
> KafkaHandler returns unvectorized rows which causes the operators downstream 
> to be slower and sub-optimal.
> Hive has a vectorization shim which allows Kafka streams without complex 
> projections to be wrapped into a vectorized reader via 
> {{hive.vectorized.use.row.serde.deserialize}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20486) Kafka: Use Row SerDe + vectorization

2018-10-25 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-20486:
--
Fix Version/s: 4.0.0

> Kafka: Use Row SerDe + vectorization
> 
>
> Key: HIVE-20486
> URL: https://issues.apache.org/jira/browse/HIVE-20486
> Project: Hive
>  Issue Type: Improvement
>  Components: kafka integration
>Reporter: Gopal V
>Assignee: slim bouguerra
>Priority: Major
> Fix For: 4.0.0
>
>
> KafkaHandler returns unvectorized rows which causes the operators downstream 
> to be slower and sub-optimal.
> Hive has a vectorization shim which allows Kafka streams without complex 
> projections to be wrapped into a vectorized reader via 
> {{hive.vectorized.use.row.serde.deserialize}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work stopped] (HIVE-20486) Kafka: Use Row SerDe + vectorization

2018-10-25 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-20486 stopped by slim bouguerra.
-
> Kafka: Use Row SerDe + vectorization
> 
>
> Key: HIVE-20486
> URL: https://issues.apache.org/jira/browse/HIVE-20486
> Project: Hive
>  Issue Type: Improvement
>  Components: kafka integration
>Reporter: Gopal V
>Assignee: slim bouguerra
>Priority: Major
> Fix For: 4.0.0
>
>
> KafkaHandler returns unvectorized rows which causes the operators downstream 
> to be slower and sub-optimal.
> Hive has a vectorization shim which allows Kafka streams without complex 
> projections to be wrapped into a vectorized reader via 
> {{hive.vectorized.use.row.serde.deserialize}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-20486) Kafka: Use Row SerDe + vectorization

2018-10-25 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra reassigned HIVE-20486:
-

Assignee: slim bouguerra

> Kafka: Use Row SerDe + vectorization
> 
>
> Key: HIVE-20486
> URL: https://issues.apache.org/jira/browse/HIVE-20486
> Project: Hive
>  Issue Type: Improvement
>  Components: kafka integration
>Reporter: Gopal V
>Assignee: slim bouguerra
>Priority: Major
> Fix For: 4.0.0
>
>
> KafkaHandler returns unvectorized rows which causes the operators downstream 
> to be slower and sub-optimal.
> Hive has a vectorization shim which allows Kafka streams without complex 
> projections to be wrapped into a vectorized reader via 
> {{hive.vectorized.use.row.serde.deserialize}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20486) Kafka: Use Row SerDe + vectorization

2018-10-25 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663962#comment-16663962
 ] 

slim bouguerra commented on HIVE-20486:
---

looked at the suggested option and the code base 
https://github.com/apache/hive/blob/37c7fd7833eba087eadd8048dbc63b403b272104/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L1462

i see that it is only possible to enable this for some pre selected Input 
formats.
Thus i have created a new vectorized Kafka Reader.



> Kafka: Use Row SerDe + vectorization
> 
>
> Key: HIVE-20486
> URL: https://issues.apache.org/jira/browse/HIVE-20486
> Project: Hive
>  Issue Type: Improvement
>Reporter: Gopal V
>Priority: Major
>
> KafkaHandler returns unvectorized rows which causes the operators downstream 
> to be slower and sub-optimal.
> Hive has a vectorization shim which allows Kafka streams without complex 
> projections to be wrapped into a vectorized reader via 
> {{hive.vectorized.use.row.serde.deserialize}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HIVE-20485) Test Storage Handler with Secured Kafka Cluster

2018-10-25 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra resolved HIVE-20485.
---
Resolution: Resolved

> Test Storage Handler with Secured Kafka Cluster
> ---
>
> Key: HIVE-20485
> URL: https://issues.apache.org/jira/browse/HIVE-20485
> Project: Hive
>  Issue Type: Sub-task
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>
> Need to test this with Secured Kafka Cluster.
> * Kerberos
> * SSL support



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20792) Inserting timestamp with zones truncates the data

2018-10-25 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663856#comment-16663856
 ] 

Hive QA commented on HIVE-20792:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12945538/HIVE-20792.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15508 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/14639/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/14639/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-14639/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12945538 - PreCommit-HIVE-Build

> Inserting timestamp with zones truncates the data
> -
>
> Key: HIVE-20792
> URL: https://issues.apache.org/jira/browse/HIVE-20792
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Major
> Attachments: HIVE-20792.1.patch, HIVE-20792.2.patch
>
>
> For example with the table:
> {code}
> CREATE TABLE myTable
> (
> a TIMESTAMP
> )
> STORED AS ORC
> tblproperties("transactional"="true");
> {code}
> The following inserts store the wrong data:
> {code}
> INSERT INTO myTable VALUES("2018-10-19 10:35:00 UTC"); -> 2018-10-19 
> 00:00:00.0
> INSERT INTO myTable VALUES("2018-10-19 10:35:00 ZZZ"); -> 2018-10-19 
> 00:00:00.0
> {code}
> The second one should fail since ZZZ is not a time zone.
> Similarly if the column is of type DATE,
> {code}
> INSERT INTO myTableDate VALUES("2018-10-19 "); -> 2018-10-19
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20782) Cleaning some unused code

2018-10-25 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-20782:
--
Attachment: HIVE-20768.2.patch

> Cleaning some unused code
> -
>
> Key: HIVE-20782
> URL: https://issues.apache.org/jira/browse/HIVE-20782
> Project: Hive
>  Issue Type: Improvement
>Reporter: slim bouguerra
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-20768.2.patch, HIVE-20782.2.patch, 
> HIVE-20782.2.patch, HIVE-20782.patch
>
>
> Am making my way into the vectorize code and trying understand the APIs. Ran 
> into this unused one, i guess it is not used anymore.
> [~ashutoshc] maybe can explain as you are the main contributor to this file 
> {code} 
> a/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedSerde.java{code}
>  ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20792) Inserting timestamp with zones truncates the data

2018-10-25 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663805#comment-16663805
 ] 

Hive QA commented on HIVE-20792:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
47s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 1s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
31s{color} | {color:blue} common in master has 65 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
39s{color} | {color:blue} serde in master has 198 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
43s{color} | {color:blue} ql in master has 2317 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
11s{color} | {color:red} common: The patch generated 1 new + 2 unchanged - 0 
fixed = 3 total (was 2) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
14s{color} | {color:red} serde: The patch generated 3 new + 191 unchanged - 1 
fixed = 194 total (was 192) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
12s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 29m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-14639/dev-support/hive-personality.sh
 |
| git revision | master / a99be34 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14639/yetus/diff-checkstyle-common.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14639/yetus/diff-checkstyle-serde.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14639/yetus/patch-asflicense-problems.txt
 |
| modules | C: common serde ql U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14639/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Inserting timestamp with zones truncates the data
> -
>
> Key: HIVE-20792
> URL: https://issues.apache.org/jira/browse/HIVE-20792
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Major
> Attachments: HIVE-20792.1.patch, HIVE-20792.2.patch
>
>
> For example with the table:
> {code}
> 

[jira] [Assigned] (HIVE-20807) Refactor LlapStatusServiceDriver

2018-10-25 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely reassigned HIVE-20807:
-


> Refactor LlapStatusServiceDriver
> 
>
> Key: HIVE-20807
> URL: https://issues.apache.org/jira/browse/HIVE-20807
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
>
> LlapStatusServiceDriver is the class used to determine if LLAP has started. 
> The following problems should be solved by refactoring:
> 1. The main class is more than 800 lines long,should be cut into multiple 
> smaller classes.
> 2. The current design makes it extremely hard to write unit tests.
> 3. There are some overcomplicated, over-engineered parts of the code.
> 4. Most of the code is under org.apache.hadoop.hive.llap.cli, but some parts 
> are under org.apache.hadoop.hive.llap.cli.status. The whole program could be 
> moved to the latter.
> 5. LlapStatusHelpers serves as a class for holding classes, which doesn't 
> make much sense.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20796) jdbc URL can contain sensitive information that should not be logged

2018-10-25 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663764#comment-16663764
 ] 

Peter Vary commented on HIVE-20796:
---

+1 pending tests

> jdbc URL can contain sensitive information that should not be logged
> 
>
> Key: HIVE-20796
> URL: https://issues.apache.org/jira/browse/HIVE-20796
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Laszlo Pinter
>Assignee: Laszlo Pinter
>Priority: Major
> Attachments: HIVE-20796.01.patch, HIVE-20796.02.patch, 
> HIVE-20796.03.patch, HIVE-20796.04.patch
>
>
> It is possible to put passwords in the jdbc connection url and some jdbc 
> drivers will supposedly use that. (derby, mysql). This information is 
> considered sensitive, and should be masked out, while logging the connection 
> url.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20796) jdbc URL can contain sensitive information that should not be logged

2018-10-25 Thread Laszlo Pinter (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Pinter updated HIVE-20796:
-
Attachment: HIVE-20796.04.patch

> jdbc URL can contain sensitive information that should not be logged
> 
>
> Key: HIVE-20796
> URL: https://issues.apache.org/jira/browse/HIVE-20796
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Laszlo Pinter
>Assignee: Laszlo Pinter
>Priority: Major
> Attachments: HIVE-20796.01.patch, HIVE-20796.02.patch, 
> HIVE-20796.03.patch, HIVE-20796.04.patch
>
>
> It is possible to put passwords in the jdbc connection url and some jdbc 
> drivers will supposedly use that. (derby, mysql). This information is 
> considered sensitive, and should be masked out, while logging the connection 
> url.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20796) jdbc URL can contain sensitive information that should not be logged

2018-10-25 Thread Laszlo Pinter (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Pinter updated HIVE-20796:
-
Attachment: HIVE-20796.03.patch

> jdbc URL can contain sensitive information that should not be logged
> 
>
> Key: HIVE-20796
> URL: https://issues.apache.org/jira/browse/HIVE-20796
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Laszlo Pinter
>Assignee: Laszlo Pinter
>Priority: Major
> Attachments: HIVE-20796.01.patch, HIVE-20796.02.patch, 
> HIVE-20796.03.patch
>
>
> It is possible to put passwords in the jdbc connection url and some jdbc 
> drivers will supposedly use that. (derby, mysql). This information is 
> considered sensitive, and should be masked out, while logging the connection 
> url.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20796) jdbc URL can contain sensitive information that should not be logged

2018-10-25 Thread Laszlo Pinter (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Pinter updated HIVE-20796:
-
Attachment: HIVE-20796.02.patch

> jdbc URL can contain sensitive information that should not be logged
> 
>
> Key: HIVE-20796
> URL: https://issues.apache.org/jira/browse/HIVE-20796
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Laszlo Pinter
>Assignee: Laszlo Pinter
>Priority: Major
> Attachments: HIVE-20796.01.patch, HIVE-20796.02.patch
>
>
> It is possible to put passwords in the jdbc connection url and some jdbc 
> drivers will supposedly use that. (derby, mysql). This information is 
> considered sensitive, and should be masked out, while logging the connection 
> url.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20682) Async query execution can potentially fail if shared sessionHive is closed by master thread.

2018-10-25 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663663#comment-16663663
 ] 

Hive QA commented on HIVE-20682:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12945528/HIVE-20682.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 15502 tests 
executed
*Failed tests:*
{noformat}
TestMiniDruidCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=195)

[druidmini_dynamic_partition.q,druidmini_test_ts.q,druidmini_expressions.q,druidmini_test_alter.q,druidmini_test_insert.q]
org.apache.hadoop.hive.ql.parse.authorization.TestSessionUserName.testSessionConstructorUser
 (batchId=300)
org.apache.hadoop.hive.ql.parse.authorization.TestSessionUserName.testSessionDefaultUser
 (batchId=300)
org.apache.hadoop.hive.ql.parse.authorization.TestSessionUserName.testSessionGetGroupNames
 (batchId=300)
org.apache.hadoop.hive.ql.parse.authorization.TestSessionUserName.testSessionNullUser
 (batchId=300)
org.apache.hive.service.cli.session.TestSessionManagerMetrics.testAbandonedSessionMetrics
 (batchId=234)
org.apache.hive.service.cli.session.TestSessionManagerMetrics.testActiveSessionMetrics
 (batchId=234)
org.apache.hive.service.cli.session.TestSessionManagerMetrics.testActiveSessionTimeMetrics
 (batchId=234)
org.apache.hive.service.cli.session.TestSessionManagerMetrics.testOpenSessionMetrics
 (batchId=234)
org.apache.hive.service.cli.session.TestSessionManagerMetrics.testOpenSessionTimeMetrics
 (batchId=234)
org.apache.hive.service.cli.session.TestSessionManagerMetrics.testThreadPoolMetrics
 (batchId=234)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/14638/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/14638/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-14638/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12945528 - PreCommit-HIVE-Build

> Async query execution can potentially fail if shared sessionHive is closed by 
> master thread.
> 
>
> Key: HIVE-20682
> URL: https://issues.apache.org/jira/browse/HIVE-20682
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20682.01.patch, HIVE-20682.02.patch
>
>
> *Problem description:*
> The master thread initializes the *sessionHive* object in *HiveSessionImpl* 
> class when we open a new session for a client connection and by default all 
> queries from this connection shares the same sessionHive object. 
> If the master thread executes a *synchronous* query, it closes the 
> sessionHive object (referred via thread local hiveDb) if  
> {{Hive.isCompatible}} returns false and sets new Hive object in thread local 
> HiveDb but doesn't change the sessionHive object in the session. Whereas, 
> *asynchronous* query execution via async threads never closes the sessionHive 
> object and it just creates a new one if needed and sets it as their thread 
> local hiveDb.
> So, the problem can happen in the case where an *asynchronous* query is being 
> executed by async threads refers to sessionHive object and the master thread 
> receives a *synchronous* query that closes the same sessionHive object. 
> Also, each query execution overwrites the thread local hiveDb object to 
> sessionHive object which potentially leaks a metastore connection if the 
> previous synchronous query execution re-created the Hive object.
> *Possible Fix:*
> We shall maintain an Atomic reference counter in the Hive object. We 
> increment the counter when somebody sets it in thread local hiveDb and 
> decrement it when somebody releases it. Only when the counter is down to 0, 
> we should close the connection.
> Couple of cases to release the thread local hiveDb:
>  * When synchronous query execution in master thread re-creates Hive object 
> due to config change. We also, need to update the sessionHive object in the 
> current session as we releases it from thread local hiveDb of master thread.
>  * When async thread exits after completing execution or due to exception.
> If the session is getting 

[jira] [Commented] (HIVE-20796) jdbc URL can contain sensitive information that should not be logged

2018-10-25 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663634#comment-16663634
 ] 

Peter Vary commented on HIVE-20796:
---

Sounds like a good place :D

> jdbc URL can contain sensitive information that should not be logged
> 
>
> Key: HIVE-20796
> URL: https://issues.apache.org/jira/browse/HIVE-20796
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Laszlo Pinter
>Assignee: Laszlo Pinter
>Priority: Major
> Attachments: HIVE-20796.01.patch
>
>
> It is possible to put passwords in the jdbc connection url and some jdbc 
> drivers will supposedly use that. (derby, mysql). This information is 
> considered sensitive, and should be masked out, while logging the connection 
> url.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >