[jira] [Created] (HIVE-20815) JdbcRecordReader.next shall not eat exception

2018-10-25 Thread Daniel Dai (JIRA)
Daniel Dai created HIVE-20815:
-

 Summary: JdbcRecordReader.next shall not eat exception
 Key: HIVE-20815
 URL: https://issues.apache.org/jira/browse/HIVE-20815
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler
Reporter: Daniel Dai
Assignee: Daniel Dai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20814) New create function between HiveServer2 is not synchronized

2018-10-25 Thread leozhang (JIRA)
leozhang created HIVE-20814:
---

 Summary:  New create function between HiveServer2 is not 
synchronized
 Key: HIVE-20814
 URL: https://issues.apache.org/jira/browse/HIVE-20814
 Project: Hive
  Issue Type: Improvement
  Components: Beeline
Affects Versions: 1.1.0
Reporter: leozhang
 Attachments: image-2018-10-26-10-23-48-101.png, 
image-2018-10-26-10-24-16-669.png, image-2018-10-26-10-24-54-904.png, 
image-2018-10-26-10-26-00-591.png, image-2018-10-26-10-27-32-291.png

I am using CDH open source version 5.15.0, where the Hive version is 1.1.0.

I have 3 Metastore services and 3 HiveServer2 services in the cluster, 
Metastore and HiveServer2 don't currently have HA turned on. The cluster opens 
the Sentry service and configures the Hive Auxiliary JARs Directory attribute.

Now I am having a problem, I am using beeline to connect to HiveServer2 on node 
1 to create a function that is successful and can be queried normally. But if I 
now connect to HiveServer2 of node 2 via Beeline, I can't see the function I 
just created on node 1 through _show function_. I have to restart HiveServer2 
on node 2 to see the function created on node 1 just now.

I don't know the reason and solution for this problem, I hope I can get help, 
thank you!

 

exp:

connect dn1.test.com:1 to create function _zzytest_trans_ , use _show 
functions_ I can see it

!image-2018-10-26-10-24-16-669.png!

!image-2018-10-26-10-24-54-904.png!

 

At this point I used beeline to connect to dn4.test.com:1 and use the show 
functions command to view the function and I can't find the zzytest_trans 
function. I have to restart HiveServer2 on the dn4.test.com node to see the 
zzytest_trans function.

!image-2018-10-26-10-26-00-591.png!

!image-2018-10-26-10-27-32-291.png!

 

 

 

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 69019: HIVE-20617 Fix type of constants in IN expressions to have correct type

2018-10-25 Thread Jesús Camacho Rodríguez


> On Oct. 20, 2018, 6:57 p.m., Ashutosh Chauhan wrote:
> >

Overall latest version of the patch LGTM. It is a bit messy that all this logic 
lives here and probably we could rely on inspectors and existing logic to make 
everything cleaner. Let's get this one in, and we can tackle in follow-up.


> On Oct. 20, 2018, 6:57 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/alter_partition_coltype.q.out
> > Line 163 (original), 163 (patched)
> > 
> >
> > String and int comparison happens in double. So, should this be 3.0D ?

This has to do with the way that Calcite generates the SQL. I will create a 
follow-up.


> On Oct. 20, 2018, 6:57 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/in_typecheck_pointlook.q.out
> > Lines 56 (patched)
> > 
> >
> > I expected 'Unknown' should have been char of length 6. Is there a 
> > reason to expand the length to 10?
> > As I mentioned previously if constant is of smaller length, then it 
> > doesn't make a difference, but is unnecessary, but if constant is of bigger 
> > length then LHS, then char::compare() actually truncates constant, so it 
> > better to create char with original length of constant.
> 
> Zoltan Haindrich wrote:
> It worked before the other way around; constants are expanded to the 
> target type - the addition I've made is that if the constant is longer; then 
> its made invalid
> 
> Ashutosh Chauhan wrote:
> I think its better to let runtime dictacte the semantics in such cases. 
> So, we just create constant char of its original length and then whatever 
> runtime does with it, we will get that, instead of us "pre-processing" value 
> at compile time. Other way to think about this is if there are 2 cols of 
> char(5) and char(10) what will runtime do? Runtime already has logic to 
> handle such cases, we let it handle that.

bq. I've made is that if the constant is longer; then its made invalid
I have moved this logic away in latest patch, we can add it in follow-up, as it 
seems to me that should not be enforced in creation of HiveChar object.


> On Oct. 20, 2018, 6:57 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/join45.q.out
> > Line 717 (original), 717 (patched)
> > 
> >
> > As discussed this should have been
> > (struct(cast (_col0 as double), cast(_col2 as double))) IN (const 
> > struct(100.0D,100.0D), const struct(101.0D,101.0D), const 
> > struct(102.0D,102.0D))
> 
> Zoltan Haindrich wrote:
> * column has type string.
> * the IN statement may have the same type for all element in this case 
> it's int...
> * in this case to change the left hand side to double: it may work
> 
> 
> but what should happen in case the IN operands contain 1 int, 1 decimal 
> and 1 string ?
> 
> note: during UDF evaluations there is a Set with all the values; and 
> "contains" is used - so if the inferred type is not a numeric: 
> 
> I've just checked that even the standard IN doesn't work like this:
> 
> ```
> create table t (a string);
> 
> insert into t values ('1'),('x'),('2.0');
> 
> select * from t where a in (1.0,'x',2);
> 1
> 2.0
> -- it doesn't return 'x' because it's casted to double
> 
> ```
> 
> I'm starting to think that it would be better to do this with the IN 
> unwinded into ORs: so that we could do the one-on-one constant checks and 
> then pointlookupoptimizer might collapse them if the types are the same - in 
> this case I think we would not loose 'x' in the above case - and it would 
> also make this whole recursive typecheck unneccessary.
> 
> Ashutosh Chauhan wrote:
> We shall strive to match semantics which is already there. In Hive, for 
> expr str_col = 12, we get cast of double on both sides. So, by extensions IN 
> clause should do the same IFF all types are same. If types aren't same I 
> agree what you are proposing ie. keep it as unwinded ORs.
> 
> Zoltan Haindrich wrote:
> Added logic to unwind ors earlier; it fixes the double casts by relys on 
> the existing logic for equals; and makes the wierd type `(1.0,'x',2)` issue 
> go away as well as `struct(null,1)` cases

Logic to unwind ORs LGTM


> On Oct. 20, 2018, 6:57 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/parquet_vectorization_13.q.out
> > Line 86 (original), 86 (patched)
> > 
> >
> > Dont we print f for float constant suffix? ie 3569.0f ?

Same as above, this has to do with the way that Calcite generates the SQL. I 
will create a follow-up.


- Jesús


---
This is an automatically generated e-mail. To 

[jira] [Created] (HIVE-20813) udf to_epoch_milli need to support timestamp without time zone as well

2018-10-25 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20813:
-

 Summary: udf to_epoch_milli need to support timestamp without time 
zone as well
 Key: HIVE-20813
 URL: https://issues.apache.org/jira/browse/HIVE-20813
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


Currently the following query will fail with a cast exception (tries to cast 
timestamp to timestamp with local timezone).
{code}
 select to_epoch_milli(current_timestamp)
{code}
As a simple fix we need to add support for timestamp object inspector.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 69167: HIVE-20796: jdbc URL can contain sensitive information that should not be logged

2018-10-25 Thread Denys Kuzmenko via Review Board


> On Oct. 25, 2018, 10:46 p.m., Denys Kuzmenko wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
> > Lines 1172 (patched)
> > 
> >
> > Could be optimized to:
> > 
> > String[] sensitiveData = {"user", "password"};
> > String regex = "([;,\?&]" + String.join("|", sensitiveData) + 
> > ")=.*?([;,&\)]+)";
> > 
> > String result = 
> > Pattern.compile(regex).matcher(connectionURL).replaceAll("$1=***$2");
> 
> Denys Kuzmenko wrote:
> or just connectionURL.replaceAll(regex, "$1=***$2");

regex = "([;,?&]" + String.join("|", sensitiveData) + ")=.*?([;,&)]?)";


- Denys


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69167/#review210059
---


On Oct. 25, 2018, 1:36 p.m., Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69167/
> ---
> 
> (Updated Oct. 25, 2018, 1:36 p.m.)
> 
> 
> Review request for hive and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20796: jdbc URL can contain sensitive information that should not be 
> logged
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
>  9c158040497cd3d2762620ce35e2b46bb6d5fffe 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
>  f3b38665676391fec9b85eb9a405c14632340dc6 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/utils/TestMetaStoreServerUtils.java
>  f4bdd734dc4e731dda01e6031a4115cde5571baf 
> 
> 
> Diff: https://reviews.apache.org/r/69167/diff/1/
> 
> 
> Testing
> ---
> 
> New unit test created.
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



Re: Review Request 69167: HIVE-20796: jdbc URL can contain sensitive information that should not be logged

2018-10-25 Thread Denys Kuzmenko via Review Board


> On Oct. 25, 2018, 10:46 p.m., Denys Kuzmenko wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
> > Lines 1172 (patched)
> > 
> >
> > Could be optimized to:
> > 
> > String[] sensitiveData = {"user", "password"};
> > String regex = "([;,\?&]" + String.join("|", sensitiveData) + 
> > ")=.*?([;,&\)]+)";
> > 
> > String result = 
> > Pattern.compile(regex).matcher(connectionURL).replaceAll("$1=***$2");

or just connectionURL.replaceAll(regex, "$1=***$2");


- Denys


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69167/#review210059
---


On Oct. 25, 2018, 1:36 p.m., Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69167/
> ---
> 
> (Updated Oct. 25, 2018, 1:36 p.m.)
> 
> 
> Review request for hive and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20796: jdbc URL can contain sensitive information that should not be 
> logged
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
>  9c158040497cd3d2762620ce35e2b46bb6d5fffe 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
>  f3b38665676391fec9b85eb9a405c14632340dc6 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/utils/TestMetaStoreServerUtils.java
>  f4bdd734dc4e731dda01e6031a4115cde5571baf 
> 
> 
> Diff: https://reviews.apache.org/r/69167/diff/1/
> 
> 
> Testing
> ---
> 
> New unit test created.
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



Re: Review Request 69167: HIVE-20796: jdbc URL can contain sensitive information that should not be logged

2018-10-25 Thread Denys Kuzmenko via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69167/#review210059
---




standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
Lines 1172 (patched)


Could be optimized to:

String[] sensitiveData = {"user", "password"};
String regex = "([;,\?&]" + String.join("|", sensitiveData) + 
")=.*?([;,&\)]+)";

String result = 
Pattern.compile(regex).matcher(connectionURL).replaceAll("$1=***$2");


- Denys Kuzmenko


On Oct. 25, 2018, 1:36 p.m., Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69167/
> ---
> 
> (Updated Oct. 25, 2018, 1:36 p.m.)
> 
> 
> Review request for hive and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20796: jdbc URL can contain sensitive information that should not be 
> logged
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
>  9c158040497cd3d2762620ce35e2b46bb6d5fffe 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
>  f3b38665676391fec9b85eb9a405c14632340dc6 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/utils/TestMetaStoreServerUtils.java
>  f4bdd734dc4e731dda01e6031a4115cde5571baf 
> 
> 
> Diff: https://reviews.apache.org/r/69167/diff/1/
> 
> 
> Testing
> ---
> 
> New unit test created.
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



[jira] [Created] (HIVE-20812) Update jetty dependency to 9.3.25.v20180904

2018-10-25 Thread Thejas M Nair (JIRA)
Thejas M Nair created HIVE-20812:


 Summary: Update jetty dependency to 9.3.25.v20180904
 Key: HIVE-20812
 URL: https://issues.apache.org/jira/browse/HIVE-20812
 Project: Hive
  Issue Type: Improvement
Reporter: Thejas M Nair


The jetty version 9.3.20.v20170531 being used currently in master has several 
CVE associated with it.
Version 9.3.25.v20180904 has those issues resolved.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 69148: HIVE-20793 add RP namespacing to workload management

2018-10-25 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69148/
---

(Updated Oct. 25, 2018, 9:43 p.m.)


Review request for hive, Jason Dere and Prasanth_J.


Repository: hive-git


Description
---

see jira


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
e226a1f82d44550f389308f91d578e7aa4ea170a 
  
itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java
 c3e1e8e88c69d7713e16c7061ce8cf73a0d5e833 
  metastore/scripts/upgrade/hive/hive-schema-4.0.0.hive.sql 
a69046f961cdf0fff7989492c489bb62f2a66d72 
  metastore/scripts/upgrade/hive/upgrade-3.1.0-to-4.0.0.hive.sql 
4c770206fe3dcceb8570be1c1ef078b376f5cafd 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
807f159daa98d40e667914adc6c53fb8ecabf998 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 
4de038913a5c9a2c199f71702b8f70ca84d0856b 
  ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHive.java 
e57db935d9420508ed6091e12ca6b6cd3382db5d 
  ql/src/test/queries/clientpositive/resourceplan.q 
fae9701ebaeaa521904a383f5fb741c13be08d8e 
  ql/src/test/results/clientpositive/llap/resourceplan.q.out 
c11daf728cdd5bd6fe36618aff113b3d60579129 
  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
 aba63f050b5b98a2aeeb0df6ff2de5e6e06761f2 
  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
 d1c0c4d1f60016f28cea69348b1b30ecb61bf083 
  standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift 
4b7b61520a2d55635f474317053a17410f3a4bb7 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 8cd46e3f44e7c4e47fbf7f2ce2b6350a5814106f 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
 9c158040497cd3d2762620ce35e2b46bb6d5fffe 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java
 c3914b668fac18ead6196a4fc449e909f5af01b1 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
 47ac68c667bea8f09f5301a6364c854bc18b3c0d 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/model/MWMResourcePlan.java
 ac51f2d77145b37da468ce8df2ac5c42f4d6c538 
  standalone-metastore/metastore-server/src/main/resources/package.jdo 
fef6a42038bb2aa0cba6dfda8d710fd37cb720e7 
  
standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql
 c889bbdf96b887b29be858e41ee854f0731cd5cd 
  
standalone-metastore/metastore-server/src/main/sql/derby/upgrade-3.2.0-to-4.0.0.derby.sql
 aca5227a5bb6192da6c5f070c04d2941d636bad2 
  
standalone-metastore/metastore-server/src/main/sql/mssql/hive-schema-4.0.0.mssql.sql
 91ba134325094e413887a89e1d605efa99218288 
  
standalone-metastore/metastore-server/src/main/sql/mssql/upgrade-3.2.0-to-4.0.0.mssql.sql
 f0d861b3a9bc982c1e24fa49415dcfc6c105cd68 
  
standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0.mysql.sql
 3af2ebb253f82bb85976d229d4ac2225deffdbde 
  
standalone-metastore/metastore-server/src/main/sql/mysql/upgrade-3.2.0-to-4.0.0.mysql.sql
 ee0f691b524a6e822ac14e09d24d3a49ae8565b1 
  
standalone-metastore/metastore-server/src/main/sql/oracle/hive-schema-4.0.0.oracle.sql
 33aa08015a9e17585c42d64d44b364be96e69eaf 
  
standalone-metastore/metastore-server/src/main/sql/oracle/upgrade-3.2.0-to-4.0.0.oracle.sql
 bbb4a39ec4f6f616c9a1a9042a35cafd45cf9796 
  
standalone-metastore/metastore-server/src/main/sql/postgres/hive-schema-4.0.0.postgres.sql
 ea088d77fdaec85834b8fd3f01eacdfac58dd245 
  
standalone-metastore/metastore-server/src/main/sql/postgres/upgrade-3.2.0-to-4.0.0.postgres.sql
 2a2d70ae802eb6f5b9ab7f4f9519a0af30d2c5b4 
  
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
 966979891b71f1cbfe50f56c40c35af8b304c47f 
  
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
 593d562c3498660861201f58d83c27d59d184046 
  
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClientPreCatalog.java
 4293579ad8b55d59f2230040f23e9a693d838ca7 


Diff: https://reviews.apache.org/r/69148/diff/2/

Changes: https://reviews.apache.org/r/69148/diff/1-2/


Testing
---


Thanks,

Sergey Shelukhin



Re: Review Request 69174: Refactor LlapStatusServiceDriver

2018-10-25 Thread Miklos Gergely

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69174/
---

(Updated Oct. 25, 2018, 8:57 p.m.)


Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-20807
https://issues.apache.org/jira/browse/HIVE-20807


Repository: hive-git


Description
---

LlapStatusServiceDriver is the class used to determine if LLAP has started. The 
following problems should be solved by refactoring:

1. The main class is more than 800 lines long,should be cut into multiple 
smaller classes.
2. The current design makes it extremely hard to write unit tests.
3. There are some overcomplicated, over-engineered parts of the code.
4. Most of the code is under org.apache.hadoop.hive.llap.cli, but some parts 
are under org.apache.hadoop.hive.llap.cli.status. The whole program could be 
moved to the latter.
5. LlapStatusHelpers serves as a class for holding classes, which doesn't make 
much sense.

This is the first step of refactoring the program, now all of it components are 
moved under the package org.apache.hadoop.hive.llap.cli.status, all the classes 
and enums are put into a separate file, the overcomplicated parts of the 
command line parsing are replaced with a more simple structure, and the 
findbugs and checkstyle warnings are fixed.


Diffs (updated)
-

  bin/ext/llapstatus.sh 2d2c8f4 
  llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapSliderUtils.java 
af47b26 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapStatusOptionsProcessor.java
 dca0c7b 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapStatusServiceDriver.java
 a521799 
  llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/AmInfo.java 
PRE-CREATION 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/AppStatusBuilder.java
 PRE-CREATION 
  llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/ExitCode.java 
PRE-CREATION 
  llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/LlapInstance.java 
PRE-CREATION 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/LlapStatusCliException.java
 PRE-CREATION 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/LlapStatusHelpers.java
 5c8aeb0 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/LlapStatusServiceCommandLine.java
 PRE-CREATION 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/LlapStatusServiceDriver.java
 PRE-CREATION 
  llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/State.java 
PRE-CREATION 
  llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/package-info.java 
PRE-CREATION 
  
llap-server/src/test/org/apache/hadoop/hive/llap/cli/TestLlapStatusServiceDriver.java
 54166d5 
  
llap-server/src/test/org/apache/hadoop/hive/llap/cli/status/TestLlapStatusServiceCommandLine.java
 PRE-CREATION 
  llap-server/src/test/org/apache/hadoop/hive/llap/cli/status/package-info.java 
PRE-CREATION 


Diff: https://reviews.apache.org/r/69174/diff/2/

Changes: https://reviews.apache.org/r/69174/diff/1-2/


Testing
---

Tested on clusters that


Thanks,

Miklos Gergely



[jira] [Created] (HIVE-20811) Turn on dynamic partitioned hash join

2018-10-25 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-20811:
--

 Summary: Turn on dynamic partitioned hash join
 Key: HIVE-20811
 URL: https://issues.apache.org/jira/browse/HIVE-20811
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Vineet Garg
Assignee: Vineet Garg
 Attachments: HIVE-20811.1.patch

Currently it is off by default.

Turning if ON by default will help fix correctness and other issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20810) option for q files to create sysdb without hardcoding the path

2018-10-25 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-20810:
---

 Summary: option for q files to create sysdb without hardcoding the 
path
 Key: HIVE-20810
 URL: https://issues.apache.org/jira/browse/HIVE-20810
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin


Just noticed the master sysdb and resourceplan tests (maybe others too) still 
run 3.1 versions of the sysdb script, because the only way to run it right now 
it so hardcode the path to some sql file. I'm going to fix that for now in some 
other JIRA.
There should be a better way to init sysdb for tests, like we do for other 
datasets like src.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Review Request 69174: Refactor LlapStatusServiceDriver

2018-10-25 Thread Miklos Gergely

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69174/
---

Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-20807
https://issues.apache.org/jira/browse/HIVE-20807


Repository: hive-git


Description
---

LlapStatusServiceDriver is the class used to determine if LLAP has started. The 
following problems should be solved by refactoring:

1. The main class is more than 800 lines long,should be cut into multiple 
smaller classes.
2. The current design makes it extremely hard to write unit tests.
3. There are some overcomplicated, over-engineered parts of the code.
4. Most of the code is under org.apache.hadoop.hive.llap.cli, but some parts 
are under org.apache.hadoop.hive.llap.cli.status. The whole program could be 
moved to the latter.
5. LlapStatusHelpers serves as a class for holding classes, which doesn't make 
much sense.

This is the first step of refactoring the program, now all of it components are 
moved under the package org.apache.hadoop.hive.llap.cli.status, all the classes 
and enums are put into a separate file, the overcomplicated parts of the 
command line parsing are replaced with a more simple structure, and the 
findbugs and checkstyle warnings are fixed.


Diffs
-

  bin/ext/llapstatus.sh 2d2c8f4 
  llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapSliderUtils.java 
af47b26 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapStatusOptionsProcessor.java
 dca0c7b 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapStatusServiceDriver.java
 a521799 
  llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/AmInfo.java 
PRE-CREATION 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/AppStatusBuilder.java
 PRE-CREATION 
  llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/ExitCode.java 
PRE-CREATION 
  llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/LlapInstance.java 
PRE-CREATION 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/LlapStatusCliException.java
 PRE-CREATION 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/LlapStatusHelpers.java
 5c8aeb0 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/LlapStatusServiceCommandLine.java
 PRE-CREATION 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/LlapStatusServiceDriver.java
 PRE-CREATION 
  llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/State.java 
PRE-CREATION 
  llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/package-info.java 
PRE-CREATION 
  
llap-server/src/test/org/apache/hadoop/hive/llap/cli/TestLlapStatusServiceDriver.java
 54166d5 
  
llap-server/src/test/org/apache/hadoop/hive/llap/cli/status/TestLlapStatusServiceCommandLine.java
 PRE-CREATION 
  llap-server/src/test/org/apache/hadoop/hive/llap/cli/status/package-info.java 
PRE-CREATION 


Diff: https://reviews.apache.org/r/69174/diff/1/


Testing
---

Tested on clusters that


Thanks,

Miklos Gergely



Review Request 69173: HIVE-20259 Cleanup of results cache directory

2018-10-25 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69173/
---

Review request for hive and Gopal V.


Bugs: HIVE-20259
https://issues.apache.org/jira/browse/HIVE-20259


Repository: hive-git


Description
---

Attached patch with utility DirectoryMarkerUpdate/Cleanup classes to create 
.cacheupdate files in the cache directory, to indicate that this directory 
should not be cleaned up by any other process performing 
DirectoryMarkerCleanup. This uses the last modify date of the .cacheupdate file 
to determine whether the file should be cleaned up, if the instance running 
cleanup determines this date is too old then the directory will be deleted.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e226a1f82d 
  common/src/java/org/apache/hive/common/util/DirectoryMarkerCleanup.java 
PRE-CREATION 
  common/src/java/org/apache/hive/common/util/DirectoryMarkerUpdate.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java 
a51b7e750b 


Diff: https://reviews.apache.org/r/69173/diff/1/


Testing
---


Thanks,

Jason Dere



Re: Review Request 69170: HIVE-20486 Adding vectorized record reader for Kafka input format

2018-10-25 Thread Slim Bouguerra

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69170/
---

(Updated Oct. 25, 2018, 6:18 p.m.)


Review request for hive, Gopal V and Teddy Choi.


Changes
---

remove extra refactoring


Bugs: HIVE-20486
https://issues.apache.org/jira/browse/HIVE-20486


Repository: hive-git


Description
---

This PR adds a vectorized record reader to the Kafka Storage Handler.
Minor documentation fixup
Some small refactors to the Serde index managements.


Diffs (updated)
-

  itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 5924d06371 
  kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaInputFormat.java 
c401df9850 
  kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaRecordIterator.java 
2225f19a4d 
  kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaRecordReader.java 
7f8353c9f0 
  kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java 
6b2ca1056e 
  kafka-handler/src/java/org/apache/hadoop/hive/kafka/SimpleKafkaWriter.java 
678e190b3f 
  
kafka-handler/src/java/org/apache/hadoop/hive/kafka/VectorizedKafkaRecordReader.java
 PRE-CREATION 
  
kafka-handler/src/test/org/apache/hadoop/hive/kafka/SimpleKafkaWriterTest.java 
8a9bbc7f66 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java e3e329f89b 
  ql/src/test/queries/clientpositive/kafka_storage_handler.q e6cd276f95 
  ql/src/test/results/clientpositive/druid/kafka_storage_handler.q.out 
8ea2aa9d3a 


Diff: https://reviews.apache.org/r/69170/diff/3/

Changes: https://reviews.apache.org/r/69170/diff/2-3/


Testing
---


Thanks,

Slim Bouguerra



Re: Review Request 69170: HIVE-20486 Adding vectorized record reader for Kafka input format

2018-10-25 Thread Slim Bouguerra

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69170/
---

(Updated Oct. 25, 2018, 6:17 p.m.)


Review request for hive, Gopal V and Teddy Choi.


Changes
---

removed the extra refactoring


Bugs: HIVE-20486
https://issues.apache.org/jira/browse/HIVE-20486


Repository: hive-git


Description
---

This PR adds a vectorized record reader to the Kafka Storage Handler.
Minor documentation fixup
Some small refactors to the Serde index managements.


Diffs (updated)
-

  
accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/HiveAccumuloRecordReader.java
 45607cbecf 
  
accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/predicate/AccumuloPredicateHandler.java
 0774d842ef 
  
accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/predicate/TestAccumuloPredicateHandler.java
 0bb50e8784 
  
accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/predicate/TestAccumuloRangeGenerator.java
 4975fa0d5e 
  common/src/java/org/apache/hadoop/hive/common/GcTimeMonitor.java edba6f9ad6 
  common/src/java/org/apache/hadoop/hive/common/JvmPauseMonitor.java 3c988da310 
  
druid-handler/src/java/org/apache/hadoop/hive/druid/DruidStorageHandlerUtils.java
 c3e7e5df8d 
  druid-handler/src/java/org/apache/hadoop/hive/druid/io/DruidRecordWriter.java 
400262a107 
  
druid-handler/src/java/org/apache/hadoop/hive/druid/json/KafkaSupervisorReport.java
 5a6756ecbf 
  druid-handler/src/java/org/apache/hadoop/hive/druid/serde/DruidWritable.java 
7390647c4b 
  
druid-handler/src/test/org/apache/hadoop/hive/druid/TestDruidStorageHandler.java
 510330d5d0 
  
druid-handler/src/test/org/apache/hadoop/hive/ql/io/TestDruidRecordWriter.java 
cb8fa3919b 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/io/TestHadoopFileStatus.java
 55877bea15 
  
itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestTriggersMoveWorkloadManager.java
 ad5aa180bf 
  
itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestTriggersTezSessionPoolManager.java
 faab11aa80 
  itests/qtest-druid/src/main/java/org/apache/hive/druid/ForkingDruidNode.java 
f81a0cae6b 
  itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 5924d06371 
  kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaInputFormat.java 
c401df9850 
  kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaRecordIterator.java 
2225f19a4d 
  kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaRecordReader.java 
7f8353c9f0 
  kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java 
6b2ca1056e 
  kafka-handler/src/java/org/apache/hadoop/hive/kafka/SimpleKafkaWriter.java 
678e190b3f 
  
kafka-handler/src/java/org/apache/hadoop/hive/kafka/VectorizedKafkaRecordReader.java
 PRE-CREATION 
  
kafka-handler/src/test/org/apache/hadoop/hive/kafka/SimpleKafkaWriterTest.java 
8a9bbc7f66 
  
llap-client/src/java/org/apache/hadoop/hive/llap/ext/LlapTaskUmbilicalExternalClient.java
 945474f540 
  
llap-client/src/test/org/apache/hadoop/hive/llap/registry/impl/TestSlotZnode.java
 0569505855 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java e3e329f89b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 92775107bc 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 1a88b77fee 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MergeFileRecordProcessor.java 
c55a3940c2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/UserPoolMapping.java 
b14c8e4476 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java 
f8fa0cd1dd 
  ql/src/java/org/apache/hadoop/hive/ql/hooks/NoOperatorReuseCheckerHook.java 
494459abd7 
  ql/src/java/org/apache/hadoop/hive/ql/hooks/PostExecOrcFileDump.java 
df99674f2c 
  ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java 
ed82d2d01e 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 25b2d483d7 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/StreamUtils.java 
215cefcd01 
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatUtils.java
 4180dc471d 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/RedundantDynamicPruningConditionsRemoval.java
 4a60158892 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
 4d9963a061 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionTimeGranularityOptimizer.java
 4297537adb 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/TablePropertyEnrichmentOptimizer.java
 a904182f91 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveSubQRemoveRelBuilder.java
 05d1dc6cf2 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java 
fc9178f156 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAggregateJoinTransposeRule.java
 ed6659c6cc 
  

Re: Review Request 69167: HIVE-20796: jdbc URL can contain sensitive information that should not be logged

2018-10-25 Thread denys kuzmenko via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69167/#review210045
---




standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
Lines 1164 (patched)


Opinion: I would go with a list of attributes to be masked


- denys kuzmenko


On Oct. 25, 2018, 1:36 p.m., Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69167/
> ---
> 
> (Updated Oct. 25, 2018, 1:36 p.m.)
> 
> 
> Review request for hive and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20796: jdbc URL can contain sensitive information that should not be 
> logged
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
>  9c158040497cd3d2762620ce35e2b46bb6d5fffe 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
>  f3b38665676391fec9b85eb9a405c14632340dc6 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/utils/TestMetaStoreServerUtils.java
>  f4bdd734dc4e731dda01e6031a4115cde5571baf 
> 
> 
> Diff: https://reviews.apache.org/r/69167/diff/1/
> 
> 
> Testing
> ---
> 
> New unit test created.
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



Re: Review Request 69167: HIVE-20796: jdbc URL can contain sensitive information that should not be logged

2018-10-25 Thread denys kuzmenko via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69167/#review210044
---




standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
Lines 1163 (patched)


Strings are immutable


- denys kuzmenko


On Oct. 25, 2018, 1:36 p.m., Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69167/
> ---
> 
> (Updated Oct. 25, 2018, 1:36 p.m.)
> 
> 
> Review request for hive and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20796: jdbc URL can contain sensitive information that should not be 
> logged
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
>  9c158040497cd3d2762620ce35e2b46bb6d5fffe 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
>  f3b38665676391fec9b85eb9a405c14632340dc6 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/utils/TestMetaStoreServerUtils.java
>  f4bdd734dc4e731dda01e6031a4115cde5571baf 
> 
> 
> Diff: https://reviews.apache.org/r/69167/diff/1/
> 
> 
> Testing
> ---
> 
> New unit test created.
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



Re: Review Request 69167: HIVE-20796: jdbc URL can contain sensitive information that should not be logged

2018-10-25 Thread Andrew Sherman via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69167/#review210043
---




standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
Lines 1176 (patched)


Nit: all the trendy kids use StringBuilder now


- Andrew Sherman


On Oct. 25, 2018, 1:36 p.m., Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69167/
> ---
> 
> (Updated Oct. 25, 2018, 1:36 p.m.)
> 
> 
> Review request for hive and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20796: jdbc URL can contain sensitive information that should not be 
> logged
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
>  9c158040497cd3d2762620ce35e2b46bb6d5fffe 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
>  f3b38665676391fec9b85eb9a405c14632340dc6 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/utils/TestMetaStoreServerUtils.java
>  f4bdd734dc4e731dda01e6031a4115cde5571baf 
> 
> 
> Diff: https://reviews.apache.org/r/69167/diff/1/
> 
> 
> Testing
> ---
> 
> New unit test created.
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



Review Request 69170: HIVE-20486 Adding vectorized record reader for Kafka input format

2018-10-25 Thread Slim Bouguerra

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69170/
---

Review request for hive, Gopal V and Teddy Choi.


Bugs: HIVE-20486
https://issues.apache.org/jira/browse/HIVE-20486


Repository: hive-git


Description
---

This PR adds a vectorized record reader to the Kafka Storage Handler.
Minor documentation fixup
Some small refactors to the Serde index managements.


Diffs
-

  itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 5924d06371 
  kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaInputFormat.java 
c401df9850 
  kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaRecordIterator.java 
2225f19a4d 
  kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaRecordReader.java 
7f8353c9f0 
  kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java 
6b2ca1056e 
  kafka-handler/src/java/org/apache/hadoop/hive/kafka/SimpleKafkaWriter.java 
678e190b3f 
  
kafka-handler/src/java/org/apache/hadoop/hive/kafka/VectorizedKafkaRecordReader.java
 PRE-CREATION 
  
kafka-handler/src/test/org/apache/hadoop/hive/kafka/SimpleKafkaWriterTest.java 
8a9bbc7f66 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java e3e329f89b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedSerde.java 
c97143c633 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java 7b788350b2 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcSerde.java 
f2295d6015 
  ql/src/test/queries/clientpositive/kafka_storage_handler.q e6cd276f95 
  ql/src/test/results/clientpositive/druid/kafka_storage_handler.q.out 
8ea2aa9d3a 


Diff: https://reviews.apache.org/r/69170/diff/1/


Testing
---


Thanks,

Slim Bouguerra



[jira] [Created] (HIVE-20809) Parse Spark error blacklist errors

2018-10-25 Thread Sahil Takiar (JIRA)
Sahil Takiar created HIVE-20809:
---

 Summary: Parse Spark error blacklist errors
 Key: HIVE-20809
 URL: https://issues.apache.org/jira/browse/HIVE-20809
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar


Spark has an executor blacklist feature that throws errors similar to the 
following:

{code}
Aborting TaskSet 52.0 because task 0 (partition 0) cannot run anywhere due to 
node and executor blacklist.  Blacklisting behavior can be configured via 
spark.blacklist.*.
{code}

I think the message changed in Spark 2.4.0, but its similar to the one above.

It would be good to have some custom parsing logic and a custom {{ErroMsg}} for 
Spark blacklist errors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20808) Queries with map() constructor are slow with vectorization

2018-10-25 Thread Matthew Barr (JIRA)
Matthew Barr created HIVE-20808:
---

 Summary: Queries with map() constructor are slow with vectorization
 Key: HIVE-20808
 URL: https://issues.apache.org/jira/browse/HIVE-20808
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Matthew Barr


Queries involving map operator with vectorization enabled appear to be slowing 
down due to vector UDF adaptor.

Corresponding jstack for slow task:
{code:java}
"TezChild" #23 daemon prio=5 os_prio=0 tid=0x7f1e44f1b080 nid=0x9419 
runnable [0x7f1e28137000] 
java.lang.Thread.State: RUNNABLE 
at 
org.apache.hadoop.hive.ql.exec.vector.ColumnVector.ensureSize(ColumnVector.java:232)
 
at 
org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector.ensureSize(DecimalColumnVector.java:208)
 
at 
org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:587)
 
at 
org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:350)
 
at 
org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:205)
 
at 
org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:146)
 
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:271)
 
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFMapIndexBaseScalar.evaluate(VectorUDFMapIndexBaseScalar.java:57)
 
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
 
at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965) 
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938) 
at 
org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:136)
 
at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965) 
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938) 
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
 
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:812)
 
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:845)
 
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
 
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
 
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
 
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
 
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) 
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
 
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
 
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
 
at java.security.AccessController.doPrivileged(Native Method) 
at javax.security.auth.Subject.doAs(Subject.java:422) 
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
 
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
 
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
 
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) 
at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
 
at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
 
at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 69107: HIVE-20512

2018-10-25 Thread Sahil Takiar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/#review210041
---




ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Line 63 (original), 67 (patched)


intialize this above and mark it as final, since its accessed by the 
MemoryInfoLogger thread it needs to be thread safe.

use a custom `ThreadFactory` for the pool. You can use Guava's 
`ThreadFactoryBuilder` - the pool should use daemon threads, specify a name 
format that includes something like `MemoryAndRowLogger`, and a customer 
uncaught exception handler that should just log any exceptions that are caught



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Lines 113 (patched)


instead of just calling `shutdownNow` you should call `shutdown` and then 
run `awaitTermination` with a wait time of say 30 seconds, and then call 
`shutdownNow`. This allows for orderly shutdown of the executor. All in 
progress tasks are allowed to complete.

this will also require handling the race condition where the 
`MemoryInfoLogger` is tries to schedule a task on a shutdown executor. You will 
probably have to use a a custom `RejectedExecutionHandler` - probably the 
`ThreadPoolExecutor.DiscardPolicy`


- Sahil Takiar


On Oct. 24, 2018, 8:55 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69107/
> ---
> 
> (Updated Oct. 24, 2018, 8:55 p.m.)
> 
> 
> Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang 
> Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Improve record and memory usage logging in SparkRecordHandler
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
> 88dd12c05ade417aca4cdaece4448d31d4e1d65f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
>  8880bb604e088755dcfb0bcb39689702fab0cb77 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
> cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 
> 
> 
> Diff: https://reviews.apache.org/r/69107/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>



Re: Review Request 69107: HIVE-20512

2018-10-25 Thread Sahil Takiar


> On Oct. 23, 2018, 7:50 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
> > Line 49 (original), 52 (patched)
> > 
> >
> > i think volatile long is sufficient here and is probably cheaper. 
> > atomics might be expensive when done per row
> 
> Bharathkrishna Guruvayoor Murali wrote:
> I first used volatile, but I replaced it with AtomicLong because the 
> rowNumber needs to be incremented and rowNumber++ on a volatile variable is 
> not considered a safe operation. What do you think about that?

i think volatile should still be fine because there is no contention on the 
variable - e.g. it is only updated by a single thread at a time. as long we 
maintain that invariant we should be fine. would be good to add some javadocs 
saying that we only expect this variable to be updated by a single thread at a 
time.


> On Oct. 23, 2018, 7:50 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
> > Line 50 (original), 53 (patched)
> > 
> >
> > this need to be volatile since it is modified by the timer task
> 
> Bharathkrishna Guruvayoor Murali wrote:
> This variable is also used as 
> logThresholdInterval = Math.min(maxLogThresholdInterval, 2 * 
> logThresholdInterval);
> 
> Non-atomic operation. So should I make this variable atomic as well?

same as above, i think volatile should be ok as long as a single thread access 
logThresholdInterval at a time.


- Sahil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/#review209935
---


On Oct. 24, 2018, 8:55 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69107/
> ---
> 
> (Updated Oct. 24, 2018, 8:55 p.m.)
> 
> 
> Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang 
> Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Improve record and memory usage logging in SparkRecordHandler
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
> 88dd12c05ade417aca4cdaece4448d31d4e1d65f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
>  8880bb604e088755dcfb0bcb39689702fab0cb77 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
> cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 
> 
> 
> Diff: https://reviews.apache.org/r/69107/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>



Re: Review Request 69107: HIVE-20512

2018-10-25 Thread Sahil Takiar


> On Oct. 24, 2018, 8:58 p.m., Bharathkrishna Guruvayoor Murali wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
> > Line 67 (original), 67 (patched)
> > 
> >
> > Creating this as a threadPool of size 1. I guess that is fine, as we 
> > know only one thread will be used at any point?

yes the size should be 1


- Sahil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/#review209986
---


On Oct. 24, 2018, 8:55 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69107/
> ---
> 
> (Updated Oct. 24, 2018, 8:55 p.m.)
> 
> 
> Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang 
> Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Improve record and memory usage logging in SparkRecordHandler
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
> 88dd12c05ade417aca4cdaece4448d31d4e1d65f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
>  8880bb604e088755dcfb0bcb39689702fab0cb77 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
> cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 
> 
> 
> Diff: https://reviews.apache.org/r/69107/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>



[jira] [Created] (HIVE-20807) Refactor LlapStatusServiceDriver

2018-10-25 Thread Miklos Gergely (JIRA)
Miklos Gergely created HIVE-20807:
-

 Summary: Refactor LlapStatusServiceDriver
 Key: HIVE-20807
 URL: https://issues.apache.org/jira/browse/HIVE-20807
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Affects Versions: 4.0.0
Reporter: Miklos Gergely
Assignee: Miklos Gergely
 Fix For: 4.0.0


LlapStatusServiceDriver is the class used to determine if LLAP has started. The 
following problems should be solved by refactoring:

1. The main class is more than 800 lines long,should be cut into multiple 
smaller classes.
2. The current design makes it extremely hard to write unit tests.
3. There are some overcomplicated, over-engineered parts of the code.
4. Most of the code is under org.apache.hadoop.hive.llap.cli, but some parts 
are under org.apache.hadoop.hive.llap.cli.status. The whole program could be 
moved to the latter.
5. LlapStatusHelpers serves as a class for holding classes, which doesn't make 
much sense.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 69167: HIVE-20796: jdbc URL can contain sensitive information that should not be logged

2018-10-25 Thread Peter Vary via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69167/#review210036
---


Ship it!




Ship It!

- Peter Vary


On okt. 25, 2018, 1:36 du, Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69167/
> ---
> 
> (Updated okt. 25, 2018, 1:36 du)
> 
> 
> Review request for hive and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20796: jdbc URL can contain sensitive information that should not be 
> logged
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
>  9c158040497cd3d2762620ce35e2b46bb6d5fffe 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
>  f3b38665676391fec9b85eb9a405c14632340dc6 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/utils/TestMetaStoreServerUtils.java
>  f4bdd734dc4e731dda01e6031a4115cde5571baf 
> 
> 
> Diff: https://reviews.apache.org/r/69167/diff/1/
> 
> 
> Testing
> ---
> 
> New unit test created.
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



Review Request 69167: HIVE-20796: jdbc URL can contain sensitive information that should not be logged

2018-10-25 Thread Laszlo Pinter via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69167/
---

Review request for hive and Peter Vary.


Repository: hive-git


Description
---

HIVE-20796: jdbc URL can contain sensitive information that should not be logged


Diffs
-

  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
 9c158040497cd3d2762620ce35e2b46bb6d5fffe 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
 f3b38665676391fec9b85eb9a405c14632340dc6 
  
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/utils/TestMetaStoreServerUtils.java
 f4bdd734dc4e731dda01e6031a4115cde5571baf 


Diff: https://reviews.apache.org/r/69167/diff/1/


Testing
---

New unit test created.


Thanks,

Laszlo Pinter



Re: Review Request 69155: HIVE-20760: Reducing memory overhead due to multiple HiveConfs

2018-10-25 Thread Peter Vary via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69155/#review210029
---



Thanks for the patch Barna!
Indeed this will be a sizeable memory saving!


common/src/java/org/apache/hadoop/hive/common/HiveConfProperties.java
Lines 56 (patched)


Why are we using interner here?



common/src/java/org/apache/hadoop/hive/common/HiveConfProperties.java
Lines 84 (patched)


Can we use interned.getProperty(key, default)?



common/src/java/org/apache/hadoop/hive/common/HiveConfProperties.java
Lines 198 (patched)


I think there are situations when this is not true, if we overwrite 
something the size will be smaller?


- Peter Vary


On okt. 25, 2018, 8:31 de, Barnabas Maidics wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69155/
> ---
> 
> (Updated okt. 25, 2018, 8:31 de)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The issue is that every Hive task has to load its own version of HiveConf. 
> When running with a large number of cores per executor (HoS), there is a 
> significant (~10%) amount of memory wasted due to this duplication. 
> See more: https://issues.apache.org/jira/browse/HIVE-20760
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/HiveConfProperties.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 07d5205bed 
>   common/src/test/org/apache/hadoop/hive/conf/TestHiveConfProperties.java 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/69155/diff/1/
> 
> 
> Testing
> ---
> 
> Created unit tests for the new Properties implementation.
> Tested multiple queries output.
> 
> 
> Thanks,
> 
> Barnabas Maidics
> 
>



[GitHub] hive pull request #454: findBestMatch() tests the inclusion of default parti...

2018-10-25 Thread glapark
GitHub user glapark opened a pull request:

https://github.com/apache/hive/pull/454

findBestMatch() tests the inclusion of default partition name

This pull request implements the change discussed in the Hive user mailing 
list regarding non-determinisitic behavior of hive in generating DAGS. From the 
discussion thread:

I have been looking further into this issue, and have found that the 
non-determinstic behavior of Hive in generating DAGs is actually due to the 
logic in AggregateStatsCache.findBestMatch() called from 
AggregateStatsCache.get(), as well as the disproportionate distribution of 
Nulls in __HIVE_DEFAULT_PARTITION__ (in the case of the TPC-DS dataset).

Here is what is happening. Let me use web_sales table and ws_web_site_sk 
column in the 10TB TPC-DS dataset as a running example.

1. In the course of running TPC-DS queries, Hive asks MetaStore about the 
column statistics of 1823 partNames in the web_sales/ws_web_site_sk 
combination, either without __HIVE_DEFAULT_PARTITION__ or with 
__HIVE_DEFAULT_PARTITION__.

  --- Without __HIVE_DEFAULT_PARTITION__, it reports a total of 901180 
nulls.

  --- With __HIVE_DEFAULT_PARTITION__, however, it report a total of 
1800087 nulls, almost twice as many.

2. The first call to MetaStore returns the correct result, but all 
subsequent requests are likely to return the same result from the cache, 
irrespective of the inclusion of __HIVE_DEFAULT_PARTITION__. This is because 
AggregateStatsCache.findBestMatch() treats __HIVE_DEFAULT_PARTITION__ in the 
same way as other partNames, and the difference in the size of partNames[] is 
just 1. The outcome depends on the duration of intervening queries, so 
everything is now non-deterministic. 

3. If a wrong value of numNulls is returned, Hive generates a different 
DAG, which usually takes much longer than the correct one (e.g., 150s to 1000s 
for the first part of Query 24, and 40s to 120s for Query 5).  I guess the 
problem is particularly pronounced here because of the huge number of nulls in 
__HIVE_DEFAULT_PARTITION__. It is ironic to see that the query optimizer is so 
efficient that a single wrong guess of numNulls creates a very inefficient DAG. 

Note that this behavior cannot be avoided by setting 
hive.metastore.aggregate.stats.cache.max.variance to zero because the 
difference in the number of partNames[] between the argument and the entry in 
the cache is just 1.

I think that AggregateStatsCache.findBestMatch() should treat 
__HIVE_DEFAULT_PARTITION__ in a special way, by not returning the result in the 
cache if there is a difference in the inclusion of partName 
__HIVE_DEFAULT_PARTITION__ (or should provide the use with an option to 
activate this feature). 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mr3-project/hive 
compare.default.partition.findBestMatch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/454.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #454


commit 00034ddb4fd8b7e0615c991a5d15233a798a1968
Author: gla 
Date:   2018-10-25T10:46:32Z

findBestMatch() tests the inclusion of default partition name




---


[jira] [Created] (HIVE-20806) Add ASF license for files added in HIVE-20679

2018-10-25 Thread anishek (JIRA)
anishek created HIVE-20806:
--

 Summary: Add ASF license for files added in HIVE-20679
 Key: HIVE-20806
 URL: https://issues.apache.org/jira/browse/HIVE-20806
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 4.0.0
Reporter: anishek
 Fix For: 4.0.0


HIVE-20679 added couple of new files Deserialzer/Serialzer that needs the ASF 
license header.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Review Request 69155: HIVE-20760: Reducing memory overhead due to multiple HiveConfs

2018-10-25 Thread Barnabas Maidics via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69155/
---

Review request for hive.


Repository: hive-git


Description
---

The issue is that every Hive task has to load its own version of HiveConf. When 
running with a large number of cores per executor (HoS), there is a significant 
(~10%) amount of memory wasted due to this duplication. 
See more: https://issues.apache.org/jira/browse/HIVE-20760


Diffs
-

  common/src/java/org/apache/hadoop/hive/common/HiveConfProperties.java 
PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 07d5205bed 
  common/src/test/org/apache/hadoop/hive/conf/TestHiveConfProperties.java 
PRE-CREATION 


Diff: https://reviews.apache.org/r/69155/diff/1/


Testing
---

Created unit tests for the new Properties implementation.
Tested multiple queries output.


Thanks,

Barnabas Maidics