Re: Review Request 69262: HIVE-20853

2018-11-07 Thread Jaume Marhuenda

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69262/
---

(Updated Nov. 7, 2018, 5:17 p.m.)


Review request for hive.


Repository: hive-git


Description
---

Expose ShuffleHandler.registerDag in the llap daemon API


Diffs (updated)
-

  
llap-client/src/java/org/apache/hadoop/hive/llap/tez/LlapProtocolClientProxy.java
 211696a0b5 
  
llap-common/src/gen/protobuf/gen-java/org/apache/hadoop/hive/llap/daemon/rpc/LlapDaemonProtocolProtos.java
 8fecc1e920 
  llap-common/src/java/org/apache/hadoop/hive/llap/LlapUtil.java 82776abea2 
  
llap-common/src/java/org/apache/hadoop/hive/llap/impl/LlapProtocolClientImpl.java
 bdffbbfc22 
  llap-common/src/protobuf/LlapDaemonProtocol.proto d70dd41a83 
  llap-server/src/java/org/apache/hadoop/hive/llap/daemon/ContainerRunner.java 
035960e347 
  
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
 ef5922ef41 
  llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapDaemon.java 
52990c5f05 
  
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapProtocolServerImpl.java
 d856b2580a 
  
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java 
ab84dcc5b3 
  
llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java
 18a37a2adc 
  
llap-server/src/test/org/apache/hadoop/hive/llap/daemon/LlapDaemonTestUtils.java
 PRE-CREATION 
  
llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TestContainerRunnerImpl.java
 PRE-CREATION 
  
llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/comparator/TestFirstInFirstOutComparator.java
 d3aa53942b 
  
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java
 5d4ce223d9 
  
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
 7e8299d156 
  llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTezUtils.java 
e4af660fff 


Diff: https://reviews.apache.org/r/69262/diff/2/

Changes: https://reviews.apache.org/r/69262/diff/1-2/


Testing
---


File Attachments (updated)


HIVE-20853.5.patch
  
https://reviews.apache.org/media/uploaded/files/2018/11/07/c032625d-ab63-4df3-b778-b390ab1168e4__HIVE-20853.5.patch


Thanks,

Jaume Marhuenda



Re: Review Request 69202: HIVE-20804 Further improvements to group by optimization with constraints

2018-11-07 Thread Vineet Garg


> On Nov. 7, 2018, 2:09 a.m., Jesús Camacho Rodríguez wrote:
> > ql/src/test/queries/clientpositive/constraints_optimization.q
> > Lines 355 (patched)
> > 
> >
> > Can we add two more tests:
> > - One with column swapping before GroupBy (probably if you use group by 
> > b,c,a and table contains a,b,c, it should work and add the Project in 
> > between the TS and the GroupBy).
> > - One with a join and a group by on one column for other table that is 
> > also the join key of the table where all columns are coming from (as in the 
> > whiteboard).

I have added the first test, but the one with join doesn't work
e.g. 
-- transitive equivalence on pk column, therefore all other columns shoule be 
removed
EXPLAIN CBO
SELECT
C_FIRST_NAME
FROM
CUSTOMER
,   STORE_SALES
WHERE
C_CUSTOMER_SK   =   SS_CUSTOMER_SK
GROUP BY
SS_CUSTOMER_SK
,   C_FIRST_NAME
,   C_LAST_NAME
,   C_PREFERRED_CUST_FLAG
,   C_BIRTH_COUNTRY
,   C_LOGIN
,   C_EMAIL_ADDRESS
;
C_CUSTOMER_SK here is key so ideally we should remove all columns from group by 
except SS_CUSTOMER_EX and C_FISRT_NAME but getExpressionLineage returs only 
STOERS_SALES as ref for SS_CUSTOMER_SK column. 
I looked at the RelMdExpressionLineage logic for join and it doesn't look like 
it take join condition into account while determining lineage.


- Vineet


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69202/#review210362
---


On Nov. 7, 2018, 1:49 a.m., Vineet Garg wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69202/
> ---
> 
> (Updated Nov. 7, 2018, 1:49 a.m.)
> 
> 
> Review request for hive and Jesús Camacho Rodríguez.
> 
> 
> Bugs: HIVE-20804
> https://issues.apache.org/jira/browse/HIVE-20804
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See Jira
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java 
> 9aa30129b6 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java
>  b7c31bdfca 
>   ql/src/test/queries/clientpositive/constraints_optimization.q 70ab8509c5 
>   ql/src/test/results/clientpositive/llap/constraints_optimization.q.out 
> 96caa4d6dd 
> 
> 
> Diff: https://reviews.apache.org/r/69202/diff/4/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Vineet Garg
> 
>



[jira] [Created] (HIVE-20884) Support bootstrap of tables to target with hive.strict.managed.tables enabled.

2018-11-07 Thread Sankar Hariappan (JIRA)
Sankar Hariappan created HIVE-20884:
---

 Summary: Support bootstrap of tables to target with 
hive.strict.managed.tables enabled.
 Key: HIVE-20884
 URL: https://issues.apache.org/jira/browse/HIVE-20884
 Project: Hive
  Issue Type: Sub-task
  Components: repl
Affects Versions: 4.0.0
Reporter: Sankar Hariappan
Assignee: Sankar Hariappan


Hive2 supports replication of managed tables. But in Hive3, some of these 
managed tables are converted to ACID or MM tables. Also, some of them are 
converted to external tables based on below rules. 
 # Avro format, Storage handlers, List bucketed tabled are converted to 
external tables.
 # Location not owned by "hive" user are converted to external table.
 # Hive owned ORC format are converted to full ACID transactional table.
 # Hive owned Non-ORC format are converted to MM transactional table.

REPL LOAD should apply these rules during bootstrap and convert the tables 
accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20882) Support Hive replication to a target cluster with hive.strict.managed.tables enabled.

2018-11-07 Thread Sankar Hariappan (JIRA)
Sankar Hariappan created HIVE-20882:
---

 Summary: Support Hive replication to a target cluster with 
hive.strict.managed.tables enabled.
 Key: HIVE-20882
 URL: https://issues.apache.org/jira/browse/HIVE-20882
 Project: Hive
  Issue Type: New Feature
  Components: repl
Affects Versions: 4.0.0
Reporter: Sankar Hariappan
Assignee: Sankar Hariappan


*Requirements:*
 - Support Hive replication with Hive2 as master and Hive3 as slave where 
hive.strict.managed.tables is enabled.
 - The non-ACID managed tables from Hive2 should be converted to appropriate 
ACID or MM tables or to an external table based on Hive3 table type rules.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 69107: HIVE-20512

2018-11-07 Thread Bharathkrishna Guruvayoor Murali via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/
---

(Updated Nov. 7, 2018, 8:52 p.m.)


Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang Karajgaonkar.


Changes
---

Adding scheduledFuture.cancel with shutDown


Repository: hive-git


Description
---

Improve record and memory usage logging in SparkRecordHandler


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
88dd12c05ade417aca4cdaece4448d31d4e1d65f 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
 8880bb604e088755dcfb0bcb39689702fab0cb77 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 
20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 


Diff: https://reviews.apache.org/r/69107/diff/6/

Changes: https://reviews.apache.org/r/69107/diff/5-6/


Testing
---


Thanks,

Bharathkrishna Guruvayoor Murali



[jira] [Created] (HIVE-20883) REPL DUMP to dump the default warehouse directory of source.

2018-11-07 Thread Sankar Hariappan (JIRA)
Sankar Hariappan created HIVE-20883:
---

 Summary: REPL DUMP to dump the default warehouse directory of 
source.
 Key: HIVE-20883
 URL: https://issues.apache.org/jira/browse/HIVE-20883
 Project: Hive
  Issue Type: Sub-task
  Components: repl
Affects Versions: 4.0.0
Reporter: Sankar Hariappan
Assignee: Sankar Hariappan


The default warehouse directory of the source is needed by target to detect if 
DB or table location is set by user or assigned by Hive. 
Using this information, REPL LOAD will decide to preserve the path or move data 
to default managed table's warehouse directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 69202: HIVE-20804 Further improvements to group by optimization with constraints

2018-11-07 Thread Vineet Garg

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69202/
---

(Updated Nov. 7, 2018, 7:39 p.m.)


Review request for hive and Jesús Camacho Rodríguez.


Bugs: HIVE-20804
https://issues.apache.org/jira/browse/HIVE-20804


Repository: hive-git


Description
---

See Jira


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java 
9aa30129b6 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java
 b7c31bdfca 
  ql/src/test/queries/clientpositive/constraints_optimization.q 70ab8509c5 
  ql/src/test/results/clientpositive/llap/constraints_optimization.q.out 
96caa4d6dd 


Diff: https://reviews.apache.org/r/69202/diff/5/

Changes: https://reviews.apache.org/r/69202/diff/4-5/


Testing
---


Thanks,

Vineet Garg



[SECURITY] CVE-2018-11777: Blocking local resource access in HiveServer2

2018-11-07 Thread Daniel Dai
CVE-2018-11777: Blocking local resource access in HiveServer2

Severity: Important

Vendor: The Apache Software Foundation

Versions Affected: This vulnerability affects all versions of Hive,
including 2.3.3, 3.1.0 and earlier

Description: Local resources on HiveServer2 machines are not properly
protected against malicious user if ranger, sentry or sql standard
authorizer is not in use.

Mitigation: It is recommended to upgrade to 2.3.4 or 3.1.1 or later if
HiveServer2 is used, and ranger, sentry or sql standard authorizer
is not in use. Admin needs to specify the following entries in
hiveserver2-site.xml:


  hive.security.authorization.enabled
  true


  hive.security.authorization.manager
  
org.apache.hadoop.hive.ql.security.authorization.plugin.fallback.FallbackHiveAuthorizerFactory


FallbackHiveAuthorizerFactory will do the following to mitigate above
mentioned threat:
1. Disallow local file location in sql statements except for admin
2. Allow "set" only selected whitelist parameters
3. Disallow dfs commands except for admin
4. Disallow "ADD JAR" statement
5. Disallow "COMPILE" statement
6. Disallow "TRANSFORM" statement

Credit: This issue was discovered by Mithun Radhakrishnan of Oath Inc


[jira] [Created] (HIVE-20885) ql.txn.compactor.TestCompactor runs most tests 2 times

2018-11-07 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20885:
-

 Summary: ql.txn.compactor.TestCompactor runs most tests 2 times
 Key: HIVE-20885
 URL: https://issues.apache.org/jira/browse/HIVE-20885
 Project: Hive
  Issue Type: Improvement
  Components: Streaming, Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman


HIVE-19211 added {{@RunWith(Parameterized.class)}} so that it runs once with 
{{newStreamingAPI=true}} and once with \{{newStreamingAPI==false}} but only 
about 5 tests out of 23 make use of this variable.  All other tests are 
executed 2 times for no reason

 

cc [~prasanth_j]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[ANNOUNCE] Apache Hive 2.3.4 Released

2018-11-07 Thread Daniel Dai
The Apache Hive team is proud to announce the release of Apache Hive
version 2.3.4.

The Apache Hive (TM) data warehouse software facilitates querying and
managing large datasets residing in distributed storage. Built on top
of Apache Hadoop (TM), it provides, among others:

* Tools to enable easy data extract/transform/load (ETL)

* A mechanism to impose structure on a variety of data formats

* Access to files stored either directly in Apache HDFS (TM) or in other
  data storage systems such as Apache HBase (TM)

* Query execution via Apache Hadoop MapReduce, Apache Tez and Apache Spark
frameworks.

For Hive release details and downloads, please visit:
https://hive.apache.org/downloads.html

Hive 2.3.4 Release Notes are available here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12344319=Text=12310843

We would like to thank the many contributors who made this release
possible.

Regards,

The Apache Hive Team


[SECURITY] CVE-2018-1314: Hive explain query not being authorized

2018-11-07 Thread Daniel Dai
CVE-2018-1314: Hive explain query not being authorized

Severity: Important

Vendor: The Apache Software Foundation

Versions Affected: This vulnerability affects all versions of Hive,
including 2.3.3, 3.1.0 and earlier

Description: Hive "EXPLAIN" operation does not check for necessary
authorization of involved entities in a query. An unauthorized user
can do "EXPLAIN" on arbitrary table or view and expose table metadata
and statistics.

Mitigation: all Hive users shall upgrade to 2.3.4 or 3.1.1 or later


[jira] [Created] (HIVE-20890) ACID: Allow whole table ReadLocks to skip all partition locks

2018-11-07 Thread Gopal V (JIRA)
Gopal V created HIVE-20890:
--

 Summary: ACID: Allow whole table ReadLocks to skip all partition 
locks
 Key: HIVE-20890
 URL: https://issues.apache.org/jira/browse/HIVE-20890
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Reporter: Gopal V


HIVE-19369 proposes adding a EXCL_WRITE lock which does not wait for any 
SHARED_READ locks for insert operations - in the presence of that lock, the 
insert overwrite no longer takes an exclusive lock.

The only exclusive operation will be a schema change or drop table, which 
should take an exclusive lock on the entire table directly.

{code}
explain locks select * from tpcds_bin_partitioned_orc_1000.store_sales where 
ss_sold_date_sk=2452626 

++
|  Explain   |
++
| LOCK INFORMATION:  |
| tpcds_bin_partitioned_orc_1000.store_sales -> SHARED_READ |
| tpcds_bin_partitioned_orc_1000.store_sales.ss_sold_date_sk=2452626 -> 
SHARED_READ |
++
{code}

So the per-partition SHARED_READ locks are no longer necessary, if the lock 
builder already includes the table-wide SHARED_READ locks.

The removal of entire partitions is the only part which needs to be taken care 
of within this semantics as row-removal instead of directory removal (i.e "drop 
partition" -> "truncate partition" and have the truncation trigger a whole 
directory cleaner, so that the partition disappears when there are 0 rows left).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20887) Tests: openjdk 8 has a bug that prevents surefire from working

2018-11-07 Thread Gopal V (JIRA)
Gopal V created HIVE-20887:
--

 Summary: Tests: openjdk 8 has a bug that prevents surefire from 
working
 Key: HIVE-20887
 URL: https://issues.apache.org/jira/browse/HIVE-20887
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V


It looks like the problem is https://bugs.openjdk.java.net/browse/JDK-8030046. 
It looks like:

{code:bash}
[ERROR] Caused by: 
org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM 
terminated without properly saying goodbye. VM crash or System.exit called?
{code}

The surefire-reports/*.dumpstream looks like:
{code:bash}
Error: Could not find or load main class 
org.apache.maven.surefire.booter.ForkedBooter
{code}

 and we can work around the problem by changing the surefire configuration:

{code:bash}
+  false
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 69202: HIVE-20804 Further improvements to group by optimization with constraints

2018-11-07 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69202/#review210391
---


Ship it!




Ship It!

- Jesús Camacho Rodríguez


On Nov. 7, 2018, 7:39 p.m., Vineet Garg wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69202/
> ---
> 
> (Updated Nov. 7, 2018, 7:39 p.m.)
> 
> 
> Review request for hive and Jesús Camacho Rodríguez.
> 
> 
> Bugs: HIVE-20804
> https://issues.apache.org/jira/browse/HIVE-20804
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See Jira
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java 
> 9aa30129b6 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java
>  b7c31bdfca 
>   ql/src/test/queries/clientpositive/constraints_optimization.q 70ab8509c5 
>   ql/src/test/results/clientpositive/llap/constraints_optimization.q.out 
> 96caa4d6dd 
> 
> 
> Diff: https://reviews.apache.org/r/69202/diff/5/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Vineet Garg
> 
>



Re: Review Request 69254: HIVE-20818: Views created with a WHERE subquery will regard views referenced in the subquery as direct input

2018-11-07 Thread Karen Coppage via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69254/
---

(Updated Nov. 7, 2018, 8:56 a.m.)


Review request for hive.


Bugs: HIVE-20818
https://issues.apache.org/jira/browse/HIVE-20818


Repository: hive-git


Description
---

If Hive is configured with an authorization hook like Sentry, and a view is 
created with a WHERE clause referencing a different view' user has no access 
to, user cannot access the view as view' is considered direct input.
WHERE IN and WHERE EXISTS cause the same issue.
Cascading views created with no WHERE clauses (i.e. with simple SELECTs and 
FROM clauses) work fine.

See Jira for example


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java ab63ce2bc3 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1a2777bf45 
  ql/src/test/org/apache/hadoop/hive/ql/plan/TestViewEntity.java 6ad38b8467 


Diff: https://reviews.apache.org/r/69254/diff/2/

Changes: https://reviews.apache.org/r/69254/diff/1-2/


Testing
---

Added unit test


Thanks,

Karen Coppage



[jira] [Created] (HIVE-20877) StandardListObjectInspector ArrayIndexOutOfBoundsException when array is null

2018-11-07 Thread ulysses you (JIRA)
ulysses you created HIVE-20877:
--

 Summary: StandardListObjectInspector 
ArrayIndexOutOfBoundsException when array is null
 Key: HIVE-20877
 URL: https://issues.apache.org/jira/browse/HIVE-20877
 Project: Hive
  Issue Type: Bug
Reporter: ulysses you
 Attachments: 1a160fefacf9ba043f87a7588dee6154a9661bdb.patch

I create a table with desc a column `array` type and relate to an external 
table. When select from this table, hive throw the Exception. 

Here is the exception log:
{code:java}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row  {"array": null}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
... 8 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at java.util.ArrayList.elementData(ArrayList.java:418)
at java.util.ArrayList.remove(ArrayList.java:495)
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.resize(StandardListObjectInspector.java:143)
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$ListConverter.convert(ObjectInspectorConverters.java:345)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$ReturnObjectInspectorResolver.convertIfNecessary(GenericUDFUtils.java:236)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$ReturnObjectInspectorResolver.convertIfNecessary(GenericUDFUtils.java:202)
at 
org.apache.hadoop.hive.ql.exec.UnionOperator.process(UnionOperator.java:137)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:111)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at 
org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:122)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:132)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:167)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552)
... 9 more
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 68474: HIVE-20440

2018-11-07 Thread Antal Sinkovits via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/
---

(Updated nov. 7, 2018, 2:38 du)


Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu 
Zhang.


Repository: hive-git


Description
---

I've modified the SmallTableCache to use guava cache, with soft references.
By using a value loader, I've also eliminated the synchronization on the 
intern-ed string of the path.


Diffs (updated)
-

  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCacheEviction.java
 PRE-CREATION 
  ql/pom.xml 8c3e55eaf4d0234a280b0936f6153d2f563bbe46 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 
da1dd426c9155290e30fd1e3ae7f19a5479a8967 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java
 9e65fd98d6e4451421641b1429ccf334fe9a9586 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java
 54377428eafdb79e1bbdc8a182eafb46f8febd23 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java
 0e4b8df036724bd83e85fc3cc70f534272dab4c4 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java
 74e0b120ea3560a6a2a0074e6c0026b4874b3d5e 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
 24b8fea33815867ce544fd284437c4d02a21f1a3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 
cf27e92bafdc63096ec0fa8c3106657bab52f370 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 
3293100af96dc60408c53065fa89143ead98f818 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java
 e8dcbf18cb09b190536f920a53d6e9fa870ce33b 
  ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java 
PRE-CREATION 


Diff: https://reviews.apache.org/r/68474/diff/5/

Changes: https://reviews.apache.org/r/68474/diff/4-5/


Testing
---


Thanks,

Antal Sinkovits



Re: Review Request 68474: HIVE-20440

2018-11-07 Thread Antal Sinkovits via Review Board


> On okt. 16, 2018, 2:56 du, Sahil Takiar wrote:
> > Could we add some more E2E integration tests? I'm thinking they could at 
> > the granularity of a `MapJoinOperator`? For example, confirm that starting 
> > a new query actually evicts everything from the cache? We want to make sure 
> > we aren't accidentally leaking small tables.
> 
> Antal Sinkovits wrote:
> MapJoinOperator cannot be tested easily. There is a TestMapJoinOperator, 
> but the test code is really complex. And the eviction happens at the 
> HivePairFlatMapFunction level. For every Map/Reduce the cache is 
> reinitialized. If we are in a new query the cache gets evicted.

I've added a new test, to check this.


- Antal


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review209628
---


On nov. 7, 2018, 2:38 du, Antal Sinkovits wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> ---
> 
> (Updated nov. 7, 2018, 2:38 du)
> 
> 
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu 
> Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the 
> intern-ed string of the path.
> 
> 
> Diffs
> -
> 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCacheEviction.java
>  PRE-CREATION 
>   ql/pom.xml 8c3e55eaf4d0234a280b0936f6153d2f563bbe46 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 
> da1dd426c9155290e30fd1e3ae7f19a5479a8967 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java
>  9e65fd98d6e4451421641b1429ccf334fe9a9586 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java
>  54377428eafdb79e1bbdc8a182eafb46f8febd23 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java
>  0e4b8df036724bd83e85fc3cc70f534272dab4c4 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java
>  74e0b120ea3560a6a2a0074e6c0026b4874b3d5e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
>  24b8fea33815867ce544fd284437c4d02a21f1a3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 
> cf27e92bafdc63096ec0fa8c3106657bab52f370 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 
> 3293100af96dc60408c53065fa89143ead98f818 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java
>  e8dcbf18cb09b190536f920a53d6e9fa870ce33b 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68474/diff/5/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Antal Sinkovits
> 
>



[jira] [Created] (HIVE-20878) Hive Runner for Unit tests with Hive JDBC standanlone jar issue with log4j slf4j

2018-11-07 Thread Carsten Steckel (JIRA)
Carsten Steckel created HIVE-20878:
--

 Summary: Hive Runner for Unit tests with Hive JDBC standanlone jar 
issue with log4j slf4j
 Key: HIVE-20878
 URL: https://issues.apache.org/jira/browse/HIVE-20878
 Project: Hive
  Issue Type: Bug
  Components: Hive, JDBC
Affects Versions: 3.1.1, 3.1.0
 Environment: hive 3.1.1 and hive 3.1.0

backend hadoop 2.9.1

hive runner https://github.com/klarna/HiveRunner
Reporter: Carsten Steckel


I have an standalone java application using the hive-jdbc-standalone.jar to 
create and drop databases, tables, indexes, views in a hive db via jdbc 
connection. I want to unit test the executed DDL operations via hive runner.

The hive jdbc standalone jar brings a lot of dependencies (and shades them), 
but that causes issues with using application context where there is logging 
infrastructure configure and in place.
java.lang.IncompatibleClassChangeError: Class 
org.apache.logging.slf4j.Log4jLoggerFactory does not implement the requested 
interface org.apache.hive.org.slf4j.ILoggerFactory
at 
org.apache.hive.org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
 

How to properly setup dependencies or exclusions? Should't a library like 
hive-jdbc leave logging to the "surrounging" application context? Why the 
dependency to logging?

Maybe related to [HIVE-20877] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 69174: Refactor LlapStatusServiceDriver

2018-11-07 Thread Miklos Gergely

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69174/
---

(Updated Nov. 7, 2018, 5 p.m.)


Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-20807
https://issues.apache.org/jira/browse/HIVE-20807


Repository: hive-git


Description
---

LlapStatusServiceDriver is the class used to determine if LLAP has started. The 
following problems should be solved by refactoring:

1. The main class is more than 800 lines long,should be cut into multiple 
smaller classes.
2. The current design makes it extremely hard to write unit tests.
3. There are some overcomplicated, over-engineered parts of the code.
4. Most of the code is under org.apache.hadoop.hive.llap.cli, but some parts 
are under org.apache.hadoop.hive.llap.cli.status. The whole program could be 
moved to the latter.
5. LlapStatusHelpers serves as a class for holding classes, which doesn't make 
much sense.

This is the first step of refactoring the program, now all of it components are 
moved under the package org.apache.hadoop.hive.llap.cli.status, all the classes 
and enums are put into a separate file, the overcomplicated parts of the 
command line parsing are replaced with a more simple structure, and the 
findbugs and checkstyle warnings are fixed.


Diffs (updated)
-

  bin/ext/llapstatus.sh 2d2c8f4 
  llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapSliderUtils.java 
af47b26 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapStatusOptionsProcessor.java
 dca0c7b 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapStatusServiceDriver.java
 a521799 
  llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/AmInfo.java 
PRE-CREATION 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/AppStatusBuilder.java
 PRE-CREATION 
  llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/ExitCode.java 
PRE-CREATION 
  llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/LlapInstance.java 
PRE-CREATION 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/LlapStatusCliException.java
 PRE-CREATION 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/LlapStatusHelpers.java
 5c8aeb0 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/LlapStatusServiceCommandLine.java
 PRE-CREATION 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/LlapStatusServiceDriver.java
 PRE-CREATION 
  llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/State.java 
PRE-CREATION 
  llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/package-info.java 
PRE-CREATION 
  
llap-server/src/test/org/apache/hadoop/hive/llap/cli/TestLlapStatusServiceDriver.java
 54166d5 
  
llap-server/src/test/org/apache/hadoop/hive/llap/cli/status/TestLlapStatusServiceCommandLine.java
 PRE-CREATION 
  llap-server/src/test/org/apache/hadoop/hive/llap/cli/status/package-info.java 
PRE-CREATION 
  service/src/java/org/apache/hive/http/LlapServlet.java 92264d2 


Diff: https://reviews.apache.org/r/69174/diff/4/

Changes: https://reviews.apache.org/r/69174/diff/3-4/


Testing
---

Tested on clusters that


Thanks,

Miklos Gergely



[jira] [Created] (HIVE-20879) Using null in a projection expression leads to CastException

2018-11-07 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-20879:
---

 Summary: Using null in a projection expression leads to 
CastException
 Key: HIVE-20879
 URL: https://issues.apache.org/jira/browse/HIVE-20879
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


repro:
{code}
create table cx1(bool0 boolean);
select NULL or bool0 from cx1;
{code}

workaround(correct type of null):
{code}
select cast(NULL as boolean) or bool0 from cx1;
{code}

exception:
{code}
2018-11-07T07:28:39,628 ERROR [3533166f-7174-45cd-9d9e-d487038cb6e0 main] 
ql.Driver: FAILED: ClassCastEx
ception 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableVoidObjectInspector
 cannot be ca
st to 
org.apache.hadoop.hive.serde2.objectinspector.primitive.BooleanObjectInspector
java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableVoidObject
Inspector cannot be cast to 
org.apache.hadoop.hive.serde2.objectinspector.primitive.BooleanObjectInspect
or
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd.initialize(GenericUDFOPAnd.java:56)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:1
48)
at 
org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.ja
va:260)
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprN
odeDesc(TypeCheckProcFactory.java:1251)
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckPr
ocFactory.java:1660)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:10
5)

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20880) Update default value for hive.stats.filter.in.min.ratio

2018-11-07 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-20880:
---

 Summary: Update default value for hive.stats.filter.in.min.ratio
 Key: HIVE-20880
 URL: https://issues.apache.org/jira/browse/HIVE-20880
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20881) Constant propagation oversimplifies projections

2018-11-07 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-20881:
---

 Summary: Constant propagation oversimplifies projections
 Key: HIVE-20881
 URL: https://issues.apache.org/jira/browse/HIVE-20881
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


{code:java}
create table cx2(bool1 boolean);
insert into cx2 values (true),(false),(null);

set hive.cbo.enable=true;
select bool1 IS TRUE OR (cast(NULL as boolean) AND bool1 IS NOT TRUE AND bool1 
IS NOT FALSE) from cx2;

++
|  _c0   |
++
| true   |
| false  |
| NULL   |
++


set hive.cbo.enable=false;
select bool1 IS TRUE OR (cast(NULL as boolean) AND bool1 IS NOT TRUE AND bool1 
IS NOT FALSE) from cx2;

+---+
|  _c0  |
+---+
| true  |
| NULL  |
| NULL  |
+---+

{code}

from explain it seems the expression was simplified to: {{(_col0 is true or 
null)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)