[jira] [Created] (HIVE-17511) Error while populating orc cache in llap

2017-09-11 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-17511:
---

 Summary: Error while populating orc cache in llap
 Key: HIVE-17511
 URL: https://issues.apache.org/jira/browse/HIVE-17511
 Project: Hive
  Issue Type: Bug
  Components: ORC
Reporter: Ashutosh Chauhan


Observed that while querying an error is thrown while loading cache in llap 
daemons



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17510) Make comparison of filter predicates in q files deterministic

2017-09-11 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-17510:
--

 Summary: Make comparison of filter predicates in q files 
deterministic
 Key: HIVE-17510
 URL: https://issues.apache.org/jira/browse/HIVE-17510
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Affects Versions: 3.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


I have been hitting this issue while submitting patches to test HIVE-17432.

Basically, the order on which the rewriting might create the children of AND 
operations is not deterministic. Thus, tests might fail because the created 
golden file is not the same, thought the test should pass because they just 
simply do not follow same order:

{code}
predicate: ((d_year >= 1992) and (d_year <= 1997) and ((c_city = 'UNITED KI1') 
or (c_city = 'UNITED KI5')) and ((s_city = 'UNITED KI1') or (s_city = 'UNITED 
KI5'))) (type: boolean)
{code}
{code}
predicate: ((d_year <= 1997) and (d_year >= 1992) and ((c_city = 'UNITED KI1') 
or (c_city = 'UNITED KI5')) and ((s_city = 'UNITED KI1') or (s_city = 'UNITED 
KI5'))) (type: boolean)
{code}

This patches fixes the issue by sorting the children of some expressions 
(currently AND and OR children) when we run explain plan and we are running in 
test mode.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 62228: HIVE-17495: CachedStore: prewarm improvements, refactoring and caching some aggregate stats

2017-09-11 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62228/#review185129
---




metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
Lines 1513 (patched)


try... finally for closeAll?



metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
Lines 2137 (patched)


nit: final?


- Sergey Shelukhin


On Sept. 11, 2017, 9:25 p.m., Vaibhav Gumashta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62228/
> ---
> 
> (Updated Sept. 11, 2017, 9:25 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Thejas Nair.
> 
> 
> Bugs: HIVE-17495
> https://issues.apache.org/jira/browse/HIVE-17495
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> https://issues.apache.org/jira/browse/HIVE-17495
> 
> 
> Diffs
> -
> 
>   
> itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java
>  8d861e4 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
> dc1245e 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
> bbe13fd 
>   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
> 3053dcb 
>   metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 71982a0 
>   metastore/src/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 
> 3ba81ce 
>   metastore/src/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java 
> 80b17e0 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/BinaryColumnStatsAggregator.java
>  e6c836b 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/BooleanColumnStatsAggregator.java
>  a34bc9f 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/ColumnStatsAggregator.java
>  a52e5e5 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/ColumnStatsAggregatorFactory.java
>  dfae708 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/DateColumnStatsAggregator.java
>  ee95396 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/DecimalColumnStatsAggregator.java
>  284c12c 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/DoubleColumnStatsAggregator.java
>  bb4a725 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/LongColumnStatsAggregator.java
>  5b1145e 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/StringColumnStatsAggregator.java
>  1b29f92 
>   
> metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
>  4db203d 
>   
> metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
>  fb16cfc 
> 
> 
> Diff: https://reviews.apache.org/r/62228/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>



[jira] [Created] (HIVE-17509) CachedStore: investigate and fix bugs related to cache update thread

2017-09-11 Thread Vaibhav Gumashta (JIRA)
Vaibhav Gumashta created HIVE-17509:
---

 Summary: CachedStore: investigate and fix bugs related to cache 
update thread
 Key: HIVE-17509
 URL: https://issues.apache.org/jira/browse/HIVE-17509
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Vaibhav Gumashta


Reported by [~ashutoshc] in some internal tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17508) Implement pool rules and triggers based on counters

2017-09-11 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-17508:


 Summary: Implement pool rules and triggers based on counters
 Key: HIVE-17508
 URL: https://issues.apache.org/jira/browse/HIVE-17508
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


Workload management can defined Rules that are bound to a resource plan. Each 
rule can have a trigger expression and an action associated with it. Trigger 
expressions are evaluated at runtime after configurable check interval, based 
on which actions like killing a query, moving a query to different pool etc. 
will get invoked. Simple rule could be something like
{code}
CREATE RULE slow_query IN resource_plan_name
WHEN execution_time_ms > 1
MOVE TO slow_queue
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] hive pull request #247: HIVE-17506 Moved standalone-metastore out from under...

2017-09-11 Thread alanfgates
GitHub user alanfgates opened a pull request:

https://github.com/apache/hive/pull/247

HIVE-17506 Moved standalone-metastore out from under Hive pom to its own 
top lev…

…el pom.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alanfgates/hive hive17506

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/247.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #247


commit 9374a05dfa57470d7b986e875485670f8ce73276
Author: Alan Gates 
Date:   2017-09-11T22:03:22Z

Moved standalone-metastore out from under Hive pom to its own top level pom.




---


[jira] [Created] (HIVE-17507) Support Mesos for Hive on Spark

2017-09-11 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-17507:
--

 Summary: Support Mesos for Hive on Spark
 Key: HIVE-17507
 URL: https://issues.apache.org/jira/browse/HIVE-17507
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang


>From the comment in HIVE-7292:
{quote}
I see the following case: I use Mesos DC/OS and Spark on Mesos. Because it's 
very convenient. But if I want to use Hive on Spark in Mesos DC/OS, I need 
special framework Apache Myriad to run YARN on Mesos. It's very cluttering 
because I run one Resource Manager on another Resource Manager, and it creates 
a lot of redundant abstraction levels.
And there are questions about that on the Internet (e.g. 
http://grokbase.com/t/hive/user/15997dye2q/hive-on-spark-on-mesos)
Can we create the new sub-task for this feature?
{quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Review Request 62228: HIVE-17495: CachedStore: prewarm improvements, refactoring and caching some aggregate stats

2017-09-11 Thread Vaibhav Gumashta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62228/
---

Review request for hive, Ashutosh Chauhan and Thejas Nair.


Bugs: HIVE-17495
https://issues.apache.org/jira/browse/HIVE-17495


Repository: hive-git


Description
---

https://issues.apache.org/jira/browse/HIVE-17495


Diffs
-

  
itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java
 8d861e4 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
dc1245e 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
bbe13fd 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 3053dcb 
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 71982a0 
  metastore/src/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 
3ba81ce 
  metastore/src/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java 
80b17e0 
  
metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/BinaryColumnStatsAggregator.java
 e6c836b 
  
metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/BooleanColumnStatsAggregator.java
 a34bc9f 
  
metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/ColumnStatsAggregator.java
 a52e5e5 
  
metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/ColumnStatsAggregatorFactory.java
 dfae708 
  
metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/DateColumnStatsAggregator.java
 ee95396 
  
metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/DecimalColumnStatsAggregator.java
 284c12c 
  
metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/DoubleColumnStatsAggregator.java
 bb4a725 
  
metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/LongColumnStatsAggregator.java
 5b1145e 
  
metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/aggr/StringColumnStatsAggregator.java
 1b29f92 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
 4db203d 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
 fb16cfc 


Diff: https://reviews.apache.org/r/62228/diff/1/


Testing
---


Thanks,

Vaibhav Gumashta



[jira] [Created] (HIVE-17506) Fix standalone-metastore pom.xml to not depend on hive's main pom

2017-09-11 Thread Alan Gates (JIRA)
Alan Gates created HIVE-17506:
-

 Summary: Fix standalone-metastore pom.xml to not depend on hive's 
main pom
 Key: HIVE-17506
 URL: https://issues.apache.org/jira/browse/HIVE-17506
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore
Reporter: Alan Gates
Assignee: Alan Gates


In order to be separately releasable the standalone metastore needs to have its 
own pom rather than inherit from Hive's.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17504) Skip ACID table for replication

2017-09-11 Thread Tao Li (JIRA)
Tao Li created HIVE-17504:
-

 Summary: Skip ACID table for replication
 Key: HIVE-17504
 URL: https://issues.apache.org/jira/browse/HIVE-17504
 Project: Hive
  Issue Type: Bug
  Components: repl
Reporter: Tao Li
Assignee: Tao Li


Currently we are not supporting replicate ACID tables (which will be future 
work).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] hive pull request #102: HIVE-14309. Fix naming of classes in ORC module.

2017-09-11 Thread omalley
Github user omalley closed the pull request at:

https://github.com/apache/hive/pull/102


---


[GitHub] hive pull request #142: HIVE-15841. Upgrade to ORC 1.3.2.

2017-09-11 Thread omalley
Github user omalley closed the pull request at:

https://github.com/apache/hive/pull/142


---


[GitHub] hive pull request #191: HIVE-14309 Shade the contents of hive-orc so that th...

2017-09-11 Thread omalley
Github user omalley closed the pull request at:

https://github.com/apache/hive/pull/191


---


[jira] [Created] (HIVE-17503) CBO: Add "Explain CBO" to print Calcite trees

2017-09-11 Thread Gopal V (JIRA)
Gopal V created HIVE-17503:
--

 Summary: CBO: Add "Explain CBO" to print Calcite trees
 Key: HIVE-17503
 URL: https://issues.apache.org/jira/browse/HIVE-17503
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Affects Versions: 3.0.0
Reporter: Gopal V


The calcite tree is only logged at debug level in Hive right now, which is 
inconvenient to debug the CBO issues with selectivity and join rotations.

The Calcite plans, before being sent to the rest of the optimizers end up 
looking like

{code}
HiveProject(s_store_name=[$0], s_company_id=[$1], s_street_number=[$2], 
s_street_name=[$3], s_street_type=[$4], s_suite_number=[$5], s_city=[$6], 
s_county=[$7], s_state=[$8], s_zip=[$9], 30days=[$10], 3160days=[$11], 
6190days=[$12], 91120days=[$13], 120days=[$14])
HiveAggregate(group=[{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}], agg#0=[sum($10)], 
agg#1=[sum($11)], agg#2=[sum($12)], agg#3=[sum($13)], agg#4=[sum($14)])
  HiveProject($f0=[$14], $f1=[$15], $f2=[$16], $f3=[$17], $f4=[$18], 
$f5=[$19], $f6=[$20], $f7=[$21], $f8=[$22], $f9=[$23], $f10=[CASE(<=(-($8, $4), 
CAST(30):BIGINT), 1, 0)], $f11=[CASE(AND(>(-($8, $4), CAST(30):BIGINT), 
<=(-($8, $4), CAST(60):BIGINT)), 1, 0)], $f12=[CASE(AND(>(-($8, $4), 
CAST(60):BIGINT), <=(-($8, $4), CAST(90):BIGINT)), 1, 0)], 
$f13=[CASE(AND(>(-($8, $4), CAST(90):BIGINT), <=(-($8, $4), CAST(120):BIGINT)), 
1, 0)], $f14=[CASE(>(-($8, $4), CAST(120):BIGINT), 1, 0)])
HiveJoin(condition=[=($2, $13)], joinType=[inner], algorithm=[none], 
cost=[not available])
  HiveJoin(condition=[=($4, $12)], joinType=[inner], algorithm=[none], 
cost=[not available])
HiveJoin(condition=[AND(=($0, $5), =($1, $6), =($3, $7))], 
joinType=[inner], algorithm=[none], cost=[not available])
  HiveProject(ss_item_sk=[$1], ss_customer_sk=[$2], 
ss_store_sk=[$6], ss_ticket_number=[$8], ss_sold_date_sk=[$22])
HiveFilter(condition=[AND(IS NOT NULL($1), IS NOT NULL($2), IS 
NOT NULL($8), IS NOT NULL($6), IS NOT NULL($22))])
  
HiveTableScan(table=[[tpcds_bin_partitioned_orc_1.store_sales]], 
table:alias=[store_sales])
  HiveJoin(condition=[=($3, $4)], joinType=[inner], 
algorithm=[none], cost=[not available])
HiveProject(sr_item_sk=[$1], sr_customer_sk=[$2], 
sr_ticket_number=[$8], sr_returned_date_sk=[$19])
  HiveFilter(condition=[AND(IS NOT NULL($1), IS NOT NULL($2), 
IS NOT NULL($8), IS NOT NULL($19))])

HiveTableScan(table=[[tpcds_bin_partitioned_orc_1.store_returns]], 
table:alias=[store_returns])
HiveProject(d_date_sk=[$0], d_year=[CAST(2000):INTEGER], 
d_moy=[CAST(9):INTEGER])
  HiveFilter(condition=[AND(=($6, 2000), =($8, 9), IS NOT 
NULL($0))])

HiveTableScan(table=[[tpcds_bin_partitioned_orc_1.date_dim]], 
table:alias=[d2])
HiveProject(d_date_sk=[$0])
  HiveFilter(condition=[IS NOT NULL($0)])

HiveTableScan(table=[[tpcds_bin_partitioned_orc_1.date_dim]], 
table:alias=[d1])
  HiveProject(s_store_sk=[$0], s_store_name=[$5], s_company_id=[$16], 
s_street_number=[$18], s_street_name=[$19], s_street_type=[$20], 
s_suite_number=[$21], s_city=[$22], s_county=[$23], s_state=[$24], s_zip=[$25])
HiveFilter(condition=[IS NOT NULL($0)])
  HiveTableScan(table=[[tpcds_bin_partitioned_orc_1.store]], 
table:alias=[store])
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17502) Reuse of default session should not throw an exception in LLAP w/ Tez

2017-09-11 Thread Thai Bui (JIRA)
Thai Bui created HIVE-17502:
---

 Summary: Reuse of default session should not throw an exception in 
LLAP w/ Tez
 Key: HIVE-17502
 URL: https://issues.apache.org/jira/browse/HIVE-17502
 Project: Hive
  Issue Type: Bug
  Components: llap, Tez
Affects Versions: 2.2.0, 2.1.1
 Environment: HDP 2.6.1.0-129, Hue 4
Reporter: Thai Bui
Assignee: Thai Bui


Hive2 w/ LLAP on Tez doesn't allow a currently used, default session to be 
skipped mostly because of this line 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java#L365.

However, some clients such as Hue 4, allow multiple sessions to be used per 
user. Under this configuration, a Thrift client will send a request to either 
reuse or open a new session. The reuse request could include the session id of 
a currently used snippet being executed in Hue, this causes HS2 to throw an 
exception:

{noformat}
2017-09-10T17:51:36,548 INFO  [Thread-89]: tez.TezSessionPoolManager 
(TezSessionPoolManager.java:canWorkWithSameSession(512)) - The current user: 
hive, session user: hive
2017-09-10T17:51:36,549 ERROR [Thread-89]: exec.Task 
(TezTask.java:execute(232)) - Failed to execute tez graph.
org.apache.hadoop.hive.ql.metadata.HiveException: The pool session 
sessionId=5b61a578-6336-41c5-860d-9838166f97fe, queueName=llap, user=hive, 
doAs=false, isOpen=true, isDefault=true, expires in 591015330ms should have 
been returned to the pool
at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.canWorkWithSameSession(TezSessionPoolManager.java:534)
 ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.getSession(TezSessionPoolManager.java:544)
 ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:147) 
[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) 
[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) 
[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79) 
[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
{noformat}

Note that every query is issued as a single 'hive' user to share the LLAP 
daemon pool, a set of pre-determined number of AMs is initialized at setup 
time. Thus, HS2 should allow new sessions from a Thrift client to be used out 
of the pool, or an existing session to be skipped and an unused session from 
the pool to be returned. The logic to throw an exception in the  
`canWorkWithSameSession` doesn't make sense to me.

I have a solution to fix this issue in my local branch at 
https://github.com/thaibui/hive/commit/078a521b9d0906fe6c0323b63e567f6eee2f3a70.
 When applied, the log will become like so

{noformat}
2017-09-10T09:15:33,578 INFO  [Thread-239]: tez.TezSessionPoolManager 
(TezSessionPoolManager.java:canWorkWithSameSession(533)) - Skipping default 
session sessionId=6638b1da-0f8a-405e-85f0-9586f484e6de, queueName=llap, 
user=hive, doAs=false, isOpen=true, isDefault=true, expires in 591868732ms 
since it is being used.
{noformat}

A test case is provided in my branch to demonstrate how it works. If possible I 
would like this patch to be applied to version 2.1, 2.2 and master. Since we 
are using 2.1 LLAP in production with Hue 4, this patch is critical to our 
success.

Alternatively, if this patch is too broad in scope, I propose adding an option 
to allow "skipping of currently used default sessions". With this new option 
default to "false", existing behavior won't change unless the option is turned 
on.

I will prepare an official path if this change to master &/ the other branches 
is acceptable. I'm not an contributor &/ committer, this will be my first time 
contributing to Hive and the Apache foundation. Any early review is greatly 
appreciated, thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17501) Hive should allow default Tez sessions to be skipped

2017-09-11 Thread Thai Bui (JIRA)
Thai Bui created HIVE-17501:
---

 Summary: Hive should allow default Tez sessions to be skipped
 Key: HIVE-17501
 URL: https://issues.apache.org/jira/browse/HIVE-17501
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 2.2.0, 2.1.1
 Environment: HDP 2.6.1.0-129, Hue 4.0
Reporter: Thai Bui


Hive2 w/ LLAP on Tez doesn't allow a currently used, default session to be 
skipped mostly because of this line 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java#L365.

However, some clients such as Hue 4, allow multiple sessions to be used per 
user. Under this configuration, a Thrift client will send a request to either 
reuse or open a new session. The reuse request could include the session id of 
a currently used snippet being executed in Hue, this causes HS2 to throw an 
exception:

{noformat}
2017-09-10T17:51:36,548 INFO  [Thread-89]: tez.TezSessionPoolManager 
(TezSessionPoolManager.java:canWorkWithSameSession(512)) - The current user: 
hive, session user: hive
2017-09-10T17:51:36,549 ERROR [Thread-89]: exec.Task 
(TezTask.java:execute(232)) - Failed to execute tez graph.
org.apache.hadoop.hive.ql.metadata.HiveException: The pool session 
sessionId=5b61a578-6336-41c5-860d-9838166f97fe, queueName=llap, user=hive, 
doAs=false, isOpen=true, isDefault=true, expires in 591015330ms should have 
been returned to the pool
at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.canWorkWithSameSession(TezSessionPoolManager.java:534)
 ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.getSession(TezSessionPoolManager.java:544)
 ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:147) 
[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) 
[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) 
[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79) 
[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
{noformat}

Note that every query is issued as a single 'hive' user to share the LLAP 
daemon pool, a set of pre-determined number of AMs is initialized at setup 
time. Thus, HS2 should allow new sessions from a Thrift client to be used out 
of the pool, or an existing session to be skipped and an unused session from 
the pool to be returned. The logic to throw an exception in the  
`canWorkWithSameSession` doesn't make sense to me.

I have a solution to fix this issue in my local branch at 
https://github.com/thaibui/hive/commit/078a521b9d0906fe6c0323b63e567f6eee2f3a70.
 When applied, the log will become like so

{noformat}
2017-09-10T09:15:33,578 INFO  [Thread-239]: tez.TezSessionPoolManager 
(TezSessionPoolManager.java:canWorkWithSameSession(533)) - Skipping default 
session sessionId=6638b1da-0f8a-405e-85f0-9586f484e6de, queueName=llap, 
user=hive, doAs=false, isOpen=true, isDefault=true, expires in 591868732ms 
since it is being used.
{noformat}

A test case is provided in my branch to demonstrate how it works. If possible I 
would like this patch to be applied to version 2.1, 2.2 and master. Since we 
are using 2.1 LLAP in production with Hue 4, this patch is critical to our 
success.

Alternatively, if this patch is too broad in scope, I propose adding an option 
to allow "skipping of currently used default sessions". With this new option 
default to "false", existing behavior won't change unless the option is turned 
on.

P/s: I'm not an contributor &/ committer, this will be my first time 
contributing to Hive and the Apache foundation. Any early review is greatly 
appreciated, thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17500) Hive LLAP service is not finding TEZ tar

2017-09-11 Thread Rajesh Narayanan (JIRA)
Rajesh Narayanan created HIVE-17500:
---

 Summary: Hive LLAP service is not finding TEZ tar
 Key: HIVE-17500
 URL: https://issues.apache.org/jira/browse/HIVE-17500
 Project: Hive
  Issue Type: Bug
  Components: Hive, HiveServer2, llap
Affects Versions: 2.2.0
 Environment: Linux 7
Java 8
Hadoop 2.7.3
Hive 2.2.2
Reporter: Rajesh Narayanan


configured Hadoop 2.7.3 and and Hive 2.2.0. Tez 0.9.0. Hive queries are running 
successfully in yarn as Tez. when i try to start hive llap like below command 
getting below exception.
./hive --service llap --name @llap --instances 1 --cache 1024m --xmx 2048m 
--size 3225m  --loglevel DEBUG --args " -XX:+UseG1GC -XX:+ResizeTLAB 
-XX:+UseNUMA  -XX:-ResizePLAB"

Failed: java.io.FileNotFoundException: 
/tmp/staging-slider-HHIwk3/lib/tez.tar.gz (Is a directory)
java.util.concurrent.ExecutionException: java.io.FileNotFoundException: 
/tmp/staging-slider-HHIwk3/lib/tez.tar.gz (Is a directory)
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.hadoop.hive.llap.cli.LlapServiceDriver.run(LlapServiceDriver.java:605)
at 
org.apache.hadoop.hive.llap.cli.LlapServiceDriver.main(LlapServiceDriver.java:113)
Caused by: java.io.FileNotFoundException: 
/tmp/staging-slider-HHIwk3/lib/tez.tar.gz (Is a directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at 
org.apache.hadoop.hive.common.CompressionUtils.unTar(CompressionUtils.java:152)
at 
org.apache.hadoop.hive.llap.cli.LlapServiceDriver$1.call(LlapServiceDriver.java:361)
at 
org.apache.hadoop.hive.llap.cli.LlapServiceDriver$1.call(LlapServiceDriver.java:348)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
INFO cli.LlapServiceDriver: LLAP service driver finished

Basically hive llap copied TEZ tar into 
/tmp/staging-slider-HHIwk3/lib/tez.tar.gz/tez.tar.gz and referring into 
/tmp/staging-slider-HHIwk3/lib/tez.tar.gz. so LLAP service is not starting



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 62152: HIVE-17317: Make Dbcp configurable using hive properties in hive-site.xml

2017-09-11 Thread Peter Vary

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62152/#review185067
---


Ship it!




Ship It!

- Peter Vary


On Sept. 8, 2017, 3 p.m., Barna Zsombor Klara wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62152/
> ---
> 
> (Updated Sept. 8, 2017, 3 p.m.)
> 
> 
> Review request for hive and Peter Vary.
> 
> 
> Bugs: HIVE-17317
> https://issues.apache.org/jira/browse/HIVE-17317
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-17317: Make Dbcp configurable using hive properties in hive-site.xml
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/Constants.java 
> 794b697dc005802a3403bd39499e13bcd8cb2f99 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> cf3f50ba64a28e63b58badcc2bce7738bf434245 
>   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
> 0db1bc059c0f6a36e721d441dbd466736d270eca 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/datasource/BoneCPDataSourceProvider.java
>  34765b0b2f34698a3ba29751a65a108e4c997502 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/datasource/DataSourceProviderFactory.java
>  1eb792ce4503dfd82ce5660a39a5f33c1db86913 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/datasource/DbCPDataSourceProvider.java
>  PRE-CREATION 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/datasource/HikariCPDataSourceProvider.java
>  9b3d6d5d7078301254a4cff0a0d8e5de44d03bc3 
>   metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 
> 1887c052be1e535539cc5ba4c634fa28dfc22f9d 
>   
> metastore/src/test/org/apache/hadoop/hive/metastore/datasource/TestDataSourceProviderFactory.java
>  daea544c7126fad26f02e39a95ea0bc0e4847387 
> 
> 
> Diff: https://reviews.apache.org/r/62152/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Barna Zsombor Klara
> 
>



Re: Review Request 62152: HIVE-17317: Make Dbcp configurable using hive properties in hive-site.xml

2017-09-11 Thread Peter Vary


> On Sept. 8, 2017, 9:04 a.m., Peter Vary wrote:
> > metastore/src/test/org/apache/hadoop/hive/metastore/datasource/TestDataSourceProviderFactory.java
> > Lines 180 (patched)
> > 
> >
> > nit: Do we need testSetDbcpNumberProperty, testSetDbcpStringProperty, 
> > testSetDbcpBooleanProperty?
> > Wouldn't it be nice to have this as a parametrized test? Or this would 
> > be an overkill?
> 
> Barna Zsombor Klara wrote:
> I would need to pass in the new value, the property name and the method 
> name to check on the created datasource, and then I would need use 
> reflections to call the method. I think this would seriously reduce the 
> readability of the test method. But if you have an idea how I could keep 
> readability and have fewer cases, then I'd love to change it.

Yeah, I have missed that point where we use specific methods to check the data 
source parameter settings.
In this case I agree with you, that the readibility requires as to use specific 
methods.

Thanks,
Peter


- Peter


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62152/#review184959
---


On Sept. 8, 2017, 3 p.m., Barna Zsombor Klara wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62152/
> ---
> 
> (Updated Sept. 8, 2017, 3 p.m.)
> 
> 
> Review request for hive and Peter Vary.
> 
> 
> Bugs: HIVE-17317
> https://issues.apache.org/jira/browse/HIVE-17317
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-17317: Make Dbcp configurable using hive properties in hive-site.xml
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/Constants.java 
> 794b697dc005802a3403bd39499e13bcd8cb2f99 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> cf3f50ba64a28e63b58badcc2bce7738bf434245 
>   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
> 0db1bc059c0f6a36e721d441dbd466736d270eca 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/datasource/BoneCPDataSourceProvider.java
>  34765b0b2f34698a3ba29751a65a108e4c997502 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/datasource/DataSourceProviderFactory.java
>  1eb792ce4503dfd82ce5660a39a5f33c1db86913 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/datasource/DbCPDataSourceProvider.java
>  PRE-CREATION 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/datasource/HikariCPDataSourceProvider.java
>  9b3d6d5d7078301254a4cff0a0d8e5de44d03bc3 
>   metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 
> 1887c052be1e535539cc5ba4c634fa28dfc22f9d 
>   
> metastore/src/test/org/apache/hadoop/hive/metastore/datasource/TestDataSourceProviderFactory.java
>  daea544c7126fad26f02e39a95ea0bc0e4847387 
> 
> 
> Diff: https://reviews.apache.org/r/62152/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Barna Zsombor Klara
> 
>



[GitHub] hive pull request #246: HIVE-17426: Execution framework in hive to run tasks...

2017-09-11 Thread anishek
GitHub user anishek opened a pull request:

https://github.com/apache/hive/pull/246

HIVE-17426: Execution framework in hive to run tasks in parallel other than 
MR Tasks



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/anishek/hive HIVE-17426

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/246.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #246


commit 6f37cae63de262b6fde9093c88c7ebafee52297f
Author: Anishek Agarwal 
Date:   2017-09-01T20:46:45Z

HIVE-17426: Execution framework in hive to run tasks in parallel other than 
MR Tasks




---


[jira] [Created] (HIVE-17499) Hive Cube Operator returns duplicate rows

2017-09-11 Thread Johannes Mayer (JIRA)
Johannes Mayer created HIVE-17499:
-

 Summary: Hive Cube Operator returns duplicate rows
 Key: HIVE-17499
 URL: https://issues.apache.org/jira/browse/HIVE-17499
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.1.0
 Environment: Hortonworks HDP 2.6.0.1


Reporter: Johannes Mayer
Priority: Critical


The cube Operator returns duplicate rows, when it shouldnt. I ran the same 
query in Pig and got the correct result. (see the example below)


{code:sql}
insert overwrite table thesis.clickstream_export PARTITION (ds_year = '2016' , 
ds_month = '04' , ds_day = '01')
select year(ds), month(ds), day(ds), c8, c11, count(*)
from thesis.clickstream_landing
where ds = '2016-04-01'
group by year(ds), month(ds), day(ds), c8, c11
With Cube;
{code}

Then I check for duplicates:

{code:sql}
select year, month, day, country, city, count (*) from thesis.clickstream_export
where ds_year = '2016' and ds_month = '04' and ds_day = '01'
group by year, month, day, country, city
having count(*) > 1;
{code}

The result is:
yearmonth   day country city_c5
nullnullnullnullnull4
nullnull1   nullnull4
null4   nullnullnull4
null4   1   nullnull4
2016nullnullnullnull4
2016null1   nullnull4
20164   nullnullnull4
20164   1   nullnull4


When i do the same thing in Pig, everything is fine:
{code:pig}
DATA = LOAD 'thesis.clickstream_landing' USING 
org.apache.hive.hcatalog.pig.HCatLoader();

FILTERED = FOREACH DATA GENERATE GetYear(ToDate(ds, '-MM-dd')) AS year, 
GetMonth(ToDate(ds, '-MM-dd')) AS month, GetDay(ToDate(ds, '-MM-dd')) 
AS day, c8 AS country, c11 AS city;

CUBED = CUBE FILTERED BY CUBE(year, month, day, country, city);

D = FOREACH CUBED GENERATE FLATTEN(group) AS (year, month, day, country, city), 
COUNT_STAR(cube) As click_count;

STORE D INTO 'thesis.clickstream_export' USING 
org.apache.hive.hcatalog.pig.HCatStorer('ds_year=2016, ds_month=04, ds_day=02');
{code}

Then again I check for duplicates:
{code:sql}
select year, month, day, country, city, count (*) from thesis.clickstream_export
where ds_year = '2016' and ds_month = '04' and ds_day = '02'
group by year, month, day, country, city
having count(*) > 1;
{code}

And the result is empty as it should be.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17498) Does hive have mr-nativetask support refer to MAPREDUCE-2841

2017-09-11 Thread Feng Yuan (JIRA)
Feng Yuan created HIVE-17498:


 Summary: Does hive have mr-nativetask support refer to 
MAPREDUCE-2841
 Key: HIVE-17498
 URL: https://issues.apache.org/jira/browse/HIVE-17498
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Feng Yuan


I try to implement a HivePlatform extends 
org.apache.hadoop.mapred.nativetask.Platform.
{code}
//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by Fernflower decompiler)
//

package org.apache.hadoop.mapred.nativetask;

import java.io.IOException;
import java.util.HashSet;
import java.util.Set;
import org.apache.hadoop.classification.InterfaceAudience.Public;
import org.apache.hadoop.classification.InterfaceStability.Evolving;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.nativetask.serde.INativeSerializer;
import org.apache.hadoop.mapred.nativetask.serde.NativeSerialization;

@Public
@Evolving
public abstract class Platform {
private final NativeSerialization serialization = 
NativeSerialization.getInstance();
protected Set keyClassNames = new HashSet();

public Platform() {
}

public abstract void init() throws IOException;

public abstract String name();

protected void registerKey(String keyClassName, Class key) throws 
IOException {
this.serialization.register(keyClassName, key);
this.keyClassNames.add(keyClassName);
}

protected abstract boolean support(String var1, INativeSerializer var2, 
JobConf var3);

protected abstract boolean define(Class var1);
}
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)