[jira] [Created] (HIVE-17474) Different physical plan of same query(TPC-DS/70) on HOS

2017-09-06 Thread liyunzhang_intel (JIRA)
liyunzhang_intel created HIVE-17474:
---

 Summary: Different physical plan of same query(TPC-DS/70) on HOS
 Key: HIVE-17474
 URL: https://issues.apache.org/jira/browse/HIVE-17474
 Project: Hive
  Issue Type: Bug
Reporter: liyunzhang_intel


in 
[DS/query70|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query70.sql].
 On hive version(d3b88f6),  i found that the physical plan is different in 
runtime with the same settings.

sometimes the physical plan
{code}
TS[0]-FIL[63]-SEL[2]-RS[43]-JOIN[45]-RS[46]-JOIN[48]-SEL[49]-GBY[50]-RS[51]-GBY[52]-SEL[53]-RS[54]-SEL[55]-PTF[56]-SEL[57]-RS[59]-SEL[60]-LIM[61]-FS[62]
TS[3]-FIL[64]-SEL[5]-RS[44]-JOIN[45]
TS[6]-FIL[65]-SEL[8]-RS[39]-JOIN[41]-RS[47]-JOIN[48]
TS[9]-FIL[67]-SEL[11]-RS[18]-JOIN[20]-RS[21]-JOIN[23]-SEL[24]-GBY[25]-RS[26]-GBY[27]-RS[29]-SEL[30]-PTF[31]-FIL[66]-SEL[32]-GBY[38]-RS[40]-JOIN[41]
TS[12]-FIL[68]-SEL[14]-RS[19]-JOIN[20]
TS[15]-FIL[69]-SEL[17]-RS[22]-JOIN[23]
{code}
 TS\[6\] connects with TS\[9\] on JOIN\[41\] and connects with TS\[0\] on 
JOIN\[48\].

sometimes 
{code}
TS[0]-FIL[63]-RS[3]-JOIN[6]-RS[8]-JOIN[11]-RS[41]-JOIN[44]-SEL[46]-GBY[47]-RS[48]-GBY[49]-RS[50]-GBY[51]-RS[52]-SEL[53]-PTF[54]-SEL[55]-RS[57]-SEL[58]-LIM[59]-FS[60]
TS[1]-FIL[64]-RS[5]-JOIN[6]
TS[2]-FIL[65]-RS[10]-JOIN[11]
TS[12]-FIL[68]-RS[16]-JOIN[19]-RS[20]-JOIN[23]-FIL[67]-SEL[25]-GBY[26]-RS[27]-GBY[28]-RS[29]-GBY[30]-RS[31]-SEL[32]-PTF[33]-FIL[66]-SEL[34]-GBY[39]-RS[43]-JOIN[44]
TS[13]-FIL[69]-RS[18]-JOIN[19]
TS[14]-FIL[70]-RS[22]-JOIN[23]
{code}
TS\[2\] connects with TS\[0\] on JOIN\[11\]

Although TS\[2\] and TS\[6\] has different operator id, they are table store in 
the query.

The difference causes different spark execution plan and different execution 
time.  I'm very confused why there are different physical plan with same 
setting. Can anyone know where to investigate the root cause?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17473) Hive WM: implement workload management pools

2017-09-06 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-17473:
---

 Summary: Hive WM: implement workload management pools
 Key: HIVE-17473
 URL: https://issues.apache.org/jira/browse/HIVE-17473
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17472) Drop-partition for multi-level partition fails, if data does not exist.

2017-09-06 Thread Mithun Radhakrishnan (JIRA)
Mithun Radhakrishnan created HIVE-17472:
---

 Summary: Drop-partition for multi-level partition fails, if data 
does not exist.
 Key: HIVE-17472
 URL: https://issues.apache.org/jira/browse/HIVE-17472
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Mithun Radhakrishnan
Assignee: Chris Drome


Raising this on behalf of [~cdrome] and [~selinazh]. 

Here's how to reproduce the problem:

{code:sql}
CREATE TABLE foobar ( foo STRING, bar STRING ) PARTITIONED BY ( dt STRING, 
region STRING ) STORED AS RCFILE LOCATION '/tmp/foobar';

ALTER TABLE foobar ADD PARTITION ( dt='1', region='A' ) ;

dfs -rm -R -skipTrash /tmp/foobar/dt=1;

ALTER TABLE foobar DROP PARTITION ( dt='1' );
{code}

This causes a client-side error as follows:
{code}
15/02/26 23:08:32 ERROR exec.DDLTask: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unknown error. Please check 
logs.
{code}





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17471) Vectorization: Enable hive.vectorized.row.identifier.enabled to true by default

2017-09-06 Thread Matt McCline (JIRA)
Matt McCline created HIVE-17471:
---

 Summary: Vectorization: Enable 
hive.vectorized.row.identifier.enabled to true by default
 Key: HIVE-17471
 URL: https://issues.apache.org/jira/browse/HIVE-17471
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Teddy Choi


We set it disabled in https://issues.apache.org/jira/browse/HIVE-17116 
"Vectorization: Add infrastructure for vectorization of ROW__ID struct"

But forgot to turn it on to true by default in Teddy's ACID ROW__ID work... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17470) eliminate potential vector copies when merging ACID deltas in LLAP IO path

2017-09-06 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-17470:
---

 Summary: eliminate potential vector copies when merging ACID 
deltas in LLAP IO path
 Key: HIVE-17470
 URL: https://issues.apache.org/jira/browse/HIVE-17470
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin


See the comments on HIVE-12631. Probably LlapRecordReader should be able to 
receive VRBs directly; that or ACID reader should be able to operate on either 
CVB or VRB.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17469) The HiveMetaStoreClient should randomize the connection to HMS HA

2017-09-06 Thread JIRA
Sergio Peña created HIVE-17469:
--

 Summary: The HiveMetaStoreClient should randomize the connection 
to HMS HA
 Key: HIVE-17469
 URL: https://issues.apache.org/jira/browse/HIVE-17469
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.1.1
Reporter: Sergio Peña


In an environment with multiple HMS servers, the HiveMetaStoreClient class 
selects the 1st URI to connect on every open() connection. We should randomize 
that connection to help balancing the HMS servers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17468) Shade and package appropriate jackson version for druid storage handler

2017-09-06 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17468:
-

 Summary: Shade and package appropriate jackson version for druid 
storage handler
 Key: HIVE-17468
 URL: https://issues.apache.org/jira/browse/HIVE-17468
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
 Fix For: 3.0.0


Currently we are excluding all the jackson core dependencies coming from druid. 
This is wrong in my opinion since this will lead to the packaging of unwanted 
jackson library from other projects.
As you can see the file hive-druid-deps.txt currently jacskon core is coming 
from calcite and the version is 2.6.3 which is very different from 2.4.6 used 
by druid. This patch exclude the unwanted jars and make sure to bring in druid 
jackson dependency from druid it self.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17467) HCatClient APIs for discovering partition key-values

2017-09-06 Thread Mithun Radhakrishnan (JIRA)
Mithun Radhakrishnan created HIVE-17467:
---

 Summary: HCatClient APIs for discovering partition key-values
 Key: HIVE-17467
 URL: https://issues.apache.org/jira/browse/HIVE-17467
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog, Metastore
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan


This is a followup to HIVE-17466, which adds the {{HiveMetaStore}} level call 
to retrieve unique combinations of part-key values that satisfy a specified 
predicate.

Attached herewith are the {{HCatClient}} APIs that will be used by Apache 
Oozie, before launching workflows.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 62108: HIVE-17387 implement Tez AM registry in Hive

2017-09-06 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62108/
---

(Updated Sept. 6, 2017, 8:25 p.m.)


Review request for hive and Gunther Hagleitner.


Repository: hive-git


Description
---

see jira


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java cf3f50ba64 
  
llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java
 65f8f945aa 
  llap-client/src/java/org/apache/hadoop/hive/registry/impl/TezAmInstance.java 
PRE-CREATION 
  
llap-client/src/java/org/apache/hadoop/hive/registry/impl/TezAmRegistryImpl.java
 PRE-CREATION 
  llap-client/src/java/org/apache/hadoop/hive/registry/impl/ZkRegistryBase.java 
c7737706c6 
  
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
 cf8bd469dc 
  
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/endpoint/LlapPluginServerImpl.java
 f3c0d5213f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SessionExpirationTracker.java 
8bee77ea72 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java 4f58565a4c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java 
1f4705c083 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolSession.java 
005eeedc02 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java 
fe5c6a1e45 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java f1f10286a3 
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java d7592bb966 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/SampleTezSessionState.java 
973c0cc630 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezSessionPool.java 
d2b98c46ca 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezTask.java 176692b6e5 


Diff: https://reviews.apache.org/r/62108/diff/2/

Changes: https://reviews.apache.org/r/62108/diff/1-2/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Created] (HIVE-17466) Metastore API to list unique partition-key-value combinations

2017-09-06 Thread Mithun Radhakrishnan (JIRA)
Mithun Radhakrishnan created HIVE-17466:
---

 Summary: Metastore API to list unique partition-key-value 
combinations
 Key: HIVE-17466
 URL: https://issues.apache.org/jira/browse/HIVE-17466
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Affects Versions: 2.2.0, 3.0.0
Reporter: Mithun Radhakrishnan
Assignee: Thiruvel Thirumoolan


Raising this on behalf of [~thiruvel], who wrote this initially as part of a 
tangential "data-discovery" system.

Programs like Apache Oozie, Apache Falcon (or Yahoo GDM), etc. launch workflows 
based on the availability of table/partitions. Partitions are currently 
discovered by listing partitions using (what boils down to) 
{{HiveMetaStoreClient.listPartitions()}}. This can be slow and cumbersome, 
given that {{Partition}} objects are heavyweight and carry redundant 
information. The alternative is to use partition-names, which will need 
client-side parsing to extract part-key values.

When checking which hourly partitions for a particular day have been published 
already, it would be preferable to have an API that pushed down part-key 
extraction into the {{RawStore}} layer, and returned key-values as the result. 
This would be similar to how {{SELECT DISTINCT part_key FROM my_table;}} would 
run, but at the {{HiveMetaStoreClient}} level.

Here's what we've been using at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17465) Statistics: Drill-down filters don't reduce row-counts progressively

2017-09-06 Thread Gopal V (JIRA)
Gopal V created HIVE-17465:
--

 Summary: Statistics: Drill-down filters don't reduce row-counts 
progressively
 Key: HIVE-17465
 URL: https://issues.apache.org/jira/browse/HIVE-17465
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V


{code}
explain select count(d_date_sk) from date_dim where d_year=2001 ;
explain select count(d_date_sk) from date_dim where d_year=2001  and d_moy = 9;
explain select count(d_date_sk) from date_dim where d_year=2001 and d_moy = 9 
and d_dom = 21;
{code}

All 3 queries end up with the same row-count estimates after the filter.

{code}
Map Operator Tree:
TableScan
  alias: date_dim
  filterExpr: (d_year = 2001) (type: boolean)
  Statistics: Num rows: 73049 Data size: 82034027 Basic stats: 
COMPLETE Column stats: COMPLETE
  Filter Operator
predicate: (d_year = 2001) (type: boolean)
Statistics: Num rows: 363 Data size: 4356 Basic stats: 
COMPLETE Column stats: COMPLETE
 
Map 1 
Map Operator Tree:
TableScan
  alias: date_dim
  filterExpr: ((d_year = 2001) and (d_moy = 9)) (type: boolean)
  Statistics: Num rows: 73049 Data size: 82034027 Basic stats: 
COMPLETE Column stats: COMPLETE
  Filter Operator
predicate: ((d_year = 2001) and (d_moy = 9)) (type: boolean)
Statistics: Num rows: 363 Data size: 5808 Basic stats: 
COMPLETE Column stats: COMPLETE
Map 1 
Map Operator Tree:
TableScan
  alias: date_dim
  filterExpr: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
21)) (type: boolean)
  Statistics: Num rows: 73049 Data size: 82034027 Basic stats: 
COMPLETE Column stats: COMPLETE
  Filter Operator
predicate: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
21)) (type: boolean)
Statistics: Num rows: 363 Data size: 7260 Basic stats: 
COMPLETE Column stats: COMPLETE
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17464) Fix to be able to disable max shuffle size DHJ config

2017-09-06 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-17464:
--

 Summary: Fix to be able to disable max shuffle size DHJ config
 Key: HIVE-17464
 URL: https://issues.apache.org/jira/browse/HIVE-17464
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 3.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Setting {{hive.auto.convert.join.shuffle.max.size}} to -1 does not work as 
expected.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17463) ORC: include orc-shims in hive-exec.jar

2017-09-06 Thread Gopal V (JIRA)
Gopal V created HIVE-17463:
--

 Summary: ORC: include orc-shims in hive-exec.jar
 Key: HIVE-17463
 URL: https://issues.apache.org/jira/browse/HIVE-17463
 Project: Hive
  Issue Type: Bug
  Components: ORC
Affects Versions: 3.0.0
Reporter: Gopal V






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17462) hive_1.2.1 memory leak

2017-09-06 Thread gehaijiang (JIRA)
gehaijiang created HIVE-17462:
-

 Summary: hive_1.2.1  memory leak
 Key: HIVE-17462
 URL: https://issues.apache.org/jira/browse/HIVE-17462
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.1
 Environment: hive  version  1.2.1

Reporter: gehaijiang


hiveserver2  memory leak

hive user third UDF  (vs-1.0.2-SNAPSHOT.jar , 
alogdata-1.0.3-SNAPSHOT-jar-with-dependencies.jar  . and so on )

lr-x-- 1 data data 64 Sep  5 18:37 964 -> 
/tmp/9e38cc04-5693-474b-9c7d-bfdd978bcbb4_resources/vs-1.0.2-SNAPSHOT.jar 
(deleted)
lr-x-- 1 data data 64 Sep  6 10:41 965 -> 
/tmp/188bbf2a-d8a5-48a7-81fc-b807f9ff201d_resources/alogdata-1.0.3-SNAPSHOT-jar-with-dependencies.jar
 (deleted)
lr-x-- 1 data data 64 Sep  6 17:41 97 -> 
/home/data/programs/hadoop-2.7.1/share/hadoop/hdfs/lib/jsr305-3.0.0.jar
lrwx-- 1 data data 64 Sep  5 18:37 975 -> socket:[1318353317]
lr-x-- 1 data data 64 Sep  6 02:38 977 -> 
/tmp/64e309dc-352f-4ba4-b871-1aa78fe05945_resources/alogdata-1.0.3-SNAPSHOT-jar-with-dependencies.jar
 (deleted)
lr-x-- 1 data data 64 Sep  6 17:41 98 -> 
/home/data/programs/hadoop-2.7.1/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar
lrwx-- 1 data data 64 Sep  6 08:40 983 -> socket:[1299459344]
lr-x-- 1 data data 64 Sep  5 19:37 987 -> 
/tmp/c3054987-c9c6-468a-8b5c-6e20b1972e0b_resources/alogdata-1.0.3-SNAPSHOT-jar-with-dependencies.jar
 (deleted)
lr-x-- 1 data data 64 Sep  6 17:41 99 -> 
/home/data/programs/hadoop-2.7.1/share/hadoop/hdfs/lib/guava-11.0.2.jar
lr-x-- 1 data data 64 Sep  6 08:40 994 -> 
/tmp/fc5c44b3-9bd8-4a32-a39a-66cd44032fee_resources/alogdata-1.0.3-SNAPSHOT-jar-with-dependencies.jar
 (deleted)
lr-x-- 1 data data 64 Sep  6 06:39 996 -> 
/tmp/3b3c2bd6-0a0e-4599-b757-4a048a968457_resources/alogdata-1.0.3-SNAPSHOT-jar-with-dependencies.jar
 (deleted)
lr-x-- 1 data data 64 Sep  5 17:36 999 -> 
/tmp/6ad76494-cdda-430b-b7d0-2213731655a8_resources/alogdata-1.0.3-SNAPSHOT-jar-with-dependencies.jar
 (deleted)

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
20084 data  20   0 13.6g  11g 533m S 62.3  9.2   6619:16 java



/home/data/programs/jdk/jdk-current/bin/java-Djava.net.preferIPv4Stack=true-Dhadoop.log.dir=/home/data/hadoop/logs-Dhadoop.log.file=hadoop.log-Dhadoop.home.dir=/home/data/programs/hadoop-2.7.1-Dhadoop.id.str=data-Dhadoop.root.logger=INFO,DRFA-Djava.library.path=/home/data/programs/hadoop-2.7.1/lib/native-Dhadoop.policy.file=hadoop-policy.xml-Djava.net.preferIPv4Stack=true-XX:+UseConcMarkSweepGC-Xms8g-Xmx8g-Dhadoop.security.logger=INFO,NullAppenderorg.apache.hadoop.util.RunJar/home/data/programs/hive-current/lib/hive-service-1.2.1.jarorg.apache.hive.service.server.HiveServer2--hiveconfhive.log.file=hiveserver2.log



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17461) Beeline should detect conflicting authentication methods

2017-09-06 Thread Peter Bacsko (JIRA)
Peter Bacsko created HIVE-17461:
---

 Summary: Beeline should detect conflicting authentication methods
 Key: HIVE-17461
 URL: https://issues.apache.org/jira/browse/HIVE-17461
 Project: Hive
  Issue Type: Improvement
  Components: Beeline, JDBC
Reporter: Peter Bacsko


In Oozie, we pass "-a delegationToken" in the command line when we invoke 
Beeline in a Hive action.

In one of our test, we accidentally defined "principal=" in the JDBC URL. In 
this case, HiveConnection ignored the delegation token setting and tried to 
authenticate via Kerberos, which didn't work inside a YARN container. We found 
this behavior very confusing.

So either BeeLine itself should detect such inconsistencies or alternatively 
HiveConnection could take care of it. Looking at the code, the "-a 
delegationToken" does not matter that much inside 
HiveConnection.createBinaryTransport() - if there's a principal, it will use 
Kerberos, then it looks for delegation tokens, then finally it falls back to 
plain authentication.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)