[jira] [Created] (HIVE-17474) Different physical plan of same query(TPC-DS/70) on HOS
liyunzhang_intel created HIVE-17474: --- Summary: Different physical plan of same query(TPC-DS/70) on HOS Key: HIVE-17474 URL: https://issues.apache.org/jira/browse/HIVE-17474 Project: Hive Issue Type: Bug Reporter: liyunzhang_intel in [DS/query70|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query70.sql]. On hive version(d3b88f6), i found that the physical plan is different in runtime with the same settings. sometimes the physical plan {code} TS[0]-FIL[63]-SEL[2]-RS[43]-JOIN[45]-RS[46]-JOIN[48]-SEL[49]-GBY[50]-RS[51]-GBY[52]-SEL[53]-RS[54]-SEL[55]-PTF[56]-SEL[57]-RS[59]-SEL[60]-LIM[61]-FS[62] TS[3]-FIL[64]-SEL[5]-RS[44]-JOIN[45] TS[6]-FIL[65]-SEL[8]-RS[39]-JOIN[41]-RS[47]-JOIN[48] TS[9]-FIL[67]-SEL[11]-RS[18]-JOIN[20]-RS[21]-JOIN[23]-SEL[24]-GBY[25]-RS[26]-GBY[27]-RS[29]-SEL[30]-PTF[31]-FIL[66]-SEL[32]-GBY[38]-RS[40]-JOIN[41] TS[12]-FIL[68]-SEL[14]-RS[19]-JOIN[20] TS[15]-FIL[69]-SEL[17]-RS[22]-JOIN[23] {code} TS\[6\] connects with TS\[9\] on JOIN\[41\] and connects with TS\[0\] on JOIN\[48\]. sometimes {code} TS[0]-FIL[63]-RS[3]-JOIN[6]-RS[8]-JOIN[11]-RS[41]-JOIN[44]-SEL[46]-GBY[47]-RS[48]-GBY[49]-RS[50]-GBY[51]-RS[52]-SEL[53]-PTF[54]-SEL[55]-RS[57]-SEL[58]-LIM[59]-FS[60] TS[1]-FIL[64]-RS[5]-JOIN[6] TS[2]-FIL[65]-RS[10]-JOIN[11] TS[12]-FIL[68]-RS[16]-JOIN[19]-RS[20]-JOIN[23]-FIL[67]-SEL[25]-GBY[26]-RS[27]-GBY[28]-RS[29]-GBY[30]-RS[31]-SEL[32]-PTF[33]-FIL[66]-SEL[34]-GBY[39]-RS[43]-JOIN[44] TS[13]-FIL[69]-RS[18]-JOIN[19] TS[14]-FIL[70]-RS[22]-JOIN[23] {code} TS\[2\] connects with TS\[0\] on JOIN\[11\] Although TS\[2\] and TS\[6\] has different operator id, they are table store in the query. The difference causes different spark execution plan and different execution time. I'm very confused why there are different physical plan with same setting. Can anyone know where to investigate the root cause? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17473) Hive WM: implement workload management pools
Sergey Shelukhin created HIVE-17473: --- Summary: Hive WM: implement workload management pools Key: HIVE-17473 URL: https://issues.apache.org/jira/browse/HIVE-17473 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17472) Drop-partition for multi-level partition fails, if data does not exist.
Mithun Radhakrishnan created HIVE-17472: --- Summary: Drop-partition for multi-level partition fails, if data does not exist. Key: HIVE-17472 URL: https://issues.apache.org/jira/browse/HIVE-17472 Project: Hive Issue Type: Bug Components: Metastore Reporter: Mithun Radhakrishnan Assignee: Chris Drome Raising this on behalf of [~cdrome] and [~selinazh]. Here's how to reproduce the problem: {code:sql} CREATE TABLE foobar ( foo STRING, bar STRING ) PARTITIONED BY ( dt STRING, region STRING ) STORED AS RCFILE LOCATION '/tmp/foobar'; ALTER TABLE foobar ADD PARTITION ( dt='1', region='A' ) ; dfs -rm -R -skipTrash /tmp/foobar/dt=1; ALTER TABLE foobar DROP PARTITION ( dt='1' ); {code} This causes a client-side error as follows: {code} 15/02/26 23:08:32 ERROR exec.DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: Unknown error. Please check logs. {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17471) Vectorization: Enable hive.vectorized.row.identifier.enabled to true by default
Matt McCline created HIVE-17471: --- Summary: Vectorization: Enable hive.vectorized.row.identifier.enabled to true by default Key: HIVE-17471 URL: https://issues.apache.org/jira/browse/HIVE-17471 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Teddy Choi We set it disabled in https://issues.apache.org/jira/browse/HIVE-17116 "Vectorization: Add infrastructure for vectorization of ROW__ID struct" But forgot to turn it on to true by default in Teddy's ACID ROW__ID work... -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17470) eliminate potential vector copies when merging ACID deltas in LLAP IO path
Sergey Shelukhin created HIVE-17470: --- Summary: eliminate potential vector copies when merging ACID deltas in LLAP IO path Key: HIVE-17470 URL: https://issues.apache.org/jira/browse/HIVE-17470 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin See the comments on HIVE-12631. Probably LlapRecordReader should be able to receive VRBs directly; that or ACID reader should be able to operate on either CVB or VRB. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17469) The HiveMetaStoreClient should randomize the connection to HMS HA
Sergio Peña created HIVE-17469: -- Summary: The HiveMetaStoreClient should randomize the connection to HMS HA Key: HIVE-17469 URL: https://issues.apache.org/jira/browse/HIVE-17469 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.1.1 Reporter: Sergio Peña In an environment with multiple HMS servers, the HiveMetaStoreClient class selects the 1st URI to connect on every open() connection. We should randomize that connection to help balancing the HMS servers. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17468) Shade and package appropriate jackson version for druid storage handler
slim bouguerra created HIVE-17468: - Summary: Shade and package appropriate jackson version for druid storage handler Key: HIVE-17468 URL: https://issues.apache.org/jira/browse/HIVE-17468 Project: Hive Issue Type: Bug Reporter: slim bouguerra Fix For: 3.0.0 Currently we are excluding all the jackson core dependencies coming from druid. This is wrong in my opinion since this will lead to the packaging of unwanted jackson library from other projects. As you can see the file hive-druid-deps.txt currently jacskon core is coming from calcite and the version is 2.6.3 which is very different from 2.4.6 used by druid. This patch exclude the unwanted jars and make sure to bring in druid jackson dependency from druid it self. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17467) HCatClient APIs for discovering partition key-values
Mithun Radhakrishnan created HIVE-17467: --- Summary: HCatClient APIs for discovering partition key-values Key: HIVE-17467 URL: https://issues.apache.org/jira/browse/HIVE-17467 Project: Hive Issue Type: New Feature Components: HCatalog, Metastore Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan This is a followup to HIVE-17466, which adds the {{HiveMetaStore}} level call to retrieve unique combinations of part-key values that satisfy a specified predicate. Attached herewith are the {{HCatClient}} APIs that will be used by Apache Oozie, before launching workflows. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: Review Request 62108: HIVE-17387 implement Tez AM registry in Hive
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/62108/ --- (Updated Sept. 6, 2017, 8:25 p.m.) Review request for hive and Gunther Hagleitner. Repository: hive-git Description --- see jira Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java cf3f50ba64 llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java 65f8f945aa llap-client/src/java/org/apache/hadoop/hive/registry/impl/TezAmInstance.java PRE-CREATION llap-client/src/java/org/apache/hadoop/hive/registry/impl/TezAmRegistryImpl.java PRE-CREATION llap-client/src/java/org/apache/hadoop/hive/registry/impl/ZkRegistryBase.java c7737706c6 llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java cf8bd469dc llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/endpoint/LlapPluginServerImpl.java f3c0d5213f ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SessionExpirationTracker.java 8bee77ea72 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java 4f58565a4c ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java 1f4705c083 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolSession.java 005eeedc02 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java fe5c6a1e45 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java f1f10286a3 ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java d7592bb966 ql/src/test/org/apache/hadoop/hive/ql/exec/tez/SampleTezSessionState.java 973c0cc630 ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezSessionPool.java d2b98c46ca ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezTask.java 176692b6e5 Diff: https://reviews.apache.org/r/62108/diff/2/ Changes: https://reviews.apache.org/r/62108/diff/1-2/ Testing --- Thanks, Sergey Shelukhin
[jira] [Created] (HIVE-17466) Metastore API to list unique partition-key-value combinations
Mithun Radhakrishnan created HIVE-17466: --- Summary: Metastore API to list unique partition-key-value combinations Key: HIVE-17466 URL: https://issues.apache.org/jira/browse/HIVE-17466 Project: Hive Issue Type: New Feature Components: Metastore Affects Versions: 2.2.0, 3.0.0 Reporter: Mithun Radhakrishnan Assignee: Thiruvel Thirumoolan Raising this on behalf of [~thiruvel], who wrote this initially as part of a tangential "data-discovery" system. Programs like Apache Oozie, Apache Falcon (or Yahoo GDM), etc. launch workflows based on the availability of table/partitions. Partitions are currently discovered by listing partitions using (what boils down to) {{HiveMetaStoreClient.listPartitions()}}. This can be slow and cumbersome, given that {{Partition}} objects are heavyweight and carry redundant information. The alternative is to use partition-names, which will need client-side parsing to extract part-key values. When checking which hourly partitions for a particular day have been published already, it would be preferable to have an API that pushed down part-key extraction into the {{RawStore}} layer, and returned key-values as the result. This would be similar to how {{SELECT DISTINCT part_key FROM my_table;}} would run, but at the {{HiveMetaStoreClient}} level. Here's what we've been using at Yahoo. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17465) Statistics: Drill-down filters don't reduce row-counts progressively
Gopal V created HIVE-17465: -- Summary: Statistics: Drill-down filters don't reduce row-counts progressively Key: HIVE-17465 URL: https://issues.apache.org/jira/browse/HIVE-17465 Project: Hive Issue Type: Bug Reporter: Gopal V {code} explain select count(d_date_sk) from date_dim where d_year=2001 ; explain select count(d_date_sk) from date_dim where d_year=2001 and d_moy = 9; explain select count(d_date_sk) from date_dim where d_year=2001 and d_moy = 9 and d_dom = 21; {code} All 3 queries end up with the same row-count estimates after the filter. {code} Map Operator Tree: TableScan alias: date_dim filterExpr: (d_year = 2001) (type: boolean) Statistics: Num rows: 73049 Data size: 82034027 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (d_year = 2001) (type: boolean) Statistics: Num rows: 363 Data size: 4356 Basic stats: COMPLETE Column stats: COMPLETE Map 1 Map Operator Tree: TableScan alias: date_dim filterExpr: ((d_year = 2001) and (d_moy = 9)) (type: boolean) Statistics: Num rows: 73049 Data size: 82034027 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((d_year = 2001) and (d_moy = 9)) (type: boolean) Statistics: Num rows: 363 Data size: 5808 Basic stats: COMPLETE Column stats: COMPLETE Map 1 Map Operator Tree: TableScan alias: date_dim filterExpr: ((d_year = 2001) and (d_moy = 9) and (d_dom = 21)) (type: boolean) Statistics: Num rows: 73049 Data size: 82034027 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((d_year = 2001) and (d_moy = 9) and (d_dom = 21)) (type: boolean) Statistics: Num rows: 363 Data size: 7260 Basic stats: COMPLETE Column stats: COMPLETE {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17464) Fix to be able to disable max shuffle size DHJ config
Jesus Camacho Rodriguez created HIVE-17464: -- Summary: Fix to be able to disable max shuffle size DHJ config Key: HIVE-17464 URL: https://issues.apache.org/jira/browse/HIVE-17464 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 3.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Setting {{hive.auto.convert.join.shuffle.max.size}} to -1 does not work as expected. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17463) ORC: include orc-shims in hive-exec.jar
Gopal V created HIVE-17463: -- Summary: ORC: include orc-shims in hive-exec.jar Key: HIVE-17463 URL: https://issues.apache.org/jira/browse/HIVE-17463 Project: Hive Issue Type: Bug Components: ORC Affects Versions: 3.0.0 Reporter: Gopal V -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17462) hive_1.2.1 memory leak
gehaijiang created HIVE-17462: - Summary: hive_1.2.1 memory leak Key: HIVE-17462 URL: https://issues.apache.org/jira/browse/HIVE-17462 Project: Hive Issue Type: Bug Affects Versions: 1.2.1 Environment: hive version 1.2.1 Reporter: gehaijiang hiveserver2 memory leak hive user third UDF (vs-1.0.2-SNAPSHOT.jar , alogdata-1.0.3-SNAPSHOT-jar-with-dependencies.jar . and so on ) lr-x-- 1 data data 64 Sep 5 18:37 964 -> /tmp/9e38cc04-5693-474b-9c7d-bfdd978bcbb4_resources/vs-1.0.2-SNAPSHOT.jar (deleted) lr-x-- 1 data data 64 Sep 6 10:41 965 -> /tmp/188bbf2a-d8a5-48a7-81fc-b807f9ff201d_resources/alogdata-1.0.3-SNAPSHOT-jar-with-dependencies.jar (deleted) lr-x-- 1 data data 64 Sep 6 17:41 97 -> /home/data/programs/hadoop-2.7.1/share/hadoop/hdfs/lib/jsr305-3.0.0.jar lrwx-- 1 data data 64 Sep 5 18:37 975 -> socket:[1318353317] lr-x-- 1 data data 64 Sep 6 02:38 977 -> /tmp/64e309dc-352f-4ba4-b871-1aa78fe05945_resources/alogdata-1.0.3-SNAPSHOT-jar-with-dependencies.jar (deleted) lr-x-- 1 data data 64 Sep 6 17:41 98 -> /home/data/programs/hadoop-2.7.1/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar lrwx-- 1 data data 64 Sep 6 08:40 983 -> socket:[1299459344] lr-x-- 1 data data 64 Sep 5 19:37 987 -> /tmp/c3054987-c9c6-468a-8b5c-6e20b1972e0b_resources/alogdata-1.0.3-SNAPSHOT-jar-with-dependencies.jar (deleted) lr-x-- 1 data data 64 Sep 6 17:41 99 -> /home/data/programs/hadoop-2.7.1/share/hadoop/hdfs/lib/guava-11.0.2.jar lr-x-- 1 data data 64 Sep 6 08:40 994 -> /tmp/fc5c44b3-9bd8-4a32-a39a-66cd44032fee_resources/alogdata-1.0.3-SNAPSHOT-jar-with-dependencies.jar (deleted) lr-x-- 1 data data 64 Sep 6 06:39 996 -> /tmp/3b3c2bd6-0a0e-4599-b757-4a048a968457_resources/alogdata-1.0.3-SNAPSHOT-jar-with-dependencies.jar (deleted) lr-x-- 1 data data 64 Sep 5 17:36 999 -> /tmp/6ad76494-cdda-430b-b7d0-2213731655a8_resources/alogdata-1.0.3-SNAPSHOT-jar-with-dependencies.jar (deleted) PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 20084 data 20 0 13.6g 11g 533m S 62.3 9.2 6619:16 java /home/data/programs/jdk/jdk-current/bin/java-Djava.net.preferIPv4Stack=true-Dhadoop.log.dir=/home/data/hadoop/logs-Dhadoop.log.file=hadoop.log-Dhadoop.home.dir=/home/data/programs/hadoop-2.7.1-Dhadoop.id.str=data-Dhadoop.root.logger=INFO,DRFA-Djava.library.path=/home/data/programs/hadoop-2.7.1/lib/native-Dhadoop.policy.file=hadoop-policy.xml-Djava.net.preferIPv4Stack=true-XX:+UseConcMarkSweepGC-Xms8g-Xmx8g-Dhadoop.security.logger=INFO,NullAppenderorg.apache.hadoop.util.RunJar/home/data/programs/hive-current/lib/hive-service-1.2.1.jarorg.apache.hive.service.server.HiveServer2--hiveconfhive.log.file=hiveserver2.log -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17461) Beeline should detect conflicting authentication methods
Peter Bacsko created HIVE-17461: --- Summary: Beeline should detect conflicting authentication methods Key: HIVE-17461 URL: https://issues.apache.org/jira/browse/HIVE-17461 Project: Hive Issue Type: Improvement Components: Beeline, JDBC Reporter: Peter Bacsko In Oozie, we pass "-a delegationToken" in the command line when we invoke Beeline in a Hive action. In one of our test, we accidentally defined "principal=" in the JDBC URL. In this case, HiveConnection ignored the delegation token setting and tried to authenticate via Kerberos, which didn't work inside a YARN container. We found this behavior very confusing. So either BeeLine itself should detect such inconsistencies or alternatively HiveConnection could take care of it. Looking at the code, the "-a delegationToken" does not matter that much inside HiveConnection.createBinaryTransport() - if there's a principal, it will use Kerberos, then it looks for delegation tokens, then finally it falls back to plain authentication. -- This message was sent by Atlassian JIRA (v6.4.14#64029)