[jira] [Commented] (HIVE-17220) Bloomfilter probing in semijoin reduction is thrashing L1 dcache

2017-08-02 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112232#comment-16112232
 ] 

Gopal V commented on HIVE-17220:


LGTM - +1 tests pending.

> Bloomfilter probing in semijoin reduction is thrashing L1 dcache
> 
>
> Key: HIVE-17220
> URL: https://issues.apache.org/jira/browse/HIVE-17220
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17220.1.patch, HIVE-17220.2.patch, 
> HIVE-17220.3.patch, HIVE-17220.WIP.patch
>
>
> [~gopalv] observed perf profiles showing bloomfilter probes as bottleneck for 
> some of the TPC-DS queries and resulted L1 data cache thrashing. 
> This is because of the huge bitset in bloom filter that doesn't fit in any 
> levels of cache, also the hash bits corresponding to a single key map to 
> different segments of bitset which are spread out. This can result in K-1 
> memory access (K being number of hash functions) in worst case for every key 
> that gets probed because of locality miss in L1 cache. 
> Ran a JMH microbenchmark to verify the same. Following is the JMH perf 
> profile for bloom filter probing
> {code}
> Perf stats:
> --
>5101.935637  task-clock (msec) #0.461 CPUs utilized
>346  context-switches  #0.068 K/sec
>336  cpu-migrations#0.066 K/sec
>  6,207  page-faults   #0.001 M/sec
> 10,016,486,301  cycles#1.963 GHz  
> (26.90%)
>  5,751,692,176  stalled-cycles-frontend   #   57.42% frontend cycles 
> idle (27.05%)
>  stalled-cycles-backend
> 14,359,914,397  instructions  #1.43  insns per cycle
>   #0.40  stalled cycles 
> per insn  (33.78%)
>  2,200,632,861  branches  #  431.333 M/sec
> (33.84%)
>  1,162,860  branch-misses #0.05% of all branches  
> (33.97%)
>  1,025,992,254  L1-dcache-loads   #  201.099 M/sec
> (26.56%)
>432,663,098  L1-dcache-load-misses #   42.17% of all L1-dcache 
> hits(14.49%)
>331,383,297  LLC-loads #   64.952 M/sec
> (14.47%)
>203,524  LLC-load-misses   #0.06% of all LL-cache 
> hits (21.67%)
>  L1-icache-loads
>  1,633,821  L1-icache-load-misses #0.320 M/sec
> (28.85%)
>950,368,796  dTLB-loads#  186.276 M/sec
> (28.61%)
>246,813,393  dTLB-load-misses  #   25.97% of all dTLB 
> cache hits   (14.53%)
> 25,451  iTLB-loads#0.005 M/sec
> (14.48%)
> 35,415  iTLB-load-misses  #  139.15% of all iTLB 
> cache hits   (21.73%)
>  L1-dcache-prefetches
>175,958  L1-dcache-prefetch-misses #0.034 M/sec
> (28.94%)
>   11.064783140 seconds time elapsed
> {code}
> This shows 42.17% of L1 data cache misses. 
> This jira is to use cache efficient bloom filter for semijoin probing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16811) Estimate statistics in absence of stats

2017-08-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112228#comment-16112228
 ] 

Hive QA commented on HIVE-16811:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12880138/HIVE-16811.4.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 80 failed/errored test(s), 11139 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_filter] 
(batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_groupby] 
(batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_select] 
(batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_table] 
(batchId=20)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_annotate_stats_groupby]
 (batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[columnStatsUpdateForStatsOptimizer_2]
 (batchId=28)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_uncompressed] 
(batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_47] 
(batchId=28)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[udaf_collect_set_2]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_use_op_stats]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=99)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[hybridgrace_hashjoin_1]
 (batchId=99)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[multi_count_distinct]
 (batchId=99)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[tez-tag] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query11] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query15] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query17] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query18] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query19] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query21] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query24] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query25] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query29] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query30] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query31] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query32] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query34] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query35] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query37] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query40] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query44] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query45] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query46] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query47] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query48] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query4] (batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query50] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query53] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query54] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query57] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query58] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query61] 
(batchId=236)

[jira] [Updated] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-08-02 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-16998:

Fix Version/s: 3.0.0

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, 
> HIVE16998.4.patch, HIVE16998.5.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17115) MetaStoreUtils.getDeserializer doesn't catch the java.lang.ClassNotFoundException

2017-08-02 Thread Erik.fang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112216#comment-16112216
 ] 

Erik.fang commented on HIVE-17115:
--

ok, I will upload a test soon

> MetaStoreUtils.getDeserializer doesn't catch the 
> java.lang.ClassNotFoundException
> -
>
> Key: HIVE-17115
> URL: https://issues.apache.org/jira/browse/HIVE-17115
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1
>Reporter: Erik.fang
>Assignee: Erik.fang
> Attachments: HIVE-17115.1.patch, HIVE-17115.patch
>
>
> Suppose we create a table with Custom SerDe, then call 
> HiveMetaStoreClient.getSchema(String db, String tableName) to extract the 
> metadata from HiveMetaStore Service
> the thrift client hangs there with exception in HiveMetaStore Service's log, 
> such as
> {code:java}
> Exception in thread "pool-5-thread-129" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hbase/util/Bytes
> at 
> org.apache.hadoop.hive.hbase.HBaseSerDe.parseColumnsMapping(HBaseSerDe.java:184)
> at 
> org.apache.hadoop.hive.hbase.HBaseSerDeParameters.(HBaseSerDeParameters.java:73)
> at 
> org.apache.hadoop.hive.hbase.HBaseSerDe.initialize(HBaseSerDe.java:117)
> at 
> org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:53)
> at 
> org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:521)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:401)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_fields_with_environment_context(HiveMetaStore.java:3556)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_schema_with_environment_context(HiveMetaStore.java:3636)
> at sun.reflect.GeneratedMethodAccessor104.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy4.get_schema_with_environment_context(Unknown 
> Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema_with_environment_context.getResult(ThriftHiveMetastore.java:9146)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema_with_environment_context.getResult(ThriftHiveMetastore.java:9130)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:551)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:546)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:546)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hbase.util.Bytes
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17115) MetaStoreUtils.getDeserializer doesn't catch the java.lang.ClassNotFoundException

2017-08-02 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112213#comment-16112213
 ] 

Daniel Dai commented on HIVE-17115:
---

Sorry for delay. I am fine with the change. It is much better for metastore 
catch throwable, send back to client, than silently eat the exception in 
metastore. cc [~thejas].

For test, you can inherit an existing SerDe (eg, RegexSerDe), and manually 
throw an NoClassDefFoundError in initialize.

> MetaStoreUtils.getDeserializer doesn't catch the 
> java.lang.ClassNotFoundException
> -
>
> Key: HIVE-17115
> URL: https://issues.apache.org/jira/browse/HIVE-17115
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1
>Reporter: Erik.fang
>Assignee: Erik.fang
> Attachments: HIVE-17115.1.patch, HIVE-17115.patch
>
>
> Suppose we create a table with Custom SerDe, then call 
> HiveMetaStoreClient.getSchema(String db, String tableName) to extract the 
> metadata from HiveMetaStore Service
> the thrift client hangs there with exception in HiveMetaStore Service's log, 
> such as
> {code:java}
> Exception in thread "pool-5-thread-129" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hbase/util/Bytes
> at 
> org.apache.hadoop.hive.hbase.HBaseSerDe.parseColumnsMapping(HBaseSerDe.java:184)
> at 
> org.apache.hadoop.hive.hbase.HBaseSerDeParameters.(HBaseSerDeParameters.java:73)
> at 
> org.apache.hadoop.hive.hbase.HBaseSerDe.initialize(HBaseSerDe.java:117)
> at 
> org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:53)
> at 
> org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:521)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:401)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_fields_with_environment_context(HiveMetaStore.java:3556)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_schema_with_environment_context(HiveMetaStore.java:3636)
> at sun.reflect.GeneratedMethodAccessor104.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy4.get_schema_with_environment_context(Unknown 
> Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema_with_environment_context.getResult(ThriftHiveMetastore.java:9146)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema_with_environment_context.getResult(ThriftHiveMetastore.java:9130)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:551)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:546)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:546)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hbase.util.Bytes
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-08-02 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112210#comment-16112210
 ] 

Lefty Leverenz commented on HIVE-16998:
---

Doc note:  This adds *hive.spark.dynamic.partition.pruning.map.join.only* to 
HiveConf.java, so it needs to be documented in the wiki.

* [Configuration Properties -- Spark | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Spark]

Added a TODOC3.0 label.

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
>  Labels: TODOC3.0
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, 
> HIVE16998.4.patch, HIVE16998.5.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-16714) make Task Dependency on Repl Load more intuitive

2017-08-02 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek resolved HIVE-16714.

Resolution: Not A Problem

> make Task Dependency on Repl Load more intuitive
> 
>
> Key: HIVE-16714
> URL: https://issues.apache.org/jira/browse/HIVE-16714
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> *Primary warehouse*
> Create table a (name string, id int);
> Create table b as select name, id from a; 
> Repl dump default;
> *Replica warehouse*
> Repl load replica from ‘[location]’;
> *Query Plan Generated*
> DDL0 =>  Copy a => move a
> DDL0 => DDL Create a => move a
> DDL0 => Copy  b => move b 
> DDL0 => DDL Create b => move b
> *Move to Query Plan :*
> DDL0 => Copy a => move a => DDL Create a
> DDL0 => Copy b => move b => DDL Create b



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-08-02 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112207#comment-16112207
 ] 

Lefty Leverenz commented on HIVE-16998:
---

[~stakiar], please set the fix version for this jira to 3.0.0.  Thanks.

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
>  Labels: TODOC3.0
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, 
> HIVE16998.4.patch, HIVE16998.5.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16714) make Task Dependency on Repl Load more intuitive

2017-08-02 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112208#comment-16112208
 ] 

anishek commented on HIVE-16714:


not required since major rework as part of HIVE-16896 corrected this. 

> make Task Dependency on Repl Load more intuitive
> 
>
> Key: HIVE-16714
> URL: https://issues.apache.org/jira/browse/HIVE-16714
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> *Primary warehouse*
> Create table a (name string, id int);
> Create table b as select name, id from a; 
> Repl dump default;
> *Replica warehouse*
> Repl load replica from ‘[location]’;
> *Query Plan Generated*
> DDL0 =>  Copy a => move a
> DDL0 => DDL Create a => move a
> DDL0 => Copy  b => move b 
> DDL0 => DDL Create b => move b
> *Move to Query Plan :*
> DDL0 => Copy a => move a => DDL Create a
> DDL0 => Copy b => move b => DDL Create b



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-08-02 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-16998:
--
Labels: TODOC3.0  (was: )

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
>  Labels: TODOC3.0
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, 
> HIVE16998.4.patch, HIVE16998.5.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17072) Make the parallelized timeout configurable in BeeLine tests

2017-08-02 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112196#comment-16112196
 ] 

Lefty Leverenz commented on HIVE-17072:
---

The doc looks good, thanks Marta.  I added version information and a link to 
this jira.

Removed the TODOC3.0 label.

> Make the parallelized timeout configurable in BeeLine tests
> ---
>
> Key: HIVE-17072
> URL: https://issues.apache.org/jira/browse/HIVE-17072
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-17072.1.patch, HIVE-17072.2.patch
>
>
> When running the BeeLine tests parallel, the timeout is hardcoded in the 
> Parallelized.java:
> {noformat}
> @Override
> public void finished() {
>   executor.shutdown();
>   try {
> executor.awaitTermination(10, TimeUnit.MINUTES);
>   } catch (InterruptedException exc) {
> throw new RuntimeException(exc);
>   }
> }
> {noformat}
> It would be better to make it configurable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17072) Make the parallelized timeout configurable in BeeLine tests

2017-08-02 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-17072:
--
Labels:   (was: TODOC3.0)

> Make the parallelized timeout configurable in BeeLine tests
> ---
>
> Key: HIVE-17072
> URL: https://issues.apache.org/jira/browse/HIVE-17072
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-17072.1.patch, HIVE-17072.2.patch
>
>
> When running the BeeLine tests parallel, the timeout is hardcoded in the 
> Parallelized.java:
> {noformat}
> @Override
> public void finished() {
>   executor.shutdown();
>   try {
> executor.awaitTermination(10, TimeUnit.MINUTES);
>   } catch (InterruptedException exc) {
> throw new RuntimeException(exc);
>   }
> }
> {noformat}
> It would be better to make it configurable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-12369) Native Vector GroupBy

2017-08-02 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112186#comment-16112186
 ] 

Matt McCline edited comment on HIVE-12369 at 8/3/17 5:01 AM:
-

Yes, I think you should continue reviewing.  The path that is implemented is 
One Long Key and groupByMode == HASH.  There are UNDONEs for *subsequent* JIRAs 
that later adds Aggregation of non-Long data types, Fixed Length Keys / 
Variable Length Keys, and the other groupByModes.  And later adds Grouping 
Sets, Empty Aggregation (i.e. GroupBy on key that has no aggregations that does 
duplicate key elimination), too.


was (Author: mmccline):
Yes, I think you continue reviewing.  The path that is implemented is One Long 
Key and groupByMode == HASH.  There are UNDONEs for *subsequent* JIRAs that 
later adds Aggregation of non-Long data types, Fixed Length Keys / Variable 
Length Keys, and the other groupByModes.  And later adds Grouping Sets, Empty 
Aggregation (i.e. GroupBy on key that has no aggregations that does duplicate 
key elimination), too.

> Native Vector GroupBy
> -
>
> Key: HIVE-12369
> URL: https://issues.apache.org/jira/browse/HIVE-12369
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch, 
> HIVE-12369.05.patch, HIVE-12369.06.patch
>
>
> Implement Native Vector GroupBy using fast hash table technology developed 
> for Native Vector MapJoin, etc.
> Patch is currently limited to a single Long key, aggregation on Long columns, 
> no more than 31 columns.
> 3 new classes introduces that stored the count in the slot table and don't 
> allocate hash elements:
> {noformat}
>   COUNT(column)  VectorGroupByHashOneLongKeyCountColumnOperator  
>   COUNT(key) VectorGroupByHashOneLongKeyCountKeyOperator
>   COUNT(*)   VectorGroupByHashOneLongKeyCountStarOperator   
> {noformat}
> And a new class that aggregates a single Long key:
> {noformat}
>   VectorGroupByHashOneLongKeyOperator
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-12369) Native Vector GroupBy

2017-08-02 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112186#comment-16112186
 ] 

Matt McCline commented on HIVE-12369:
-

Yes, I think you continue reviewing.  The path that is implemented is One Long 
Key and groupByMode == HASH.  There are UNDONEs for *subsequent* JIRAs that 
later adds Aggregation of non-Long data types, Fixed Length Keys / Variable 
Length Keys, and the other groupByModes.  And later adds Grouping Sets, Empty 
Aggregation (i.e. GroupBy on key that has no aggregations that does duplicate 
key elimination), too.

> Native Vector GroupBy
> -
>
> Key: HIVE-12369
> URL: https://issues.apache.org/jira/browse/HIVE-12369
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch, 
> HIVE-12369.05.patch, HIVE-12369.06.patch
>
>
> Implement Native Vector GroupBy using fast hash table technology developed 
> for Native Vector MapJoin, etc.
> Patch is currently limited to a single Long key, aggregation on Long columns, 
> no more than 31 columns.
> 3 new classes introduces that stored the count in the slot table and don't 
> allocate hash elements:
> {noformat}
>   COUNT(column)  VectorGroupByHashOneLongKeyCountColumnOperator  
>   COUNT(key) VectorGroupByHashOneLongKeyCountKeyOperator
>   COUNT(*)   VectorGroupByHashOneLongKeyCountStarOperator   
> {noformat}
> And a new class that aggregates a single Long key:
> {noformat}
>   VectorGroupByHashOneLongKeyOperator
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17234) Remove HBase metastore from master

2017-08-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112185#comment-16112185
 ] 

Hive QA commented on HIVE-17234:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12880133/HIVE-17234.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10961 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_uncompressed] 
(batchId=56)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
org.apache.hive.hcatalog.pig.TestRCFileHCatStorer.testWriteTimestamp 
(batchId=182)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteSmallint 
(batchId=182)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6240/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6240/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6240/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12880133 - PreCommit-HIVE-Build

> Remove HBase metastore from master
> --
>
> Key: HIVE-17234
> URL: https://issues.apache.org/jira/browse/HIVE-17234
> Project: Hive
>  Issue Type: Task
>  Components: HBase Metastore
>Affects Versions: 3.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17234.patch
>
>
> No new development has been done on the HBase metastore in at least a year, 
> and to my knowledge no one is using it (nor is it even in a state to be fully 
> usable).  Given the lack of interest in continuing to develop it, we should 
> remove it rather than leave dead code hanging around and extra tests taking 
> up time in test runs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17171) Remove old javadoc versions

2017-08-02 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112176#comment-16112176
 ] 

Lefty Leverenz commented on HIVE-17171:
---

Owen, three links to archived javadocs are broken:

* [Hive 2.0.1 Javadocs | 
https://svn.apache.org/repos/infra/websites/production/hive/content/javadocs/r2.0.1/api/index.html?p=1015623]
* [Hive 1.1.1 Javadocs | 
https://svn.apache.org/repos/infra/websites/production/hive/content/javadocs/r1.1.1/api/index.html?p=1015623]
* [Hive 1.0.1 Javadocs | 
https://svn.apache.org/repos/infra/websites/production/hive/content/javadocs/r1.0.1/api/index.html?p=1015623]

But the link to 0.13.1 is okay:

* [Hive 0.13.1 Javadocs | 
https://svn.apache.org/repos/infra/websites/production/hive/content/javadocs/r0.13.1/api/index.html?p=1015623]

Also, the Nexus link opens the hive-storage-api artifact, and finding the hive 
artifact isn't easy for a newbie.

Glitches aside, the changes look good.  Thanks.

> Remove old javadoc versions
> ---
>
> Key: HIVE-17171
> URL: https://issues.apache.org/jira/browse/HIVE-17171
> Project: Hive
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> We currently have a lot of old javadoc versions. I'd propose that we keep the 
> following versions:
> * r1.2.2
> * r2.1.1
> * r2.2.0
> (Note that 2.3.0 was not checked in to the site.) In particular, I'd suggest 
> we remove:
> * hcat-r0.5.0
> * r0.10.0
> * r0.11.0
> * r0.12.0
> * r0.13.1
> * r1.0.1
> * r1.1.1
> * r2.0.1
> Any concerns?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15794) Support get hdfsEncryptionShim if FileSystem is ViewFileSystem

2017-08-02 Thread Yuming Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated HIVE-15794:
---
Status: Patch Available  (was: In Progress)

> Support get hdfsEncryptionShim if FileSystem is ViewFileSystem
> --
>
> Key: HIVE-15794
> URL: https://issues.apache.org/jira/browse/HIVE-15794
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.2.0, 1.1.0, 1.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
> Attachments: HIVE-15794.1.patch, HIVE-15794.2.patch
>
>
> *SQL*:
> {code:sql}
> hive> create table table2 as select * from table1;
> hive> show create table table2;
> OK
> CREATE TABLE `table2`(
>   `id` string)
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
> STORED AS INPUTFORMAT 
>   'org.apache.hadoop.mapred.TextInputFormat' 
> OUTPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://cluster4/user/hive/warehouse/table2'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1486050317')
> {code}
> *LOG*:
> {noformat}
> 2017-02-02T20:12:49,738  INFO [99374b82-e9ca-4654-b803-93b194b9331b main] 
> session.SessionState: Could not get hdfsEncryptionShim, it is only applicable 
> to hdfs filesystem.
> 2017-02-02T20:12:49,738  INFO [99374b82-e9ca-4654-b803-93b194b9331b main] 
> session.SessionState: Could not get hdfsEncryptionShim, it is only applicable 
> to hdfs filesystem.
> {noformat}
> Can’t get hdfsEncryptionShim if {{FileSystem}} is 
> [ViewFileSystem|http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-hdfs/ViewFs.html],
>  we should support it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15794) Support get hdfsEncryptionShim if FileSystem is ViewFileSystem

2017-08-02 Thread Yuming Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated HIVE-15794:
---
Status: In Progress  (was: Patch Available)

> Support get hdfsEncryptionShim if FileSystem is ViewFileSystem
> --
>
> Key: HIVE-15794
> URL: https://issues.apache.org/jira/browse/HIVE-15794
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.2.0, 1.1.0, 1.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
> Attachments: HIVE-15794.1.patch, HIVE-15794.2.patch
>
>
> *SQL*:
> {code:sql}
> hive> create table table2 as select * from table1;
> hive> show create table table2;
> OK
> CREATE TABLE `table2`(
>   `id` string)
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
> STORED AS INPUTFORMAT 
>   'org.apache.hadoop.mapred.TextInputFormat' 
> OUTPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://cluster4/user/hive/warehouse/table2'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1486050317')
> {code}
> *LOG*:
> {noformat}
> 2017-02-02T20:12:49,738  INFO [99374b82-e9ca-4654-b803-93b194b9331b main] 
> session.SessionState: Could not get hdfsEncryptionShim, it is only applicable 
> to hdfs filesystem.
> 2017-02-02T20:12:49,738  INFO [99374b82-e9ca-4654-b803-93b194b9331b main] 
> session.SessionState: Could not get hdfsEncryptionShim, it is only applicable 
> to hdfs filesystem.
> {noformat}
> Can’t get hdfsEncryptionShim if {{FileSystem}} is 
> [ViewFileSystem|http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-hdfs/ViewFs.html],
>  we should support it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15705) Event replication for constraints

2017-08-02 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-15705:
--
Attachment: HIVE-15705.5.patch

> Event replication for constraints
> -
>
> Key: HIVE-15705
> URL: https://issues.apache.org/jira/browse/HIVE-15705
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-15705.1.patch, HIVE-15705.2.patch, 
> HIVE-15705.3.patch, HIVE-15705.4.patch, HIVE-15705.5.patch
>
>
> Make event replication for primary key and foreign key work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15794) Support get hdfsEncryptionShim if FileSystem is ViewFileSystem

2017-08-02 Thread Yuming Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated HIVE-15794:
---
Attachment: HIVE-15794.2.patch

> Support get hdfsEncryptionShim if FileSystem is ViewFileSystem
> --
>
> Key: HIVE-15794
> URL: https://issues.apache.org/jira/browse/HIVE-15794
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 1.2.0, 1.1.0, 2.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
> Attachments: HIVE-15794.1.patch, HIVE-15794.2.patch
>
>
> *SQL*:
> {code:sql}
> hive> create table table2 as select * from table1;
> hive> show create table table2;
> OK
> CREATE TABLE `table2`(
>   `id` string)
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
> STORED AS INPUTFORMAT 
>   'org.apache.hadoop.mapred.TextInputFormat' 
> OUTPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://cluster4/user/hive/warehouse/table2'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1486050317')
> {code}
> *LOG*:
> {noformat}
> 2017-02-02T20:12:49,738  INFO [99374b82-e9ca-4654-b803-93b194b9331b main] 
> session.SessionState: Could not get hdfsEncryptionShim, it is only applicable 
> to hdfs filesystem.
> 2017-02-02T20:12:49,738  INFO [99374b82-e9ca-4654-b803-93b194b9331b main] 
> session.SessionState: Could not get hdfsEncryptionShim, it is only applicable 
> to hdfs filesystem.
> {noformat}
> Can’t get hdfsEncryptionShim if {{FileSystem}} is 
> [ViewFileSystem|http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-hdfs/ViewFs.html],
>  we should support it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15705) Event replication for constraints

2017-08-02 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112164#comment-16112164
 ] 

Daniel Dai commented on HIVE-15705:
---

Upload new patch to address most of the comments including the issue with 
general constraint implementation. There are something not addressed/unclear:
7, 8, 10: We plan to remove HBaseStore code, so I didn't include HBaseStore 
only issues
12. Chaned AddxxxHandler, but in DropConstraintHandler, we only have constraint 
name not constraint object in message. I still get db/table from msg in 
DropConstraintHandler. I checked those are valid during load.
15. I would leave table rename for constraint to another ticket. It is more 
involved and not appropriate to piggyback in this patch.

> Event replication for constraints
> -
>
> Key: HIVE-15705
> URL: https://issues.apache.org/jira/browse/HIVE-15705
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-15705.1.patch, HIVE-15705.2.patch, 
> HIVE-15705.3.patch, HIVE-15705.4.patch
>
>
> Make event replication for primary key and foreign key work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15794) Support get hdfsEncryptionShim if FileSystem is ViewFileSystem

2017-08-02 Thread Yuming Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated HIVE-15794:
---
Attachment: (was: HIVE-15794.1.patch)

> Support get hdfsEncryptionShim if FileSystem is ViewFileSystem
> --
>
> Key: HIVE-15794
> URL: https://issues.apache.org/jira/browse/HIVE-15794
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 1.2.0, 1.1.0, 2.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
> Attachments: HIVE-15794.1.patch
>
>
> *SQL*:
> {code:sql}
> hive> create table table2 as select * from table1;
> hive> show create table table2;
> OK
> CREATE TABLE `table2`(
>   `id` string)
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
> STORED AS INPUTFORMAT 
>   'org.apache.hadoop.mapred.TextInputFormat' 
> OUTPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://cluster4/user/hive/warehouse/table2'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1486050317')
> {code}
> *LOG*:
> {noformat}
> 2017-02-02T20:12:49,738  INFO [99374b82-e9ca-4654-b803-93b194b9331b main] 
> session.SessionState: Could not get hdfsEncryptionShim, it is only applicable 
> to hdfs filesystem.
> 2017-02-02T20:12:49,738  INFO [99374b82-e9ca-4654-b803-93b194b9331b main] 
> session.SessionState: Could not get hdfsEncryptionShim, it is only applicable 
> to hdfs filesystem.
> {noformat}
> Can’t get hdfsEncryptionShim if {{FileSystem}} is 
> [ViewFileSystem|http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-hdfs/ViewFs.html],
>  we should support it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15705) Event replication for constraints

2017-08-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112136#comment-16112136
 ] 

ASF GitHub Bot commented on HIVE-15705:
---

GitHub user daijyc opened a pull request:

https://github.com/apache/hive/pull/219

HIVE-15705: Event replication for constraints



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/daijyc/hive HIVE-15705

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/219.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #219


commit ee3bff683f35b0ecbc1e721faa511b6cac7b8189
Author: Daniel Dai 
Date:   2017-08-03T03:47:01Z

HIVE-15705: Event replication for constraints




> Event replication for constraints
> -
>
> Key: HIVE-15705
> URL: https://issues.apache.org/jira/browse/HIVE-15705
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-15705.1.patch, HIVE-15705.2.patch, 
> HIVE-15705.3.patch, HIVE-15705.4.patch
>
>
> Make event replication for primary key and foreign key work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15794) Support get hdfsEncryptionShim if FileSystem is ViewFileSystem

2017-08-02 Thread Yuming Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated HIVE-15794:
---
Attachment: HIVE-15794.1.patch

> Support get hdfsEncryptionShim if FileSystem is ViewFileSystem
> --
>
> Key: HIVE-15794
> URL: https://issues.apache.org/jira/browse/HIVE-15794
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 1.2.0, 1.1.0, 2.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
> Attachments: HIVE-15794.1.patch, HIVE-15794.1.patch
>
>
> *SQL*:
> {code:sql}
> hive> create table table2 as select * from table1;
> hive> show create table table2;
> OK
> CREATE TABLE `table2`(
>   `id` string)
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
> STORED AS INPUTFORMAT 
>   'org.apache.hadoop.mapred.TextInputFormat' 
> OUTPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://cluster4/user/hive/warehouse/table2'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1486050317')
> {code}
> *LOG*:
> {noformat}
> 2017-02-02T20:12:49,738  INFO [99374b82-e9ca-4654-b803-93b194b9331b main] 
> session.SessionState: Could not get hdfsEncryptionShim, it is only applicable 
> to hdfs filesystem.
> 2017-02-02T20:12:49,738  INFO [99374b82-e9ca-4654-b803-93b194b9331b main] 
> session.SessionState: Could not get hdfsEncryptionShim, it is only applicable 
> to hdfs filesystem.
> {noformat}
> Can’t get hdfsEncryptionShim if {{FileSystem}} is 
> [ViewFileSystem|http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-hdfs/ViewFs.html],
>  we should support it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15794) Support get hdfsEncryptionShim if FileSystem is ViewFileSystem

2017-08-02 Thread Yuming Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated HIVE-15794:
---
Status: Patch Available  (was: Open)

> Support get hdfsEncryptionShim if FileSystem is ViewFileSystem
> --
>
> Key: HIVE-15794
> URL: https://issues.apache.org/jira/browse/HIVE-15794
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.2.0, 1.1.0, 1.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
> Attachments: HIVE-15794.1.patch, HIVE-15794.1.patch
>
>
> *SQL*:
> {code:sql}
> hive> create table table2 as select * from table1;
> hive> show create table table2;
> OK
> CREATE TABLE `table2`(
>   `id` string)
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
> STORED AS INPUTFORMAT 
>   'org.apache.hadoop.mapred.TextInputFormat' 
> OUTPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://cluster4/user/hive/warehouse/table2'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1486050317')
> {code}
> *LOG*:
> {noformat}
> 2017-02-02T20:12:49,738  INFO [99374b82-e9ca-4654-b803-93b194b9331b main] 
> session.SessionState: Could not get hdfsEncryptionShim, it is only applicable 
> to hdfs filesystem.
> 2017-02-02T20:12:49,738  INFO [99374b82-e9ca-4654-b803-93b194b9331b main] 
> session.SessionState: Could not get hdfsEncryptionShim, it is only applicable 
> to hdfs filesystem.
> {noformat}
> Can’t get hdfsEncryptionShim if {{FileSystem}} is 
> [ViewFileSystem|http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-hdfs/ViewFs.html],
>  we should support it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15794) Support get hdfsEncryptionShim if FileSystem is ViewFileSystem

2017-08-02 Thread Yuming Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated HIVE-15794:
---
Status: Open  (was: Patch Available)

> Support get hdfsEncryptionShim if FileSystem is ViewFileSystem
> --
>
> Key: HIVE-15794
> URL: https://issues.apache.org/jira/browse/HIVE-15794
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.2.0, 1.1.0, 1.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
> Attachments: HIVE-15794.1.patch
>
>
> *SQL*:
> {code:sql}
> hive> create table table2 as select * from table1;
> hive> show create table table2;
> OK
> CREATE TABLE `table2`(
>   `id` string)
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
> STORED AS INPUTFORMAT 
>   'org.apache.hadoop.mapred.TextInputFormat' 
> OUTPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://cluster4/user/hive/warehouse/table2'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1486050317')
> {code}
> *LOG*:
> {noformat}
> 2017-02-02T20:12:49,738  INFO [99374b82-e9ca-4654-b803-93b194b9331b main] 
> session.SessionState: Could not get hdfsEncryptionShim, it is only applicable 
> to hdfs filesystem.
> 2017-02-02T20:12:49,738  INFO [99374b82-e9ca-4654-b803-93b194b9331b main] 
> session.SessionState: Could not get hdfsEncryptionShim, it is only applicable 
> to hdfs filesystem.
> {noformat}
> Can’t get hdfsEncryptionShim if {{FileSystem}} is 
> [ViewFileSystem|http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-hdfs/ViewFs.html],
>  we should support it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17213) HoS: file merging doesn't work for union all

2017-08-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112135#comment-16112135
 ] 

Hive QA commented on HIVE-17213:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12880115/HIVE-17213.5.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 11127 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_uncompressed] 
(batchId=56)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_union_merge]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testConnection (batchId=241)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testIsValidNeg (batchId=241)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testNegativeTokenAuth 
(batchId=241)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testProxyAuth (batchId=241)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testTokenAuth (batchId=241)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6239/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6239/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6239/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12880115 - PreCommit-HIVE-Build

> HoS: file merging doesn't work for union all
> 
>
> Key: HIVE-17213
> URL: https://issues.apache.org/jira/browse/HIVE-17213
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-17213.0.patch, HIVE-17213.1.patch, 
> HIVE-17213.2.patch, HIVE-17213.3.patch, HIVE-17213.4.patch, HIVE-17213.5.patch
>
>
> HoS file merging doesn't work properly since it doesn't set linked file sinks 
> properly which is used to generate move tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15794) Support get hdfsEncryptionShim if FileSystem is ViewFileSystem

2017-08-02 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112131#comment-16112131
 ] 

Lefty Leverenz commented on HIVE-15794:
---

[~q79969786], to retest a patch you have to resubmit it -- either use the 
Cancel Patch button at the top of the page and add it again or else rename the 
patch "HIVE-15794.2.patch" and submit it as if it were a new patch.  Then 
testing will be run automatically.

> Support get hdfsEncryptionShim if FileSystem is ViewFileSystem
> --
>
> Key: HIVE-15794
> URL: https://issues.apache.org/jira/browse/HIVE-15794
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 1.2.0, 1.1.0, 2.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
> Attachments: HIVE-15794.1.patch
>
>
> *SQL*:
> {code:sql}
> hive> create table table2 as select * from table1;
> hive> show create table table2;
> OK
> CREATE TABLE `table2`(
>   `id` string)
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
> STORED AS INPUTFORMAT 
>   'org.apache.hadoop.mapred.TextInputFormat' 
> OUTPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://cluster4/user/hive/warehouse/table2'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1486050317')
> {code}
> *LOG*:
> {noformat}
> 2017-02-02T20:12:49,738  INFO [99374b82-e9ca-4654-b803-93b194b9331b main] 
> session.SessionState: Could not get hdfsEncryptionShim, it is only applicable 
> to hdfs filesystem.
> 2017-02-02T20:12:49,738  INFO [99374b82-e9ca-4654-b803-93b194b9331b main] 
> session.SessionState: Could not get hdfsEncryptionShim, it is only applicable 
> to hdfs filesystem.
> {noformat}
> Can’t get hdfsEncryptionShim if {{FileSystem}} is 
> [ViewFileSystem|http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-hdfs/ViewFs.html],
>  we should support it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17220) Bloomfilter probing in semijoin reduction is thrashing L1 dcache

2017-08-02 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17220:
-
Attachment: HIVE-17220.3.patch

Addressed [~gopalv]'s review comments. Also fixed test failures. 

> Bloomfilter probing in semijoin reduction is thrashing L1 dcache
> 
>
> Key: HIVE-17220
> URL: https://issues.apache.org/jira/browse/HIVE-17220
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17220.1.patch, HIVE-17220.2.patch, 
> HIVE-17220.3.patch, HIVE-17220.WIP.patch
>
>
> [~gopalv] observed perf profiles showing bloomfilter probes as bottleneck for 
> some of the TPC-DS queries and resulted L1 data cache thrashing. 
> This is because of the huge bitset in bloom filter that doesn't fit in any 
> levels of cache, also the hash bits corresponding to a single key map to 
> different segments of bitset which are spread out. This can result in K-1 
> memory access (K being number of hash functions) in worst case for every key 
> that gets probed because of locality miss in L1 cache. 
> Ran a JMH microbenchmark to verify the same. Following is the JMH perf 
> profile for bloom filter probing
> {code}
> Perf stats:
> --
>5101.935637  task-clock (msec) #0.461 CPUs utilized
>346  context-switches  #0.068 K/sec
>336  cpu-migrations#0.066 K/sec
>  6,207  page-faults   #0.001 M/sec
> 10,016,486,301  cycles#1.963 GHz  
> (26.90%)
>  5,751,692,176  stalled-cycles-frontend   #   57.42% frontend cycles 
> idle (27.05%)
>  stalled-cycles-backend
> 14,359,914,397  instructions  #1.43  insns per cycle
>   #0.40  stalled cycles 
> per insn  (33.78%)
>  2,200,632,861  branches  #  431.333 M/sec
> (33.84%)
>  1,162,860  branch-misses #0.05% of all branches  
> (33.97%)
>  1,025,992,254  L1-dcache-loads   #  201.099 M/sec
> (26.56%)
>432,663,098  L1-dcache-load-misses #   42.17% of all L1-dcache 
> hits(14.49%)
>331,383,297  LLC-loads #   64.952 M/sec
> (14.47%)
>203,524  LLC-load-misses   #0.06% of all LL-cache 
> hits (21.67%)
>  L1-icache-loads
>  1,633,821  L1-icache-load-misses #0.320 M/sec
> (28.85%)
>950,368,796  dTLB-loads#  186.276 M/sec
> (28.61%)
>246,813,393  dTLB-load-misses  #   25.97% of all dTLB 
> cache hits   (14.53%)
> 25,451  iTLB-loads#0.005 M/sec
> (14.48%)
> 35,415  iTLB-load-misses  #  139.15% of all iTLB 
> cache hits   (21.73%)
>  L1-dcache-prefetches
>175,958  L1-dcache-prefetch-misses #0.034 M/sec
> (28.94%)
>   11.064783140 seconds time elapsed
> {code}
> This shows 42.17% of L1 data cache misses. 
> This jira is to use cache efficient bloom filter for semijoin probing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17235) Add ORC Decimal64 Serialization/Deserialization

2017-08-02 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-17235:

Attachment: HIVE-17235.03.patch

> Add ORC Decimal64 Serialization/Deserialization
> ---
>
> Key: HIVE-17235
> URL: https://issues.apache.org/jira/browse/HIVE-17235
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-17235.03.patch
>
>
> The storage-api changes for ORC-209.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-12369) Native Vector GroupBy

2017-08-02 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112093#comment-16112093
 ] 

Sergey Shelukhin commented on HIVE-12369:
-

Hmm. Reviewed most of page one. Should I be reviewing this? It looks like lots 
of stuff is UNDONE (not implemented?).
I can also review stuff now and then the diff of the diffs, but I wonder if it 
makes sense, i.e. whether the upcoming changes are going to be reasonable in 
scope for that.

> Native Vector GroupBy
> -
>
> Key: HIVE-12369
> URL: https://issues.apache.org/jira/browse/HIVE-12369
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch, 
> HIVE-12369.05.patch, HIVE-12369.06.patch
>
>
> Implement Native Vector GroupBy using fast hash table technology developed 
> for Native Vector MapJoin, etc.
> Patch is currently limited to a single Long key, aggregation on Long columns, 
> no more than 31 columns.
> 3 new classes introduces that stored the count in the slot table and don't 
> allocate hash elements:
> {noformat}
>   COUNT(column)  VectorGroupByHashOneLongKeyCountColumnOperator  
>   COUNT(key) VectorGroupByHashOneLongKeyCountKeyOperator
>   COUNT(*)   VectorGroupByHashOneLongKeyCountStarOperator   
> {noformat}
> And a new class that aggregates a single Long key:
> {noformat}
>   VectorGroupByHashOneLongKeyOperator
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15794) Support get hdfsEncryptionShim if FileSystem is ViewFileSystem

2017-08-02 Thread Yuming Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112090#comment-16112090
 ] 

Yuming Wang commented on HIVE-15794:


retest this please.

> Support get hdfsEncryptionShim if FileSystem is ViewFileSystem
> --
>
> Key: HIVE-15794
> URL: https://issues.apache.org/jira/browse/HIVE-15794
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 1.2.0, 1.1.0, 2.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
> Attachments: HIVE-15794.1.patch
>
>
> *SQL*:
> {code:sql}
> hive> create table table2 as select * from table1;
> hive> show create table table2;
> OK
> CREATE TABLE `table2`(
>   `id` string)
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
> STORED AS INPUTFORMAT 
>   'org.apache.hadoop.mapred.TextInputFormat' 
> OUTPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://cluster4/user/hive/warehouse/table2'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1486050317')
> {code}
> *LOG*:
> {noformat}
> 2017-02-02T20:12:49,738  INFO [99374b82-e9ca-4654-b803-93b194b9331b main] 
> session.SessionState: Could not get hdfsEncryptionShim, it is only applicable 
> to hdfs filesystem.
> 2017-02-02T20:12:49,738  INFO [99374b82-e9ca-4654-b803-93b194b9331b main] 
> session.SessionState: Could not get hdfsEncryptionShim, it is only applicable 
> to hdfs filesystem.
> {noformat}
> Can’t get hdfsEncryptionShim if {{FileSystem}} is 
> [ViewFileSystem|http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-hdfs/ViewFs.html],
>  we should support it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17237) HMS wastes 26.4% of memory due to dup strings in metastore.api.Partition.parameters

2017-08-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112087#comment-16112087
 ] 

Hive QA commented on HIVE-17237:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12880139/HIVE-17237.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11138 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=241)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=241)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_uncompressed] 
(batchId=56)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=236)
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgress (batchId=223)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6238/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6238/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6238/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12880139 - PreCommit-HIVE-Build

> HMS wastes 26.4% of memory due to dup strings in 
> metastore.api.Partition.parameters
> ---
>
> Key: HIVE-17237
> URL: https://issues.apache.org/jira/browse/HIVE-17237
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HIVE-17237.01.patch
>
>
> I've analyzed a heap dump from a production Hive installation using jxray 
> (www.jxray.com) It turns out that there are a lot of duplicate strings in 
> memory, that waste 26.4% of the heap. Most of them come from HashMaps 
> referenced by org.apache.hadoop.hive.metastore.api.Partition.parameters. 
> Below is the relevant section of the jxray report.
> Looking at Partition.java, I see that in the past somebody has already added 
> code to intern keys and values in the parameters table when it's first set 
> up. However, when more key-value pairs are added, they are not interned, and 
> that probably explains the reason for all these duplicate strings. Also when 
> a Partition instance is deserialized, no interning of parameters is currently 
> done.
> {code}
> 6. DUPLICATE STRINGS
> Total strings: 3,273,557  Unique strings: 460,390  Duplicate values: 110,232  
> Overhead: 3,220,458K (26.4%)
> 
> ===
> 7. REFERENCE CHAINS FOR DUPLICATE STRINGS
>   2,326,150K (19.1%), 597058 dup strings (36386 unique), 597058 dup backing 
> arrays:
> 39949 of "-1", 39088 of "true", 28959 of "8", 20987 of "1", 18437 of "10", 
> 9583 of "9", 5908 of "269664", 5691 of "174528", 4598 of "133980", 4598 of 
> "BgUGBQgFCAYFCgYIBgUEBgQHBgUGCwYGBwYHBgkKBwYGBggIBwUHBgYGCgUJCQUG ...[length 
> 3560]"
> ... and 419200 more strings, of which 36376 are unique
> Also contains one-char strings: 217 of "6", 147 of "7", 91 of "4", 28 of "5", 
> 28 of "2", 21 of "0"
>  <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success
>  <-- Java Local 
> (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result)
>  [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots
>   463,060K (3.8%), 119644 dup strings (34075 unique), 119644 dup backing 
> arrays:
> 7914 of "true", 7912 of "-1", 6578 of "8", 5606 of "1", 2302 of "10", 1626 of 
> "174528", 1223 of "9", 970 of "171680", 837 of "269664", 657 of "133980"
> ... and 84009 more strings, of which 34065 are unique
> Also contains one-char strings: 42 of "7", 31 of "6", 20 of 

[jira] [Commented] (HIVE-12369) Native Vector GroupBy

2017-08-02 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112077#comment-16112077
 ] 

Sergey Shelukhin commented on HIVE-12369:
-

I started reviewing this... it will take some time, probably over a couple 
days. Will publish review in parts

> Native Vector GroupBy
> -
>
> Key: HIVE-12369
> URL: https://issues.apache.org/jira/browse/HIVE-12369
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch, 
> HIVE-12369.05.patch, HIVE-12369.06.patch
>
>
> Implement Native Vector GroupBy using fast hash table technology developed 
> for Native Vector MapJoin, etc.
> Patch is currently limited to a single Long key, aggregation on Long columns, 
> no more than 31 columns.
> 3 new classes introduces that stored the count in the slot table and don't 
> allocate hash elements:
> {noformat}
>   COUNT(column)  VectorGroupByHashOneLongKeyCountColumnOperator  
>   COUNT(key) VectorGroupByHashOneLongKeyCountKeyOperator
>   COUNT(*)   VectorGroupByHashOneLongKeyCountStarOperator   
> {noformat}
> And a new class that aggregates a single Long key:
> {noformat}
>   VectorGroupByHashOneLongKeyOperator
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17213) HoS: file merging doesn't work for union all

2017-08-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112025#comment-16112025
 ] 

Hive QA commented on HIVE-17213:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12880115/HIVE-17213.5.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11127 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=240)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_uncompressed] 
(batchId=56)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_union_merge]
 (batchId=169)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6237/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6237/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6237/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12880115 - PreCommit-HIVE-Build

> HoS: file merging doesn't work for union all
> 
>
> Key: HIVE-17213
> URL: https://issues.apache.org/jira/browse/HIVE-17213
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-17213.0.patch, HIVE-17213.1.patch, 
> HIVE-17213.2.patch, HIVE-17213.3.patch, HIVE-17213.4.patch, HIVE-17213.5.patch
>
>
> HoS file merging doesn't work properly since it doesn't set linked file sinks 
> properly which is used to generate move tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-08-02 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Status: Open  (was: Patch Available)

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.1.patch, HIVE-16811.2.patch, 
> HIVE-16811.3.patch, HIVE-16811.4.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-08-02 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Status: Patch Available  (was: Open)

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.1.patch, HIVE-16811.2.patch, 
> HIVE-16811.3.patch, HIVE-16811.4.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17237) HMS wastes 26.4% of memory due to dup strings in metastore.api.Partition.parameters

2017-08-02 Thread Misha Dmitriev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HIVE-17237:
--
Status: Patch Available  (was: Open)

> HMS wastes 26.4% of memory due to dup strings in 
> metastore.api.Partition.parameters
> ---
>
> Key: HIVE-17237
> URL: https://issues.apache.org/jira/browse/HIVE-17237
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HIVE-17237.01.patch
>
>
> I've analyzed a heap dump from a production Hive installation using jxray 
> (www.jxray.com) It turns out that there are a lot of duplicate strings in 
> memory, that waste 26.4% of the heap. Most of them come from HashMaps 
> referenced by org.apache.hadoop.hive.metastore.api.Partition.parameters. 
> Below is the relevant section of the jxray report.
> Looking at Partition.java, I see that in the past somebody has already added 
> code to intern keys and values in the parameters table when it's first set 
> up. However, when more key-value pairs are added, they are not interned, and 
> that probably explains the reason for all these duplicate strings. Also when 
> a Partition instance is deserialized, no interning of parameters is currently 
> done.
> {code}
> 6. DUPLICATE STRINGS
> Total strings: 3,273,557  Unique strings: 460,390  Duplicate values: 110,232  
> Overhead: 3,220,458K (26.4%)
> 
> ===
> 7. REFERENCE CHAINS FOR DUPLICATE STRINGS
>   2,326,150K (19.1%), 597058 dup strings (36386 unique), 597058 dup backing 
> arrays:
> 39949 of "-1", 39088 of "true", 28959 of "8", 20987 of "1", 18437 of "10", 
> 9583 of "9", 5908 of "269664", 5691 of "174528", 4598 of "133980", 4598 of 
> "BgUGBQgFCAYFCgYIBgUEBgQHBgUGCwYGBwYHBgkKBwYGBggIBwUHBgYGCgUJCQUG ...[length 
> 3560]"
> ... and 419200 more strings, of which 36376 are unique
> Also contains one-char strings: 217 of "6", 147 of "7", 91 of "4", 28 of "5", 
> 28 of "2", 21 of "0"
>  <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success
>  <-- Java Local 
> (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result)
>  [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots
>   463,060K (3.8%), 119644 dup strings (34075 unique), 119644 dup backing 
> arrays:
> 7914 of "true", 7912 of "-1", 6578 of "8", 5606 of "1", 2302 of "10", 1626 of 
> "174528", 1223 of "9", 970 of "171680", 837 of "269664", 657 of "133980"
> ... and 84009 more strings, of which 34065 are unique
> Also contains one-char strings: 42 of "7", 31 of "6", 20 of "4", 8 of "5", 5 
> of "2", 3 of "0"
>  <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.TreeMap}.values <-- Java Local (j.u.TreeMap) [@6f084afa0,@73aac9e68]
>   233,384K (1.9%), 64601 dup strings (27295 unique), 64601 dup backing arrays:
> 4472 of "true", 4173 of "-1", 3798 of "1", 3591 of "8", 813 of "174528", 684 
> of "10" ... and 44568 more strings, of which 27285 are unique
> Also contains one-char strings: 305 of "7", 301 of "0", 277 of "4", 146 of 
> "6", 29 of "2", 23 of "5", 19 of "9", 2 of "3"
>  <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- Java Local (j.u.ArrayList) 
> [@4f4cfbd10,@536122408,@726616778]
> ...
>   52,916K (0.4%), 597058 dup strings (16 unique), 597058 dup backing arrays:
>  <--  {j.u.HashMap}.keys <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success
>  <-- Java Local 
> (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result)
>  [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17237) HMS wastes 26.4% of memory due to dup strings in metastore.api.Partition.parameters

2017-08-02 Thread Misha Dmitriev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HIVE-17237:
--
Attachment: HIVE-17237.01.patch

> HMS wastes 26.4% of memory due to dup strings in 
> metastore.api.Partition.parameters
> ---
>
> Key: HIVE-17237
> URL: https://issues.apache.org/jira/browse/HIVE-17237
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HIVE-17237.01.patch
>
>
> I've analyzed a heap dump from a production Hive installation using jxray 
> (www.jxray.com) It turns out that there are a lot of duplicate strings in 
> memory, that waste 26.4% of the heap. Most of them come from HashMaps 
> referenced by org.apache.hadoop.hive.metastore.api.Partition.parameters. 
> Below is the relevant section of the jxray report.
> Looking at Partition.java, I see that in the past somebody has already added 
> code to intern keys and values in the parameters table when it's first set 
> up. However, when more key-value pairs are added, they are not interned, and 
> that probably explains the reason for all these duplicate strings. Also when 
> a Partition instance is deserialized, no interning of parameters is currently 
> done.
> {code}
> 6. DUPLICATE STRINGS
> Total strings: 3,273,557  Unique strings: 460,390  Duplicate values: 110,232  
> Overhead: 3,220,458K (26.4%)
> 
> ===
> 7. REFERENCE CHAINS FOR DUPLICATE STRINGS
>   2,326,150K (19.1%), 597058 dup strings (36386 unique), 597058 dup backing 
> arrays:
> 39949 of "-1", 39088 of "true", 28959 of "8", 20987 of "1", 18437 of "10", 
> 9583 of "9", 5908 of "269664", 5691 of "174528", 4598 of "133980", 4598 of 
> "BgUGBQgFCAYFCgYIBgUEBgQHBgUGCwYGBwYHBgkKBwYGBggIBwUHBgYGCgUJCQUG ...[length 
> 3560]"
> ... and 419200 more strings, of which 36376 are unique
> Also contains one-char strings: 217 of "6", 147 of "7", 91 of "4", 28 of "5", 
> 28 of "2", 21 of "0"
>  <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success
>  <-- Java Local 
> (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result)
>  [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots
>   463,060K (3.8%), 119644 dup strings (34075 unique), 119644 dup backing 
> arrays:
> 7914 of "true", 7912 of "-1", 6578 of "8", 5606 of "1", 2302 of "10", 1626 of 
> "174528", 1223 of "9", 970 of "171680", 837 of "269664", 657 of "133980"
> ... and 84009 more strings, of which 34065 are unique
> Also contains one-char strings: 42 of "7", 31 of "6", 20 of "4", 8 of "5", 5 
> of "2", 3 of "0"
>  <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.TreeMap}.values <-- Java Local (j.u.TreeMap) [@6f084afa0,@73aac9e68]
>   233,384K (1.9%), 64601 dup strings (27295 unique), 64601 dup backing arrays:
> 4472 of "true", 4173 of "-1", 3798 of "1", 3591 of "8", 813 of "174528", 684 
> of "10" ... and 44568 more strings, of which 27285 are unique
> Also contains one-char strings: 305 of "7", 301 of "0", 277 of "4", 146 of 
> "6", 29 of "2", 23 of "5", 19 of "9", 2 of "3"
>  <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- Java Local (j.u.ArrayList) 
> [@4f4cfbd10,@536122408,@726616778]
> ...
>   52,916K (0.4%), 597058 dup strings (16 unique), 597058 dup backing arrays:
>  <--  {j.u.HashMap}.keys <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success
>  <-- Java Local 
> (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result)
>  [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-08-02 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Attachment: HIVE-16811.4.patch

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.1.patch, HIVE-16811.2.patch, 
> HIVE-16811.3.patch, HIVE-16811.4.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17237) HMS wastes 26.4% of memory due to dup strings in metastore.api.Partition.parameters

2017-08-02 Thread Misha Dmitriev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev reassigned HIVE-17237:
-


> HMS wastes 26.4% of memory due to dup strings in 
> metastore.api.Partition.parameters
> ---
>
> Key: HIVE-17237
> URL: https://issues.apache.org/jira/browse/HIVE-17237
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>
> I've analyzed a heap dump from a production Hive installation using jxray 
> (www.jxray.com) It turns out that there are a lot of duplicate strings in 
> memory, that waste 26.4% of the heap. Most of them come from HashMaps 
> referenced by org.apache.hadoop.hive.metastore.api.Partition.parameters. 
> Below is the relevant section of the jxray report.
> Looking at Partition.java, I see that in the past somebody has already added 
> code to intern keys and values in the parameters table when it's first set 
> up. However, when more key-value pairs are added, they are not interned, and 
> that probably explains the reason for all these duplicate strings. Also when 
> a Partition instance is deserialized, no interning of parameters is currently 
> done.
> {code}
> 6. DUPLICATE STRINGS
> Total strings: 3,273,557  Unique strings: 460,390  Duplicate values: 110,232  
> Overhead: 3,220,458K (26.4%)
> 
> ===
> 7. REFERENCE CHAINS FOR DUPLICATE STRINGS
>   2,326,150K (19.1%), 597058 dup strings (36386 unique), 597058 dup backing 
> arrays:
> 39949 of "-1", 39088 of "true", 28959 of "8", 20987 of "1", 18437 of "10", 
> 9583 of "9", 5908 of "269664", 5691 of "174528", 4598 of "133980", 4598 of 
> "BgUGBQgFCAYFCgYIBgUEBgQHBgUGCwYGBwYHBgkKBwYGBggIBwUHBgYGCgUJCQUG ...[length 
> 3560]"
> ... and 419200 more strings, of which 36376 are unique
> Also contains one-char strings: 217 of "6", 147 of "7", 91 of "4", 28 of "5", 
> 28 of "2", 21 of "0"
>  <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success
>  <-- Java Local 
> (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result)
>  [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots
>   463,060K (3.8%), 119644 dup strings (34075 unique), 119644 dup backing 
> arrays:
> 7914 of "true", 7912 of "-1", 6578 of "8", 5606 of "1", 2302 of "10", 1626 of 
> "174528", 1223 of "9", 970 of "171680", 837 of "269664", 657 of "133980"
> ... and 84009 more strings, of which 34065 are unique
> Also contains one-char strings: 42 of "7", 31 of "6", 20 of "4", 8 of "5", 5 
> of "2", 3 of "0"
>  <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.TreeMap}.values <-- Java Local (j.u.TreeMap) [@6f084afa0,@73aac9e68]
>   233,384K (1.9%), 64601 dup strings (27295 unique), 64601 dup backing arrays:
> 4472 of "true", 4173 of "-1", 3798 of "1", 3591 of "8", 813 of "174528", 684 
> of "10" ... and 44568 more strings, of which 27285 are unique
> Also contains one-char strings: 305 of "7", 301 of "0", 277 of "4", 146 of 
> "6", 29 of "2", 23 of "5", 19 of "9", 2 of "3"
>  <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- Java Local (j.u.ArrayList) 
> [@4f4cfbd10,@536122408,@726616778]
> ...
>   52,916K (0.4%), 597058 dup strings (16 unique), 597058 dup backing arrays:
>  <--  {j.u.HashMap}.keys <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success
>  <-- Java Local 
> (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result)
>  [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17234) Remove HBase metastore from master

2017-08-02 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17234:
--
Status: Patch Available  (was: Open)

> Remove HBase metastore from master
> --
>
> Key: HIVE-17234
> URL: https://issues.apache.org/jira/browse/HIVE-17234
> Project: Hive
>  Issue Type: Task
>  Components: HBase Metastore
>Affects Versions: 3.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17234.patch
>
>
> No new development has been done on the HBase metastore in at least a year, 
> and to my knowledge no one is using it (nor is it even in a state to be fully 
> usable).  Given the lack of interest in continuing to develop it, we should 
> remove it rather than leave dead code hanging around and extra tests taking 
> up time in test runs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17234) Remove HBase metastore from master

2017-08-02 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17234:
--
Attachment: HIVE-17234.patch

This patch removes all of the unused parts of HBase metastore.  The aggregate 
stats work is kept, and moves into directories that match the already changed 
packages.  Two methods that were still used from HBaseUtils move into 
MetaStoreUtils.  FileMetadata remains in the hbase package, since I believe 
Sergey wants to use it sometime in the future.

> Remove HBase metastore from master
> --
>
> Key: HIVE-17234
> URL: https://issues.apache.org/jira/browse/HIVE-17234
> Project: Hive
>  Issue Type: Task
>  Components: HBase Metastore
>Affects Versions: 3.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17234.patch
>
>
> No new development has been done on the HBase metastore in at least a year, 
> and to my knowledge no one is using it (nor is it even in a state to be fully 
> usable).  Given the lack of interest in continuing to develop it, we should 
> remove it rather than leave dead code hanging around and extra tests taking 
> up time in test runs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17234) Remove HBase metastore from master

2017-08-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111981#comment-16111981
 ] 

ASF GitHub Bot commented on HIVE-17234:
---

GitHub user alanfgates opened a pull request:

https://github.com/apache/hive/pull/218

HIVE-17234

This patch removes all of the unused parts of HBase metastore.  The 
aggregate stats work is kept, and moves into directories that match the already 
changed packages.  Two methods that were still used from HBaseUtils move into 
MetaStoreUtils.  FileMetadata remains in the hbase package, since I believe 
Sergey wants to use it sometime in the future.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alanfgates/hive hive17234

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/218.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #218


commit 6afb3e1335e3541b6775a58d00b383cc36024d18
Author: Alan Gates 
Date:   2017-08-03T00:12:09Z

HIVE-17234 Remove HBase metastore from master

commit fbce5296c95f38bb56ff2ba3a670f02533ed6661
Author: Alan Gates 
Date:   2017-08-03T00:32:59Z

Removed one more file I missed previously.




> Remove HBase metastore from master
> --
>
> Key: HIVE-17234
> URL: https://issues.apache.org/jira/browse/HIVE-17234
> Project: Hive
>  Issue Type: Task
>  Components: HBase Metastore
>Affects Versions: 3.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>
> No new development has been done on the HBase metastore in at least a year, 
> and to my knowledge no one is using it (nor is it even in a state to be fully 
> usable).  Given the lack of interest in continuing to develop it, we should 
> remove it rather than leave dead code hanging around and extra tests taking 
> up time in test runs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17230) Timestamp format different in HiveCLI and Beeline

2017-08-02 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111975#comment-16111975
 ] 

Aihua Xu commented on HIVE-17230:
-

The output from the beeline version is more accurate to me. I feel it makes 
sense to make that change.

> Timestamp format different in HiveCLI and Beeline
> -
>
> Key: HIVE-17230
> URL: https://issues.apache.org/jira/browse/HIVE-17230
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, CLI
>Reporter: Peter Vary
>Assignee: Peter Vary
>
> The issue can be reproduced with the following commands:
> {code}
> create table timestamp_test(t timestamp);
> insert into table timestamp_test values('2000-01-01 01:00:00');
> select * from timestamp_test;
> {code}
> The timestamp is displayed without nanoseconds in HiveCLI:
> {code}
> 2000-01-01 01:00:00
> {code}
> When the exact same timestamp is displayed in BeeLine it displays:
> {code}
> 2000-01-01 01:00:00.0
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17236) Add support for wild card LOCATION for external table

2017-08-02 Thread nirav patel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nirav patel updated HIVE-17236:
---
Description: 
Hive currently doesn't support wild card in path for external table. I think it 
should considering mapreduce framework supports it and it's a common 
requirement.

Following should work
CREATE EXTERNAL TABLE testTable (val map)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '/user/mycomp/customers/\*/departments/partition/\*/sales.tsv';


  was:
Hive currently doesn't support wild card in path for external table. I think it 
should considering mapreduce framework supports it and it's a common 
requirement.

Following should work
CREATE EXTERNAL TABLE testTable (val map)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '/user/mycomp/customers/\\*/departments/partition/\*/sales.tsv';



> Add support for wild card LOCATION for external table
> -
>
> Key: HIVE-17236
> URL: https://issues.apache.org/jira/browse/HIVE-17236
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 1.2.2
>Reporter: nirav patel
>
> Hive currently doesn't support wild card in path for external table. I think 
> it should considering mapreduce framework supports it and it's a common 
> requirement.
> Following should work
> CREATE EXTERNAL TABLE testTable (val map)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> LOCATION '/user/mycomp/customers/\*/departments/partition/\*/sales.tsv';



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17236) Add support for wild card LOCATION for external table

2017-08-02 Thread nirav patel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nirav patel updated HIVE-17236:
---
Description: 
Hive currently doesn't support wild card in path for external table. I think it 
should considering mapreduce framework supports it and it's a common 
requirement.

Following should work
CREATE EXTERNAL TABLE testTable (val map)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '/user/mycomp/customers/*/departments/partition/*/sales.tsv';


  was:
Hive currently doesn't support wild car in path for external table. I think it 
should considering mapreduce framework supports it and it's a common 
requirement.

Following should work
CREATE EXTERNAL TABLE testTable (val map)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '/user/mycomp/customers/*/departments/partition/*/sales.tsv';



> Add support for wild card LOCATION for external table
> -
>
> Key: HIVE-17236
> URL: https://issues.apache.org/jira/browse/HIVE-17236
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 1.2.2
>Reporter: nirav patel
>
> Hive currently doesn't support wild card in path for external table. I think 
> it should considering mapreduce framework supports it and it's a common 
> requirement.
> Following should work
> CREATE EXTERNAL TABLE testTable (val map)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> LOCATION '/user/mycomp/customers/*/departments/partition/*/sales.tsv';



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17236) Add support for wild card LOCATION for external table

2017-08-02 Thread nirav patel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nirav patel updated HIVE-17236:
---
Description: 
Hive currently doesn't support wild card in path for external table. I think it 
should considering mapreduce framework supports it and it's a common 
requirement.

Following should work
CREATE EXTERNAL TABLE testTable (val map)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '/user/mycomp/customers/\\*/departments/partition/\*/sales.tsv';


  was:
Hive currently doesn't support wild card in path for external table. I think it 
should considering mapreduce framework supports it and it's a common 
requirement.

Following should work
CREATE EXTERNAL TABLE testTable (val map)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '/user/mycomp/customers/*/departments/partition/*/sales.tsv';



> Add support for wild card LOCATION for external table
> -
>
> Key: HIVE-17236
> URL: https://issues.apache.org/jira/browse/HIVE-17236
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 1.2.2
>Reporter: nirav patel
>
> Hive currently doesn't support wild card in path for external table. I think 
> it should considering mapreduce framework supports it and it's a common 
> requirement.
> Following should work
> CREATE EXTERNAL TABLE testTable (val map)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> LOCATION '/user/mycomp/customers/\\*/departments/partition/\*/sales.tsv';



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17217) SMB Join : Assert if paths are different in TezGroupedSplit in KeyValueInputMerger

2017-08-02 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17217:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> SMB Join : Assert if paths are different in TezGroupedSplit in 
> KeyValueInputMerger
> --
>
> Key: HIVE-17217
> URL: https://issues.apache.org/jira/browse/HIVE-17217
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Fix For: 3.0.0
>
> Attachments: HIVE-17217.1.patch, HIVE-17217.2.patch
>
>
> In KeyValueInputMerger, a TezGroupedSplit may contain more than 1 splits. 
> However, the splits should all belong to same path.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17235) Add ORC Decimal64 Serialization/Deserialization

2017-08-02 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline reassigned HIVE-17235:
---


> Add ORC Decimal64 Serialization/Deserialization
> ---
>
> Key: HIVE-17235
> URL: https://issues.apache.org/jira/browse/HIVE-17235
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
>
> The storage-api changes for ORC-209.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17213) HoS: file merging doesn't work for union all

2017-08-02 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111946#comment-16111946
 ] 

Xuefu Zhang commented on HIVE-17213:


+1 pending on tests.

> HoS: file merging doesn't work for union all
> 
>
> Key: HIVE-17213
> URL: https://issues.apache.org/jira/browse/HIVE-17213
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-17213.0.patch, HIVE-17213.1.patch, 
> HIVE-17213.2.patch, HIVE-17213.3.patch, HIVE-17213.4.patch, HIVE-17213.5.patch
>
>
> HoS file merging doesn't work properly since it doesn't set linked file sinks 
> properly which is used to generate move tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14343) HiveDriverRunHookContext's command is null in HS2 mode

2017-08-02 Thread Peng Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111932#comment-16111932
 ] 

Peng Cheng commented on HIVE-14343:
---

Apparently this bug also affects Hive 1.2.1 branch:

https://stackoverflow.com/questions/45450066/why-hivedriverrunhook-cannot-read-any-command-when-it-is-submitted-is-it-a-bug

Would you like me to open an new ticket?

> HiveDriverRunHookContext's command is null in HS2 mode
> --
>
> Key: HIVE-14343
> URL: https://issues.apache.org/jira/browse/HIVE-14343
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Fix For: 2.3.0
>
> Attachments: HIVE-14343.0.patch, HIVE-14343.1.patch
>
>
> Looking at the {{Driver#runInternal(String command, boolean 
> alreadyCompiled)}}:
> {code}
> HiveDriverRunHookContext hookContext = new 
> HiveDriverRunHookContextImpl(conf, command);
> // Get all the driver run hooks and pre-execute them.
> List driverRunHooks;
> {code}
> The context is initialized with the {{command}} passed in to the method. 
> However, this command is always null if {{alreadyCompiled}} is true, which is 
> the case for HS2 mode.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-13989) Extended ACLs are not handled according to specification

2017-08-02 Thread Chris Drome (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111879#comment-16111879
 ] 

Chris Drome commented on HIVE-13989:


[~vgumashta], I checked the behavior of hadoop-2.7 and hadoop-2.8, which 
matches what you describe about zeroing out the 'other' permissions.

My intention was to let HDFS create and manage the child directories where 
possible.
However, the reason for this patch was because early versions of ACL support in 
hadoop combined with the original treatment of ACLs in hive/hcat were 
generating incorrect results.

Let me revisit the patch and submit a new version.

> Extended ACLs are not handled according to specification
> 
>
> Key: HIVE-13989
> URL: https://issues.apache.org/jira/browse/HIVE-13989
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-13989.1-branch-1.patch, HIVE-13989.1.patch, 
> HIVE-13989-branch-1.patch, HIVE-13989-branch-2.2.patch, 
> HIVE-13989-branch-2.2.patch, HIVE-13989-branch-2.2.patch
>
>
> Hive takes two approaches to working with extended ACLs depending on whether 
> data is being produced via a Hive query or HCatalog APIs. A Hive query will 
> run an FsShell command to recursively set the extended ACLs for a directory 
> sub-tree. HCatalog APIs will attempt to build up the directory sub-tree 
> programmatically and runs some code to set the ACLs to match the parent 
> directory.
> Some incorrect assumptions were made when implementing the extended ACLs 
> support. Refer to https://issues.apache.org/jira/browse/HDFS-4685 for the 
> design documents of extended ACLs in HDFS. These documents model the 
> implementation after the POSIX implementation on Linux, which can be found at 
> http://www.vanemery.com/Linux/ACL/POSIX_ACL_on_Linux.html.
> The code for setting extended ACLs via HCatalog APIs is found in 
> HdfsUtils.java:
> {code}
> if (aclEnabled) {
>   aclStatus =  sourceStatus.getAclStatus();
>   if (aclStatus != null) {
> LOG.trace(aclStatus.toString());
> aclEntries = aclStatus.getEntries();
> removeBaseAclEntries(aclEntries);
> //the ACL api's also expect the tradition user/group/other permission 
> in the form of ACL
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.USER, 
> sourcePerm.getUserAction()));
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.GROUP, 
> sourcePerm.getGroupAction()));
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.OTHER, 
> sourcePerm.getOtherAction()));
>   }
> }
> {code}
> We found that DEFAULT extended ACL rules were not being inherited properly by 
> the directory sub-tree, so the above code is incomplete because it 
> effectively drops the DEFAULT rules. The second problem is with the call to 
> {{sourcePerm.getGroupAction()}}, which is incorrect in the case of extended 
> ACLs. When extended ACLs are used the GROUP permission is replaced with the 
> extended ACL mask. So the above code will apply the wrong permissions to the 
> GROUP. Instead the correct GROUP permissions now need to be pulled from the 
> AclEntry as returned by {{getAclStatus().getEntries()}}. See the 
> implementation of the new method {{getDefaultAclEntries}} for details.
> Similar issues exist with the HCatalog API. None of the API accounts for 
> setting extended ACLs on the directory sub-tree. The changes to the HCatalog 
> API allow the extended ACLs to be passed into the required methods similar to 
> how basic permissions are passed in. When building the directory sub-tree the 
> extended ACLs of the table directory are inherited by all sub-directories, 
> including the DEFAULT rules.
> Replicating the problem:
> Create a table to write data into (I will use acl_test as the destination and 
> words_text as the source) and set the ACLs as follows:
> {noformat}
> $ hdfs dfs -setfacl -m 
> default:user::rwx,default:group::r-x,default:mask::rwx,default:user:hdfs:rwx,group::r-x,user:hdfs:rwx
>  /user/cdrome/hive/acl_test
> $ hdfs dfs -ls -d /user/cdrome/hive/acl_test
> drwxrwx---+  - cdrome hdfs  0 2016-07-13 20:36 
> /user/cdrome/hive/acl_test
> $ hdfs dfs -getfacl -R /user/cdrome/hive/acl_test
> # file: /user/cdrome/hive/acl_test
> # owner: cdrome
> # group: hdfs
> user::rwx
> user:hdfs:rwx
> group::r-x
> mask::rwx
> other::---
> default:user::rwx
> default:user:hdfs:rwx
> default:group::r-x
> default:mask::rwx
> default:other::---
> {noformat}
> Note that the basic GROUP permission is set to {{rwx}} after setting the 
> ACLs. The ACLs explicitly set the DEFAULT rules and a rule specifically for 
> the {{hdfs}} user.
> Run the following query to populate the 

[jira] [Updated] (HIVE-17222) Llap: Iotrace throws java.lang.UnsupportedOperationException with IncompleteCb

2017-08-02 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-17222:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Thanks [~sershe]. Committed to master.

> Llap: Iotrace throws  java.lang.UnsupportedOperationException with 
> IncompleteCb
> ---
>
> Key: HIVE-17222
> URL: https://issues.apache.org/jira/browse/HIVE-17222
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-17222.1.patch
>
>
> branch: hive master 
> Running Q76 at 1 TB generates the following exception.
> {noformat}
> Caused by: java.io.IOException: java.lang.UnsupportedOperationException
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.rethrowErrorIfAny(LlapRecordReader.java:349)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.nextCvb(LlapRecordReader.java:304)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.next(LlapRecordReader.java:244)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.next(LlapRecordReader.java:67)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
> ... 23 more
> Caused by: java.lang.UnsupportedOperationException
> at 
> org.apache.hadoop.hive.common.io.DiskRange.getData(DiskRange.java:86)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.IoTrace.logRange(IoTrace.java:304)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.IoTrace.logRanges(IoTrace.java:291)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:328)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:426)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:250)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:247)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:247)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:96)
> ... 6 more
> {noformat}
> When {{IncompleteCb}} is encountered, it ends up throwing this error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration

2017-08-02 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-17160:
--
Attachment: HIVE-17160.2.patch

> Adding kerberos Authorization to the Druid hive integration
> ---
>
> Key: HIVE-17160
> URL: https://issues.apache.org/jira/browse/HIVE-17160
> Project: Hive
>  Issue Type: New Feature
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17160.2.patch, HIVE-17160.patch
>
>
> This goal of this feature is to allow hive querying a secured druid cluster 
> using kerberos credentials.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration

2017-08-02 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-17160:
--
Status: Patch Available  (was: In Progress)

> Adding kerberos Authorization to the Druid hive integration
> ---
>
> Key: HIVE-17160
> URL: https://issues.apache.org/jira/browse/HIVE-17160
> Project: Hive
>  Issue Type: New Feature
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17160.2.patch, HIVE-17160.patch
>
>
> This goal of this feature is to allow hive querying a secured druid cluster 
> using kerberos credentials.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration

2017-08-02 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-17160:
--
Attachment: (was: HIVE-17160.2.patch)

> Adding kerberos Authorization to the Druid hive integration
> ---
>
> Key: HIVE-17160
> URL: https://issues.apache.org/jira/browse/HIVE-17160
> Project: Hive
>  Issue Type: New Feature
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17160.2.patch, HIVE-17160.patch
>
>
> This goal of this feature is to allow hive querying a secured druid cluster 
> using kerberos credentials.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration

2017-08-02 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-17160:
--
Status: Open  (was: Patch Available)

> Adding kerberos Authorization to the Druid hive integration
> ---
>
> Key: HIVE-17160
> URL: https://issues.apache.org/jira/browse/HIVE-17160
> Project: Hive
>  Issue Type: New Feature
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17160.2.patch, HIVE-17160.patch
>
>
> This goal of this feature is to allow hive querying a secured druid cluster 
> using kerberos credentials.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration

2017-08-02 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-17160:
--
Status: Patch Available  (was: Open)

> Adding kerberos Authorization to the Druid hive integration
> ---
>
> Key: HIVE-17160
> URL: https://issues.apache.org/jira/browse/HIVE-17160
> Project: Hive
>  Issue Type: New Feature
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17160.2.patch, HIVE-17160.patch
>
>
> This goal of this feature is to allow hive querying a secured druid cluster 
> using kerberos credentials.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration

2017-08-02 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-17160:
--
Status: In Progress  (was: Patch Available)

> Adding kerberos Authorization to the Druid hive integration
> ---
>
> Key: HIVE-17160
> URL: https://issues.apache.org/jira/browse/HIVE-17160
> Project: Hive
>  Issue Type: New Feature
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17160.2.patch, HIVE-17160.patch
>
>
> This goal of this feature is to allow hive querying a secured druid cluster 
> using kerberos credentials.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17213) HoS: file merging doesn't work for union all

2017-08-02 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-17213:

Attachment: HIVE-17213.5.patch

Patch v4 used INSERT OVERWRITE DIRECTORY, which didn't work for mini 
SparkOnYarn test. It seems currently all the qfiles in 
{{spark.only.query.files}} are in default run using the 
{{MiniSparkOnYarnCliDriver}}, which is causing problem for this case since we 
want it to run using {{TestSparkCliDriver}}.

Patch v5 further divides the existing property {{spark.only.query.files}} into 
two:
- {{spark.only.query.files}}: contains all qfiles that are Spark only AND 
should only be tested using {{TestSparkCliDriver}},
- {{miniSparkOnYarn.only.query.files}}: contains all qfiles that are Spark only 
AND should only be tested using {{MiniSparkOnYarnCliDriver}}.


> HoS: file merging doesn't work for union all
> 
>
> Key: HIVE-17213
> URL: https://issues.apache.org/jira/browse/HIVE-17213
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-17213.0.patch, HIVE-17213.1.patch, 
> HIVE-17213.2.patch, HIVE-17213.3.patch, HIVE-17213.4.patch, HIVE-17213.5.patch
>
>
> HoS file merging doesn't work properly since it doesn't set linked file sinks 
> properly which is used to generate move tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17226) Use strong hashing as security improvement

2017-08-02 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17226:
--
Status: Patch Available  (was: Open)

> Use strong hashing as security improvement
> --
>
> Key: HIVE-17226
> URL: https://issues.apache.org/jira/browse/HIVE-17226
> Project: Hive
>  Issue Type: Improvement
>  Components: Security
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17226.1.patch
>
>
> There have been 2 places identified where weak hashing needs to be replaced 
> by SHA256.
> 1. CookieSigner.java uses MessageDigest.getInstance("SHA"). Mostly SHA is 
> mapped to SHA-1, which is not secure enough according to today's standards. 
> We should use SHA-256 instead.
> 2. GenericUDFMaskHash.java uses DigestUtils.md5Hex. MD5 is considered weak 
> and should be replaced by DigestUtils.sha256Hex.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17226) Use strong hashing as security improvement

2017-08-02 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17226:
--
Attachment: HIVE-17226.1.patch

> Use strong hashing as security improvement
> --
>
> Key: HIVE-17226
> URL: https://issues.apache.org/jira/browse/HIVE-17226
> Project: Hive
>  Issue Type: Improvement
>  Components: Security
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17226.1.patch
>
>
> There have been 2 places identified where weak hashing needs to be replaced 
> by SHA256.
> 1. CookieSigner.java uses MessageDigest.getInstance("SHA"). Mostly SHA is 
> mapped to SHA-1, which is not secure enough according to today's standards. 
> We should use SHA-256 instead.
> 2. GenericUDFMaskHash.java uses DigestUtils.md5Hex. MD5 is considered weak 
> and should be replaced by DigestUtils.sha256Hex.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17232) "No match found" Compactor finds a bucket file thinking it's a directory

2017-08-02 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111781#comment-16111781
 ] 

Eugene Koifman commented on HIVE-17232:
---

In the case of this test at least it's due to compaction being invoked via 
txnHandler.compact(new CompactionRequest("default", 
Table.ACIDTBLPART.name(), CompactionType.MAJOR));
but the target table is Partitioned.  Thus in CompactorMR.run()
AcidUtils.Directory dir = AcidUtils.getAcidState(new 
Path(sd.getLocation()), conf, txns, false, true);

ends up finding the delta.../bucket.. files but thinks they are "original" 
because they are not at the level where they are expected.

Compacting all partitions of a partition table in one command is not supported.
If compaction is invoked via Alter Table (as it should), DDLTask.compact() will 
check that if table is partitioned then partition spec is supplied.

todo: add some checks to getAcidState() to do some sanity checking to make sure 
it raises some error if it finds unexpected directory layout. 



>  "No match found"  Compactor finds a bucket file thinking it's a directory
> --
>
> Key: HIVE-17232
> URL: https://issues.apache.org/jira/browse/HIVE-17232
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> {noformat}
> 2017-08-02T12:38:11,996  WARN [main] compactor.CompactorMR: Found a 
> non-bucket file that we thought matched the bucket pattern! 
> file:/Users/ekoifman/dev/hiv\
> erwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands2-1501702264311/warehouse/acidtblpart/p=1/delta_013_013_/bucket_1
>  Matcher=java\
> .util.regex.Matcher[pattern=^[0-9]{6} region=0,12 lastmatch=]
> 2017-08-02T12:38:11,996  INFO [main] mapreduce.JobSubmitter: Cleaning up the 
> staging area 
> file:/tmp/hadoop/mapred/staging/ekoifman1723152463/.staging/job_lo\
> cal1723152463_0183
> 2017-08-02T12:38:11,997 ERROR [main] compactor.Worker: Caught exception while 
> trying to compact 
> id:1,dbname:default,tableName:ACIDTBLPART,partName:null,stat\
> e:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0.
>   Marking failed to avoid repeated failures, java.lang.IllegalStateException: 
> \
> No match found
> at java.util.regex.Matcher.group(Matcher.java:536)
> at java.util.regex.Matcher.group(Matcher.java:496)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorInputFormat.addFileToMap(CompactorMR.java:577)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorInputFormat.getSplits(CompactorMR.java:549)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:330)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:322)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:198)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.launchCompactionJob(CompactorMR.java:320)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:275)
> at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:166)
> at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runWorker(TestTxnCommands2.java:1138)
> at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.updateDeletePartitioned(TestTxnCommands2.java:894)
> {noformat}
> the stack trace points to 1st runWorker() in updateDeletePartitioned() though 
> the test run was TestTxnCommands2WithSplitUpdateAndVectorization



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17232) "No match found" Compactor finds a bucket file thinking it's a directory

2017-08-02 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111693#comment-16111693
 ] 

Eugene Koifman edited comment on HIVE-17232 at 8/2/17 9:40 PM:
---

The loop in CopactorInputFormat.getSplits() is expecting delta/base or original 
file but ends up seeing a bucket file in an acid delta dir


was (Author: ekoifman):
The loop in CopactorInputSplit.getSplits() is expecting delta/base or original 
file but ends up seeing a bucket file in an acid delta dir

>  "No match found"  Compactor finds a bucket file thinking it's a directory
> --
>
> Key: HIVE-17232
> URL: https://issues.apache.org/jira/browse/HIVE-17232
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> {noformat}
> 2017-08-02T12:38:11,996  WARN [main] compactor.CompactorMR: Found a 
> non-bucket file that we thought matched the bucket pattern! 
> file:/Users/ekoifman/dev/hiv\
> erwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands2-1501702264311/warehouse/acidtblpart/p=1/delta_013_013_/bucket_1
>  Matcher=java\
> .util.regex.Matcher[pattern=^[0-9]{6} region=0,12 lastmatch=]
> 2017-08-02T12:38:11,996  INFO [main] mapreduce.JobSubmitter: Cleaning up the 
> staging area 
> file:/tmp/hadoop/mapred/staging/ekoifman1723152463/.staging/job_lo\
> cal1723152463_0183
> 2017-08-02T12:38:11,997 ERROR [main] compactor.Worker: Caught exception while 
> trying to compact 
> id:1,dbname:default,tableName:ACIDTBLPART,partName:null,stat\
> e:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0.
>   Marking failed to avoid repeated failures, java.lang.IllegalStateException: 
> \
> No match found
> at java.util.regex.Matcher.group(Matcher.java:536)
> at java.util.regex.Matcher.group(Matcher.java:496)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorInputFormat.addFileToMap(CompactorMR.java:577)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorInputFormat.getSplits(CompactorMR.java:549)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:330)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:322)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:198)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.launchCompactionJob(CompactorMR.java:320)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:275)
> at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:166)
> at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runWorker(TestTxnCommands2.java:1138)
> at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.updateDeletePartitioned(TestTxnCommands2.java:894)
> {noformat}
> the stack trace points to 1st runWorker() in updateDeletePartitioned() though 
> the test run was TestTxnCommands2WithSplitUpdateAndVectorization



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17226) Use strong hashing as security improvement

2017-08-02 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111770#comment-16111770
 ] 

Tao Li commented on HIVE-17226:
---

[~asherman] I think changing the hash function for GenericUDFMaskHash should 
not cause compatibility issues, since there is no expectation/assumption that 
the masking result has to be the same. But we should include this change in the 
release notes so users are aware of it.

> Use strong hashing as security improvement
> --
>
> Key: HIVE-17226
> URL: https://issues.apache.org/jira/browse/HIVE-17226
> Project: Hive
>  Issue Type: Improvement
>  Components: Security
>Reporter: Tao Li
>Assignee: Tao Li
>
> There have been 2 places identified where weak hashing needs to be replaced 
> by SHA256.
> 1. CookieSigner.java uses MessageDigest.getInstance("SHA"). Mostly SHA is 
> mapped to SHA-1, which is not secure enough according to today's standards. 
> We should use SHA-256 instead.
> 2. GenericUDFMaskHash.java uses DigestUtils.md5Hex. MD5 is considered weak 
> and should be replaced by DigestUtils.sha256Hex.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17226) Use strong hashing as security improvement

2017-08-02 Thread Andrew Sherman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111758#comment-16111758
 ] 

Andrew Sherman commented on HIVE-17226:
---

Hi [~taoli-hwx] I have no idea, sorry. That's why I'm asking, in the hope of 
learning something. 

> Use strong hashing as security improvement
> --
>
> Key: HIVE-17226
> URL: https://issues.apache.org/jira/browse/HIVE-17226
> Project: Hive
>  Issue Type: Improvement
>  Components: Security
>Reporter: Tao Li
>Assignee: Tao Li
>
> There have been 2 places identified where weak hashing needs to be replaced 
> by SHA256.
> 1. CookieSigner.java uses MessageDigest.getInstance("SHA"). Mostly SHA is 
> mapped to SHA-1, which is not secure enough according to today's standards. 
> We should use SHA-256 instead.
> 2. GenericUDFMaskHash.java uses DigestUtils.md5Hex. MD5 is considered weak 
> and should be replaced by DigestUtils.sha256Hex.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16820) TezTask may not shut down correctly before submit

2017-08-02 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111755#comment-16111755
 ] 

Sergey Shelukhin commented on HIVE-16820:
-

Yeah, what I was saying is that we don't need the same bugfix for that task 
cause there's no implementation. It probably does need an implementation 
(without bugs like this).

> TezTask may not shut down correctly before submit
> -
>
> Key: HIVE-16820
> URL: https://issues.apache.org/jira/browse/HIVE-16820
> Project: Hive
>  Issue Type: Bug
>Reporter: Visakh Nair
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-16820.01.patch, HIVE-16820.patch
>
>
> The query will run and only fail at the very end when the driver checks its 
> own shutdown flag.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks

2017-08-02 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111753#comment-16111753
 ] 

Mithun Radhakrishnan commented on HIVE-15686:
-

For the record, this patch is only for {{branch-2}} and {{branch-2.2}}. 

> Partitions on Remote HDFS break encryption-zone checks
> --
>
> Key: HIVE-15686
> URL: https://issues.apache.org/jira/browse/HIVE-15686
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-15686.branch-2.patch
>
>
> This is in relation to HIVE-13243, which fixes encryption-zone checks for 
> external tables.
> Unfortunately, this is still borked for partitions with remote HDFS paths. 
> The code fails as follows:
> {noformat}
> 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer 
> (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during 
> processing of message.
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, 
> expected: hdfs://local-cluster-n1.myth.net:8020
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985)
> at 
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974)
> at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:763)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:285)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I have a really simple fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17234) Remove HBase metastore from master

2017-08-02 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned HIVE-17234:
-


> Remove HBase metastore from master
> --
>
> Key: HIVE-17234
> URL: https://issues.apache.org/jira/browse/HIVE-17234
> Project: Hive
>  Issue Type: Task
>  Components: HBase Metastore
>Affects Versions: 3.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>
> No new development has been done on the HBase metastore in at least a year, 
> and to my knowledge no one is using it (nor is it even in a state to be fully 
> usable).  Given the lack of interest in continuing to develop it, we should 
> remove it rather than leave dead code hanging around and extra tests taking 
> up time in test runs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration

2017-08-02 Thread slim bouguerra (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111738#comment-16111738
 ] 

slim bouguerra commented on HIVE-17160:
---

addressed the comments and uploaded new patch.

> Adding kerberos Authorization to the Druid hive integration
> ---
>
> Key: HIVE-17160
> URL: https://issues.apache.org/jira/browse/HIVE-17160
> Project: Hive
>  Issue Type: New Feature
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17160.2.patch, HIVE-17160.patch
>
>
> This goal of this feature is to allow hive querying a secured druid cluster 
> using kerberos credentials.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17233) Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs.

2017-08-02 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17233:

Status: Patch Available  (was: Open)

> Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs.
> 
>
> Key: HIVE-17233
> URL: https://issues.apache.org/jira/browse/HIVE-17233
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17233.1.patch
>
>
> This has to do with {{HIVE-15575}}. {{TezCompiler}} seems to set 
> {{mapred.input.dir.recursive}} to {{true}}. This is acceptable for Hive jobs, 
> since this allows Hive to consume its peculiar {{UNION ALL}} output, where 
> the output of each relation is stored in a separate sub-directory of the 
> output-dir.
> For such output to be readable through HCatalog (via Pig/HCatLoader), 
> {{mapred.input.dir.recursive}} should be set from {{HCatInputFormat}} as 
> well. Otherwise, one gets zero records for that input.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17233) Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs.

2017-08-02 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17233:

Attachment: HIVE-17233.1.patch

The proposed fix.

> Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs.
> 
>
> Key: HIVE-17233
> URL: https://issues.apache.org/jira/browse/HIVE-17233
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17233.1.patch
>
>
> This has to do with {{HIVE-15575}}. {{TezCompiler}} seems to set 
> {{mapred.input.dir.recursive}} to {{true}}. This is acceptable for Hive jobs, 
> since this allows Hive to consume its peculiar {{UNION ALL}} output, where 
> the output of each relation is stored in a separate sub-directory of the 
> output-dir.
> For such output to be readable through HCatalog (via Pig/HCatLoader), 
> {{mapred.input.dir.recursive}} should be set from {{HCatInputFormat}} as 
> well. Otherwise, one gets zero records for that input.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17226) Use strong hashing as security improvement

2017-08-02 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111703#comment-16111703
 ] 

Tao Li commented on HIVE-17226:
---

[~asherman] Regarding the CookieSigner, I don't see an incompatibility issue. 
Regarding the UDF hash, do you have any concerns related to compatibility?

> Use strong hashing as security improvement
> --
>
> Key: HIVE-17226
> URL: https://issues.apache.org/jira/browse/HIVE-17226
> Project: Hive
>  Issue Type: Improvement
>  Components: Security
>Reporter: Tao Li
>Assignee: Tao Li
>
> There have been 2 places identified where weak hashing needs to be replaced 
> by SHA256.
> 1. CookieSigner.java uses MessageDigest.getInstance("SHA"). Mostly SHA is 
> mapped to SHA-1, which is not secure enough according to today's standards. 
> We should use SHA-256 instead.
> 2. GenericUDFMaskHash.java uses DigestUtils.md5Hex. MD5 is considered weak 
> and should be replaced by DigestUtils.sha256Hex.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17232) "No match found" Compactor finds a bucket file thinking it's a directory

2017-08-02 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111693#comment-16111693
 ] 

Eugene Koifman commented on HIVE-17232:
---

The loop in CopactorInputSplit.getSplits() is expecting delta/base or original 
file but ends up seeing a bucket file in an acid delta dir

>  "No match found"  Compactor finds a bucket file thinking it's a directory
> --
>
> Key: HIVE-17232
> URL: https://issues.apache.org/jira/browse/HIVE-17232
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> {noformat}
> 2017-08-02T12:38:11,996  WARN [main] compactor.CompactorMR: Found a 
> non-bucket file that we thought matched the bucket pattern! 
> file:/Users/ekoifman/dev/hiv\
> erwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands2-1501702264311/warehouse/acidtblpart/p=1/delta_013_013_/bucket_1
>  Matcher=java\
> .util.regex.Matcher[pattern=^[0-9]{6} region=0,12 lastmatch=]
> 2017-08-02T12:38:11,996  INFO [main] mapreduce.JobSubmitter: Cleaning up the 
> staging area 
> file:/tmp/hadoop/mapred/staging/ekoifman1723152463/.staging/job_lo\
> cal1723152463_0183
> 2017-08-02T12:38:11,997 ERROR [main] compactor.Worker: Caught exception while 
> trying to compact 
> id:1,dbname:default,tableName:ACIDTBLPART,partName:null,stat\
> e:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0.
>   Marking failed to avoid repeated failures, java.lang.IllegalStateException: 
> \
> No match found
> at java.util.regex.Matcher.group(Matcher.java:536)
> at java.util.regex.Matcher.group(Matcher.java:496)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorInputFormat.addFileToMap(CompactorMR.java:577)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorInputFormat.getSplits(CompactorMR.java:549)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:330)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:322)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:198)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.launchCompactionJob(CompactorMR.java:320)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:275)
> at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:166)
> at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runWorker(TestTxnCommands2.java:1138)
> at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.updateDeletePartitioned(TestTxnCommands2.java:894)
> {noformat}
> the stack trace points to 1st runWorker() in updateDeletePartitioned() though 
> the test run was TestTxnCommands2WithSplitUpdateAndVectorization



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17232) "No match found" Compactor finds a bucket file thinking it's a directory

2017-08-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17232:
-

Assignee: Eugene Koifman

>  "No match found"  Compactor finds a bucket file thinking it's a directory
> --
>
> Key: HIVE-17232
> URL: https://issues.apache.org/jira/browse/HIVE-17232
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> {noformat}
> 2017-08-02T12:38:11,996  WARN [main] compactor.CompactorMR: Found a 
> non-bucket file that we thought matched the bucket pattern! 
> file:/Users/ekoifman/dev/hiv\
> erwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands2-1501702264311/warehouse/acidtblpart/p=1/delta_013_013_/bucket_1
>  Matcher=java\
> .util.regex.Matcher[pattern=^[0-9]{6} region=0,12 lastmatch=]
> 2017-08-02T12:38:11,996  INFO [main] mapreduce.JobSubmitter: Cleaning up the 
> staging area 
> file:/tmp/hadoop/mapred/staging/ekoifman1723152463/.staging/job_lo\
> cal1723152463_0183
> 2017-08-02T12:38:11,997 ERROR [main] compactor.Worker: Caught exception while 
> trying to compact 
> id:1,dbname:default,tableName:ACIDTBLPART,partName:null,stat\
> e:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0.
>   Marking failed to avoid repeated failures, java.lang.IllegalStateException: 
> \
> No match found
> at java.util.regex.Matcher.group(Matcher.java:536)
> at java.util.regex.Matcher.group(Matcher.java:496)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorInputFormat.addFileToMap(CompactorMR.java:577)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorInputFormat.getSplits(CompactorMR.java:549)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:330)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:322)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:198)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.launchCompactionJob(CompactorMR.java:320)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:275)
> at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:166)
> at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runWorker(TestTxnCommands2.java:1138)
> at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.updateDeletePartitioned(TestTxnCommands2.java:894)
> {noformat}
> the stack trace points to 1st runWorker() in updateDeletePartitioned() though 
> the test run was TestTxnCommands2WithSplitUpdateAndVectorization



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17233) Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs.

2017-08-02 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17233:
---

Assignee: Mithun Radhakrishnan

> Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs.
> 
>
> Key: HIVE-17233
> URL: https://issues.apache.org/jira/browse/HIVE-17233
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>
> This has to do with {{HIVE-15575}}. {{TezCompiler}} seems to set 
> {{mapred.input.dir.recursive}} to {{true}}. This is acceptable for Hive jobs, 
> since this allows Hive to consume its peculiar {{UNION ALL}} output, where 
> the output of each relation is stored in a separate sub-directory of the 
> output-dir.
> For such output to be readable through HCatalog (via Pig/HCatLoader), 
> {{mapred.input.dir.recursive}} should be set from {{HCatInputFormat}} as 
> well. Otherwise, one gets zero records for that input.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17232) "No match found" Compactor finds a bucket file thinking it's a directory

2017-08-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17232:
--
Summary:  "No match found"  Compactor finds a bucket file thinking it's a 
directory  (was:  No match found  Compactor finds a bucket file thinking it's a 
directory)

>  "No match found"  Compactor finds a bucket file thinking it's a directory
> --
>
> Key: HIVE-17232
> URL: https://issues.apache.org/jira/browse/HIVE-17232
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>
> {noformat}
> 2017-08-02T12:38:11,996  WARN [main] compactor.CompactorMR: Found a 
> non-bucket file that we thought matched the bucket pattern! 
> file:/Users/ekoifman/dev/hiv\
> erwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands2-1501702264311/warehouse/acidtblpart/p=1/delta_013_013_/bucket_1
>  Matcher=java\
> .util.regex.Matcher[pattern=^[0-9]{6} region=0,12 lastmatch=]
> 2017-08-02T12:38:11,996  INFO [main] mapreduce.JobSubmitter: Cleaning up the 
> staging area 
> file:/tmp/hadoop/mapred/staging/ekoifman1723152463/.staging/job_lo\
> cal1723152463_0183
> 2017-08-02T12:38:11,997 ERROR [main] compactor.Worker: Caught exception while 
> trying to compact 
> id:1,dbname:default,tableName:ACIDTBLPART,partName:null,stat\
> e:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0.
>   Marking failed to avoid repeated failures, java.lang.IllegalStateException: 
> \
> No match found
> at java.util.regex.Matcher.group(Matcher.java:536)
> at java.util.regex.Matcher.group(Matcher.java:496)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorInputFormat.addFileToMap(CompactorMR.java:577)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorInputFormat.getSplits(CompactorMR.java:549)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:330)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:322)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:198)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.launchCompactionJob(CompactorMR.java:320)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:275)
> at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:166)
> at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runWorker(TestTxnCommands2.java:1138)
> at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.updateDeletePartitioned(TestTxnCommands2.java:894)
> {noformat}
> the stack trace points to 1st runWorker() in updateDeletePartitioned() though 
> the test run was TestTxnCommands2WithSplitUpdateAndVectorization



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17172) add ordering checks to DiskRangeList

2017-08-02 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17172:

Fix Version/s: 2.4.0
   3.0.0

> add ordering checks to DiskRangeList
> 
>
> Key: HIVE-17172
> URL: https://issues.apache.org/jira/browse/HIVE-17172
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17172.01.patch, HIVE-17172.02.patch, 
> HIVE-17172.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17172) add ordering checks to DiskRangeList

2017-08-02 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17172:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to master and branch-2. Thanks for the reviews!

> add ordering checks to DiskRangeList
> 
>
> Key: HIVE-17172
> URL: https://issues.apache.org/jira/browse/HIVE-17172
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17172.01.patch, HIVE-17172.02.patch, 
> HIVE-17172.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks

2017-08-02 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-15686:

Status: Patch Available  (was: Open)

> Partitions on Remote HDFS break encryption-zone checks
> --
>
> Key: HIVE-15686
> URL: https://issues.apache.org/jira/browse/HIVE-15686
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-15686.branch-2.patch
>
>
> This is in relation to HIVE-13243, which fixes encryption-zone checks for 
> external tables.
> Unfortunately, this is still borked for partitions with remote HDFS paths. 
> The code fails as follows:
> {noformat}
> 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer 
> (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during 
> processing of message.
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, 
> expected: hdfs://local-cluster-n1.myth.net:8020
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985)
> at 
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974)
> at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:763)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:285)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I have a really simple fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks

2017-08-02 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-15686:

Attachment: HIVE-15686.branch-2.patch

> Partitions on Remote HDFS break encryption-zone checks
> --
>
> Key: HIVE-15686
> URL: https://issues.apache.org/jira/browse/HIVE-15686
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-15686.branch-2.patch
>
>
> This is in relation to HIVE-13243, which fixes encryption-zone checks for 
> external tables.
> Unfortunately, this is still borked for partitions with remote HDFS paths. 
> The code fails as follows:
> {noformat}
> 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer 
> (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during 
> processing of message.
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, 
> expected: hdfs://local-cluster-n1.myth.net:8020
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985)
> at 
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974)
> at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:763)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:285)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I have a really simple fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks

2017-08-02 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-15686:

 Assignee: Mithun Radhakrishnan
Affects Version/s: (was: 1.2.1)
   2.2.0
   Status: Open  (was: Patch Available)

Attaching patch for {{branch-2}}.

> Partitions on Remote HDFS break encryption-zone checks
> --
>
> Key: HIVE-15686
> URL: https://issues.apache.org/jira/browse/HIVE-15686
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-15686.branch-2.patch
>
>
> This is in relation to HIVE-13243, which fixes encryption-zone checks for 
> external tables.
> Unfortunately, this is still borked for partitions with remote HDFS paths. 
> The code fails as follows:
> {noformat}
> 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer 
> (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during 
> processing of message.
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, 
> expected: hdfs://local-cluster-n1.myth.net:8020
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985)
> at 
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974)
> at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:763)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:285)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I have a really simple fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)



[jira] [Updated] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks

2017-08-02 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-15686:

Attachment: (was: HADOOP-14015.1.patch)

> Partitions on Remote HDFS break encryption-zone checks
> --
>
> Key: HIVE-15686
> URL: https://issues.apache.org/jira/browse/HIVE-15686
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Mithun Radhakrishnan
>
> This is in relation to HIVE-13243, which fixes encryption-zone checks for 
> external tables.
> Unfortunately, this is still borked for partitions with remote HDFS paths. 
> The code fails as follows:
> {noformat}
> 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer 
> (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during 
> processing of message.
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, 
> expected: hdfs://local-cluster-n1.myth.net:8020
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985)
> at 
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974)
> at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:763)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:285)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I have a really simple fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17213) HoS: file merging doesn't work for union all

2017-08-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111656#comment-16111656
 ] 

Hive QA commented on HIVE-17213:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12880072/HIVE-17213.4.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11138 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat6]
 (batchId=7)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_union_merge]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=236)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6236/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6236/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6236/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12880072 - PreCommit-HIVE-Build

> HoS: file merging doesn't work for union all
> 
>
> Key: HIVE-17213
> URL: https://issues.apache.org/jira/browse/HIVE-17213
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-17213.0.patch, HIVE-17213.1.patch, 
> HIVE-17213.2.patch, HIVE-17213.3.patch, HIVE-17213.4.patch
>
>
> HoS file merging doesn't work properly since it doesn't set linked file sinks 
> properly which is used to generate move tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16820) TezTask may not shut down correctly before submit

2017-08-02 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111648#comment-16111648
 ] 

Mithun Radhakrishnan commented on HIVE-16820:
-

bq. MergeFileTask doesn't appear to implement shutdown at all.

:] Ah, but doesn't it need one? It is conceivable that a user might cancel a 
query between the TezTask and the MergeFileTask, (or simply interrupt an 
{{ALTER TABLE CONCATENATE}}). In that case, the merge will run through to the 
end, in spite of cancellation. 

I wonder if there isn't value in applying the HIVE-12556 + HIVE-16820 treatment 
for MergeFileTask as well.

> TezTask may not shut down correctly before submit
> -
>
> Key: HIVE-16820
> URL: https://issues.apache.org/jira/browse/HIVE-16820
> Project: Hive
>  Issue Type: Bug
>Reporter: Visakh Nair
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-16820.01.patch, HIVE-16820.patch
>
>
> The query will run and only fail at the very end when the driver checks its 
> own shutdown flag.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14786) Beeline displays binary column data as string instead of byte array

2017-08-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111550#comment-16111550
 ] 

Hive QA commented on HIVE-14786:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12880064/HIVE-14786.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11139 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=236)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=236)
org.apache.hive.beeline.TestBufferedRows.testNormalizeWidths (batchId=177)
org.apache.hive.beeline.TestTableOutputFormat.testPrint (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6235/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6235/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6235/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12880064 - PreCommit-HIVE-Build

> Beeline displays binary column data as string instead of byte array
> ---
>
> Key: HIVE-14786
> URL: https://issues.apache.org/jira/browse/HIVE-14786
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 1.2.1
>Reporter: Ram Mettu
>Assignee: Barna Zsombor Klara
>Priority: Minor
> Attachments: HIVE-14786.01.patch
>
>
> In Beeline, doing a SELECT binaryColName FROM tableName; results in displays 
> data as string type (which looks corrupted due to unprintable chars). Instead 
> Beeline should display binary columns as byte array. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17225) HoS DPP pruning sink ops can target parallel work objects

2017-08-02 Thread Janaki Lahorani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janaki Lahorani reassigned HIVE-17225:
--

Assignee: Janaki Lahorani  (was: Sahil Takiar)

> HoS DPP pruning sink ops can target parallel work objects
> -
>
> Key: HIVE-17225
> URL: https://issues.apache.org/jira/browse/HIVE-17225
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
>
> Setup:
> {code:sql}
> SET hive.spark.dynamic.partition.pruning=true;
> SET hive.strict.checks.cartesian.product=false;
> SET hive.auto.convert.join=true;
> CREATE TABLE partitioned_table1 (col int) PARTITIONED BY (part_col int);
> CREATE TABLE regular_table1 (col int);
> CREATE TABLE regular_table2 (col int);
> ALTER TABLE partitioned_table1 ADD PARTITION (part_col = 1);
> ALTER TABLE partitioned_table1 ADD PARTITION (part_col = 2);
> ALTER TABLE partitioned_table1 ADD PARTITION (part_col = 3);
> INSERT INTO table regular_table1 VALUES (1), (2), (3), (4), (5), (6);
> INSERT INTO table regular_table2 VALUES (1), (2), (3), (4), (5), (6);
> INSERT INTO TABLE partitioned_table1 PARTITION (part_col = 1) VALUES (1);
> INSERT INTO TABLE partitioned_table1 PARTITION (part_col = 2) VALUES (2);
> INSERT INTO TABLE partitioned_table1 PARTITION (part_col = 3) VALUES (3);
> SELECT *
> FROM   partitioned_table1,
>regular_table1 rt1,
>regular_table2 rt2
> WHERE  rt1.col = partitioned_table1.part_col
>AND rt2.col = partitioned_table1.part_col;
> {code}
> Exception:
> {code}
> 2017-08-01T13:27:47,483 ERROR [b0d354a8-4cdb-4ba9-acec-27d14926aaf4 main] 
> ql.Driver: FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.FileNotFoundException: File 
> file:/Users/stakiar/Documents/idea/apache-hive/itests/qtest-spark/target/tmp/scratchdir/stakiar/b0d354a8-4cdb-4ba9-acec-27d14926aaf4/hive_2017-08-01_13-27-45_553_1088589686371686526-1/-mr-10004/3/5
>  does not exist
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:408)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:498)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
>   at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82)
>   at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:82)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
>   at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82)
>   at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:82)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 

[jira] [Commented] (HIVE-17208) Repl dump should pass in db/table information to authorization API

2017-08-02 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111523#comment-16111523
 ] 

Thejas M Nair commented on HIVE-17208:
--

+1

> Repl dump should pass in db/table information to authorization API
> --
>
> Key: HIVE-17208
> URL: https://issues.apache.org/jira/browse/HIVE-17208
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-17208.1.patch, HIVE-17208.2.patch, 
> HIVE-17208.3.patch
>
>
> "repl dump" does not provide db/table information. That is necessary for 
> authorization replication in ranger.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17229) HiveMetastore HMSHandler locks during initialization, even though its static variable threadPool is not null

2017-08-02 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17229:
---

Assignee: Zac Zhou  (was: Mithun Radhakrishnan)

> HiveMetastore HMSHandler locks during initialization, even though its static 
> variable threadPool is not null
> 
>
> Key: HIVE-17229
> URL: https://issues.apache.org/jira/browse/HIVE-17229
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Zac Zhou
>Assignee: Zac Zhou
> Attachments: HIVE-17229.2.patch, HIVE-17229.patch
>
>
> A thread pool is used to accelerate adding partitions operation, since 
> [HIVE-13901|https://issues.apache.org/jira/browse/HIVE-13901]. 
> However, HMSHandler needs a lock during initialization every time, even 
> though its static variable threadPool is not null



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17229) HiveMetastore HMSHandler locks during initialization, even though its static variable threadPool is not null

2017-08-02 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17229:

Status: Patch Available  (was: Open)

> HiveMetastore HMSHandler locks during initialization, even though its static 
> variable threadPool is not null
> 
>
> Key: HIVE-17229
> URL: https://issues.apache.org/jira/browse/HIVE-17229
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Zac Zhou
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17229.2.patch, HIVE-17229.patch
>
>
> A thread pool is used to accelerate adding partitions operation, since 
> [HIVE-13901|https://issues.apache.org/jira/browse/HIVE-13901]. 
> However, HMSHandler needs a lock during initialization every time, even 
> though its static variable threadPool is not null



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17229) HiveMetastore HMSHandler locks during initialization, even though its static variable threadPool is not null

2017-08-02 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17229:

Status: Open  (was: Patch Available)

> HiveMetastore HMSHandler locks during initialization, even though its static 
> variable threadPool is not null
> 
>
> Key: HIVE-17229
> URL: https://issues.apache.org/jira/browse/HIVE-17229
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Zac Zhou
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17229.2.patch, HIVE-17229.patch
>
>
> A thread pool is used to accelerate adding partitions operation, since 
> [HIVE-13901|https://issues.apache.org/jira/browse/HIVE-13901]. 
> However, HMSHandler needs a lock during initialization every time, even 
> though its static variable threadPool is not null



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17229) HiveMetastore HMSHandler locks during initialization, even though its static variable threadPool is not null

2017-08-02 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17229:

Attachment: HIVE-17229.2.patch

Attempting to fix the patch, to have the tests run.

> HiveMetastore HMSHandler locks during initialization, even though its static 
> variable threadPool is not null
> 
>
> Key: HIVE-17229
> URL: https://issues.apache.org/jira/browse/HIVE-17229
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Zac Zhou
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17229.2.patch, HIVE-17229.patch
>
>
> A thread pool is used to accelerate adding partitions operation, since 
> [HIVE-13901|https://issues.apache.org/jira/browse/HIVE-13901]. 
> However, HMSHandler needs a lock during initialization every time, even 
> though its static variable threadPool is not null



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17229) HiveMetastore HMSHandler locks during initialization, even though its static variable threadPool is not null

2017-08-02 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17229:
---

Assignee: Mithun Radhakrishnan  (was: Zac Zhou)

> HiveMetastore HMSHandler locks during initialization, even though its static 
> variable threadPool is not null
> 
>
> Key: HIVE-17229
> URL: https://issues.apache.org/jira/browse/HIVE-17229
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Zac Zhou
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17229.patch
>
>
> A thread pool is used to accelerate adding partitions operation, since 
> [HIVE-13901|https://issues.apache.org/jira/browse/HIVE-13901]. 
> However, HMSHandler needs a lock during initialization every time, even 
> though its static variable threadPool is not null



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17208) Repl dump should pass in db/table information to authorization API

2017-08-02 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-17208:
--
Attachment: HIVE-17208.3.patch

repl_dump_requires_admin/repl_load_requires_admin test failures are related. 
Attach a new patch.

> Repl dump should pass in db/table information to authorization API
> --
>
> Key: HIVE-17208
> URL: https://issues.apache.org/jira/browse/HIVE-17208
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-17208.1.patch, HIVE-17208.2.patch, 
> HIVE-17208.3.patch
>
>
> "repl dump" does not provide db/table information. That is necessary for 
> authorization replication in ranger.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration

2017-08-02 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-17160:
--
Attachment: HIVE-17160.2.patch

> Adding kerberos Authorization to the Druid hive integration
> ---
>
> Key: HIVE-17160
> URL: https://issues.apache.org/jira/browse/HIVE-17160
> Project: Hive
>  Issue Type: New Feature
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17160.2.patch, HIVE-17160.patch
>
>
> This goal of this feature is to allow hive querying a secured druid cluster 
> using kerberos credentials.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().

2017-08-02 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17188:

   Resolution: Fixed
Fix Version/s: 2.2.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Committed to {{master}}, {{branch-2}}, and {{branch-2.2}}, as advised by 
[~owen.omalley].

Thank you, [~cdrome], for the patch. Thanks, [~vihangk1], for the review.

> ObjectStore runs out of memory for large batches of addPartitions().
> 
>
> Key: HIVE-17188
> URL: https://issues.apache.org/jira/browse/HIVE-17188
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HIVE-17188.1.patch
>
>
> For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} 
> runs out of memory. Flushing the {{PersistenceManager}} alleviates the 
> problem.
> Note: The problem being addressed here isn't so much with the size of the 
> hundreds of Partition objects, but the cruft that builds with the 
> PersistenceManager, in the JDO layer, as confirmed through memory-profiling.
> (Raising this on behalf of [~cdrome] and [~thiruvel].)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17228) Bump tez version to 0.9.0

2017-08-02 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111430#comment-16111430
 ] 

Gunther Hagleitner commented on HIVE-17228:
---

+1

> Bump tez version to 0.9.0
> -
>
> Key: HIVE-17228
> URL: https://issues.apache.org/jira/browse/HIVE-17228
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-17228.1.patch, HIVE-17228.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16820) TezTask may not shut down correctly before submit

2017-08-02 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111429#comment-16111429
 ] 

Sergey Shelukhin commented on HIVE-16820:
-

MergeFileTask doesn't appear to implement shutdown at all. So, execute is safe 
from interference :)

> TezTask may not shut down correctly before submit
> -
>
> Key: HIVE-16820
> URL: https://issues.apache.org/jira/browse/HIVE-16820
> Project: Hive
>  Issue Type: Bug
>Reporter: Visakh Nair
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-16820.01.patch, HIVE-16820.patch
>
>
> The query will run and only fail at the very end when the driver checks its 
> own shutdown flag.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17228) Bump tez version to 0.9.0

2017-08-02 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111428#comment-16111428
 ] 

Zhiyuan Yang commented on HIVE-17228:
-

Test failures are unrelated. Please help review, [~hagleitn], [~sseth]

> Bump tez version to 0.9.0
> -
>
> Key: HIVE-17228
> URL: https://issues.apache.org/jira/browse/HIVE-17228
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-17228.1.patch, HIVE-17228.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >