[jira] [Commented] (HIVE-17220) Bloomfilter probing in semijoin reduction is thrashing L1 dcache
[ https://issues.apache.org/jira/browse/HIVE-17220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112232#comment-16112232 ] Gopal V commented on HIVE-17220: LGTM - +1 tests pending. > Bloomfilter probing in semijoin reduction is thrashing L1 dcache > > > Key: HIVE-17220 > URL: https://issues.apache.org/jira/browse/HIVE-17220 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17220.1.patch, HIVE-17220.2.patch, > HIVE-17220.3.patch, HIVE-17220.WIP.patch > > > [~gopalv] observed perf profiles showing bloomfilter probes as bottleneck for > some of the TPC-DS queries and resulted L1 data cache thrashing. > This is because of the huge bitset in bloom filter that doesn't fit in any > levels of cache, also the hash bits corresponding to a single key map to > different segments of bitset which are spread out. This can result in K-1 > memory access (K being number of hash functions) in worst case for every key > that gets probed because of locality miss in L1 cache. > Ran a JMH microbenchmark to verify the same. Following is the JMH perf > profile for bloom filter probing > {code} > Perf stats: > -- >5101.935637 task-clock (msec) #0.461 CPUs utilized >346 context-switches #0.068 K/sec >336 cpu-migrations#0.066 K/sec > 6,207 page-faults #0.001 M/sec > 10,016,486,301 cycles#1.963 GHz > (26.90%) > 5,751,692,176 stalled-cycles-frontend # 57.42% frontend cycles > idle (27.05%) > stalled-cycles-backend > 14,359,914,397 instructions #1.43 insns per cycle > #0.40 stalled cycles > per insn (33.78%) > 2,200,632,861 branches # 431.333 M/sec > (33.84%) > 1,162,860 branch-misses #0.05% of all branches > (33.97%) > 1,025,992,254 L1-dcache-loads # 201.099 M/sec > (26.56%) >432,663,098 L1-dcache-load-misses # 42.17% of all L1-dcache > hits(14.49%) >331,383,297 LLC-loads # 64.952 M/sec > (14.47%) >203,524 LLC-load-misses #0.06% of all LL-cache > hits (21.67%) > L1-icache-loads > 1,633,821 L1-icache-load-misses #0.320 M/sec > (28.85%) >950,368,796 dTLB-loads# 186.276 M/sec > (28.61%) >246,813,393 dTLB-load-misses # 25.97% of all dTLB > cache hits (14.53%) > 25,451 iTLB-loads#0.005 M/sec > (14.48%) > 35,415 iTLB-load-misses # 139.15% of all iTLB > cache hits (21.73%) > L1-dcache-prefetches >175,958 L1-dcache-prefetch-misses #0.034 M/sec > (28.94%) > 11.064783140 seconds time elapsed > {code} > This shows 42.17% of L1 data cache misses. > This jira is to use cache efficient bloom filter for semijoin probing. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16811) Estimate statistics in absence of stats
[ https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112228#comment-16112228 ] Hive QA commented on HIVE-16811: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12880138/HIVE-16811.4.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 80 failed/errored test(s), 11139 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_filter] (batchId=8) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_groupby] (batchId=47) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_select] (batchId=59) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_table] (batchId=20) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_annotate_stats_groupby] (batchId=81) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[columnStatsUpdateForStatsOptimizer_2] (batchId=28) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_uncompressed] (batchId=56) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_47] (batchId=28) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[udaf_collect_set_2] (batchId=158) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_use_op_stats] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=100) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] (batchId=99) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[hybridgrace_hashjoin_1] (batchId=99) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[multi_count_distinct] (batchId=99) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[tez-tag] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query11] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query15] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query17] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query18] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query19] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query21] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query24] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query25] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query29] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query30] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query31] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query32] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query34] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query35] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query37] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query40] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query44] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query45] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query46] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query47] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query48] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query4] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query50] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query53] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query54] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query57] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query58] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query61] (batchId=236)
[jira] [Updated] (HIVE-16998) Add config to enable HoS DPP only for map-joins
[ https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-16998: Fix Version/s: 3.0.0 > Add config to enable HoS DPP only for map-joins > --- > > Key: HIVE-16998 > URL: https://issues.apache.org/jira/browse/HIVE-16998 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Spark >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Labels: TODOC3.0 > Fix For: 3.0.0 > > Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, > HIVE16998.4.patch, HIVE16998.5.patch > > > HoS DPP will split a given operator tree in two under the following > conditions: it has detected that the query can benefit from DPP, and the > filter is not a map-join (see SplitOpTreeForDPP). > This can hurt performance if the the non-partitioned side of the join > involves a complex operator tree - e.g. the query {{select count(*) from > srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all > select min(srcpart.ds) from srcpart)}} will require running the subquery > twice, once in each Spark job. > Queries with map-joins don't get split into two operator trees and thus don't > suffer from this drawback. Thus, it would be nice to have a config key that > just enables DPP on HoS for map-joins. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17115) MetaStoreUtils.getDeserializer doesn't catch the java.lang.ClassNotFoundException
[ https://issues.apache.org/jira/browse/HIVE-17115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112216#comment-16112216 ] Erik.fang commented on HIVE-17115: -- ok, I will upload a test soon > MetaStoreUtils.getDeserializer doesn't catch the > java.lang.ClassNotFoundException > - > > Key: HIVE-17115 > URL: https://issues.apache.org/jira/browse/HIVE-17115 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.2.1 >Reporter: Erik.fang >Assignee: Erik.fang > Attachments: HIVE-17115.1.patch, HIVE-17115.patch > > > Suppose we create a table with Custom SerDe, then call > HiveMetaStoreClient.getSchema(String db, String tableName) to extract the > metadata from HiveMetaStore Service > the thrift client hangs there with exception in HiveMetaStore Service's log, > such as > {code:java} > Exception in thread "pool-5-thread-129" java.lang.NoClassDefFoundError: > org/apache/hadoop/hbase/util/Bytes > at > org.apache.hadoop.hive.hbase.HBaseSerDe.parseColumnsMapping(HBaseSerDe.java:184) > at > org.apache.hadoop.hive.hbase.HBaseSerDeParameters.(HBaseSerDeParameters.java:73) > at > org.apache.hadoop.hive.hbase.HBaseSerDe.initialize(HBaseSerDe.java:117) > at > org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:53) > at > org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:521) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:401) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_fields_with_environment_context(HiveMetaStore.java:3556) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_schema_with_environment_context(HiveMetaStore.java:3636) > at sun.reflect.GeneratedMethodAccessor104.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) > at com.sun.proxy.$Proxy4.get_schema_with_environment_context(Unknown > Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema_with_environment_context.getResult(ThriftHiveMetastore.java:9146) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema_with_environment_context.getResult(ThriftHiveMetastore.java:9130) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:551) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:546) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:546) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.util.Bytes > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17115) MetaStoreUtils.getDeserializer doesn't catch the java.lang.ClassNotFoundException
[ https://issues.apache.org/jira/browse/HIVE-17115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112213#comment-16112213 ] Daniel Dai commented on HIVE-17115: --- Sorry for delay. I am fine with the change. It is much better for metastore catch throwable, send back to client, than silently eat the exception in metastore. cc [~thejas]. For test, you can inherit an existing SerDe (eg, RegexSerDe), and manually throw an NoClassDefFoundError in initialize. > MetaStoreUtils.getDeserializer doesn't catch the > java.lang.ClassNotFoundException > - > > Key: HIVE-17115 > URL: https://issues.apache.org/jira/browse/HIVE-17115 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.2.1 >Reporter: Erik.fang >Assignee: Erik.fang > Attachments: HIVE-17115.1.patch, HIVE-17115.patch > > > Suppose we create a table with Custom SerDe, then call > HiveMetaStoreClient.getSchema(String db, String tableName) to extract the > metadata from HiveMetaStore Service > the thrift client hangs there with exception in HiveMetaStore Service's log, > such as > {code:java} > Exception in thread "pool-5-thread-129" java.lang.NoClassDefFoundError: > org/apache/hadoop/hbase/util/Bytes > at > org.apache.hadoop.hive.hbase.HBaseSerDe.parseColumnsMapping(HBaseSerDe.java:184) > at > org.apache.hadoop.hive.hbase.HBaseSerDeParameters.(HBaseSerDeParameters.java:73) > at > org.apache.hadoop.hive.hbase.HBaseSerDe.initialize(HBaseSerDe.java:117) > at > org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:53) > at > org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:521) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:401) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_fields_with_environment_context(HiveMetaStore.java:3556) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_schema_with_environment_context(HiveMetaStore.java:3636) > at sun.reflect.GeneratedMethodAccessor104.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) > at com.sun.proxy.$Proxy4.get_schema_with_environment_context(Unknown > Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema_with_environment_context.getResult(ThriftHiveMetastore.java:9146) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema_with_environment_context.getResult(ThriftHiveMetastore.java:9130) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:551) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:546) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:546) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.util.Bytes > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins
[ https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112210#comment-16112210 ] Lefty Leverenz commented on HIVE-16998: --- Doc note: This adds *hive.spark.dynamic.partition.pruning.map.join.only* to HiveConf.java, so it needs to be documented in the wiki. * [Configuration Properties -- Spark | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Spark] Added a TODOC3.0 label. > Add config to enable HoS DPP only for map-joins > --- > > Key: HIVE-16998 > URL: https://issues.apache.org/jira/browse/HIVE-16998 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Spark >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Labels: TODOC3.0 > Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, > HIVE16998.4.patch, HIVE16998.5.patch > > > HoS DPP will split a given operator tree in two under the following > conditions: it has detected that the query can benefit from DPP, and the > filter is not a map-join (see SplitOpTreeForDPP). > This can hurt performance if the the non-partitioned side of the join > involves a complex operator tree - e.g. the query {{select count(*) from > srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all > select min(srcpart.ds) from srcpart)}} will require running the subquery > twice, once in each Spark job. > Queries with map-joins don't get split into two operator trees and thus don't > suffer from this drawback. Thus, it would be nice to have a config key that > just enables DPP on HoS for map-joins. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HIVE-16714) make Task Dependency on Repl Load more intuitive
[ https://issues.apache.org/jira/browse/HIVE-16714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek resolved HIVE-16714. Resolution: Not A Problem > make Task Dependency on Repl Load more intuitive > > > Key: HIVE-16714 > URL: https://issues.apache.org/jira/browse/HIVE-16714 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > *Primary warehouse* > Create table a (name string, id int); > Create table b as select name, id from a; > Repl dump default; > *Replica warehouse* > Repl load replica from ‘[location]’; > *Query Plan Generated* > DDL0 => Copy a => move a > DDL0 => DDL Create a => move a > DDL0 => Copy b => move b > DDL0 => DDL Create b => move b > *Move to Query Plan :* > DDL0 => Copy a => move a => DDL Create a > DDL0 => Copy b => move b => DDL Create b -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins
[ https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112207#comment-16112207 ] Lefty Leverenz commented on HIVE-16998: --- [~stakiar], please set the fix version for this jira to 3.0.0. Thanks. > Add config to enable HoS DPP only for map-joins > --- > > Key: HIVE-16998 > URL: https://issues.apache.org/jira/browse/HIVE-16998 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Spark >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Labels: TODOC3.0 > Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, > HIVE16998.4.patch, HIVE16998.5.patch > > > HoS DPP will split a given operator tree in two under the following > conditions: it has detected that the query can benefit from DPP, and the > filter is not a map-join (see SplitOpTreeForDPP). > This can hurt performance if the the non-partitioned side of the join > involves a complex operator tree - e.g. the query {{select count(*) from > srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all > select min(srcpart.ds) from srcpart)}} will require running the subquery > twice, once in each Spark job. > Queries with map-joins don't get split into two operator trees and thus don't > suffer from this drawback. Thus, it would be nice to have a config key that > just enables DPP on HoS for map-joins. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16714) make Task Dependency on Repl Load more intuitive
[ https://issues.apache.org/jira/browse/HIVE-16714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112208#comment-16112208 ] anishek commented on HIVE-16714: not required since major rework as part of HIVE-16896 corrected this. > make Task Dependency on Repl Load more intuitive > > > Key: HIVE-16714 > URL: https://issues.apache.org/jira/browse/HIVE-16714 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > *Primary warehouse* > Create table a (name string, id int); > Create table b as select name, id from a; > Repl dump default; > *Replica warehouse* > Repl load replica from ‘[location]’; > *Query Plan Generated* > DDL0 => Copy a => move a > DDL0 => DDL Create a => move a > DDL0 => Copy b => move b > DDL0 => DDL Create b => move b > *Move to Query Plan :* > DDL0 => Copy a => move a => DDL Create a > DDL0 => Copy b => move b => DDL Create b -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16998) Add config to enable HoS DPP only for map-joins
[ https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-16998: -- Labels: TODOC3.0 (was: ) > Add config to enable HoS DPP only for map-joins > --- > > Key: HIVE-16998 > URL: https://issues.apache.org/jira/browse/HIVE-16998 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Spark >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Labels: TODOC3.0 > Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, > HIVE16998.4.patch, HIVE16998.5.patch > > > HoS DPP will split a given operator tree in two under the following > conditions: it has detected that the query can benefit from DPP, and the > filter is not a map-join (see SplitOpTreeForDPP). > This can hurt performance if the the non-partitioned side of the join > involves a complex operator tree - e.g. the query {{select count(*) from > srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all > select min(srcpart.ds) from srcpart)}} will require running the subquery > twice, once in each Spark job. > Queries with map-joins don't get split into two operator trees and thus don't > suffer from this drawback. Thus, it would be nice to have a config key that > just enables DPP on HoS for map-joins. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17072) Make the parallelized timeout configurable in BeeLine tests
[ https://issues.apache.org/jira/browse/HIVE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112196#comment-16112196 ] Lefty Leverenz commented on HIVE-17072: --- The doc looks good, thanks Marta. I added version information and a link to this jira. Removed the TODOC3.0 label. > Make the parallelized timeout configurable in BeeLine tests > --- > > Key: HIVE-17072 > URL: https://issues.apache.org/jira/browse/HIVE-17072 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-17072.1.patch, HIVE-17072.2.patch > > > When running the BeeLine tests parallel, the timeout is hardcoded in the > Parallelized.java: > {noformat} > @Override > public void finished() { > executor.shutdown(); > try { > executor.awaitTermination(10, TimeUnit.MINUTES); > } catch (InterruptedException exc) { > throw new RuntimeException(exc); > } > } > {noformat} > It would be better to make it configurable. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17072) Make the parallelized timeout configurable in BeeLine tests
[ https://issues.apache.org/jira/browse/HIVE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-17072: -- Labels: (was: TODOC3.0) > Make the parallelized timeout configurable in BeeLine tests > --- > > Key: HIVE-17072 > URL: https://issues.apache.org/jira/browse/HIVE-17072 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-17072.1.patch, HIVE-17072.2.patch > > > When running the BeeLine tests parallel, the timeout is hardcoded in the > Parallelized.java: > {noformat} > @Override > public void finished() { > executor.shutdown(); > try { > executor.awaitTermination(10, TimeUnit.MINUTES); > } catch (InterruptedException exc) { > throw new RuntimeException(exc); > } > } > {noformat} > It would be better to make it configurable. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-12369) Native Vector GroupBy
[ https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112186#comment-16112186 ] Matt McCline edited comment on HIVE-12369 at 8/3/17 5:01 AM: - Yes, I think you should continue reviewing. The path that is implemented is One Long Key and groupByMode == HASH. There are UNDONEs for *subsequent* JIRAs that later adds Aggregation of non-Long data types, Fixed Length Keys / Variable Length Keys, and the other groupByModes. And later adds Grouping Sets, Empty Aggregation (i.e. GroupBy on key that has no aggregations that does duplicate key elimination), too. was (Author: mmccline): Yes, I think you continue reviewing. The path that is implemented is One Long Key and groupByMode == HASH. There are UNDONEs for *subsequent* JIRAs that later adds Aggregation of non-Long data types, Fixed Length Keys / Variable Length Keys, and the other groupByModes. And later adds Grouping Sets, Empty Aggregation (i.e. GroupBy on key that has no aggregations that does duplicate key elimination), too. > Native Vector GroupBy > - > > Key: HIVE-12369 > URL: https://issues.apache.org/jira/browse/HIVE-12369 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch, > HIVE-12369.05.patch, HIVE-12369.06.patch > > > Implement Native Vector GroupBy using fast hash table technology developed > for Native Vector MapJoin, etc. > Patch is currently limited to a single Long key, aggregation on Long columns, > no more than 31 columns. > 3 new classes introduces that stored the count in the slot table and don't > allocate hash elements: > {noformat} > COUNT(column) VectorGroupByHashOneLongKeyCountColumnOperator > COUNT(key) VectorGroupByHashOneLongKeyCountKeyOperator > COUNT(*) VectorGroupByHashOneLongKeyCountStarOperator > {noformat} > And a new class that aggregates a single Long key: > {noformat} > VectorGroupByHashOneLongKeyOperator > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-12369) Native Vector GroupBy
[ https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112186#comment-16112186 ] Matt McCline commented on HIVE-12369: - Yes, I think you continue reviewing. The path that is implemented is One Long Key and groupByMode == HASH. There are UNDONEs for *subsequent* JIRAs that later adds Aggregation of non-Long data types, Fixed Length Keys / Variable Length Keys, and the other groupByModes. And later adds Grouping Sets, Empty Aggregation (i.e. GroupBy on key that has no aggregations that does duplicate key elimination), too. > Native Vector GroupBy > - > > Key: HIVE-12369 > URL: https://issues.apache.org/jira/browse/HIVE-12369 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch, > HIVE-12369.05.patch, HIVE-12369.06.patch > > > Implement Native Vector GroupBy using fast hash table technology developed > for Native Vector MapJoin, etc. > Patch is currently limited to a single Long key, aggregation on Long columns, > no more than 31 columns. > 3 new classes introduces that stored the count in the slot table and don't > allocate hash elements: > {noformat} > COUNT(column) VectorGroupByHashOneLongKeyCountColumnOperator > COUNT(key) VectorGroupByHashOneLongKeyCountKeyOperator > COUNT(*) VectorGroupByHashOneLongKeyCountStarOperator > {noformat} > And a new class that aggregates a single Long key: > {noformat} > VectorGroupByHashOneLongKeyOperator > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17234) Remove HBase metastore from master
[ https://issues.apache.org/jira/browse/HIVE-17234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112185#comment-16112185 ] Hive QA commented on HIVE-17234: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12880133/HIVE-17234.patch {color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10961 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_uncompressed] (batchId=56) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=234) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) org.apache.hive.hcatalog.pig.TestRCFileHCatStorer.testWriteTimestamp (batchId=182) org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteSmallint (batchId=182) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6240/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6240/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6240/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12880133 - PreCommit-HIVE-Build > Remove HBase metastore from master > -- > > Key: HIVE-17234 > URL: https://issues.apache.org/jira/browse/HIVE-17234 > Project: Hive > Issue Type: Task > Components: HBase Metastore >Affects Versions: 3.0.0 >Reporter: Alan Gates >Assignee: Alan Gates > Attachments: HIVE-17234.patch > > > No new development has been done on the HBase metastore in at least a year, > and to my knowledge no one is using it (nor is it even in a state to be fully > usable). Given the lack of interest in continuing to develop it, we should > remove it rather than leave dead code hanging around and extra tests taking > up time in test runs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17171) Remove old javadoc versions
[ https://issues.apache.org/jira/browse/HIVE-17171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112176#comment-16112176 ] Lefty Leverenz commented on HIVE-17171: --- Owen, three links to archived javadocs are broken: * [Hive 2.0.1 Javadocs | https://svn.apache.org/repos/infra/websites/production/hive/content/javadocs/r2.0.1/api/index.html?p=1015623] * [Hive 1.1.1 Javadocs | https://svn.apache.org/repos/infra/websites/production/hive/content/javadocs/r1.1.1/api/index.html?p=1015623] * [Hive 1.0.1 Javadocs | https://svn.apache.org/repos/infra/websites/production/hive/content/javadocs/r1.0.1/api/index.html?p=1015623] But the link to 0.13.1 is okay: * [Hive 0.13.1 Javadocs | https://svn.apache.org/repos/infra/websites/production/hive/content/javadocs/r0.13.1/api/index.html?p=1015623] Also, the Nexus link opens the hive-storage-api artifact, and finding the hive artifact isn't easy for a newbie. Glitches aside, the changes look good. Thanks. > Remove old javadoc versions > --- > > Key: HIVE-17171 > URL: https://issues.apache.org/jira/browse/HIVE-17171 > Project: Hive > Issue Type: Improvement >Reporter: Owen O'Malley >Assignee: Owen O'Malley > > We currently have a lot of old javadoc versions. I'd propose that we keep the > following versions: > * r1.2.2 > * r2.1.1 > * r2.2.0 > (Note that 2.3.0 was not checked in to the site.) In particular, I'd suggest > we remove: > * hcat-r0.5.0 > * r0.10.0 > * r0.11.0 > * r0.12.0 > * r0.13.1 > * r1.0.1 > * r1.1.1 > * r2.0.1 > Any concerns? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15794) Support get hdfsEncryptionShim if FileSystem is ViewFileSystem
[ https://issues.apache.org/jira/browse/HIVE-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated HIVE-15794: --- Status: Patch Available (was: In Progress) > Support get hdfsEncryptionShim if FileSystem is ViewFileSystem > -- > > Key: HIVE-15794 > URL: https://issues.apache.org/jira/browse/HIVE-15794 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 2.2.0, 1.1.0, 1.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang > Attachments: HIVE-15794.1.patch, HIVE-15794.2.patch > > > *SQL*: > {code:sql} > hive> create table table2 as select * from table1; > hive> show create table table2; > OK > CREATE TABLE `table2`( > `id` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION > 'viewfs://cluster4/user/hive/warehouse/table2' > TBLPROPERTIES ( > 'transient_lastDdlTime'='1486050317') > {code} > *LOG*: > {noformat} > 2017-02-02T20:12:49,738 INFO [99374b82-e9ca-4654-b803-93b194b9331b main] > session.SessionState: Could not get hdfsEncryptionShim, it is only applicable > to hdfs filesystem. > 2017-02-02T20:12:49,738 INFO [99374b82-e9ca-4654-b803-93b194b9331b main] > session.SessionState: Could not get hdfsEncryptionShim, it is only applicable > to hdfs filesystem. > {noformat} > Can’t get hdfsEncryptionShim if {{FileSystem}} is > [ViewFileSystem|http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-hdfs/ViewFs.html], > we should support it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15794) Support get hdfsEncryptionShim if FileSystem is ViewFileSystem
[ https://issues.apache.org/jira/browse/HIVE-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated HIVE-15794: --- Status: In Progress (was: Patch Available) > Support get hdfsEncryptionShim if FileSystem is ViewFileSystem > -- > > Key: HIVE-15794 > URL: https://issues.apache.org/jira/browse/HIVE-15794 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 2.2.0, 1.1.0, 1.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang > Attachments: HIVE-15794.1.patch, HIVE-15794.2.patch > > > *SQL*: > {code:sql} > hive> create table table2 as select * from table1; > hive> show create table table2; > OK > CREATE TABLE `table2`( > `id` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION > 'viewfs://cluster4/user/hive/warehouse/table2' > TBLPROPERTIES ( > 'transient_lastDdlTime'='1486050317') > {code} > *LOG*: > {noformat} > 2017-02-02T20:12:49,738 INFO [99374b82-e9ca-4654-b803-93b194b9331b main] > session.SessionState: Could not get hdfsEncryptionShim, it is only applicable > to hdfs filesystem. > 2017-02-02T20:12:49,738 INFO [99374b82-e9ca-4654-b803-93b194b9331b main] > session.SessionState: Could not get hdfsEncryptionShim, it is only applicable > to hdfs filesystem. > {noformat} > Can’t get hdfsEncryptionShim if {{FileSystem}} is > [ViewFileSystem|http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-hdfs/ViewFs.html], > we should support it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15705) Event replication for constraints
[ https://issues.apache.org/jira/browse/HIVE-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-15705: -- Attachment: HIVE-15705.5.patch > Event replication for constraints > - > > Key: HIVE-15705 > URL: https://issues.apache.org/jira/browse/HIVE-15705 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-15705.1.patch, HIVE-15705.2.patch, > HIVE-15705.3.patch, HIVE-15705.4.patch, HIVE-15705.5.patch > > > Make event replication for primary key and foreign key work. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15794) Support get hdfsEncryptionShim if FileSystem is ViewFileSystem
[ https://issues.apache.org/jira/browse/HIVE-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated HIVE-15794: --- Attachment: HIVE-15794.2.patch > Support get hdfsEncryptionShim if FileSystem is ViewFileSystem > -- > > Key: HIVE-15794 > URL: https://issues.apache.org/jira/browse/HIVE-15794 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 1.2.0, 1.1.0, 2.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang > Attachments: HIVE-15794.1.patch, HIVE-15794.2.patch > > > *SQL*: > {code:sql} > hive> create table table2 as select * from table1; > hive> show create table table2; > OK > CREATE TABLE `table2`( > `id` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION > 'viewfs://cluster4/user/hive/warehouse/table2' > TBLPROPERTIES ( > 'transient_lastDdlTime'='1486050317') > {code} > *LOG*: > {noformat} > 2017-02-02T20:12:49,738 INFO [99374b82-e9ca-4654-b803-93b194b9331b main] > session.SessionState: Could not get hdfsEncryptionShim, it is only applicable > to hdfs filesystem. > 2017-02-02T20:12:49,738 INFO [99374b82-e9ca-4654-b803-93b194b9331b main] > session.SessionState: Could not get hdfsEncryptionShim, it is only applicable > to hdfs filesystem. > {noformat} > Can’t get hdfsEncryptionShim if {{FileSystem}} is > [ViewFileSystem|http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-hdfs/ViewFs.html], > we should support it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15705) Event replication for constraints
[ https://issues.apache.org/jira/browse/HIVE-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112164#comment-16112164 ] Daniel Dai commented on HIVE-15705: --- Upload new patch to address most of the comments including the issue with general constraint implementation. There are something not addressed/unclear: 7, 8, 10: We plan to remove HBaseStore code, so I didn't include HBaseStore only issues 12. Chaned AddxxxHandler, but in DropConstraintHandler, we only have constraint name not constraint object in message. I still get db/table from msg in DropConstraintHandler. I checked those are valid during load. 15. I would leave table rename for constraint to another ticket. It is more involved and not appropriate to piggyback in this patch. > Event replication for constraints > - > > Key: HIVE-15705 > URL: https://issues.apache.org/jira/browse/HIVE-15705 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-15705.1.patch, HIVE-15705.2.patch, > HIVE-15705.3.patch, HIVE-15705.4.patch > > > Make event replication for primary key and foreign key work. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15794) Support get hdfsEncryptionShim if FileSystem is ViewFileSystem
[ https://issues.apache.org/jira/browse/HIVE-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated HIVE-15794: --- Attachment: (was: HIVE-15794.1.patch) > Support get hdfsEncryptionShim if FileSystem is ViewFileSystem > -- > > Key: HIVE-15794 > URL: https://issues.apache.org/jira/browse/HIVE-15794 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 1.2.0, 1.1.0, 2.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang > Attachments: HIVE-15794.1.patch > > > *SQL*: > {code:sql} > hive> create table table2 as select * from table1; > hive> show create table table2; > OK > CREATE TABLE `table2`( > `id` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION > 'viewfs://cluster4/user/hive/warehouse/table2' > TBLPROPERTIES ( > 'transient_lastDdlTime'='1486050317') > {code} > *LOG*: > {noformat} > 2017-02-02T20:12:49,738 INFO [99374b82-e9ca-4654-b803-93b194b9331b main] > session.SessionState: Could not get hdfsEncryptionShim, it is only applicable > to hdfs filesystem. > 2017-02-02T20:12:49,738 INFO [99374b82-e9ca-4654-b803-93b194b9331b main] > session.SessionState: Could not get hdfsEncryptionShim, it is only applicable > to hdfs filesystem. > {noformat} > Can’t get hdfsEncryptionShim if {{FileSystem}} is > [ViewFileSystem|http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-hdfs/ViewFs.html], > we should support it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15705) Event replication for constraints
[ https://issues.apache.org/jira/browse/HIVE-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112136#comment-16112136 ] ASF GitHub Bot commented on HIVE-15705: --- GitHub user daijyc opened a pull request: https://github.com/apache/hive/pull/219 HIVE-15705: Event replication for constraints You can merge this pull request into a Git repository by running: $ git pull https://github.com/daijyc/hive HIVE-15705 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/219.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #219 commit ee3bff683f35b0ecbc1e721faa511b6cac7b8189 Author: Daniel DaiDate: 2017-08-03T03:47:01Z HIVE-15705: Event replication for constraints > Event replication for constraints > - > > Key: HIVE-15705 > URL: https://issues.apache.org/jira/browse/HIVE-15705 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-15705.1.patch, HIVE-15705.2.patch, > HIVE-15705.3.patch, HIVE-15705.4.patch > > > Make event replication for primary key and foreign key work. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15794) Support get hdfsEncryptionShim if FileSystem is ViewFileSystem
[ https://issues.apache.org/jira/browse/HIVE-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated HIVE-15794: --- Attachment: HIVE-15794.1.patch > Support get hdfsEncryptionShim if FileSystem is ViewFileSystem > -- > > Key: HIVE-15794 > URL: https://issues.apache.org/jira/browse/HIVE-15794 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 1.2.0, 1.1.0, 2.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang > Attachments: HIVE-15794.1.patch, HIVE-15794.1.patch > > > *SQL*: > {code:sql} > hive> create table table2 as select * from table1; > hive> show create table table2; > OK > CREATE TABLE `table2`( > `id` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION > 'viewfs://cluster4/user/hive/warehouse/table2' > TBLPROPERTIES ( > 'transient_lastDdlTime'='1486050317') > {code} > *LOG*: > {noformat} > 2017-02-02T20:12:49,738 INFO [99374b82-e9ca-4654-b803-93b194b9331b main] > session.SessionState: Could not get hdfsEncryptionShim, it is only applicable > to hdfs filesystem. > 2017-02-02T20:12:49,738 INFO [99374b82-e9ca-4654-b803-93b194b9331b main] > session.SessionState: Could not get hdfsEncryptionShim, it is only applicable > to hdfs filesystem. > {noformat} > Can’t get hdfsEncryptionShim if {{FileSystem}} is > [ViewFileSystem|http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-hdfs/ViewFs.html], > we should support it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15794) Support get hdfsEncryptionShim if FileSystem is ViewFileSystem
[ https://issues.apache.org/jira/browse/HIVE-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated HIVE-15794: --- Status: Patch Available (was: Open) > Support get hdfsEncryptionShim if FileSystem is ViewFileSystem > -- > > Key: HIVE-15794 > URL: https://issues.apache.org/jira/browse/HIVE-15794 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 2.2.0, 1.1.0, 1.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang > Attachments: HIVE-15794.1.patch, HIVE-15794.1.patch > > > *SQL*: > {code:sql} > hive> create table table2 as select * from table1; > hive> show create table table2; > OK > CREATE TABLE `table2`( > `id` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION > 'viewfs://cluster4/user/hive/warehouse/table2' > TBLPROPERTIES ( > 'transient_lastDdlTime'='1486050317') > {code} > *LOG*: > {noformat} > 2017-02-02T20:12:49,738 INFO [99374b82-e9ca-4654-b803-93b194b9331b main] > session.SessionState: Could not get hdfsEncryptionShim, it is only applicable > to hdfs filesystem. > 2017-02-02T20:12:49,738 INFO [99374b82-e9ca-4654-b803-93b194b9331b main] > session.SessionState: Could not get hdfsEncryptionShim, it is only applicable > to hdfs filesystem. > {noformat} > Can’t get hdfsEncryptionShim if {{FileSystem}} is > [ViewFileSystem|http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-hdfs/ViewFs.html], > we should support it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15794) Support get hdfsEncryptionShim if FileSystem is ViewFileSystem
[ https://issues.apache.org/jira/browse/HIVE-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated HIVE-15794: --- Status: Open (was: Patch Available) > Support get hdfsEncryptionShim if FileSystem is ViewFileSystem > -- > > Key: HIVE-15794 > URL: https://issues.apache.org/jira/browse/HIVE-15794 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 2.2.0, 1.1.0, 1.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang > Attachments: HIVE-15794.1.patch > > > *SQL*: > {code:sql} > hive> create table table2 as select * from table1; > hive> show create table table2; > OK > CREATE TABLE `table2`( > `id` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION > 'viewfs://cluster4/user/hive/warehouse/table2' > TBLPROPERTIES ( > 'transient_lastDdlTime'='1486050317') > {code} > *LOG*: > {noformat} > 2017-02-02T20:12:49,738 INFO [99374b82-e9ca-4654-b803-93b194b9331b main] > session.SessionState: Could not get hdfsEncryptionShim, it is only applicable > to hdfs filesystem. > 2017-02-02T20:12:49,738 INFO [99374b82-e9ca-4654-b803-93b194b9331b main] > session.SessionState: Could not get hdfsEncryptionShim, it is only applicable > to hdfs filesystem. > {noformat} > Can’t get hdfsEncryptionShim if {{FileSystem}} is > [ViewFileSystem|http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-hdfs/ViewFs.html], > we should support it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17213) HoS: file merging doesn't work for union all
[ https://issues.apache.org/jira/browse/HIVE-17213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112135#comment-16112135 ] Hive QA commented on HIVE-17213: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12880115/HIVE-17213.5.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 11127 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_uncompressed] (batchId=56) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_union_merge] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=178) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=178) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=178) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testConnection (batchId=241) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testIsValidNeg (batchId=241) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testNegativeTokenAuth (batchId=241) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testProxyAuth (batchId=241) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testTokenAuth (batchId=241) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6239/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6239/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6239/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12880115 - PreCommit-HIVE-Build > HoS: file merging doesn't work for union all > > > Key: HIVE-17213 > URL: https://issues.apache.org/jira/browse/HIVE-17213 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Chao Sun >Assignee: Chao Sun > Attachments: HIVE-17213.0.patch, HIVE-17213.1.patch, > HIVE-17213.2.patch, HIVE-17213.3.patch, HIVE-17213.4.patch, HIVE-17213.5.patch > > > HoS file merging doesn't work properly since it doesn't set linked file sinks > properly which is used to generate move tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15794) Support get hdfsEncryptionShim if FileSystem is ViewFileSystem
[ https://issues.apache.org/jira/browse/HIVE-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112131#comment-16112131 ] Lefty Leverenz commented on HIVE-15794: --- [~q79969786], to retest a patch you have to resubmit it -- either use the Cancel Patch button at the top of the page and add it again or else rename the patch "HIVE-15794.2.patch" and submit it as if it were a new patch. Then testing will be run automatically. > Support get hdfsEncryptionShim if FileSystem is ViewFileSystem > -- > > Key: HIVE-15794 > URL: https://issues.apache.org/jira/browse/HIVE-15794 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 1.2.0, 1.1.0, 2.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang > Attachments: HIVE-15794.1.patch > > > *SQL*: > {code:sql} > hive> create table table2 as select * from table1; > hive> show create table table2; > OK > CREATE TABLE `table2`( > `id` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION > 'viewfs://cluster4/user/hive/warehouse/table2' > TBLPROPERTIES ( > 'transient_lastDdlTime'='1486050317') > {code} > *LOG*: > {noformat} > 2017-02-02T20:12:49,738 INFO [99374b82-e9ca-4654-b803-93b194b9331b main] > session.SessionState: Could not get hdfsEncryptionShim, it is only applicable > to hdfs filesystem. > 2017-02-02T20:12:49,738 INFO [99374b82-e9ca-4654-b803-93b194b9331b main] > session.SessionState: Could not get hdfsEncryptionShim, it is only applicable > to hdfs filesystem. > {noformat} > Can’t get hdfsEncryptionShim if {{FileSystem}} is > [ViewFileSystem|http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-hdfs/ViewFs.html], > we should support it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17220) Bloomfilter probing in semijoin reduction is thrashing L1 dcache
[ https://issues.apache.org/jira/browse/HIVE-17220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-17220: - Attachment: HIVE-17220.3.patch Addressed [~gopalv]'s review comments. Also fixed test failures. > Bloomfilter probing in semijoin reduction is thrashing L1 dcache > > > Key: HIVE-17220 > URL: https://issues.apache.org/jira/browse/HIVE-17220 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17220.1.patch, HIVE-17220.2.patch, > HIVE-17220.3.patch, HIVE-17220.WIP.patch > > > [~gopalv] observed perf profiles showing bloomfilter probes as bottleneck for > some of the TPC-DS queries and resulted L1 data cache thrashing. > This is because of the huge bitset in bloom filter that doesn't fit in any > levels of cache, also the hash bits corresponding to a single key map to > different segments of bitset which are spread out. This can result in K-1 > memory access (K being number of hash functions) in worst case for every key > that gets probed because of locality miss in L1 cache. > Ran a JMH microbenchmark to verify the same. Following is the JMH perf > profile for bloom filter probing > {code} > Perf stats: > -- >5101.935637 task-clock (msec) #0.461 CPUs utilized >346 context-switches #0.068 K/sec >336 cpu-migrations#0.066 K/sec > 6,207 page-faults #0.001 M/sec > 10,016,486,301 cycles#1.963 GHz > (26.90%) > 5,751,692,176 stalled-cycles-frontend # 57.42% frontend cycles > idle (27.05%) > stalled-cycles-backend > 14,359,914,397 instructions #1.43 insns per cycle > #0.40 stalled cycles > per insn (33.78%) > 2,200,632,861 branches # 431.333 M/sec > (33.84%) > 1,162,860 branch-misses #0.05% of all branches > (33.97%) > 1,025,992,254 L1-dcache-loads # 201.099 M/sec > (26.56%) >432,663,098 L1-dcache-load-misses # 42.17% of all L1-dcache > hits(14.49%) >331,383,297 LLC-loads # 64.952 M/sec > (14.47%) >203,524 LLC-load-misses #0.06% of all LL-cache > hits (21.67%) > L1-icache-loads > 1,633,821 L1-icache-load-misses #0.320 M/sec > (28.85%) >950,368,796 dTLB-loads# 186.276 M/sec > (28.61%) >246,813,393 dTLB-load-misses # 25.97% of all dTLB > cache hits (14.53%) > 25,451 iTLB-loads#0.005 M/sec > (14.48%) > 35,415 iTLB-load-misses # 139.15% of all iTLB > cache hits (21.73%) > L1-dcache-prefetches >175,958 L1-dcache-prefetch-misses #0.034 M/sec > (28.94%) > 11.064783140 seconds time elapsed > {code} > This shows 42.17% of L1 data cache misses. > This jira is to use cache efficient bloom filter for semijoin probing. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17235) Add ORC Decimal64 Serialization/Deserialization
[ https://issues.apache.org/jira/browse/HIVE-17235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-17235: Attachment: HIVE-17235.03.patch > Add ORC Decimal64 Serialization/Deserialization > --- > > Key: HIVE-17235 > URL: https://issues.apache.org/jira/browse/HIVE-17235 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-17235.03.patch > > > The storage-api changes for ORC-209. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-12369) Native Vector GroupBy
[ https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112093#comment-16112093 ] Sergey Shelukhin commented on HIVE-12369: - Hmm. Reviewed most of page one. Should I be reviewing this? It looks like lots of stuff is UNDONE (not implemented?). I can also review stuff now and then the diff of the diffs, but I wonder if it makes sense, i.e. whether the upcoming changes are going to be reasonable in scope for that. > Native Vector GroupBy > - > > Key: HIVE-12369 > URL: https://issues.apache.org/jira/browse/HIVE-12369 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch, > HIVE-12369.05.patch, HIVE-12369.06.patch > > > Implement Native Vector GroupBy using fast hash table technology developed > for Native Vector MapJoin, etc. > Patch is currently limited to a single Long key, aggregation on Long columns, > no more than 31 columns. > 3 new classes introduces that stored the count in the slot table and don't > allocate hash elements: > {noformat} > COUNT(column) VectorGroupByHashOneLongKeyCountColumnOperator > COUNT(key) VectorGroupByHashOneLongKeyCountKeyOperator > COUNT(*) VectorGroupByHashOneLongKeyCountStarOperator > {noformat} > And a new class that aggregates a single Long key: > {noformat} > VectorGroupByHashOneLongKeyOperator > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15794) Support get hdfsEncryptionShim if FileSystem is ViewFileSystem
[ https://issues.apache.org/jira/browse/HIVE-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112090#comment-16112090 ] Yuming Wang commented on HIVE-15794: retest this please. > Support get hdfsEncryptionShim if FileSystem is ViewFileSystem > -- > > Key: HIVE-15794 > URL: https://issues.apache.org/jira/browse/HIVE-15794 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 1.2.0, 1.1.0, 2.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang > Attachments: HIVE-15794.1.patch > > > *SQL*: > {code:sql} > hive> create table table2 as select * from table1; > hive> show create table table2; > OK > CREATE TABLE `table2`( > `id` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION > 'viewfs://cluster4/user/hive/warehouse/table2' > TBLPROPERTIES ( > 'transient_lastDdlTime'='1486050317') > {code} > *LOG*: > {noformat} > 2017-02-02T20:12:49,738 INFO [99374b82-e9ca-4654-b803-93b194b9331b main] > session.SessionState: Could not get hdfsEncryptionShim, it is only applicable > to hdfs filesystem. > 2017-02-02T20:12:49,738 INFO [99374b82-e9ca-4654-b803-93b194b9331b main] > session.SessionState: Could not get hdfsEncryptionShim, it is only applicable > to hdfs filesystem. > {noformat} > Can’t get hdfsEncryptionShim if {{FileSystem}} is > [ViewFileSystem|http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-hdfs/ViewFs.html], > we should support it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17237) HMS wastes 26.4% of memory due to dup strings in metastore.api.Partition.parameters
[ https://issues.apache.org/jira/browse/HIVE-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112087#comment-16112087 ] Hive QA commented on HIVE-17237: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12880139/HIVE-17237.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11138 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed] (batchId=241) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=241) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_uncompressed] (batchId=56) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=236) org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgress (batchId=223) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6238/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6238/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6238/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12880139 - PreCommit-HIVE-Build > HMS wastes 26.4% of memory due to dup strings in > metastore.api.Partition.parameters > --- > > Key: HIVE-17237 > URL: https://issues.apache.org/jira/browse/HIVE-17237 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev > Attachments: HIVE-17237.01.patch > > > I've analyzed a heap dump from a production Hive installation using jxray > (www.jxray.com) It turns out that there are a lot of duplicate strings in > memory, that waste 26.4% of the heap. Most of them come from HashMaps > referenced by org.apache.hadoop.hive.metastore.api.Partition.parameters. > Below is the relevant section of the jxray report. > Looking at Partition.java, I see that in the past somebody has already added > code to intern keys and values in the parameters table when it's first set > up. However, when more key-value pairs are added, they are not interned, and > that probably explains the reason for all these duplicate strings. Also when > a Partition instance is deserialized, no interning of parameters is currently > done. > {code} > 6. DUPLICATE STRINGS > Total strings: 3,273,557 Unique strings: 460,390 Duplicate values: 110,232 > Overhead: 3,220,458K (26.4%) > > === > 7. REFERENCE CHAINS FOR DUPLICATE STRINGS > 2,326,150K (19.1%), 597058 dup strings (36386 unique), 597058 dup backing > arrays: > 39949 of "-1", 39088 of "true", 28959 of "8", 20987 of "1", 18437 of "10", > 9583 of "9", 5908 of "269664", 5691 of "174528", 4598 of "133980", 4598 of > "BgUGBQgFCAYFCgYIBgUEBgQHBgUGCwYGBwYHBgkKBwYGBggIBwUHBgYGCgUJCQUG ...[length > 3560]" > ... and 419200 more strings, of which 36376 are unique > Also contains one-char strings: 217 of "6", 147 of "7", 91 of "4", 28 of "5", > 28 of "2", 21 of "0" > <-- {j.u.HashMap}.values <-- > org.apache.hadoop.hive.metastore.api.Partition.parameters <-- > {j.u.ArrayList} <-- > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success > <-- Java Local > (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result) > [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots > 463,060K (3.8%), 119644 dup strings (34075 unique), 119644 dup backing > arrays: > 7914 of "true", 7912 of "-1", 6578 of "8", 5606 of "1", 2302 of "10", 1626 of > "174528", 1223 of "9", 970 of "171680", 837 of "269664", 657 of "133980" > ... and 84009 more strings, of which 34065 are unique > Also contains one-char strings: 42 of "7", 31 of "6", 20 of
[jira] [Commented] (HIVE-12369) Native Vector GroupBy
[ https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112077#comment-16112077 ] Sergey Shelukhin commented on HIVE-12369: - I started reviewing this... it will take some time, probably over a couple days. Will publish review in parts > Native Vector GroupBy > - > > Key: HIVE-12369 > URL: https://issues.apache.org/jira/browse/HIVE-12369 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch, > HIVE-12369.05.patch, HIVE-12369.06.patch > > > Implement Native Vector GroupBy using fast hash table technology developed > for Native Vector MapJoin, etc. > Patch is currently limited to a single Long key, aggregation on Long columns, > no more than 31 columns. > 3 new classes introduces that stored the count in the slot table and don't > allocate hash elements: > {noformat} > COUNT(column) VectorGroupByHashOneLongKeyCountColumnOperator > COUNT(key) VectorGroupByHashOneLongKeyCountKeyOperator > COUNT(*) VectorGroupByHashOneLongKeyCountStarOperator > {noformat} > And a new class that aggregates a single Long key: > {noformat} > VectorGroupByHashOneLongKeyOperator > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17213) HoS: file merging doesn't work for union all
[ https://issues.apache.org/jira/browse/HIVE-17213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112025#comment-16112025 ] Hive QA commented on HIVE-17213: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12880115/HIVE-17213.5.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11127 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=240) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_uncompressed] (batchId=56) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_union_merge] (batchId=169) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=178) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=178) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=178) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6237/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6237/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6237/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12880115 - PreCommit-HIVE-Build > HoS: file merging doesn't work for union all > > > Key: HIVE-17213 > URL: https://issues.apache.org/jira/browse/HIVE-17213 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Chao Sun >Assignee: Chao Sun > Attachments: HIVE-17213.0.patch, HIVE-17213.1.patch, > HIVE-17213.2.patch, HIVE-17213.3.patch, HIVE-17213.4.patch, HIVE-17213.5.patch > > > HoS file merging doesn't work properly since it doesn't set linked file sinks > properly which is used to generate move tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats
[ https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-16811: --- Status: Open (was: Patch Available) > Estimate statistics in absence of stats > --- > > Key: HIVE-16811 > URL: https://issues.apache.org/jira/browse/HIVE-16811 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-16811.1.patch, HIVE-16811.2.patch, > HIVE-16811.3.patch, HIVE-16811.4.patch > > > Currently Join ordering completely bails out in absence of statistics and > this could lead to bad joins such as cross joins. > e.g. following select query will produce cross join. > {code:sql} > create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, > S_NATIONKEY INT, > S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING) > CREATE TABLE lineitem (L_ORDERKEY INT, > L_PARTKEY INT, > L_SUPPKEY INT, > L_LINENUMBERINT, > L_QUANTITY DOUBLE, > L_EXTENDEDPRICE DOUBLE, > L_DISCOUNT DOUBLE, > L_TAX DOUBLE, > L_RETURNFLAGSTRING, > L_LINESTATUSSTRING, > l_shipdate STRING, > L_COMMITDATESTRING, > L_RECEIPTDATE STRING, > L_SHIPINSTRUCT STRING, > L_SHIPMODE STRING, > L_COMMENT STRING) partitioned by (dl > int) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '|'; > CREATE TABLE part( > p_partkey INT, > p_name STRING, > p_mfgr STRING, > p_brand STRING, > p_type STRING, > p_size INT, > p_container STRING, > p_retailprice DOUBLE, > p_comment STRING > ); > explain select count(1) from part,supplier,lineitem where p_partkey = > l_partkey and s_suppkey = l_suppkey; > {code} > Estimating stats will prevent join ordering algorithm to bail out and come up > with join at least better than cross join -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats
[ https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-16811: --- Status: Patch Available (was: Open) > Estimate statistics in absence of stats > --- > > Key: HIVE-16811 > URL: https://issues.apache.org/jira/browse/HIVE-16811 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-16811.1.patch, HIVE-16811.2.patch, > HIVE-16811.3.patch, HIVE-16811.4.patch > > > Currently Join ordering completely bails out in absence of statistics and > this could lead to bad joins such as cross joins. > e.g. following select query will produce cross join. > {code:sql} > create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, > S_NATIONKEY INT, > S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING) > CREATE TABLE lineitem (L_ORDERKEY INT, > L_PARTKEY INT, > L_SUPPKEY INT, > L_LINENUMBERINT, > L_QUANTITY DOUBLE, > L_EXTENDEDPRICE DOUBLE, > L_DISCOUNT DOUBLE, > L_TAX DOUBLE, > L_RETURNFLAGSTRING, > L_LINESTATUSSTRING, > l_shipdate STRING, > L_COMMITDATESTRING, > L_RECEIPTDATE STRING, > L_SHIPINSTRUCT STRING, > L_SHIPMODE STRING, > L_COMMENT STRING) partitioned by (dl > int) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '|'; > CREATE TABLE part( > p_partkey INT, > p_name STRING, > p_mfgr STRING, > p_brand STRING, > p_type STRING, > p_size INT, > p_container STRING, > p_retailprice DOUBLE, > p_comment STRING > ); > explain select count(1) from part,supplier,lineitem where p_partkey = > l_partkey and s_suppkey = l_suppkey; > {code} > Estimating stats will prevent join ordering algorithm to bail out and come up > with join at least better than cross join -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17237) HMS wastes 26.4% of memory due to dup strings in metastore.api.Partition.parameters
[ https://issues.apache.org/jira/browse/HIVE-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Misha Dmitriev updated HIVE-17237: -- Status: Patch Available (was: Open) > HMS wastes 26.4% of memory due to dup strings in > metastore.api.Partition.parameters > --- > > Key: HIVE-17237 > URL: https://issues.apache.org/jira/browse/HIVE-17237 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev > Attachments: HIVE-17237.01.patch > > > I've analyzed a heap dump from a production Hive installation using jxray > (www.jxray.com) It turns out that there are a lot of duplicate strings in > memory, that waste 26.4% of the heap. Most of them come from HashMaps > referenced by org.apache.hadoop.hive.metastore.api.Partition.parameters. > Below is the relevant section of the jxray report. > Looking at Partition.java, I see that in the past somebody has already added > code to intern keys and values in the parameters table when it's first set > up. However, when more key-value pairs are added, they are not interned, and > that probably explains the reason for all these duplicate strings. Also when > a Partition instance is deserialized, no interning of parameters is currently > done. > {code} > 6. DUPLICATE STRINGS > Total strings: 3,273,557 Unique strings: 460,390 Duplicate values: 110,232 > Overhead: 3,220,458K (26.4%) > > === > 7. REFERENCE CHAINS FOR DUPLICATE STRINGS > 2,326,150K (19.1%), 597058 dup strings (36386 unique), 597058 dup backing > arrays: > 39949 of "-1", 39088 of "true", 28959 of "8", 20987 of "1", 18437 of "10", > 9583 of "9", 5908 of "269664", 5691 of "174528", 4598 of "133980", 4598 of > "BgUGBQgFCAYFCgYIBgUEBgQHBgUGCwYGBwYHBgkKBwYGBggIBwUHBgYGCgUJCQUG ...[length > 3560]" > ... and 419200 more strings, of which 36376 are unique > Also contains one-char strings: 217 of "6", 147 of "7", 91 of "4", 28 of "5", > 28 of "2", 21 of "0" > <-- {j.u.HashMap}.values <-- > org.apache.hadoop.hive.metastore.api.Partition.parameters <-- > {j.u.ArrayList} <-- > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success > <-- Java Local > (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result) > [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots > 463,060K (3.8%), 119644 dup strings (34075 unique), 119644 dup backing > arrays: > 7914 of "true", 7912 of "-1", 6578 of "8", 5606 of "1", 2302 of "10", 1626 of > "174528", 1223 of "9", 970 of "171680", 837 of "269664", 657 of "133980" > ... and 84009 more strings, of which 34065 are unique > Also contains one-char strings: 42 of "7", 31 of "6", 20 of "4", 8 of "5", 5 > of "2", 3 of "0" > <-- {j.u.HashMap}.values <-- > org.apache.hadoop.hive.metastore.api.Partition.parameters <-- > {j.u.TreeMap}.values <-- Java Local (j.u.TreeMap) [@6f084afa0,@73aac9e68] > 233,384K (1.9%), 64601 dup strings (27295 unique), 64601 dup backing arrays: > 4472 of "true", 4173 of "-1", 3798 of "1", 3591 of "8", 813 of "174528", 684 > of "10" ... and 44568 more strings, of which 27285 are unique > Also contains one-char strings: 305 of "7", 301 of "0", 277 of "4", 146 of > "6", 29 of "2", 23 of "5", 19 of "9", 2 of "3" > <-- {j.u.HashMap}.values <-- > org.apache.hadoop.hive.metastore.api.Partition.parameters <-- > {j.u.ArrayList} <-- Java Local (j.u.ArrayList) > [@4f4cfbd10,@536122408,@726616778] > ... > 52,916K (0.4%), 597058 dup strings (16 unique), 597058 dup backing arrays: > <-- {j.u.HashMap}.keys <-- > org.apache.hadoop.hive.metastore.api.Partition.parameters <-- > {j.u.ArrayList} <-- > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success > <-- Java Local > (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result) > [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17237) HMS wastes 26.4% of memory due to dup strings in metastore.api.Partition.parameters
[ https://issues.apache.org/jira/browse/HIVE-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Misha Dmitriev updated HIVE-17237: -- Attachment: HIVE-17237.01.patch > HMS wastes 26.4% of memory due to dup strings in > metastore.api.Partition.parameters > --- > > Key: HIVE-17237 > URL: https://issues.apache.org/jira/browse/HIVE-17237 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev > Attachments: HIVE-17237.01.patch > > > I've analyzed a heap dump from a production Hive installation using jxray > (www.jxray.com) It turns out that there are a lot of duplicate strings in > memory, that waste 26.4% of the heap. Most of them come from HashMaps > referenced by org.apache.hadoop.hive.metastore.api.Partition.parameters. > Below is the relevant section of the jxray report. > Looking at Partition.java, I see that in the past somebody has already added > code to intern keys and values in the parameters table when it's first set > up. However, when more key-value pairs are added, they are not interned, and > that probably explains the reason for all these duplicate strings. Also when > a Partition instance is deserialized, no interning of parameters is currently > done. > {code} > 6. DUPLICATE STRINGS > Total strings: 3,273,557 Unique strings: 460,390 Duplicate values: 110,232 > Overhead: 3,220,458K (26.4%) > > === > 7. REFERENCE CHAINS FOR DUPLICATE STRINGS > 2,326,150K (19.1%), 597058 dup strings (36386 unique), 597058 dup backing > arrays: > 39949 of "-1", 39088 of "true", 28959 of "8", 20987 of "1", 18437 of "10", > 9583 of "9", 5908 of "269664", 5691 of "174528", 4598 of "133980", 4598 of > "BgUGBQgFCAYFCgYIBgUEBgQHBgUGCwYGBwYHBgkKBwYGBggIBwUHBgYGCgUJCQUG ...[length > 3560]" > ... and 419200 more strings, of which 36376 are unique > Also contains one-char strings: 217 of "6", 147 of "7", 91 of "4", 28 of "5", > 28 of "2", 21 of "0" > <-- {j.u.HashMap}.values <-- > org.apache.hadoop.hive.metastore.api.Partition.parameters <-- > {j.u.ArrayList} <-- > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success > <-- Java Local > (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result) > [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots > 463,060K (3.8%), 119644 dup strings (34075 unique), 119644 dup backing > arrays: > 7914 of "true", 7912 of "-1", 6578 of "8", 5606 of "1", 2302 of "10", 1626 of > "174528", 1223 of "9", 970 of "171680", 837 of "269664", 657 of "133980" > ... and 84009 more strings, of which 34065 are unique > Also contains one-char strings: 42 of "7", 31 of "6", 20 of "4", 8 of "5", 5 > of "2", 3 of "0" > <-- {j.u.HashMap}.values <-- > org.apache.hadoop.hive.metastore.api.Partition.parameters <-- > {j.u.TreeMap}.values <-- Java Local (j.u.TreeMap) [@6f084afa0,@73aac9e68] > 233,384K (1.9%), 64601 dup strings (27295 unique), 64601 dup backing arrays: > 4472 of "true", 4173 of "-1", 3798 of "1", 3591 of "8", 813 of "174528", 684 > of "10" ... and 44568 more strings, of which 27285 are unique > Also contains one-char strings: 305 of "7", 301 of "0", 277 of "4", 146 of > "6", 29 of "2", 23 of "5", 19 of "9", 2 of "3" > <-- {j.u.HashMap}.values <-- > org.apache.hadoop.hive.metastore.api.Partition.parameters <-- > {j.u.ArrayList} <-- Java Local (j.u.ArrayList) > [@4f4cfbd10,@536122408,@726616778] > ... > 52,916K (0.4%), 597058 dup strings (16 unique), 597058 dup backing arrays: > <-- {j.u.HashMap}.keys <-- > org.apache.hadoop.hive.metastore.api.Partition.parameters <-- > {j.u.ArrayList} <-- > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success > <-- Java Local > (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result) > [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats
[ https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-16811: --- Attachment: HIVE-16811.4.patch > Estimate statistics in absence of stats > --- > > Key: HIVE-16811 > URL: https://issues.apache.org/jira/browse/HIVE-16811 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-16811.1.patch, HIVE-16811.2.patch, > HIVE-16811.3.patch, HIVE-16811.4.patch > > > Currently Join ordering completely bails out in absence of statistics and > this could lead to bad joins such as cross joins. > e.g. following select query will produce cross join. > {code:sql} > create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, > S_NATIONKEY INT, > S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING) > CREATE TABLE lineitem (L_ORDERKEY INT, > L_PARTKEY INT, > L_SUPPKEY INT, > L_LINENUMBERINT, > L_QUANTITY DOUBLE, > L_EXTENDEDPRICE DOUBLE, > L_DISCOUNT DOUBLE, > L_TAX DOUBLE, > L_RETURNFLAGSTRING, > L_LINESTATUSSTRING, > l_shipdate STRING, > L_COMMITDATESTRING, > L_RECEIPTDATE STRING, > L_SHIPINSTRUCT STRING, > L_SHIPMODE STRING, > L_COMMENT STRING) partitioned by (dl > int) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '|'; > CREATE TABLE part( > p_partkey INT, > p_name STRING, > p_mfgr STRING, > p_brand STRING, > p_type STRING, > p_size INT, > p_container STRING, > p_retailprice DOUBLE, > p_comment STRING > ); > explain select count(1) from part,supplier,lineitem where p_partkey = > l_partkey and s_suppkey = l_suppkey; > {code} > Estimating stats will prevent join ordering algorithm to bail out and come up > with join at least better than cross join -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17237) HMS wastes 26.4% of memory due to dup strings in metastore.api.Partition.parameters
[ https://issues.apache.org/jira/browse/HIVE-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Misha Dmitriev reassigned HIVE-17237: - > HMS wastes 26.4% of memory due to dup strings in > metastore.api.Partition.parameters > --- > > Key: HIVE-17237 > URL: https://issues.apache.org/jira/browse/HIVE-17237 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev > > I've analyzed a heap dump from a production Hive installation using jxray > (www.jxray.com) It turns out that there are a lot of duplicate strings in > memory, that waste 26.4% of the heap. Most of them come from HashMaps > referenced by org.apache.hadoop.hive.metastore.api.Partition.parameters. > Below is the relevant section of the jxray report. > Looking at Partition.java, I see that in the past somebody has already added > code to intern keys and values in the parameters table when it's first set > up. However, when more key-value pairs are added, they are not interned, and > that probably explains the reason for all these duplicate strings. Also when > a Partition instance is deserialized, no interning of parameters is currently > done. > {code} > 6. DUPLICATE STRINGS > Total strings: 3,273,557 Unique strings: 460,390 Duplicate values: 110,232 > Overhead: 3,220,458K (26.4%) > > === > 7. REFERENCE CHAINS FOR DUPLICATE STRINGS > 2,326,150K (19.1%), 597058 dup strings (36386 unique), 597058 dup backing > arrays: > 39949 of "-1", 39088 of "true", 28959 of "8", 20987 of "1", 18437 of "10", > 9583 of "9", 5908 of "269664", 5691 of "174528", 4598 of "133980", 4598 of > "BgUGBQgFCAYFCgYIBgUEBgQHBgUGCwYGBwYHBgkKBwYGBggIBwUHBgYGCgUJCQUG ...[length > 3560]" > ... and 419200 more strings, of which 36376 are unique > Also contains one-char strings: 217 of "6", 147 of "7", 91 of "4", 28 of "5", > 28 of "2", 21 of "0" > <-- {j.u.HashMap}.values <-- > org.apache.hadoop.hive.metastore.api.Partition.parameters <-- > {j.u.ArrayList} <-- > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success > <-- Java Local > (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result) > [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots > 463,060K (3.8%), 119644 dup strings (34075 unique), 119644 dup backing > arrays: > 7914 of "true", 7912 of "-1", 6578 of "8", 5606 of "1", 2302 of "10", 1626 of > "174528", 1223 of "9", 970 of "171680", 837 of "269664", 657 of "133980" > ... and 84009 more strings, of which 34065 are unique > Also contains one-char strings: 42 of "7", 31 of "6", 20 of "4", 8 of "5", 5 > of "2", 3 of "0" > <-- {j.u.HashMap}.values <-- > org.apache.hadoop.hive.metastore.api.Partition.parameters <-- > {j.u.TreeMap}.values <-- Java Local (j.u.TreeMap) [@6f084afa0,@73aac9e68] > 233,384K (1.9%), 64601 dup strings (27295 unique), 64601 dup backing arrays: > 4472 of "true", 4173 of "-1", 3798 of "1", 3591 of "8", 813 of "174528", 684 > of "10" ... and 44568 more strings, of which 27285 are unique > Also contains one-char strings: 305 of "7", 301 of "0", 277 of "4", 146 of > "6", 29 of "2", 23 of "5", 19 of "9", 2 of "3" > <-- {j.u.HashMap}.values <-- > org.apache.hadoop.hive.metastore.api.Partition.parameters <-- > {j.u.ArrayList} <-- Java Local (j.u.ArrayList) > [@4f4cfbd10,@536122408,@726616778] > ... > 52,916K (0.4%), 597058 dup strings (16 unique), 597058 dup backing arrays: > <-- {j.u.HashMap}.keys <-- > org.apache.hadoop.hive.metastore.api.Partition.parameters <-- > {j.u.ArrayList} <-- > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success > <-- Java Local > (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result) > [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17234) Remove HBase metastore from master
[ https://issues.apache.org/jira/browse/HIVE-17234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-17234: -- Status: Patch Available (was: Open) > Remove HBase metastore from master > -- > > Key: HIVE-17234 > URL: https://issues.apache.org/jira/browse/HIVE-17234 > Project: Hive > Issue Type: Task > Components: HBase Metastore >Affects Versions: 3.0.0 >Reporter: Alan Gates >Assignee: Alan Gates > Attachments: HIVE-17234.patch > > > No new development has been done on the HBase metastore in at least a year, > and to my knowledge no one is using it (nor is it even in a state to be fully > usable). Given the lack of interest in continuing to develop it, we should > remove it rather than leave dead code hanging around and extra tests taking > up time in test runs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17234) Remove HBase metastore from master
[ https://issues.apache.org/jira/browse/HIVE-17234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-17234: -- Attachment: HIVE-17234.patch This patch removes all of the unused parts of HBase metastore. The aggregate stats work is kept, and moves into directories that match the already changed packages. Two methods that were still used from HBaseUtils move into MetaStoreUtils. FileMetadata remains in the hbase package, since I believe Sergey wants to use it sometime in the future. > Remove HBase metastore from master > -- > > Key: HIVE-17234 > URL: https://issues.apache.org/jira/browse/HIVE-17234 > Project: Hive > Issue Type: Task > Components: HBase Metastore >Affects Versions: 3.0.0 >Reporter: Alan Gates >Assignee: Alan Gates > Attachments: HIVE-17234.patch > > > No new development has been done on the HBase metastore in at least a year, > and to my knowledge no one is using it (nor is it even in a state to be fully > usable). Given the lack of interest in continuing to develop it, we should > remove it rather than leave dead code hanging around and extra tests taking > up time in test runs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17234) Remove HBase metastore from master
[ https://issues.apache.org/jira/browse/HIVE-17234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111981#comment-16111981 ] ASF GitHub Bot commented on HIVE-17234: --- GitHub user alanfgates opened a pull request: https://github.com/apache/hive/pull/218 HIVE-17234 This patch removes all of the unused parts of HBase metastore. The aggregate stats work is kept, and moves into directories that match the already changed packages. Two methods that were still used from HBaseUtils move into MetaStoreUtils. FileMetadata remains in the hbase package, since I believe Sergey wants to use it sometime in the future. You can merge this pull request into a Git repository by running: $ git pull https://github.com/alanfgates/hive hive17234 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/218.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #218 commit 6afb3e1335e3541b6775a58d00b383cc36024d18 Author: Alan GatesDate: 2017-08-03T00:12:09Z HIVE-17234 Remove HBase metastore from master commit fbce5296c95f38bb56ff2ba3a670f02533ed6661 Author: Alan Gates Date: 2017-08-03T00:32:59Z Removed one more file I missed previously. > Remove HBase metastore from master > -- > > Key: HIVE-17234 > URL: https://issues.apache.org/jira/browse/HIVE-17234 > Project: Hive > Issue Type: Task > Components: HBase Metastore >Affects Versions: 3.0.0 >Reporter: Alan Gates >Assignee: Alan Gates > > No new development has been done on the HBase metastore in at least a year, > and to my knowledge no one is using it (nor is it even in a state to be fully > usable). Given the lack of interest in continuing to develop it, we should > remove it rather than leave dead code hanging around and extra tests taking > up time in test runs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17230) Timestamp format different in HiveCLI and Beeline
[ https://issues.apache.org/jira/browse/HIVE-17230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111975#comment-16111975 ] Aihua Xu commented on HIVE-17230: - The output from the beeline version is more accurate to me. I feel it makes sense to make that change. > Timestamp format different in HiveCLI and Beeline > - > > Key: HIVE-17230 > URL: https://issues.apache.org/jira/browse/HIVE-17230 > Project: Hive > Issue Type: Bug > Components: Beeline, CLI >Reporter: Peter Vary >Assignee: Peter Vary > > The issue can be reproduced with the following commands: > {code} > create table timestamp_test(t timestamp); > insert into table timestamp_test values('2000-01-01 01:00:00'); > select * from timestamp_test; > {code} > The timestamp is displayed without nanoseconds in HiveCLI: > {code} > 2000-01-01 01:00:00 > {code} > When the exact same timestamp is displayed in BeeLine it displays: > {code} > 2000-01-01 01:00:00.0 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17236) Add support for wild card LOCATION for external table
[ https://issues.apache.org/jira/browse/HIVE-17236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nirav patel updated HIVE-17236: --- Description: Hive currently doesn't support wild card in path for external table. I think it should considering mapreduce framework supports it and it's a common requirement. Following should work CREATE EXTERNAL TABLE testTable (val map) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/user/mycomp/customers/\*/departments/partition/\*/sales.tsv'; was: Hive currently doesn't support wild card in path for external table. I think it should considering mapreduce framework supports it and it's a common requirement. Following should work CREATE EXTERNAL TABLE testTable (val map ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/user/mycomp/customers/\\*/departments/partition/\*/sales.tsv'; > Add support for wild card LOCATION for external table > - > > Key: HIVE-17236 > URL: https://issues.apache.org/jira/browse/HIVE-17236 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 1.2.2 >Reporter: nirav patel > > Hive currently doesn't support wild card in path for external table. I think > it should considering mapreduce framework supports it and it's a common > requirement. > Following should work > CREATE EXTERNAL TABLE testTable (val map ) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '\t' > LOCATION '/user/mycomp/customers/\*/departments/partition/\*/sales.tsv'; -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17236) Add support for wild card LOCATION for external table
[ https://issues.apache.org/jira/browse/HIVE-17236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nirav patel updated HIVE-17236: --- Description: Hive currently doesn't support wild card in path for external table. I think it should considering mapreduce framework supports it and it's a common requirement. Following should work CREATE EXTERNAL TABLE testTable (val map) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/user/mycomp/customers/*/departments/partition/*/sales.tsv'; was: Hive currently doesn't support wild car in path for external table. I think it should considering mapreduce framework supports it and it's a common requirement. Following should work CREATE EXTERNAL TABLE testTable (val map ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/user/mycomp/customers/*/departments/partition/*/sales.tsv'; > Add support for wild card LOCATION for external table > - > > Key: HIVE-17236 > URL: https://issues.apache.org/jira/browse/HIVE-17236 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 1.2.2 >Reporter: nirav patel > > Hive currently doesn't support wild card in path for external table. I think > it should considering mapreduce framework supports it and it's a common > requirement. > Following should work > CREATE EXTERNAL TABLE testTable (val map ) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '\t' > LOCATION '/user/mycomp/customers/*/departments/partition/*/sales.tsv'; -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17236) Add support for wild card LOCATION for external table
[ https://issues.apache.org/jira/browse/HIVE-17236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nirav patel updated HIVE-17236: --- Description: Hive currently doesn't support wild card in path for external table. I think it should considering mapreduce framework supports it and it's a common requirement. Following should work CREATE EXTERNAL TABLE testTable (val map) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/user/mycomp/customers/\\*/departments/partition/\*/sales.tsv'; was: Hive currently doesn't support wild card in path for external table. I think it should considering mapreduce framework supports it and it's a common requirement. Following should work CREATE EXTERNAL TABLE testTable (val map ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/user/mycomp/customers/*/departments/partition/*/sales.tsv'; > Add support for wild card LOCATION for external table > - > > Key: HIVE-17236 > URL: https://issues.apache.org/jira/browse/HIVE-17236 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 1.2.2 >Reporter: nirav patel > > Hive currently doesn't support wild card in path for external table. I think > it should considering mapreduce framework supports it and it's a common > requirement. > Following should work > CREATE EXTERNAL TABLE testTable (val map ) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '\t' > LOCATION '/user/mycomp/customers/\\*/departments/partition/\*/sales.tsv'; -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17217) SMB Join : Assert if paths are different in TezGroupedSplit in KeyValueInputMerger
[ https://issues.apache.org/jira/browse/HIVE-17217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-17217: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Committed to master > SMB Join : Assert if paths are different in TezGroupedSplit in > KeyValueInputMerger > -- > > Key: HIVE-17217 > URL: https://issues.apache.org/jira/browse/HIVE-17217 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal > Fix For: 3.0.0 > > Attachments: HIVE-17217.1.patch, HIVE-17217.2.patch > > > In KeyValueInputMerger, a TezGroupedSplit may contain more than 1 splits. > However, the splits should all belong to same path. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17235) Add ORC Decimal64 Serialization/Deserialization
[ https://issues.apache.org/jira/browse/HIVE-17235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline reassigned HIVE-17235: --- > Add ORC Decimal64 Serialization/Deserialization > --- > > Key: HIVE-17235 > URL: https://issues.apache.org/jira/browse/HIVE-17235 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > > The storage-api changes for ORC-209. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17213) HoS: file merging doesn't work for union all
[ https://issues.apache.org/jira/browse/HIVE-17213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111946#comment-16111946 ] Xuefu Zhang commented on HIVE-17213: +1 pending on tests. > HoS: file merging doesn't work for union all > > > Key: HIVE-17213 > URL: https://issues.apache.org/jira/browse/HIVE-17213 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Chao Sun >Assignee: Chao Sun > Attachments: HIVE-17213.0.patch, HIVE-17213.1.patch, > HIVE-17213.2.patch, HIVE-17213.3.patch, HIVE-17213.4.patch, HIVE-17213.5.patch > > > HoS file merging doesn't work properly since it doesn't set linked file sinks > properly which is used to generate move tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-14343) HiveDriverRunHookContext's command is null in HS2 mode
[ https://issues.apache.org/jira/browse/HIVE-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111932#comment-16111932 ] Peng Cheng commented on HIVE-14343: --- Apparently this bug also affects Hive 1.2.1 branch: https://stackoverflow.com/questions/45450066/why-hivedriverrunhook-cannot-read-any-command-when-it-is-submitted-is-it-a-bug Would you like me to open an new ticket? > HiveDriverRunHookContext's command is null in HS2 mode > -- > > Key: HIVE-14343 > URL: https://issues.apache.org/jira/browse/HIVE-14343 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.2.0 >Reporter: Chao Sun >Assignee: Chao Sun > Fix For: 2.3.0 > > Attachments: HIVE-14343.0.patch, HIVE-14343.1.patch > > > Looking at the {{Driver#runInternal(String command, boolean > alreadyCompiled)}}: > {code} > HiveDriverRunHookContext hookContext = new > HiveDriverRunHookContextImpl(conf, command); > // Get all the driver run hooks and pre-execute them. > List driverRunHooks; > {code} > The context is initialized with the {{command}} passed in to the method. > However, this command is always null if {{alreadyCompiled}} is true, which is > the case for HS2 mode. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-13989) Extended ACLs are not handled according to specification
[ https://issues.apache.org/jira/browse/HIVE-13989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111879#comment-16111879 ] Chris Drome commented on HIVE-13989: [~vgumashta], I checked the behavior of hadoop-2.7 and hadoop-2.8, which matches what you describe about zeroing out the 'other' permissions. My intention was to let HDFS create and manage the child directories where possible. However, the reason for this patch was because early versions of ACL support in hadoop combined with the original treatment of ACLs in hive/hcat were generating incorrect results. Let me revisit the patch and submit a new version. > Extended ACLs are not handled according to specification > > > Key: HIVE-13989 > URL: https://issues.apache.org/jira/browse/HIVE-13989 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1, 2.0.0 >Reporter: Chris Drome >Assignee: Chris Drome > Attachments: HIVE-13989.1-branch-1.patch, HIVE-13989.1.patch, > HIVE-13989-branch-1.patch, HIVE-13989-branch-2.2.patch, > HIVE-13989-branch-2.2.patch, HIVE-13989-branch-2.2.patch > > > Hive takes two approaches to working with extended ACLs depending on whether > data is being produced via a Hive query or HCatalog APIs. A Hive query will > run an FsShell command to recursively set the extended ACLs for a directory > sub-tree. HCatalog APIs will attempt to build up the directory sub-tree > programmatically and runs some code to set the ACLs to match the parent > directory. > Some incorrect assumptions were made when implementing the extended ACLs > support. Refer to https://issues.apache.org/jira/browse/HDFS-4685 for the > design documents of extended ACLs in HDFS. These documents model the > implementation after the POSIX implementation on Linux, which can be found at > http://www.vanemery.com/Linux/ACL/POSIX_ACL_on_Linux.html. > The code for setting extended ACLs via HCatalog APIs is found in > HdfsUtils.java: > {code} > if (aclEnabled) { > aclStatus = sourceStatus.getAclStatus(); > if (aclStatus != null) { > LOG.trace(aclStatus.toString()); > aclEntries = aclStatus.getEntries(); > removeBaseAclEntries(aclEntries); > //the ACL api's also expect the tradition user/group/other permission > in the form of ACL > aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.USER, > sourcePerm.getUserAction())); > aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.GROUP, > sourcePerm.getGroupAction())); > aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.OTHER, > sourcePerm.getOtherAction())); > } > } > {code} > We found that DEFAULT extended ACL rules were not being inherited properly by > the directory sub-tree, so the above code is incomplete because it > effectively drops the DEFAULT rules. The second problem is with the call to > {{sourcePerm.getGroupAction()}}, which is incorrect in the case of extended > ACLs. When extended ACLs are used the GROUP permission is replaced with the > extended ACL mask. So the above code will apply the wrong permissions to the > GROUP. Instead the correct GROUP permissions now need to be pulled from the > AclEntry as returned by {{getAclStatus().getEntries()}}. See the > implementation of the new method {{getDefaultAclEntries}} for details. > Similar issues exist with the HCatalog API. None of the API accounts for > setting extended ACLs on the directory sub-tree. The changes to the HCatalog > API allow the extended ACLs to be passed into the required methods similar to > how basic permissions are passed in. When building the directory sub-tree the > extended ACLs of the table directory are inherited by all sub-directories, > including the DEFAULT rules. > Replicating the problem: > Create a table to write data into (I will use acl_test as the destination and > words_text as the source) and set the ACLs as follows: > {noformat} > $ hdfs dfs -setfacl -m > default:user::rwx,default:group::r-x,default:mask::rwx,default:user:hdfs:rwx,group::r-x,user:hdfs:rwx > /user/cdrome/hive/acl_test > $ hdfs dfs -ls -d /user/cdrome/hive/acl_test > drwxrwx---+ - cdrome hdfs 0 2016-07-13 20:36 > /user/cdrome/hive/acl_test > $ hdfs dfs -getfacl -R /user/cdrome/hive/acl_test > # file: /user/cdrome/hive/acl_test > # owner: cdrome > # group: hdfs > user::rwx > user:hdfs:rwx > group::r-x > mask::rwx > other::--- > default:user::rwx > default:user:hdfs:rwx > default:group::r-x > default:mask::rwx > default:other::--- > {noformat} > Note that the basic GROUP permission is set to {{rwx}} after setting the > ACLs. The ACLs explicitly set the DEFAULT rules and a rule specifically for > the {{hdfs}} user. > Run the following query to populate the
[jira] [Updated] (HIVE-17222) Llap: Iotrace throws java.lang.UnsupportedOperationException with IncompleteCb
[ https://issues.apache.org/jira/browse/HIVE-17222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-17222: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Thanks [~sershe]. Committed to master. > Llap: Iotrace throws java.lang.UnsupportedOperationException with > IncompleteCb > --- > > Key: HIVE-17222 > URL: https://issues.apache.org/jira/browse/HIVE-17222 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-17222.1.patch > > > branch: hive master > Running Q76 at 1 TB generates the following exception. > {noformat} > Caused by: java.io.IOException: java.lang.UnsupportedOperationException > at > org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.rethrowErrorIfAny(LlapRecordReader.java:349) > at > org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.nextCvb(LlapRecordReader.java:304) > at > org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.next(LlapRecordReader.java:244) > at > org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.next(LlapRecordReader.java:67) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) > ... 23 more > Caused by: java.lang.UnsupportedOperationException > at > org.apache.hadoop.hive.common.io.DiskRange.getData(DiskRange.java:86) > at > org.apache.hadoop.hive.ql.io.orc.encoded.IoTrace.logRange(IoTrace.java:304) > at > org.apache.hadoop.hive.ql.io.orc.encoded.IoTrace.logRanges(IoTrace.java:291) > at > org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:328) > at > org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:426) > at > org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:250) > at > org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:247) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) > at > org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:247) > at > org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:96) > ... 6 more > {noformat} > When {{IncompleteCb}} is encountered, it ends up throwing this error. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration
[ https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated HIVE-17160: -- Attachment: HIVE-17160.2.patch > Adding kerberos Authorization to the Druid hive integration > --- > > Key: HIVE-17160 > URL: https://issues.apache.org/jira/browse/HIVE-17160 > Project: Hive > Issue Type: New Feature > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra > Attachments: HIVE-17160.2.patch, HIVE-17160.patch > > > This goal of this feature is to allow hive querying a secured druid cluster > using kerberos credentials. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration
[ https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated HIVE-17160: -- Status: Patch Available (was: In Progress) > Adding kerberos Authorization to the Druid hive integration > --- > > Key: HIVE-17160 > URL: https://issues.apache.org/jira/browse/HIVE-17160 > Project: Hive > Issue Type: New Feature > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra > Attachments: HIVE-17160.2.patch, HIVE-17160.patch > > > This goal of this feature is to allow hive querying a secured druid cluster > using kerberos credentials. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration
[ https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated HIVE-17160: -- Attachment: (was: HIVE-17160.2.patch) > Adding kerberos Authorization to the Druid hive integration > --- > > Key: HIVE-17160 > URL: https://issues.apache.org/jira/browse/HIVE-17160 > Project: Hive > Issue Type: New Feature > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra > Attachments: HIVE-17160.2.patch, HIVE-17160.patch > > > This goal of this feature is to allow hive querying a secured druid cluster > using kerberos credentials. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration
[ https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated HIVE-17160: -- Status: Open (was: Patch Available) > Adding kerberos Authorization to the Druid hive integration > --- > > Key: HIVE-17160 > URL: https://issues.apache.org/jira/browse/HIVE-17160 > Project: Hive > Issue Type: New Feature > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra > Attachments: HIVE-17160.2.patch, HIVE-17160.patch > > > This goal of this feature is to allow hive querying a secured druid cluster > using kerberos credentials. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration
[ https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated HIVE-17160: -- Status: Patch Available (was: Open) > Adding kerberos Authorization to the Druid hive integration > --- > > Key: HIVE-17160 > URL: https://issues.apache.org/jira/browse/HIVE-17160 > Project: Hive > Issue Type: New Feature > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra > Attachments: HIVE-17160.2.patch, HIVE-17160.patch > > > This goal of this feature is to allow hive querying a secured druid cluster > using kerberos credentials. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration
[ https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated HIVE-17160: -- Status: In Progress (was: Patch Available) > Adding kerberos Authorization to the Druid hive integration > --- > > Key: HIVE-17160 > URL: https://issues.apache.org/jira/browse/HIVE-17160 > Project: Hive > Issue Type: New Feature > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra > Attachments: HIVE-17160.2.patch, HIVE-17160.patch > > > This goal of this feature is to allow hive querying a secured druid cluster > using kerberos credentials. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17213) HoS: file merging doesn't work for union all
[ https://issues.apache.org/jira/browse/HIVE-17213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-17213: Attachment: HIVE-17213.5.patch Patch v4 used INSERT OVERWRITE DIRECTORY, which didn't work for mini SparkOnYarn test. It seems currently all the qfiles in {{spark.only.query.files}} are in default run using the {{MiniSparkOnYarnCliDriver}}, which is causing problem for this case since we want it to run using {{TestSparkCliDriver}}. Patch v5 further divides the existing property {{spark.only.query.files}} into two: - {{spark.only.query.files}}: contains all qfiles that are Spark only AND should only be tested using {{TestSparkCliDriver}}, - {{miniSparkOnYarn.only.query.files}}: contains all qfiles that are Spark only AND should only be tested using {{MiniSparkOnYarnCliDriver}}. > HoS: file merging doesn't work for union all > > > Key: HIVE-17213 > URL: https://issues.apache.org/jira/browse/HIVE-17213 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Chao Sun >Assignee: Chao Sun > Attachments: HIVE-17213.0.patch, HIVE-17213.1.patch, > HIVE-17213.2.patch, HIVE-17213.3.patch, HIVE-17213.4.patch, HIVE-17213.5.patch > > > HoS file merging doesn't work properly since it doesn't set linked file sinks > properly which is used to generate move tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17226) Use strong hashing as security improvement
[ https://issues.apache.org/jira/browse/HIVE-17226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Li updated HIVE-17226: -- Status: Patch Available (was: Open) > Use strong hashing as security improvement > -- > > Key: HIVE-17226 > URL: https://issues.apache.org/jira/browse/HIVE-17226 > Project: Hive > Issue Type: Improvement > Components: Security >Reporter: Tao Li >Assignee: Tao Li > Attachments: HIVE-17226.1.patch > > > There have been 2 places identified where weak hashing needs to be replaced > by SHA256. > 1. CookieSigner.java uses MessageDigest.getInstance("SHA"). Mostly SHA is > mapped to SHA-1, which is not secure enough according to today's standards. > We should use SHA-256 instead. > 2. GenericUDFMaskHash.java uses DigestUtils.md5Hex. MD5 is considered weak > and should be replaced by DigestUtils.sha256Hex. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17226) Use strong hashing as security improvement
[ https://issues.apache.org/jira/browse/HIVE-17226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Li updated HIVE-17226: -- Attachment: HIVE-17226.1.patch > Use strong hashing as security improvement > -- > > Key: HIVE-17226 > URL: https://issues.apache.org/jira/browse/HIVE-17226 > Project: Hive > Issue Type: Improvement > Components: Security >Reporter: Tao Li >Assignee: Tao Li > Attachments: HIVE-17226.1.patch > > > There have been 2 places identified where weak hashing needs to be replaced > by SHA256. > 1. CookieSigner.java uses MessageDigest.getInstance("SHA"). Mostly SHA is > mapped to SHA-1, which is not secure enough according to today's standards. > We should use SHA-256 instead. > 2. GenericUDFMaskHash.java uses DigestUtils.md5Hex. MD5 is considered weak > and should be replaced by DigestUtils.sha256Hex. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17232) "No match found" Compactor finds a bucket file thinking it's a directory
[ https://issues.apache.org/jira/browse/HIVE-17232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111781#comment-16111781 ] Eugene Koifman commented on HIVE-17232: --- In the case of this test at least it's due to compaction being invoked via txnHandler.compact(new CompactionRequest("default", Table.ACIDTBLPART.name(), CompactionType.MAJOR)); but the target table is Partitioned. Thus in CompactorMR.run() AcidUtils.Directory dir = AcidUtils.getAcidState(new Path(sd.getLocation()), conf, txns, false, true); ends up finding the delta.../bucket.. files but thinks they are "original" because they are not at the level where they are expected. Compacting all partitions of a partition table in one command is not supported. If compaction is invoked via Alter Table (as it should), DDLTask.compact() will check that if table is partitioned then partition spec is supplied. todo: add some checks to getAcidState() to do some sanity checking to make sure it raises some error if it finds unexpected directory layout. > "No match found" Compactor finds a bucket file thinking it's a directory > -- > > Key: HIVE-17232 > URL: https://issues.apache.org/jira/browse/HIVE-17232 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > > {noformat} > 2017-08-02T12:38:11,996 WARN [main] compactor.CompactorMR: Found a > non-bucket file that we thought matched the bucket pattern! > file:/Users/ekoifman/dev/hiv\ > erwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands2-1501702264311/warehouse/acidtblpart/p=1/delta_013_013_/bucket_1 > Matcher=java\ > .util.regex.Matcher[pattern=^[0-9]{6} region=0,12 lastmatch=] > 2017-08-02T12:38:11,996 INFO [main] mapreduce.JobSubmitter: Cleaning up the > staging area > file:/tmp/hadoop/mapred/staging/ekoifman1723152463/.staging/job_lo\ > cal1723152463_0183 > 2017-08-02T12:38:11,997 ERROR [main] compactor.Worker: Caught exception while > trying to compact > id:1,dbname:default,tableName:ACIDTBLPART,partName:null,stat\ > e:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0. > Marking failed to avoid repeated failures, java.lang.IllegalStateException: > \ > No match found > at java.util.regex.Matcher.group(Matcher.java:536) > at java.util.regex.Matcher.group(Matcher.java:496) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorInputFormat.addFileToMap(CompactorMR.java:577) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorInputFormat.getSplits(CompactorMR.java:549) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:330) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:322) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:198) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.launchCompactionJob(CompactorMR.java:320) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:275) > at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:166) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.runWorker(TestTxnCommands2.java:1138) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.updateDeletePartitioned(TestTxnCommands2.java:894) > {noformat} > the stack trace points to 1st runWorker() in updateDeletePartitioned() though > the test run was TestTxnCommands2WithSplitUpdateAndVectorization -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-17232) "No match found" Compactor finds a bucket file thinking it's a directory
[ https://issues.apache.org/jira/browse/HIVE-17232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111693#comment-16111693 ] Eugene Koifman edited comment on HIVE-17232 at 8/2/17 9:40 PM: --- The loop in CopactorInputFormat.getSplits() is expecting delta/base or original file but ends up seeing a bucket file in an acid delta dir was (Author: ekoifman): The loop in CopactorInputSplit.getSplits() is expecting delta/base or original file but ends up seeing a bucket file in an acid delta dir > "No match found" Compactor finds a bucket file thinking it's a directory > -- > > Key: HIVE-17232 > URL: https://issues.apache.org/jira/browse/HIVE-17232 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > > {noformat} > 2017-08-02T12:38:11,996 WARN [main] compactor.CompactorMR: Found a > non-bucket file that we thought matched the bucket pattern! > file:/Users/ekoifman/dev/hiv\ > erwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands2-1501702264311/warehouse/acidtblpart/p=1/delta_013_013_/bucket_1 > Matcher=java\ > .util.regex.Matcher[pattern=^[0-9]{6} region=0,12 lastmatch=] > 2017-08-02T12:38:11,996 INFO [main] mapreduce.JobSubmitter: Cleaning up the > staging area > file:/tmp/hadoop/mapred/staging/ekoifman1723152463/.staging/job_lo\ > cal1723152463_0183 > 2017-08-02T12:38:11,997 ERROR [main] compactor.Worker: Caught exception while > trying to compact > id:1,dbname:default,tableName:ACIDTBLPART,partName:null,stat\ > e:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0. > Marking failed to avoid repeated failures, java.lang.IllegalStateException: > \ > No match found > at java.util.regex.Matcher.group(Matcher.java:536) > at java.util.regex.Matcher.group(Matcher.java:496) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorInputFormat.addFileToMap(CompactorMR.java:577) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorInputFormat.getSplits(CompactorMR.java:549) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:330) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:322) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:198) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.launchCompactionJob(CompactorMR.java:320) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:275) > at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:166) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.runWorker(TestTxnCommands2.java:1138) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.updateDeletePartitioned(TestTxnCommands2.java:894) > {noformat} > the stack trace points to 1st runWorker() in updateDeletePartitioned() though > the test run was TestTxnCommands2WithSplitUpdateAndVectorization -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17226) Use strong hashing as security improvement
[ https://issues.apache.org/jira/browse/HIVE-17226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111770#comment-16111770 ] Tao Li commented on HIVE-17226: --- [~asherman] I think changing the hash function for GenericUDFMaskHash should not cause compatibility issues, since there is no expectation/assumption that the masking result has to be the same. But we should include this change in the release notes so users are aware of it. > Use strong hashing as security improvement > -- > > Key: HIVE-17226 > URL: https://issues.apache.org/jira/browse/HIVE-17226 > Project: Hive > Issue Type: Improvement > Components: Security >Reporter: Tao Li >Assignee: Tao Li > > There have been 2 places identified where weak hashing needs to be replaced > by SHA256. > 1. CookieSigner.java uses MessageDigest.getInstance("SHA"). Mostly SHA is > mapped to SHA-1, which is not secure enough according to today's standards. > We should use SHA-256 instead. > 2. GenericUDFMaskHash.java uses DigestUtils.md5Hex. MD5 is considered weak > and should be replaced by DigestUtils.sha256Hex. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17226) Use strong hashing as security improvement
[ https://issues.apache.org/jira/browse/HIVE-17226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111758#comment-16111758 ] Andrew Sherman commented on HIVE-17226: --- Hi [~taoli-hwx] I have no idea, sorry. That's why I'm asking, in the hope of learning something. > Use strong hashing as security improvement > -- > > Key: HIVE-17226 > URL: https://issues.apache.org/jira/browse/HIVE-17226 > Project: Hive > Issue Type: Improvement > Components: Security >Reporter: Tao Li >Assignee: Tao Li > > There have been 2 places identified where weak hashing needs to be replaced > by SHA256. > 1. CookieSigner.java uses MessageDigest.getInstance("SHA"). Mostly SHA is > mapped to SHA-1, which is not secure enough according to today's standards. > We should use SHA-256 instead. > 2. GenericUDFMaskHash.java uses DigestUtils.md5Hex. MD5 is considered weak > and should be replaced by DigestUtils.sha256Hex. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16820) TezTask may not shut down correctly before submit
[ https://issues.apache.org/jira/browse/HIVE-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111755#comment-16111755 ] Sergey Shelukhin commented on HIVE-16820: - Yeah, what I was saying is that we don't need the same bugfix for that task cause there's no implementation. It probably does need an implementation (without bugs like this). > TezTask may not shut down correctly before submit > - > > Key: HIVE-16820 > URL: https://issues.apache.org/jira/browse/HIVE-16820 > Project: Hive > Issue Type: Bug >Reporter: Visakh Nair >Assignee: Sergey Shelukhin > Fix For: 3.0.0 > > Attachments: HIVE-16820.01.patch, HIVE-16820.patch > > > The query will run and only fail at the very end when the driver checks its > own shutdown flag. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks
[ https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111753#comment-16111753 ] Mithun Radhakrishnan commented on HIVE-15686: - For the record, this patch is only for {{branch-2}} and {{branch-2.2}}. > Partitions on Remote HDFS break encryption-zone checks > -- > > Key: HIVE-15686 > URL: https://issues.apache.org/jira/browse/HIVE-15686 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-15686.branch-2.patch > > > This is in relation to HIVE-13243, which fixes encryption-zone checks for > external tables. > Unfortunately, this is still borked for partitions with remote HDFS paths. > The code fails as follows: > {noformat} > 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer > (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during > processing of message. > java.lang.IllegalArgumentException: Wrong FS: > hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, > expected: hdfs://local-cluster-n1.myth.net:8020 > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985) > at > org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262) > at > org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974) > at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) > at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:763) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:285) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > I have a really simple fix. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17234) Remove HBase metastore from master
[ https://issues.apache.org/jira/browse/HIVE-17234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned HIVE-17234: - > Remove HBase metastore from master > -- > > Key: HIVE-17234 > URL: https://issues.apache.org/jira/browse/HIVE-17234 > Project: Hive > Issue Type: Task > Components: HBase Metastore >Affects Versions: 3.0.0 >Reporter: Alan Gates >Assignee: Alan Gates > > No new development has been done on the HBase metastore in at least a year, > and to my knowledge no one is using it (nor is it even in a state to be fully > usable). Given the lack of interest in continuing to develop it, we should > remove it rather than leave dead code hanging around and extra tests taking > up time in test runs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration
[ https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111738#comment-16111738 ] slim bouguerra commented on HIVE-17160: --- addressed the comments and uploaded new patch. > Adding kerberos Authorization to the Druid hive integration > --- > > Key: HIVE-17160 > URL: https://issues.apache.org/jira/browse/HIVE-17160 > Project: Hive > Issue Type: New Feature > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra > Attachments: HIVE-17160.2.patch, HIVE-17160.patch > > > This goal of this feature is to allow hive querying a secured druid cluster > using kerberos credentials. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17233) Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs.
[ https://issues.apache.org/jira/browse/HIVE-17233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-17233: Status: Patch Available (was: Open) > Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs. > > > Key: HIVE-17233 > URL: https://issues.apache.org/jira/browse/HIVE-17233 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 2.2.0, 3.0.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-17233.1.patch > > > This has to do with {{HIVE-15575}}. {{TezCompiler}} seems to set > {{mapred.input.dir.recursive}} to {{true}}. This is acceptable for Hive jobs, > since this allows Hive to consume its peculiar {{UNION ALL}} output, where > the output of each relation is stored in a separate sub-directory of the > output-dir. > For such output to be readable through HCatalog (via Pig/HCatLoader), > {{mapred.input.dir.recursive}} should be set from {{HCatInputFormat}} as > well. Otherwise, one gets zero records for that input. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17233) Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs.
[ https://issues.apache.org/jira/browse/HIVE-17233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-17233: Attachment: HIVE-17233.1.patch The proposed fix. > Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs. > > > Key: HIVE-17233 > URL: https://issues.apache.org/jira/browse/HIVE-17233 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 2.2.0, 3.0.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-17233.1.patch > > > This has to do with {{HIVE-15575}}. {{TezCompiler}} seems to set > {{mapred.input.dir.recursive}} to {{true}}. This is acceptable for Hive jobs, > since this allows Hive to consume its peculiar {{UNION ALL}} output, where > the output of each relation is stored in a separate sub-directory of the > output-dir. > For such output to be readable through HCatalog (via Pig/HCatLoader), > {{mapred.input.dir.recursive}} should be set from {{HCatInputFormat}} as > well. Otherwise, one gets zero records for that input. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17226) Use strong hashing as security improvement
[ https://issues.apache.org/jira/browse/HIVE-17226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111703#comment-16111703 ] Tao Li commented on HIVE-17226: --- [~asherman] Regarding the CookieSigner, I don't see an incompatibility issue. Regarding the UDF hash, do you have any concerns related to compatibility? > Use strong hashing as security improvement > -- > > Key: HIVE-17226 > URL: https://issues.apache.org/jira/browse/HIVE-17226 > Project: Hive > Issue Type: Improvement > Components: Security >Reporter: Tao Li >Assignee: Tao Li > > There have been 2 places identified where weak hashing needs to be replaced > by SHA256. > 1. CookieSigner.java uses MessageDigest.getInstance("SHA"). Mostly SHA is > mapped to SHA-1, which is not secure enough according to today's standards. > We should use SHA-256 instead. > 2. GenericUDFMaskHash.java uses DigestUtils.md5Hex. MD5 is considered weak > and should be replaced by DigestUtils.sha256Hex. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17232) "No match found" Compactor finds a bucket file thinking it's a directory
[ https://issues.apache.org/jira/browse/HIVE-17232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111693#comment-16111693 ] Eugene Koifman commented on HIVE-17232: --- The loop in CopactorInputSplit.getSplits() is expecting delta/base or original file but ends up seeing a bucket file in an acid delta dir > "No match found" Compactor finds a bucket file thinking it's a directory > -- > > Key: HIVE-17232 > URL: https://issues.apache.org/jira/browse/HIVE-17232 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > > {noformat} > 2017-08-02T12:38:11,996 WARN [main] compactor.CompactorMR: Found a > non-bucket file that we thought matched the bucket pattern! > file:/Users/ekoifman/dev/hiv\ > erwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands2-1501702264311/warehouse/acidtblpart/p=1/delta_013_013_/bucket_1 > Matcher=java\ > .util.regex.Matcher[pattern=^[0-9]{6} region=0,12 lastmatch=] > 2017-08-02T12:38:11,996 INFO [main] mapreduce.JobSubmitter: Cleaning up the > staging area > file:/tmp/hadoop/mapred/staging/ekoifman1723152463/.staging/job_lo\ > cal1723152463_0183 > 2017-08-02T12:38:11,997 ERROR [main] compactor.Worker: Caught exception while > trying to compact > id:1,dbname:default,tableName:ACIDTBLPART,partName:null,stat\ > e:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0. > Marking failed to avoid repeated failures, java.lang.IllegalStateException: > \ > No match found > at java.util.regex.Matcher.group(Matcher.java:536) > at java.util.regex.Matcher.group(Matcher.java:496) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorInputFormat.addFileToMap(CompactorMR.java:577) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorInputFormat.getSplits(CompactorMR.java:549) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:330) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:322) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:198) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.launchCompactionJob(CompactorMR.java:320) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:275) > at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:166) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.runWorker(TestTxnCommands2.java:1138) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.updateDeletePartitioned(TestTxnCommands2.java:894) > {noformat} > the stack trace points to 1st runWorker() in updateDeletePartitioned() though > the test run was TestTxnCommands2WithSplitUpdateAndVectorization -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17232) "No match found" Compactor finds a bucket file thinking it's a directory
[ https://issues.apache.org/jira/browse/HIVE-17232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reassigned HIVE-17232: - Assignee: Eugene Koifman > "No match found" Compactor finds a bucket file thinking it's a directory > -- > > Key: HIVE-17232 > URL: https://issues.apache.org/jira/browse/HIVE-17232 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > > {noformat} > 2017-08-02T12:38:11,996 WARN [main] compactor.CompactorMR: Found a > non-bucket file that we thought matched the bucket pattern! > file:/Users/ekoifman/dev/hiv\ > erwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands2-1501702264311/warehouse/acidtblpart/p=1/delta_013_013_/bucket_1 > Matcher=java\ > .util.regex.Matcher[pattern=^[0-9]{6} region=0,12 lastmatch=] > 2017-08-02T12:38:11,996 INFO [main] mapreduce.JobSubmitter: Cleaning up the > staging area > file:/tmp/hadoop/mapred/staging/ekoifman1723152463/.staging/job_lo\ > cal1723152463_0183 > 2017-08-02T12:38:11,997 ERROR [main] compactor.Worker: Caught exception while > trying to compact > id:1,dbname:default,tableName:ACIDTBLPART,partName:null,stat\ > e:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0. > Marking failed to avoid repeated failures, java.lang.IllegalStateException: > \ > No match found > at java.util.regex.Matcher.group(Matcher.java:536) > at java.util.regex.Matcher.group(Matcher.java:496) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorInputFormat.addFileToMap(CompactorMR.java:577) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorInputFormat.getSplits(CompactorMR.java:549) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:330) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:322) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:198) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.launchCompactionJob(CompactorMR.java:320) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:275) > at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:166) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.runWorker(TestTxnCommands2.java:1138) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.updateDeletePartitioned(TestTxnCommands2.java:894) > {noformat} > the stack trace points to 1st runWorker() in updateDeletePartitioned() though > the test run was TestTxnCommands2WithSplitUpdateAndVectorization -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17233) Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs.
[ https://issues.apache.org/jira/browse/HIVE-17233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan reassigned HIVE-17233: --- Assignee: Mithun Radhakrishnan > Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs. > > > Key: HIVE-17233 > URL: https://issues.apache.org/jira/browse/HIVE-17233 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 2.2.0, 3.0.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > > This has to do with {{HIVE-15575}}. {{TezCompiler}} seems to set > {{mapred.input.dir.recursive}} to {{true}}. This is acceptable for Hive jobs, > since this allows Hive to consume its peculiar {{UNION ALL}} output, where > the output of each relation is stored in a separate sub-directory of the > output-dir. > For such output to be readable through HCatalog (via Pig/HCatLoader), > {{mapred.input.dir.recursive}} should be set from {{HCatInputFormat}} as > well. Otherwise, one gets zero records for that input. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17232) "No match found" Compactor finds a bucket file thinking it's a directory
[ https://issues.apache.org/jira/browse/HIVE-17232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-17232: -- Summary: "No match found" Compactor finds a bucket file thinking it's a directory (was: No match found Compactor finds a bucket file thinking it's a directory) > "No match found" Compactor finds a bucket file thinking it's a directory > -- > > Key: HIVE-17232 > URL: https://issues.apache.org/jira/browse/HIVE-17232 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman > > {noformat} > 2017-08-02T12:38:11,996 WARN [main] compactor.CompactorMR: Found a > non-bucket file that we thought matched the bucket pattern! > file:/Users/ekoifman/dev/hiv\ > erwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands2-1501702264311/warehouse/acidtblpart/p=1/delta_013_013_/bucket_1 > Matcher=java\ > .util.regex.Matcher[pattern=^[0-9]{6} region=0,12 lastmatch=] > 2017-08-02T12:38:11,996 INFO [main] mapreduce.JobSubmitter: Cleaning up the > staging area > file:/tmp/hadoop/mapred/staging/ekoifman1723152463/.staging/job_lo\ > cal1723152463_0183 > 2017-08-02T12:38:11,997 ERROR [main] compactor.Worker: Caught exception while > trying to compact > id:1,dbname:default,tableName:ACIDTBLPART,partName:null,stat\ > e:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0. > Marking failed to avoid repeated failures, java.lang.IllegalStateException: > \ > No match found > at java.util.regex.Matcher.group(Matcher.java:536) > at java.util.regex.Matcher.group(Matcher.java:496) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorInputFormat.addFileToMap(CompactorMR.java:577) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorInputFormat.getSplits(CompactorMR.java:549) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:330) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:322) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:198) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.launchCompactionJob(CompactorMR.java:320) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:275) > at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:166) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.runWorker(TestTxnCommands2.java:1138) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.updateDeletePartitioned(TestTxnCommands2.java:894) > {noformat} > the stack trace points to 1st runWorker() in updateDeletePartitioned() though > the test run was TestTxnCommands2WithSplitUpdateAndVectorization -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17172) add ordering checks to DiskRangeList
[ https://issues.apache.org/jira/browse/HIVE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17172: Fix Version/s: 2.4.0 3.0.0 > add ordering checks to DiskRangeList > > > Key: HIVE-17172 > URL: https://issues.apache.org/jira/browse/HIVE-17172 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 3.0.0, 2.4.0 > > Attachments: HIVE-17172.01.patch, HIVE-17172.02.patch, > HIVE-17172.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17172) add ordering checks to DiskRangeList
[ https://issues.apache.org/jira/browse/HIVE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17172: Resolution: Fixed Status: Resolved (was: Patch Available) Committed to master and branch-2. Thanks for the reviews! > add ordering checks to DiskRangeList > > > Key: HIVE-17172 > URL: https://issues.apache.org/jira/browse/HIVE-17172 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17172.01.patch, HIVE-17172.02.patch, > HIVE-17172.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks
[ https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-15686: Status: Patch Available (was: Open) > Partitions on Remote HDFS break encryption-zone checks > -- > > Key: HIVE-15686 > URL: https://issues.apache.org/jira/browse/HIVE-15686 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-15686.branch-2.patch > > > This is in relation to HIVE-13243, which fixes encryption-zone checks for > external tables. > Unfortunately, this is still borked for partitions with remote HDFS paths. > The code fails as follows: > {noformat} > 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer > (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during > processing of message. > java.lang.IllegalArgumentException: Wrong FS: > hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, > expected: hdfs://local-cluster-n1.myth.net:8020 > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985) > at > org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262) > at > org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974) > at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) > at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:763) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:285) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > I have a really simple fix. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks
[ https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-15686: Attachment: HIVE-15686.branch-2.patch > Partitions on Remote HDFS break encryption-zone checks > -- > > Key: HIVE-15686 > URL: https://issues.apache.org/jira/browse/HIVE-15686 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-15686.branch-2.patch > > > This is in relation to HIVE-13243, which fixes encryption-zone checks for > external tables. > Unfortunately, this is still borked for partitions with remote HDFS paths. > The code fails as follows: > {noformat} > 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer > (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during > processing of message. > java.lang.IllegalArgumentException: Wrong FS: > hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, > expected: hdfs://local-cluster-n1.myth.net:8020 > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985) > at > org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262) > at > org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974) > at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) > at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:763) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:285) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > I have a really simple fix. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks
[ https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-15686: Assignee: Mithun Radhakrishnan Affects Version/s: (was: 1.2.1) 2.2.0 Status: Open (was: Patch Available) Attaching patch for {{branch-2}}. > Partitions on Remote HDFS break encryption-zone checks > -- > > Key: HIVE-15686 > URL: https://issues.apache.org/jira/browse/HIVE-15686 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-15686.branch-2.patch > > > This is in relation to HIVE-13243, which fixes encryption-zone checks for > external tables. > Unfortunately, this is still borked for partitions with remote HDFS paths. > The code fails as follows: > {noformat} > 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer > (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during > processing of message. > java.lang.IllegalArgumentException: Wrong FS: > hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, > expected: hdfs://local-cluster-n1.myth.net:8020 > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985) > at > org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262) > at > org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974) > at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) > at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:763) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:285) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > I have a really simple fix. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks
[ https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-15686: Attachment: (was: HADOOP-14015.1.patch) > Partitions on Remote HDFS break encryption-zone checks > -- > > Key: HIVE-15686 > URL: https://issues.apache.org/jira/browse/HIVE-15686 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1 >Reporter: Mithun Radhakrishnan > > This is in relation to HIVE-13243, which fixes encryption-zone checks for > external tables. > Unfortunately, this is still borked for partitions with remote HDFS paths. > The code fails as follows: > {noformat} > 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer > (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during > processing of message. > java.lang.IllegalArgumentException: Wrong FS: > hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, > expected: hdfs://local-cluster-n1.myth.net:8020 > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985) > at > org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262) > at > org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974) > at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) > at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:763) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:285) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > I have a really simple fix. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17213) HoS: file merging doesn't work for union all
[ https://issues.apache.org/jira/browse/HIVE-17213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111656#comment-16111656 ] Hive QA commented on HIVE-17213: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12880072/HIVE-17213.4.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11138 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat6] (batchId=7) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_union_merge] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=236) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6236/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6236/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6236/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12880072 - PreCommit-HIVE-Build > HoS: file merging doesn't work for union all > > > Key: HIVE-17213 > URL: https://issues.apache.org/jira/browse/HIVE-17213 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Chao Sun >Assignee: Chao Sun > Attachments: HIVE-17213.0.patch, HIVE-17213.1.patch, > HIVE-17213.2.patch, HIVE-17213.3.patch, HIVE-17213.4.patch > > > HoS file merging doesn't work properly since it doesn't set linked file sinks > properly which is used to generate move tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16820) TezTask may not shut down correctly before submit
[ https://issues.apache.org/jira/browse/HIVE-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111648#comment-16111648 ] Mithun Radhakrishnan commented on HIVE-16820: - bq. MergeFileTask doesn't appear to implement shutdown at all. :] Ah, but doesn't it need one? It is conceivable that a user might cancel a query between the TezTask and the MergeFileTask, (or simply interrupt an {{ALTER TABLE CONCATENATE}}). In that case, the merge will run through to the end, in spite of cancellation. I wonder if there isn't value in applying the HIVE-12556 + HIVE-16820 treatment for MergeFileTask as well. > TezTask may not shut down correctly before submit > - > > Key: HIVE-16820 > URL: https://issues.apache.org/jira/browse/HIVE-16820 > Project: Hive > Issue Type: Bug >Reporter: Visakh Nair >Assignee: Sergey Shelukhin > Fix For: 3.0.0 > > Attachments: HIVE-16820.01.patch, HIVE-16820.patch > > > The query will run and only fail at the very end when the driver checks its > own shutdown flag. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-14786) Beeline displays binary column data as string instead of byte array
[ https://issues.apache.org/jira/browse/HIVE-14786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111550#comment-16111550 ] Hive QA commented on HIVE-14786: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12880064/HIVE-14786.01.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11139 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=236) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=236) org.apache.hive.beeline.TestBufferedRows.testNormalizeWidths (batchId=177) org.apache.hive.beeline.TestTableOutputFormat.testPrint (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6235/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6235/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6235/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12880064 - PreCommit-HIVE-Build > Beeline displays binary column data as string instead of byte array > --- > > Key: HIVE-14786 > URL: https://issues.apache.org/jira/browse/HIVE-14786 > Project: Hive > Issue Type: Improvement > Components: Beeline >Affects Versions: 1.2.1 >Reporter: Ram Mettu >Assignee: Barna Zsombor Klara >Priority: Minor > Attachments: HIVE-14786.01.patch > > > In Beeline, doing a SELECT binaryColName FROM tableName; results in displays > data as string type (which looks corrupted due to unprintable chars). Instead > Beeline should display binary columns as byte array. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17225) HoS DPP pruning sink ops can target parallel work objects
[ https://issues.apache.org/jira/browse/HIVE-17225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Janaki Lahorani reassigned HIVE-17225: -- Assignee: Janaki Lahorani (was: Sahil Takiar) > HoS DPP pruning sink ops can target parallel work objects > - > > Key: HIVE-17225 > URL: https://issues.apache.org/jira/browse/HIVE-17225 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: 3.0.0 >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > > Setup: > {code:sql} > SET hive.spark.dynamic.partition.pruning=true; > SET hive.strict.checks.cartesian.product=false; > SET hive.auto.convert.join=true; > CREATE TABLE partitioned_table1 (col int) PARTITIONED BY (part_col int); > CREATE TABLE regular_table1 (col int); > CREATE TABLE regular_table2 (col int); > ALTER TABLE partitioned_table1 ADD PARTITION (part_col = 1); > ALTER TABLE partitioned_table1 ADD PARTITION (part_col = 2); > ALTER TABLE partitioned_table1 ADD PARTITION (part_col = 3); > INSERT INTO table regular_table1 VALUES (1), (2), (3), (4), (5), (6); > INSERT INTO table regular_table2 VALUES (1), (2), (3), (4), (5), (6); > INSERT INTO TABLE partitioned_table1 PARTITION (part_col = 1) VALUES (1); > INSERT INTO TABLE partitioned_table1 PARTITION (part_col = 2) VALUES (2); > INSERT INTO TABLE partitioned_table1 PARTITION (part_col = 3) VALUES (3); > SELECT * > FROM partitioned_table1, >regular_table1 rt1, >regular_table2 rt2 > WHERE rt1.col = partitioned_table1.part_col >AND rt2.col = partitioned_table1.part_col; > {code} > Exception: > {code} > 2017-08-01T13:27:47,483 ERROR [b0d354a8-4cdb-4ba9-acec-27d14926aaf4 main] > ql.Driver: FAILED: Execution Error, return code 3 from > org.apache.hadoop.hive.ql.exec.spark.SparkTask. java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.FileNotFoundException: File > file:/Users/stakiar/Documents/idea/apache-hive/itests/qtest-spark/target/tmp/scratchdir/stakiar/b0d354a8-4cdb-4ba9-acec-27d14926aaf4/hive_2017-08-01_13-27-45_553_1088589686371686526-1/-mr-10004/3/5 > does not exist > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:408) > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:498) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:246) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:246) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:285) > at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:82) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:246) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:285) > at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:82) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246) > at scala.Option.getOrElse(Option.scala:121) > at
[jira] [Commented] (HIVE-17208) Repl dump should pass in db/table information to authorization API
[ https://issues.apache.org/jira/browse/HIVE-17208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111523#comment-16111523 ] Thejas M Nair commented on HIVE-17208: -- +1 > Repl dump should pass in db/table information to authorization API > -- > > Key: HIVE-17208 > URL: https://issues.apache.org/jira/browse/HIVE-17208 > Project: Hive > Issue Type: Bug > Components: Authorization >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-17208.1.patch, HIVE-17208.2.patch, > HIVE-17208.3.patch > > > "repl dump" does not provide db/table information. That is necessary for > authorization replication in ranger. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17229) HiveMetastore HMSHandler locks during initialization, even though its static variable threadPool is not null
[ https://issues.apache.org/jira/browse/HIVE-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan reassigned HIVE-17229: --- Assignee: Zac Zhou (was: Mithun Radhakrishnan) > HiveMetastore HMSHandler locks during initialization, even though its static > variable threadPool is not null > > > Key: HIVE-17229 > URL: https://issues.apache.org/jira/browse/HIVE-17229 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Zac Zhou >Assignee: Zac Zhou > Attachments: HIVE-17229.2.patch, HIVE-17229.patch > > > A thread pool is used to accelerate adding partitions operation, since > [HIVE-13901|https://issues.apache.org/jira/browse/HIVE-13901]. > However, HMSHandler needs a lock during initialization every time, even > though its static variable threadPool is not null -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17229) HiveMetastore HMSHandler locks during initialization, even though its static variable threadPool is not null
[ https://issues.apache.org/jira/browse/HIVE-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-17229: Status: Patch Available (was: Open) > HiveMetastore HMSHandler locks during initialization, even though its static > variable threadPool is not null > > > Key: HIVE-17229 > URL: https://issues.apache.org/jira/browse/HIVE-17229 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Zac Zhou >Assignee: Mithun Radhakrishnan > Attachments: HIVE-17229.2.patch, HIVE-17229.patch > > > A thread pool is used to accelerate adding partitions operation, since > [HIVE-13901|https://issues.apache.org/jira/browse/HIVE-13901]. > However, HMSHandler needs a lock during initialization every time, even > though its static variable threadPool is not null -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17229) HiveMetastore HMSHandler locks during initialization, even though its static variable threadPool is not null
[ https://issues.apache.org/jira/browse/HIVE-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-17229: Status: Open (was: Patch Available) > HiveMetastore HMSHandler locks during initialization, even though its static > variable threadPool is not null > > > Key: HIVE-17229 > URL: https://issues.apache.org/jira/browse/HIVE-17229 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Zac Zhou >Assignee: Mithun Radhakrishnan > Attachments: HIVE-17229.2.patch, HIVE-17229.patch > > > A thread pool is used to accelerate adding partitions operation, since > [HIVE-13901|https://issues.apache.org/jira/browse/HIVE-13901]. > However, HMSHandler needs a lock during initialization every time, even > though its static variable threadPool is not null -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17229) HiveMetastore HMSHandler locks during initialization, even though its static variable threadPool is not null
[ https://issues.apache.org/jira/browse/HIVE-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-17229: Attachment: HIVE-17229.2.patch Attempting to fix the patch, to have the tests run. > HiveMetastore HMSHandler locks during initialization, even though its static > variable threadPool is not null > > > Key: HIVE-17229 > URL: https://issues.apache.org/jira/browse/HIVE-17229 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Zac Zhou >Assignee: Mithun Radhakrishnan > Attachments: HIVE-17229.2.patch, HIVE-17229.patch > > > A thread pool is used to accelerate adding partitions operation, since > [HIVE-13901|https://issues.apache.org/jira/browse/HIVE-13901]. > However, HMSHandler needs a lock during initialization every time, even > though its static variable threadPool is not null -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17229) HiveMetastore HMSHandler locks during initialization, even though its static variable threadPool is not null
[ https://issues.apache.org/jira/browse/HIVE-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan reassigned HIVE-17229: --- Assignee: Mithun Radhakrishnan (was: Zac Zhou) > HiveMetastore HMSHandler locks during initialization, even though its static > variable threadPool is not null > > > Key: HIVE-17229 > URL: https://issues.apache.org/jira/browse/HIVE-17229 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Zac Zhou >Assignee: Mithun Radhakrishnan > Attachments: HIVE-17229.patch > > > A thread pool is used to accelerate adding partitions operation, since > [HIVE-13901|https://issues.apache.org/jira/browse/HIVE-13901]. > However, HMSHandler needs a lock during initialization every time, even > though its static variable threadPool is not null -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17208) Repl dump should pass in db/table information to authorization API
[ https://issues.apache.org/jira/browse/HIVE-17208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-17208: -- Attachment: HIVE-17208.3.patch repl_dump_requires_admin/repl_load_requires_admin test failures are related. Attach a new patch. > Repl dump should pass in db/table information to authorization API > -- > > Key: HIVE-17208 > URL: https://issues.apache.org/jira/browse/HIVE-17208 > Project: Hive > Issue Type: Bug > Components: Authorization >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-17208.1.patch, HIVE-17208.2.patch, > HIVE-17208.3.patch > > > "repl dump" does not provide db/table information. That is necessary for > authorization replication in ranger. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration
[ https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated HIVE-17160: -- Attachment: HIVE-17160.2.patch > Adding kerberos Authorization to the Druid hive integration > --- > > Key: HIVE-17160 > URL: https://issues.apache.org/jira/browse/HIVE-17160 > Project: Hive > Issue Type: New Feature > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra > Attachments: HIVE-17160.2.patch, HIVE-17160.patch > > > This goal of this feature is to allow hive querying a secured druid cluster > using kerberos credentials. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().
[ https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-17188: Resolution: Fixed Fix Version/s: 2.2.0 3.0.0 Status: Resolved (was: Patch Available) Committed to {{master}}, {{branch-2}}, and {{branch-2.2}}, as advised by [~owen.omalley]. Thank you, [~cdrome], for the patch. Thanks, [~vihangk1], for the review. > ObjectStore runs out of memory for large batches of addPartitions(). > > > Key: HIVE-17188 > URL: https://issues.apache.org/jira/browse/HIVE-17188 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.2.0 >Reporter: Mithun Radhakrishnan >Assignee: Chris Drome > Fix For: 3.0.0, 2.2.0 > > Attachments: HIVE-17188.1.patch > > > For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} > runs out of memory. Flushing the {{PersistenceManager}} alleviates the > problem. > Note: The problem being addressed here isn't so much with the size of the > hundreds of Partition objects, but the cruft that builds with the > PersistenceManager, in the JDO layer, as confirmed through memory-profiling. > (Raising this on behalf of [~cdrome] and [~thiruvel].) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17228) Bump tez version to 0.9.0
[ https://issues.apache.org/jira/browse/HIVE-17228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111430#comment-16111430 ] Gunther Hagleitner commented on HIVE-17228: --- +1 > Bump tez version to 0.9.0 > - > > Key: HIVE-17228 > URL: https://issues.apache.org/jira/browse/HIVE-17228 > Project: Hive > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: HIVE-17228.1.patch, HIVE-17228.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16820) TezTask may not shut down correctly before submit
[ https://issues.apache.org/jira/browse/HIVE-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111429#comment-16111429 ] Sergey Shelukhin commented on HIVE-16820: - MergeFileTask doesn't appear to implement shutdown at all. So, execute is safe from interference :) > TezTask may not shut down correctly before submit > - > > Key: HIVE-16820 > URL: https://issues.apache.org/jira/browse/HIVE-16820 > Project: Hive > Issue Type: Bug >Reporter: Visakh Nair >Assignee: Sergey Shelukhin > Fix For: 3.0.0 > > Attachments: HIVE-16820.01.patch, HIVE-16820.patch > > > The query will run and only fail at the very end when the driver checks its > own shutdown flag. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17228) Bump tez version to 0.9.0
[ https://issues.apache.org/jira/browse/HIVE-17228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111428#comment-16111428 ] Zhiyuan Yang commented on HIVE-17228: - Test failures are unrelated. Please help review, [~hagleitn], [~sseth] > Bump tez version to 0.9.0 > - > > Key: HIVE-17228 > URL: https://issues.apache.org/jira/browse/HIVE-17228 > Project: Hive > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: HIVE-17228.1.patch, HIVE-17228.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)