[jira] [Created] (HIVE-10831) HiveQL Parse error in 1.1.1
Zoltán Szatmári created HIVE-10831: -- Summary: HiveQL Parse error in 1.1.1 Key: HIVE-10831 URL: https://issues.apache.org/jira/browse/HIVE-10831 Project: Hive Issue Type: Bug Components: Hive, HiveServer2 Affects Versions: 1.1.1 Environment: CentOS 6.4, Apache Hadoop 2.7 and Hive 1.1.1 based on the following binaries: - https://archive.apache.org/dist/hive/hive-1.1.1/apache-hive-1.1.1-bin.tar.gz - http://www.eu.apache.org/dist/hadoop/common/hadoop-2.7.0/hadoop-2.7.0.tar.gz Reporter: Zoltán Szatmári The create table ... stored as textfile query fails with AssertionError during parsing the query text. Without stored as something it works. These query is ok in 1.0.0, 1.0.1, 1.1.0 and 1.2.0 (with the exactly same configuration), but fails in 1.1.1. We tried using both Hive CLI and also beeline. Almost the same stacktrace is shown in Hive CLI or in the HiveServer log (when using beeline). The interesting is that the Hive CLI crashes. hive CREATE TABLE r3 (a1 DOUBLE , a2 DOUBLE) stored as textfile; Exception in thread main java.lang.AssertionError: Unknown token: [@-1,0:0='TOK_FILEFORMAT_GENERIC',679,0:-1] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:10895) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10103) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10147) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) bash-4.1# -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 34447: HIVE-10761 : Create codahale-based metrics system for Hive
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34447/ --- (Updated May 27, 2015, 6:25 p.m.) Review request for hive, Chao Sun, Jimmy Xiang, and Xuefu Zhang. Changes --- Rebase the patch. Bugs: HIVE-10761 https://issues.apache.org/jira/browse/HIVE-10761 Repository: hive-git Description --- See JIRA for the motivation. Summary: There is an existing metric system that uses some custom model and hooked up to JMX reporting, codahale-based metrics system will be desirable for standard model and reporting. This adds a codahale-based metrics system to HiveServer2 and HiveMetastore. Metrics implementation is now internally pluggable, and the existing Metrics system can be re-enabled by configuration if desired for backward-compatibility. Following metrics are supported by Metrics system: 1. JVMPauseMonitor (used to call Hadoop's internal implementation, now forked off to integrate with Metrics system) 2. HMS API calls 3. Standard JVM metrics (only for new implementation, as its free with codahale). The following metrics reporting are supported by new system (configuration exposed) 1. JMX 2. CONSOLE 3. JSON_FILE (periodic file of metrics that gets overwritten). A goal is to add a webserver that exposes the JSON metrics, but this will defer to a later implementation. Diffs (updated) - common/pom.xml a615c1e common/src/java/org/apache/hadoop/hive/common/JvmPauseMonitor.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/metrics/Metrics.java 01c9d1d common/src/java/org/apache/hadoop/hive/common/metrics/MetricsLegacy.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/metrics/common/Metrics.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/metrics/common/MetricsFactory.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/Metrics.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/MetricsReporting.java PRE-CREATION common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 49b8f97 common/src/test/org/apache/hadoop/hive/common/metrics/TestMetrics.java e85d3f8 common/src/test/org/apache/hadoop/hive/common/metrics/TestMetricsLegacy.java PRE-CREATION common/src/test/org/apache/hadoop/hive/common/metrics/metrics2/TestMetrics.java PRE-CREATION itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestMetaStoreMetrics.java PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java d81c856 pom.xml b21d894 service/src/java/org/apache/hive/service/server/HiveServer2.java 58e8e49 shims/0.20S/src/main/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java 6d8166c shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java 19324b8 shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 5a6bc44 Diff: https://reviews.apache.org/r/34447/diff/ Testing --- New unit test added. Manually tested. Thanks, Szehon Ho
[jira] [Created] (HIVE-10834) Support First_value()/last_value() over x preceding and y preceding windowing
Aihua Xu created HIVE-10834: --- Summary: Support First_value()/last_value() over x preceding and y preceding windowing Key: HIVE-10834 URL: https://issues.apache.org/jira/browse/HIVE-10834 Project: Hive Issue Type: Sub-task Components: PTF-Windowing Reporter: Aihua Xu Assignee: Aihua Xu Currently the following query {noformat} select ts, f, first_value(f) over (partition by ts order by t rows between 2 preceding and 1 preceding) from over10k limit 100; {noformat} throws exception: {noformat} java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:2013-03-01 09:11:58.703071,reducesinkkey1:-3},value:{_col3:0.83}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:449) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:2013-03-01 09:11:58.703071,reducesinkkey1:-3},value:{_col3:0.83}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) ... 3 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Internal Error: cannot generate all output rows for a Partition at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.finishPartition(WindowingTableFunction.java:519) at org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:337) at org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:114) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: normalizing spark tarball dependency in Hive build
It’s possible to publish binaries to central. For example, your kit redistributable is published this way: http://search.maven.org/#browse|928812221 On 15/5/26, 21:35, Xuefu Zhang xzh...@cloudera.com wrote: We thought of that, but unfortunate there this is a binary which isn't published anywhere in public maven repositories. That's why we hosted it at cloudfront. I think this is a general problem for any binaries required by tests. We are open to suggestions though. Thanks, Xuefu On Tue, May 26, 2015 at 1:35 PM, Sergey Shelukhin ser...@hortonworks.com wrote: Hi. I was trying to build Hive on a slow connection (or I could have no connection for that matter), and pulling http://d3jw87u4immizc.cloudfront.net/spark-tarball/spark-1.3.0-bin-hadoop 2 -without-hive.tgz” was taking forever (I ctrl-c-ed it eventually). On a good note it did appear to respect “-o” on rebuild attempt (either that, or whatever was remaining from the canceled build sufficed for the mvn install -o … build that followed). Is it possible to get this dependency via some more conventional means like maven?
Review Request 34726: HIVE-10533
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34726/ --- Review request for hive and Ashutosh Chauhan. Bugs: HIVE-10533 https://issues.apache.org/jira/browse/HIVE-10533 Repository: hive-git Description --- CBO (Calcite Return Path): Join to MultiJoin support for outer joins Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveCalciteUtil.java f4e7c45242cd7e714148da281a08fbf90552d720 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveMultiJoin.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveInsertExchange4JoinRule.java 30db8fd75a716442b1ae3c3e9c2e42b36d4fea9f ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinToMultiJoinRule.java 532d7d3b56377946f6a9ad883d7b7dbf1325a8c7 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveOpConverter.java efc254297df51756e555fb75d015a49b0ae11a71 Diff: https://reviews.apache.org/r/34726/diff/ Testing --- Thanks, Jesús Camacho Rodríguez
[jira] [Created] (HIVE-10835) Concurrency issues in JDBC driver
Chaoyu Tang created HIVE-10835: -- Summary: Concurrency issues in JDBC driver Key: HIVE-10835 URL: https://issues.apache.org/jira/browse/HIVE-10835 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 1.2.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Though JDBC specification specifies that Each Connection object can create multiple Statement objects that may be used concurrently by the program, but that does not work in current Hive JDBC driver. In addition, there also exist race conditions between DatabaseMetaData, Statement and ResultSet as long as they make RPC calls to HS2 using same Thrift transport, which happens within a connection. So we need a connection level lock to serialize all these RPC calls in a connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34666/#review85230 --- This a big patch, for a big feature. It's hard to review offline. Here I offered about things that are obvious. For better understanding, I think an in-person review would be more effective. ql/if/queryplan.thrift https://reviews.apache.org/r/34666/#comment136752 I'm not sure if it matters, but it's probably better if we add it as the last. ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java https://reviews.apache.org/r/34666/#comment136753 Did you make any changes in this file? If not, let's leave it as it is. ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java https://reviews.apache.org/r/34666/#comment136942 File descriptor needs to be closed in final block. In addition, closing in is not sufficient, as in might be null while fs.open(fstatus.getPath() returns not null. ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java https://reviews.apache.org/r/34666/#comment136943 Any chance that an op might be visited multiple times? ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java https://reviews.apache.org/r/34666/#comment136946 numThread could be = 0? ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java https://reviews.apache.org/r/34666/#comment136948 what's this change about? ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out https://reviews.apache.org/r/34666/#comment136976 why the stats are gone? - Xuefu Zhang On May 26, 2015, 4:28 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34666/ --- (Updated May 26, 2015, 4:28 p.m.) Review request for hive, chengxiang li and Xuefu Zhang. Bugs: HIVE-9152 https://issues.apache.org/jira/browse/HIVE-9152 Repository: hive-git Description --- Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc itests/src/test/resources/testconfiguration.properties 2a5f7e3 metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 4cc54e8 ql/if/queryplan.thrift c8dfa35 ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java e18f935 ql/src/gen/thrift/gen-php/Types.php 7121ed4 ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java 8e56263 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70
Review Request 34727: HIVE-10835: Concurrency issues in JDBC driver
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34727/ --- Review request for hive, Szehon Ho, Thejas Nair, and Xuefu Zhang. Bugs: HIVE-10835 https://issues.apache.org/jira/browse/HIVE-10835 Repository: hive-git Description --- There exist race conditions between DatabaseMetaData, Statement and ResultSet when they make RPC calls to HS2 using same Thrift transport, which happens within same connection. The patch is to have a connection level lock to serialize the RPC calls within a single connection. Diffs - jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java 1b2891b jdbc/src/java/org/apache/hive/jdbc/HiveDatabaseMetaData.java 13e42b5 jdbc/src/java/org/apache/hive/jdbc/HivePreparedStatement.java 8a0671f jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java e93795a jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 6b3d05c Diff: https://reviews.apache.org/r/34727/diff/ Testing --- Some multi-thread tests. Thanks, Chaoyu Tang
Hive-0.14 - Build # 967 - Failure
Changes for Build #967 No tests ran. The Apache Jenkins build system has built Hive-0.14 (build #967) Status: Failure Check console output at https://builds.apache.org/job/Hive-0.14/967/ to view the results.
Re: Review Request 34696: HIVE-686 add UDF substring_index
On May 27, 2015, 4:42 a.m., Swarnim Kulkarni wrote: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSubstringIndex.java, line 45 https://reviews.apache.org/r/34696/diff/1/?file=972489#file972489line45 Worth mentinoning in your example what the expected output would look like? Alexander Pivovarov wrote: Not sure I got the issue... --- desc output hive desc function extended substring_index; OK ... Example: SELECT substring_index('www.apache.org', '.', 2); 'www.apache' -- actual select hive SELECT substring_index('www.apache.org', '.', 2); OK www.apache Swarnim Kulkarni wrote: My point was just that why not also include a sample result what the users could expect to see after this command is executed. Might improve the readability a bit. it's included. The result is 'www.apache' - right adter \n symbol - Alexander --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34696/#review85318 --- On May 27, 2015, 3:35 a.m., Alexander Pivovarov wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34696/ --- (Updated May 27, 2015, 3:35 a.m.) Review request for hive, Hao Cheng, Jason Dere, namit jain, and Thejas Nair. Bugs: HIVE-686 https://issues.apache.org/jira/browse/HIVE-686 Repository: hive-git Description --- HIVE-686 add UDF substring_index Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 94a3b1787e2b3571eb7a8102c28f7334ae3fa829 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSubstringIndex.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSubstringIndex.java PRE-CREATION ql/src/test/queries/clientpositive/udf_substring_index.q PRE-CREATION ql/src/test/results/clientpositive/show_functions.q.out 16820ca887320da13a42bebe0876f29eec373c8f ql/src/test/results/clientpositive/udf_substring_index.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34696/diff/ Testing --- Thanks, Alexander Pivovarov
Review Request 34713: Invalidate basic stats for insert queries if autogather=false
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34713/ --- Review request for hive and Gopal V. Bugs: HIVE-10807 https://issues.apache.org/jira/browse/HIVE-10807 Repository: hive-git Description --- Invalidate basic stats for insert queries if autogather=false Diffs - ql/src/java/org/apache/hadoop/hive/ql/QueryProperties.java e8f7fba ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java 2a8167a ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java e5b9c2b ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java acd9bf5 ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java 14a7e9c ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 7f355e5 ql/src/test/queries/clientpositive/insert_into1.q f19506a ql/src/test/results/clientnegative/stats_partialscan_autogether.q.out 321ebe5 ql/src/test/results/clientpositive/auto_join_filters.q.out a6720d9 ql/src/test/results/clientpositive/auto_join_nulls.q.out 4416f3e ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out 5114038 ql/src/test/results/clientpositive/auto_sortmerge_join_11.q.out e6e7ef3 ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out b2e782f ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out 210f1ab ql/src/test/results/clientpositive/auto_sortmerge_join_3.q.out a307b13 ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out f4ceee7 ql/src/test/results/clientpositive/auto_sortmerge_join_5.q.out 3c2951a ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out e1f3888 ql/src/test/results/clientpositive/auto_sortmerge_join_8.q.out 38ecdbe ql/src/test/results/clientpositive/bucket_map_join_1.q.out 42e6a3f ql/src/test/results/clientpositive/bucket_map_join_2.q.out af73309 ql/src/test/results/clientpositive/bucket_map_join_spark1.q.out 870ecdd ql/src/test/results/clientpositive/bucket_map_join_spark2.q.out 33f5c46 ql/src/test/results/clientpositive/bucket_map_join_spark3.q.out 067d1ff ql/src/test/results/clientpositive/bucketcontext_1.q.out 77bfcf9 ql/src/test/results/clientpositive/bucketcontext_2.q.out a9db13d ql/src/test/results/clientpositive/bucketcontext_3.q.out 9ba3e0c ql/src/test/results/clientpositive/bucketcontext_4.q.out a2b37a8 ql/src/test/results/clientpositive/bucketcontext_5.q.out 3ee1f0e ql/src/test/results/clientpositive/bucketcontext_6.q.out d2304fa ql/src/test/results/clientpositive/bucketcontext_7.q.out 1a105ed ql/src/test/results/clientpositive/bucketcontext_8.q.out 138e415 ql/src/test/results/clientpositive/bucketmapjoin1.q.out 471ff73 ql/src/test/results/clientpositive/bucketmapjoin10.q.out b0e849d ql/src/test/results/clientpositive/bucketmapjoin11.q.out 4263cab ql/src/test/results/clientpositive/bucketmapjoin12.q.out bcd7394 ql/src/test/results/clientpositive/bucketmapjoin2.q.out a8d9e9d ql/src/test/results/clientpositive/bucketmapjoin3.q.out c759f05 ql/src/test/results/clientpositive/bucketmapjoin4.q.out f61500c ql/src/test/results/clientpositive/bucketmapjoin5.q.out 0cb2825 ql/src/test/results/clientpositive/bucketmapjoin7.q.out 667a9db ql/src/test/results/clientpositive/bucketmapjoin8.q.out 252b377 ql/src/test/results/clientpositive/bucketmapjoin9.q.out 5e28dc3 ql/src/test/results/clientpositive/bucketmapjoin_negative.q.out 6ae127d ql/src/test/results/clientpositive/bucketmapjoin_negative2.q.out 4c9f54a ql/src/test/results/clientpositive/bucketmapjoin_negative3.q.out 9a0bfc4 ql/src/test/results/clientpositive/columnstats_partlvl.q.out e0c4cfe ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 19283bb ql/src/test/results/clientpositive/display_colstats_tbllvl.q.out 7c91248 ql/src/test/results/clientpositive/encrypted/encryption_insert_partition_dynamic.q.out 939e206 ql/src/test/results/clientpositive/encrypted/encryption_insert_partition_static.q.out fd7932e ql/src/test/results/clientpositive/encrypted/encryption_join_unencrypted_tbl.q.out 9b6f750 ql/src/test/results/clientpositive/groupby_sort_6.q.out 0169430 ql/src/test/results/clientpositive/insert_into1.q.out 9e5f3bb ql/src/test/results/clientpositive/join_filters.q.out 4f112bd ql/src/test/results/clientpositive/join_nulls.q.out 46e0170 ql/src/test/results/clientpositive/list_bucket_dml_8.q.java1.7.out a9522e0 ql/src/test/results/clientpositive/parquet_serde.q.out e753180 ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_2.q.out 3ee2e0f ql/src/test/results/clientpositive/skewjoin_union_remove_1.q.out 1f21877 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_1.q.out 09d2692 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_12.q.out a70b161
[jira] [Created] (HIVE-10832) ColumnStatsTask failure when processing large amount of partitions
Chao Sun created HIVE-10832: --- Summary: ColumnStatsTask failure when processing large amount of partitions Key: HIVE-10832 URL: https://issues.apache.org/jira/browse/HIVE-10832 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 1.1.0 Reporter: Chao Sun We are trying to populate column stats for a TPC-DS 4TB dataset, and, every time we try to do: {code} analyze table catalog_sales partition(cs_sold_date_sk) compute statistics for columns; {code} it ends up with the failure: {noformat} 2015-05-26 12:14:53,128 WARN org.apache.hadoop.hive.metastore.RetryingMetaStoreClient: MetaStoreClient lost connection. Attempting to reconnect. org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_aggr_stats_for(ThriftHiveMetastore.java:2974) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_aggr_stats_for(ThriftHiveMetastore.java:2961) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.setPartitionColumnStatistics(HiveMetaStoreClient.java:1376) at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:91) at com.sun.proxy.$Proxy10.setPartitionColumnStatistics(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.setPartitionColumnStatistics(Hive.java:2921) at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistPartitionStats(ColumnStatsTask.java:349) at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(Write failed: Broken pipe ~ $ at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1181) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1042) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:145) at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:70) at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:197) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:209) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ... 35 more {noformat} We didn't see this issue for smaller amount of partitions, and seems like ColumnStatsTask has a scalability issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Caching metastore objects
Great, that is perfect (I think :)). The only thing it appears to be missing is the ability to change multiple listeners together, but that would be a relatively simple patch. Thanks for pointing me to it! From: Ashutosh Chauhan hashut...@apache.org To: dev@hive.apache.org dev@hive.apache.org Date: 05/27/2015 01:25 AM Subject:Re: Caching metastore objects Siva / Scott, Such a framework exists in some form : https://issues.apache.org/jira/browse/HIVE-2038 To make it even more generic there was a proposal https://issues.apache.org/jira/browse/HIVE-2147 But there was a resistance from a community for it. May be now community is ready for it : ) Ashutosh On Tue, May 26, 2015 at 10:12 PM, Sivaramakrishnan Narayanan tarb...@gmail.com wrote: Thanks for the replies. @Ashutosh - thanks for the pointer! Yes I was running 0.11 metastore. Let me try with 0.13 metastore! Maybe my woes will be gone. If they don't then I'll continue working along these lines. @Alan - agreed. Caching MTables seems like a better approach if 0.13 metastore perf is not as good as I'd like. @Scott - a pluggable hook for metastore calls would be super useful. If you want to generate events for client-side actions, I suppose you could just implement a dynamic proxy class over the metastore client class which does whatever you need it to. Similar technique could work in the server side - I believe there is already a RetryingMetaStoreClient proxy class in place. On Wed, May 27, 2015 at 7:32 AM, Ashutosh Chauhan hashut...@apache.org wrote: Are you running pre-0.12 or with hive.metastore.try.direct.sql = false; Work done on https://issues.apache.org/jira/browse/HIVE-4051 should alleviate some of your problems. On Mon, May 25, 2015 at 8:19 PM, Sivaramakrishnan Narayanan tarb...@gmail.com wrote: Apologies if this has been discussed in the past - my searches did not pull up any relevant threads. If there are better solutions available out of the box, please let me know! Problem statement -- We have a setup where a single metastoredb is used by Hive, Presto and SparkSQL. In addition, there are 1000s of hive queries submitted in batch form from multiple machines. Oftentimes, the metastoredb ends up being remote (in a different region in AWS etc) and round-trip latency is high. We've seen single thrift calls getting translated into lots of small SQL calls by datanucleus and the roundtrip latency ends up killing performance. Furthermore, any of these systems may create / modify a hive table and this should be reflected in the other system. Example, I may create a table in hive and query it using Presto or vice versa. In our setup, there may be multiple thrift metastore servers pointing to the same metastore db. Investigation --- Basically, we've been looking at caching to solve this problem (will come to invalidation in a bit). I looked briefly at DN's support for caching - these two parameters seem to be switched off by default. METASTORE_CACHE_LEVEL2(datanucleus.cache.level2, false), METASTORE_CACHE_LEVEL2_TYPE(datanucleus.cache.level2.type, none), Furthermore, my reading of http://www.datanucleus.org/products/datanucleus/jdo/cache.html suggests that there is no sophistication in invalidation - seems like only time-based invalidation is supported and it can't work across multiple PMFs (therefore, multiple thrift metastore servers) Solution Outline --- - Every table / partition will have an additional property called 'version' - Any call that modifies table or partition will bump up version of the table / partition - Guava based cache of thrift objects that come from metastore calls - We fire a single SQL matching versions before returning from cache - It is conceivable to have a mode wherein invalidation based on version happens in a background thread (for higher performance, lower fidelity) - Not proposing any locking (not shooting for world peace here :) ) - We could extend HiveMetaStore class or create a new server altogether Is this something that would be interesting to the community? Is this problem already solved and should I spend my time watching GoT instead? Thanks Siva
Re: [VOTE] Stable releases from branch-1 and experimental releases from master
+1 for all the reasons outlined. On Tue, May 26, 2015 at 6:13 PM, Thejas Nair thejas.n...@gmail.com wrote: +1 - This is great for users who want to take longer to upgrade from hadoop-1 and care mainly for bug fixes and incremental features, rather than radical new features. - The ability to release initial 2.x releases marked as alpha/beta also helps to get users to try it out, and also lets them choose what is right for them. - This also lets developers focus on major new features without the burden of maintaining hadoop-1 compatibility. On Tue, May 26, 2015 at 11:41 AM, Alan Gates alanfga...@gmail.com wrote: We have discussed this for several weeks now. Some concerns have been raised which I have tried to address. I think it is time to vote on it as our release plan. To be specific, I propose: Hive makes a branch-1 from the current master. This would be used for 1.3 and future 1.x releases. This branch would not deprecate existing functionality. Any new features in this branch would also need to be put on master. An upgrade path for users will be maintained from one 1.x release to the next, as well as from the latest 1.x release to the latest 2.x release. Going forward releases numbered 2.x will be made from master. The purpose of these releases will be to enable users to get access to new features being developed in Hive and allow developers to get feedback. It is expected that for a while these releases will not be production ready and will be clearly so labeled. Some legacy features, such as Hadoop 1 and MapReduce, will no longer be supported in the master. Any critical bug fixes (security, incorrect results, crashes) fixed in master will also be ported to branch-1 for at least a year. This time period may be extended in the future based on the stability and adoption of 2.x releases. Based on Hive's bylaws this release plan vote will be open for 3 days and all active committers have binding votes. Here's my +1. Alan. -- Nothing better than when appreciated for hard work. -Mark
Re: Review Request 34696: HIVE-686 add UDF substring_index
On May 27, 2015, 4:42 a.m., Swarnim Kulkarni wrote: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSubstringIndex.java, line 45 https://reviews.apache.org/r/34696/diff/1/?file=972489#file972489line45 Worth mentinoning in your example what the expected output would look like? Alexander Pivovarov wrote: Not sure I got the issue... --- desc output hive desc function extended substring_index; OK ... Example: SELECT substring_index('www.apache.org', '.', 2); 'www.apache' -- actual select hive SELECT substring_index('www.apache.org', '.', 2); OK www.apache My point was just that why not also include a sample result what the users could expect to see after this command is executed. Might improve the readability a bit. - Swarnim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34696/#review85318 --- On May 27, 2015, 3:35 a.m., Alexander Pivovarov wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34696/ --- (Updated May 27, 2015, 3:35 a.m.) Review request for hive, Hao Cheng, Jason Dere, namit jain, and Thejas Nair. Bugs: HIVE-686 https://issues.apache.org/jira/browse/HIVE-686 Repository: hive-git Description --- HIVE-686 add UDF substring_index Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 94a3b1787e2b3571eb7a8102c28f7334ae3fa829 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSubstringIndex.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSubstringIndex.java PRE-CREATION ql/src/test/queries/clientpositive/udf_substring_index.q PRE-CREATION ql/src/test/results/clientpositive/show_functions.q.out 16820ca887320da13a42bebe0876f29eec373c8f ql/src/test/results/clientpositive/udf_substring_index.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34696/diff/ Testing --- Thanks, Alexander Pivovarov
[jira] [Created] (HIVE-10833) RowResolver looks mangled with CBO
Eugene Koifman created HIVE-10833: - Summary: RowResolver looks mangled with CBO Key: HIVE-10833 URL: https://issues.apache.org/jira/browse/HIVE-10833 Project: Hive Issue Type: Bug Affects Versions: 1.3.0 Reporter: Eugene Koifman Assignee: Laljo John Pullokkaran While working on HIVE-10828 I noticed that internal state of RowResolver looks odd when CBO is enabled. Consider the script below. {noformat} set hive.enforce.bucketing=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.cbo.enable=false; drop table if exists acid_partitioned; create table acid_partitioned (a int, c string) partitioned by (p int) clustered by (a) into 1 buckets; insert into acid_partitioned partition (p) (a,p) values(1,1); {noformat} With CBO on, if you put a break point in {noformat}SemanticAnalyzer.genSelectPlan(String dest, ASTNode selExprList, QB qb, Operator? input, Operator? inputForSelectStar, boolean outerLV){noformat} at line _selectStar = selectStar exprList.getChildCount() == posn + 1;_ (currently 3865) and examine _out_rwsch.rslvMap_ variable looks like {noformat}{null={values__tmp__table__1.tmp_values_col1=_col0: string, values__tmp__table__1.tmp_values_col2=_col1: string}}{noformat} with CBO disabled, the same _out_rwsch.rslvMap_ looks like {noformat}{values__tmp__table__1={tmp_values_col1=_col0: string, tmp_values_col2=_col1: string}}{noformat} The _out_rwsch.invRslvMap_ also differs in the same way. It seems that the version you get with CBO off is the correct one since _insert into acid_partitioned partition (p) (a,p) values(1,1)_ is rewritten to _insert into acid_partitioned partition (p) (a,p) select * from values__tmp__table__1_ CC [~ashutoshc] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 34716: HIVE-10826 Support min()/max() functions over x preceding and y preceding windowing
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34716/ --- Review request for hive. Repository: hive-git Description --- HIVE-10826 Support min()/max() functions over x preceding and y preceding windowing Diffs - ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMax.java 6b7808aa6e1104a0acff3bc0fe89fc92bb200803 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMin.java d931d52d0235fcd19571d317715f8a6663aeb49c ql/src/test/queries/clientpositive/windowing_windowspec2.q d85cea987462e4c15129334aa4aed9263ef8cc01 ql/src/test/results/clientpositive/windowing_windowspec2.q.out bf916398b2d7b0198713623d23d27c2a76551bcb Diff: https://reviews.apache.org/r/34716/diff/ Testing --- Thanks, Aihua Xu
[jira] [Created] (HIVE-10836) Beeline OutOfMemoryError due to large history
Patrick McAnneny created HIVE-10836: --- Summary: Beeline OutOfMemoryError due to large history Key: HIVE-10836 URL: https://issues.apache.org/jira/browse/HIVE-10836 Project: Hive Issue Type: Bug Environment: Hive 1.1.0 on RHEL with Cloudera (cdh5.4.0) Reporter: Patrick McAnneny Attempting to run beeline via commandline fails with the error below due to large commands in the ~/.beeline/history file. Not sure if the problem also exists with many lines in the history or just big lines. I had a few lines in my history file with over 1 million characters each. Deleting said lines from the history file resolved the issue. Beeline version 1.1.0-cdh5.4.0 by Apache Hive Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2367) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535) at java.lang.StringBuffer.append(StringBuffer.java:322) at java.io.BufferedReader.readLine(BufferedReader.java:363) at java.io.BufferedReader.readLine(BufferedReader.java:382) at jline.console.history.FileHistory.load(FileHistory.java:69) at jline.console.history.FileHistory.load(FileHistory.java:61) at org.apache.hive.beeline.BeeLine.getConsoleReader(BeeLine.java:869) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:766) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:480) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:463) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 34576: Bucketized Table feature fails in some cases
On May 24, 2015, 1:50 a.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java, line 226 https://reviews.apache.org/r/34576/diff/2/?file=971006#file971006line226 Warning is proper, but I think the words should say might because the source data might be already bucketed and matches the target, in which case, there is no problem. Load command doesn't excersise bucketizing. IMO will not is correct. - John --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34576/#review85081 --- On May 23, 2015, 5:47 p.m., pengcheng xiong wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34576/ --- (Updated May 23, 2015, 5:47 p.m.) Review request for hive and John Pullokkaran. Repository: hive-git Description --- Bucketized Table feature fails in some cases. if src destination is bucketed on same key, and if actual data in the src is not bucketed (because data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be bucketed while writing to destination. Example -- CREATE TABLE P1(key STRING, val STRING) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE P1; – perform an insert to make sure there are 2 files INSERT OVERWRITE TABLE P1 select key, val from P1; -- This is not a regression. This has never worked. This got only discovered due to Hadoop2 changes. In Hadoop1, in local mode, number of reducers will always be 1, regardless of what is requested by app. Hadoop2 now honors the number of reducer setting in local mode (by spawning threads). Long term solution seems to be to prevent load data for bucketed table. Diffs - ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java e53933e ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 1a9b42b ql/src/test/results/clientnegative/bucket_mapjoin_mismatch1.q.out 623c2e8 ql/src/test/results/clientnegative/bucket_mapjoin_wrong_table_metadata_1.q.out f4522d2 ql/src/test/results/clientnegative/bucket_mapjoin_wrong_table_metadata_2.q.out 9aa9b5d ql/src/test/results/clientnegative/exim_11_nonpart_noncompat_sorting.q.out 9220c8e ql/src/test/results/clientpositive/auto_join32.q.out bfc8be8 ql/src/test/results/clientpositive/auto_join_filters.q.out a6720d9 ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out 383defd ql/src/test/results/clientpositive/auto_sortmerge_join_11.q.out e6e7ef3 ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out e9fb705 ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out c089419 ql/src/test/results/clientpositive/auto_sortmerge_join_3.q.out 6e443fa ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out feaea04 ql/src/test/results/clientpositive/auto_sortmerge_join_5.q.out f64ecf0 ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out e89f548 ql/src/test/results/clientpositive/auto_sortmerge_join_8.q.out 44c037f ql/src/test/results/clientpositive/bucket_map_join_1.q.out d778203 ql/src/test/results/clientpositive/bucket_map_join_2.q.out aef77aa ql/src/test/results/clientpositive/bucket_map_join_spark1.q.out 870ecdd ql/src/test/results/clientpositive/bucket_map_join_spark2.q.out 33f5c46 ql/src/test/results/clientpositive/bucket_map_join_spark3.q.out 067d1ff ql/src/test/results/clientpositive/bucketcontext_1.q.out 77bfcf9 ql/src/test/results/clientpositive/bucketcontext_2.q.out a9db13d ql/src/test/results/clientpositive/bucketcontext_3.q.out 9ba3e0c ql/src/test/results/clientpositive/bucketcontext_4.q.out a2b37a8 ql/src/test/results/clientpositive/bucketcontext_5.q.out 3ee1f0e ql/src/test/results/clientpositive/bucketcontext_6.q.out d2304fa ql/src/test/results/clientpositive/bucketcontext_7.q.out 1a105ed ql/src/test/results/clientpositive/bucketcontext_8.q.out 138e415 ql/src/test/results/clientpositive/bucketizedhiveinputformat_auto.q.out 215efdd ql/src/test/results/clientpositive/bucketmapjoin1.q.out 72f2a07 ql/src/test/results/clientpositive/bucketmapjoin10.q.out b0e849d ql/src/test/results/clientpositive/bucketmapjoin11.q.out 4263cab ql/src/test/results/clientpositive/bucketmapjoin12.q.out bcd7394 ql/src/test/results/clientpositive/bucketmapjoin2.q.out a8d9e9d ql/src/test/results/clientpositive/bucketmapjoin3.q.out c759f05
Re: Review Request 34696: HIVE-686 add UDF substring_index
On May 27, 2015, 4:42 a.m., Swarnim Kulkarni wrote: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSubstringIndex.java, line 45 https://reviews.apache.org/r/34696/diff/1/?file=972489#file972489line45 Worth mentinoning in your example what the expected output would look like? Alexander Pivovarov wrote: Not sure I got the issue... --- desc output hive desc function extended substring_index; OK ... Example: SELECT substring_index('www.apache.org', '.', 2); 'www.apache' -- actual select hive SELECT substring_index('www.apache.org', '.', 2); OK www.apache Swarnim Kulkarni wrote: My point was just that why not also include a sample result what the users could expect to see after this command is executed. Might improve the readability a bit. Alexander Pivovarov wrote: it's included. The result is 'www.apache' - right adter \n symbol Ah ok. Sorry missed that. - Swarnim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34696/#review85318 --- On May 27, 2015, 3:35 a.m., Alexander Pivovarov wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34696/ --- (Updated May 27, 2015, 3:35 a.m.) Review request for hive, Hao Cheng, Jason Dere, namit jain, and Thejas Nair. Bugs: HIVE-686 https://issues.apache.org/jira/browse/HIVE-686 Repository: hive-git Description --- HIVE-686 add UDF substring_index Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 94a3b1787e2b3571eb7a8102c28f7334ae3fa829 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSubstringIndex.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSubstringIndex.java PRE-CREATION ql/src/test/queries/clientpositive/udf_substring_index.q PRE-CREATION ql/src/test/results/clientpositive/show_functions.q.out 16820ca887320da13a42bebe0876f29eec373c8f ql/src/test/results/clientpositive/udf_substring_index.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34696/diff/ Testing --- Thanks, Alexander Pivovarov
Re: Review Request 34576: Bucketized Table feature fails in some cases
On May 24, 2015, 2:03 a.m., Xuefu Zhang wrote: Have you thought of what if the client is not interactive, such as JDBC or thrift? pengcheng xiong wrote: I am sorry that we have not thought about it yet. We admitted that the patch will not cover the case when the client is not interactive. Do you have any good ideas that you can share with us? Do you think logging this besides printing a waring msg is good enough? Thanks. Xuefu Zhang wrote: There are all kinds of issues with data loading into bucketed tables. While advanced users might be able to load data correctly, I think that's really rare. The data in a bucketed table needs to be generated by Hive. Thefore, I think we should disable insert into and load data into|overwrite for a bucketed table. We should also disallow external tables for the same reason. To allow the advanced user to achieve what they used to do, we can have a flag, such as hive.enforce.strict.bucketing, which defaults to true. Those users can proceed by turning this off. Another option for insert into would be supporting appending new data, such as proposed in HIVE-3244. Gopal V wrote: Why would you disable insert into bucketed tables? How else would ACID work? Xuefu Zhang wrote: yeah. but I guess we were talking about things out of the context of ACID. Even before ACID, user can do insert into a bucketed table, which can be very harmful. This patch is only addressing Load path. Which i think we all agree is a problem. - John --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34576/#review85082 --- On May 23, 2015, 5:47 p.m., pengcheng xiong wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34576/ --- (Updated May 23, 2015, 5:47 p.m.) Review request for hive and John Pullokkaran. Repository: hive-git Description --- Bucketized Table feature fails in some cases. if src destination is bucketed on same key, and if actual data in the src is not bucketed (because data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be bucketed while writing to destination. Example -- CREATE TABLE P1(key STRING, val STRING) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE P1; – perform an insert to make sure there are 2 files INSERT OVERWRITE TABLE P1 select key, val from P1; -- This is not a regression. This has never worked. This got only discovered due to Hadoop2 changes. In Hadoop1, in local mode, number of reducers will always be 1, regardless of what is requested by app. Hadoop2 now honors the number of reducer setting in local mode (by spawning threads). Long term solution seems to be to prevent load data for bucketed table. Diffs - ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java e53933e ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 1a9b42b ql/src/test/results/clientnegative/bucket_mapjoin_mismatch1.q.out 623c2e8 ql/src/test/results/clientnegative/bucket_mapjoin_wrong_table_metadata_1.q.out f4522d2 ql/src/test/results/clientnegative/bucket_mapjoin_wrong_table_metadata_2.q.out 9aa9b5d ql/src/test/results/clientnegative/exim_11_nonpart_noncompat_sorting.q.out 9220c8e ql/src/test/results/clientpositive/auto_join32.q.out bfc8be8 ql/src/test/results/clientpositive/auto_join_filters.q.out a6720d9 ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out 383defd ql/src/test/results/clientpositive/auto_sortmerge_join_11.q.out e6e7ef3 ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out e9fb705 ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out c089419 ql/src/test/results/clientpositive/auto_sortmerge_join_3.q.out 6e443fa ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out feaea04 ql/src/test/results/clientpositive/auto_sortmerge_join_5.q.out f64ecf0 ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out e89f548 ql/src/test/results/clientpositive/auto_sortmerge_join_8.q.out 44c037f ql/src/test/results/clientpositive/bucket_map_join_1.q.out d778203 ql/src/test/results/clientpositive/bucket_map_join_2.q.out aef77aa ql/src/test/results/clientpositive/bucket_map_join_spark1.q.out 870ecdd ql/src/test/results/clientpositive/bucket_map_join_spark2.q.out 33f5c46 ql/src/test/results/clientpositive/bucket_map_join_spark3.q.out 067d1ff
[jira] [Created] (HIVE-10837) Running large queries (inserts) fails and crashes hiveserver2
Patrick McAnneny created HIVE-10837: --- Summary: Running large queries (inserts) fails and crashes hiveserver2 Key: HIVE-10837 URL: https://issues.apache.org/jira/browse/HIVE-10837 Project: Hive Issue Type: Bug Environment: Hive 1.1.0 on RHEL with Cloudera (cdh5.4.0) Reporter: Patrick McAnneny Priority: Critical When running a large insert statement through beeline or pyhs2, a thrift error is returned and hiveserver2 crashes. I ran into this with large insert statements -- my initial failing query was around 6million characters. After further testing however it seems like the failure threshold is based on number of inserted rows rather than the query's size in characters. My testing shows the failure threshold between 199,000 and 230,000 inserted rows. The thrift error is as follows: Error: org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe (state=08S01,code=0) Also note for anyone that tests this issue - when testing different queries I ran into https://issues.apache.org/jira/browse/HIVE-10836 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 34447: HIVE-10761 : Create codahale-based metrics system for Hive
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34447/#review85418 --- common/src/java/org/apache/hadoop/hive/common/JvmPauseMonitor.java https://reviews.apache.org/r/34447/#comment136981 Maybe a more informational message common/src/java/org/apache/hadoop/hive/common/JvmPauseMonitor.java https://reviews.apache.org/r/34447/#comment136983 should we check isStarted()? common/src/java/org/apache/hadoop/hive/common/metrics/MetricsLegacy.java https://reviews.apache.org/r/34447/#comment137016 LegacyMetrics? common/src/java/org/apache/hadoop/hive/common/metrics/common/MetricsFactory.java https://reviews.apache.org/r/34447/#comment136986 This should be also synchronized. common/src/java/org/apache/hadoop/hive/common/metrics/common/MetricsFactory.java https://reviews.apache.org/r/34447/#comment137006 Should we call it deinit()? common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/Metrics.java https://reviews.apache.org/r/34447/#comment137008 Could we rename the class so that we don't have to handle the duplicated class/interface names? common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/Metrics.java https://reviews.apache.org/r/34447/#comment137010 Could we rename the class so that we don't have to handle the duplicated class/interface names? common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/Metrics.java https://reviews.apache.org/r/34447/#comment137009 If the synchronized block is for the whole method, we might just as well declare the whole method as synchronized. common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/Metrics.java https://reviews.apache.org/r/34447/#comment137011 Same as above. common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/Metrics.java https://reviews.apache.org/r/34447/#comment137013 Shouldn't this be private? common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/Metrics.java https://reviews.apache.org/r/34447/#comment137012 I think fd needs to be closed properly in final block. common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/Metrics.java https://reviews.apache.org/r/34447/#comment137014 I think checking initialized needs to be synchronized. common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/Metrics.java https://reviews.apache.org/r/34447/#comment137015 Same as above. metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java https://reviews.apache.org/r/34447/#comment137007 Where do we call uninit() or it doesn't matter? Same for HS2. - Xuefu Zhang On May 27, 2015, 6:25 p.m., Szehon Ho wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34447/ --- (Updated May 27, 2015, 6:25 p.m.) Review request for hive, Chao Sun, Jimmy Xiang, and Xuefu Zhang. Bugs: HIVE-10761 https://issues.apache.org/jira/browse/HIVE-10761 Repository: hive-git Description --- See JIRA for the motivation. Summary: There is an existing metric system that uses some custom model and hooked up to JMX reporting, codahale-based metrics system will be desirable for standard model and reporting. This adds a codahale-based metrics system to HiveServer2 and HiveMetastore. Metrics implementation is now internally pluggable, and the existing Metrics system can be re-enabled by configuration if desired for backward-compatibility. Following metrics are supported by Metrics system: 1. JVMPauseMonitor (used to call Hadoop's internal implementation, now forked off to integrate with Metrics system) 2. HMS API calls 3. Standard JVM metrics (only for new implementation, as its free with codahale). The following metrics reporting are supported by new system (configuration exposed) 1. JMX 2. CONSOLE 3. JSON_FILE (periodic file of metrics that gets overwritten). A goal is to add a webserver that exposes the JSON metrics, but this will defer to a later implementation. Diffs - common/pom.xml a615c1e common/src/java/org/apache/hadoop/hive/common/JvmPauseMonitor.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/metrics/Metrics.java 01c9d1d common/src/java/org/apache/hadoop/hive/common/metrics/MetricsLegacy.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/metrics/common/Metrics.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/metrics/common/MetricsFactory.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/Metrics.java PRE-CREATION
Re: Review Request 34586: HIVE-10704
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34586/ --- (Updated May 27, 2015, 6:32 a.m.) Review request for hive. Repository: hive-git Description --- fix biggest small table selection when table sizes are 0 fallback to dividing memory equally if any tables have invalid size Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 536b92c5dd03abe9ff57bf64d87be0f3ef34aa7a Diff: https://reviews.apache.org/r/34586/diff/ Testing --- Thanks, Mostafa Mokhtar
Re: Review Request 34586: HIVE-10704
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34586/ --- (Updated May 27, 2015, 6:33 a.m.) Review request for hive. Repository: hive-git Description --- fix biggest small table selection when table sizes are 0 fallback to dividing memory equally if any tables have invalid size Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 536b92c Diff: https://reviews.apache.org/r/34586/diff/ Testing --- Thanks, Mostafa Mokhtar
Re: Review Request 34586: HIVE-10704
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34586/ --- (Updated May 27, 2015, 6:30 a.m.) Review request for hive. Repository: hive-git Description --- fix biggest small table selection when table sizes are 0 fallback to dividing memory equally if any tables have invalid size Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 536b92c5dd03abe9ff57bf64d87be0f3ef34aa7a Diff: https://reviews.apache.org/r/34586/diff/ Testing --- File Attachments (updated) HIVE-10704.3.patch https://reviews.apache.org/media/uploaded/files/2015/05/27/4a999c9c-1c3f-44dd-a321-a4157a067300__HIVE-10704.3.patch Thanks, Mostafa Mokhtar
RE: Build hive failure on ubuntu 15.04 with oracle java 1.8
This is known issue. https://issues.apache.org/jira/browse/HIVE-10674 From: yu20...@hotmail.com To: dev@hive.apache.org Subject: Build hive failure on ubuntu 15.04 with oracle java 1.8 Date: Tue, 26 May 2015 11:58:45 +0800 Hi guys, I tried to built hive 1.2.0 on ubuntu 15.04 with oracle Java 1.8. Then I encountered following problem.What should I do to fix this issue? Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0Running org.apache.hadoop.hive.metastore.TestMetastoreExprTests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.719 sec - in org.apache.hadoop.hive.metastore.TestMetastoreExprResults :Failed tests: TestExecDriver.testMapRedPlan1:513-executePlan:487 expected:true but was:false TestExecDriver.testMapRedPlan2:522-executePlan:487 expected:true but was:false TestExecDriver.testMapRedPlan3:531-executePlan:487 expected:true but was:false TestExecDriver.testMapRedPlan4:540-executePlan:487 expected:true but was:false TestExecDriver.testMapRedPlan5:549-executePlan:487 expected:true but was:false TestExecDriver.testMapRedPlan6:558-executePlan:487 expected:true but was:false TestExecDriver.testMapPlan1:496-executePlan:487 expected:true but was:false TestExecDriver.testMapPlan2:504-executePlan:487 expected:true but was:false TestSessionState.testReloadExistingAuxJars2:234 Could not find SessionStateTest.jar.v1 TestSessionState.testReloadAuxJars2:191 Could not find SessionStateTest.jar.v1 TestSessionState.testReloadExistingAuxJars2:234 Could not find SessionStateTest.jar.v1 TestSessionState.testReloadAuxJars2:191 Could not find SessionStateTest.jar.v1Tests run: 3545, Failures: 12, Errors: 0, Skipped: 1[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.16:test (default-test) on project hive-exec: There are test failures.[ERROR][ERROR] Please refer to /home/hadoop/apache-hive-1.2.0-src/ql/target/surefire-reports for the individual test results.[ERROR] - [Help 1]org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.16:test (default-test) on project hive-exec: There are test failures.Please refer to /home/hadoop/apache-hive-1.2.0-src/ql/target/surefire-reports for the individual test results.at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:213) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59) at org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:320) at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156)at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537)at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497)at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)Caused by: org.apache.maven.plugin.MojoFailureException: There are test failures.Thanks,Jared
Review Request 34754: NumberFormatException while running analyze table partition compute statics query
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34754/ --- Review request for hive and pengcheng xiong. Bugs: HIVE-10840 https://issues.apache.org/jira/browse/HIVE-10840 Repository: hive-git Description --- NumberFormatException while running analyze table partition compute statics query Diffs - itests/src/test/resources/testconfiguration.properties ae03283 ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java ad481bc ql/src/test/queries/clientpositive/stats_only_null.q a91022c ql/src/test/results/clientpositive/tez/stats_only_null.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34754/diff/ Testing --- Modified existing test to increase its coverage. Thanks, Ashutosh Chauhan
Re: Review Request 34455: HIVE-10550 Dynamic RDD caching optimization for HoS.[Spark Branch]
On May 27, 2015, 10:13 p.m., Xuefu Zhang wrote: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 2062 https://reviews.apache.org/r/34455/diff/3/?file=972428#file972428line2062 Sorry for pointing this out late. I'm not certain if it's a good idea to expose these two configurations. Also this introduces a change of behavior. For now, can we get rid of them and change the persistency level back to MEM+DISK? We can come back to revisit this later on. At this moment, I don't feel confident to make the call. chengxiang li wrote: persistent to MEM + DISK may hurt the performance in certain cases, i think at least we should have a switch to open/close this optimization, Xuefu Zhang wrote: Agreed. However, before we find out more about in what cases this helps or hurts, I think it's better we keep the existing behavior. This doesn't prevent us from adding a flag later on. chengxiang li wrote: Ok, i would remove these configurations from patch in temp, we can discuss later when we got more knowledge about it. Please feel free to create a followup JIRA to do more research. We can try different data sizes and persistancy levels to see the result. At that time, we can decide if it makes sense to introduce configurations. Thanks. - Xuefu --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34455/#review85451 --- On May 28, 2015, 3:30 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34455/ --- (Updated May 28, 2015, 3:30 a.m.) Review request for hive, Chao Sun, Jimmy Xiang, and Xuefu Zhang. Bugs: HIVE-10550 https://issues.apache.org/jira/browse/HIVE-10550 Repository: hive-git Description --- see jira description Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/CacheTran.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapTran.java 2170243 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java e60dfac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java ee5c78a ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 3f240f5 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c Diff: https://reviews.apache.org/r/34455/diff/ Testing --- Thanks, chengxiang li
Re: Big Lock in Driver.compileInternal
Hi. As luck would have it, we are currently looking at this issue :) I have a small patch up at https://issues.apache.org/jira/browse/HIVE-4239; I tested it a bit w/a unit test and some manual cluster testing. Would you be willing to test it on your setup? On 15/5/25, 20:54, Loudongfeng loudongf...@huawei.com wrote: Hi, All I notice that there is a big lock in org.apache.hadoop.hive.ql.Driver Following is a piece of code from Apache Hive 1.2.0 private static final Object compileMonitor = new Object(); private int compileInternal(String command) { int ret; synchronized (compileMonitor) { ret = compile(command); } ... } This means HQLs submitted concurrently from clients side will be compiled one by one on Hive Server side. This will cause problem when compile phase is slow. My question is ,what does this lock protect for? Is it possible to remove it ? Best Regards Nemon
Re: Review Request 34455: HIVE-10550 Dynamic RDD caching optimization for HoS.[Spark Branch]
On May 27, 2015, 10:13 p.m., Xuefu Zhang wrote: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 2062 https://reviews.apache.org/r/34455/diff/3/?file=972428#file972428line2062 Sorry for pointing this out late. I'm not certain if it's a good idea to expose these two configurations. Also this introduces a change of behavior. For now, can we get rid of them and change the persistency level back to MEM+DISK? We can come back to revisit this later on. At this moment, I don't feel confident to make the call. chengxiang li wrote: persistent to MEM + DISK may hurt the performance in certain cases, i think at least we should have a switch to open/close this optimization, Agreed. However, before we find out more about in what cases this helps or hurts, I think it's better we keep the existing behavior. This doesn't prevent us from adding a flag later on. - Xuefu --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34455/#review85451 --- On May 27, 2015, 1:50 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34455/ --- (Updated May 27, 2015, 1:50 a.m.) Review request for hive, Chao Sun, Jimmy Xiang, and Xuefu Zhang. Bugs: HIVE-10550 https://issues.apache.org/jira/browse/HIVE-10550 Repository: hive-git Description --- see jira description Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc ql/src/java/org/apache/hadoop/hive/ql/exec/spark/CacheTran.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapTran.java 2170243 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java e60dfac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java ee5c78a ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 3f240f5 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkRddCachingResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java bb5dd79 Diff: https://reviews.apache.org/r/34455/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 34455: HIVE-10550 Dynamic RDD caching optimization for HoS.[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34455/ --- (Updated May 28, 2015, 3:30 a.m.) Review request for hive, Chao Sun, Jimmy Xiang, and Xuefu Zhang. Changes --- remove configs, and move common parent match logic in SparkPlanGenerator directly. Bugs: HIVE-10550 https://issues.apache.org/jira/browse/HIVE-10550 Repository: hive-git Description --- see jira description Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/CacheTran.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapTran.java 2170243 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java e60dfac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java ee5c78a ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 3f240f5 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c Diff: https://reviews.apache.org/r/34455/diff/ Testing --- Thanks, chengxiang li
Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34757/ --- Review request for hive and Xuefu Zhang. Bugs: HIVE-10844 https://issues.apache.org/jira/browse/HIVE-10844 Repository: hive-git Description --- Some Hive queries(like TPCDS Q39) may share the same subquery, which translated into sperate, but equivalent Works in SparkWork, combining these equivalent Works into a single one would help to benifit from following dynamic RDD caching optimization. Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 Diff: https://reviews.apache.org/r/34757/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 34455: HIVE-10550 Dynamic RDD caching optimization for HoS.[Spark Branch]
On 五月 27, 2015, 10:13 p.m., Xuefu Zhang wrote: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 2062 https://reviews.apache.org/r/34455/diff/3/?file=972428#file972428line2062 Sorry for pointing this out late. I'm not certain if it's a good idea to expose these two configurations. Also this introduces a change of behavior. For now, can we get rid of them and change the persistency level back to MEM+DISK? We can come back to revisit this later on. At this moment, I don't feel confident to make the call. persistent to MEM + DISK may hurt the performance in certain cases, i think at least we should have a switch to open/close this optimization, - chengxiang --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34455/#review85451 --- On 五月 27, 2015, 1:50 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34455/ --- (Updated 五月 27, 2015, 1:50 a.m.) Review request for hive, Chao Sun, Jimmy Xiang, and Xuefu Zhang. Bugs: HIVE-10550 https://issues.apache.org/jira/browse/HIVE-10550 Repository: hive-git Description --- see jira description Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc ql/src/java/org/apache/hadoop/hive/ql/exec/spark/CacheTran.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapTran.java 2170243 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java e60dfac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java ee5c78a ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 3f240f5 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkRddCachingResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java bb5dd79 Diff: https://reviews.apache.org/r/34455/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 34248: HIVE-10684 Fix the unit test failures for HIVE-7553 after HIVE-10674 removed the binary jar files
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34248/ --- (Updated May 28, 2015, 2:31 a.m.) Review request for hive and Sushanth Sowmyan. Bugs: HIVE-10684 https://issues.apache.org/jira/browse/HIVE-10684 Repository: hive-git Description --- Remove binaries from source and fix the failed cases Diffs (updated) - ql/src/test/org/apache/hadoop/hive/ql/session/TestSessionState.java 45ba07e ql/src/test/resources/RefreshedJarClassV1.txt PRE-CREATION ql/src/test/resources/RefreshedJarClassV2.txt PRE-CREATION Diff: https://reviews.apache.org/r/34248/diff/ Testing --- UT passed Thanks, cheng xu
Re: Review Request 34455: HIVE-10550 Dynamic RDD caching optimization for HoS.[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34455/#review85509 --- Ship it! Ship It! - Xuefu Zhang On May 28, 2015, 3:30 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34455/ --- (Updated May 28, 2015, 3:30 a.m.) Review request for hive, Chao Sun, Jimmy Xiang, and Xuefu Zhang. Bugs: HIVE-10550 https://issues.apache.org/jira/browse/HIVE-10550 Repository: hive-git Description --- see jira description Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/CacheTran.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapTran.java 2170243 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java e60dfac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java ee5c78a ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 3f240f5 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c Diff: https://reviews.apache.org/r/34455/diff/ Testing --- Thanks, chengxiang li
Re: How to debug hive unit test in eclipse ?
This is great -- thanks Bob! Would you be willing to contribute it to the Hive wiki, or at least allow us to link to it from the Testing Docs overview https://cwiki.apache.org/confluence/display/Hive/TestingDocs? -- Lefty On Mon, May 25, 2015 at 12:02 PM, Bob Freitas bob.e.frei...@gmail.com wrote: Hi Jeff, I recently needed to figure out how to do unit testing of Hive scripts, and it turned out to be something of an adventure. I had done some previous work in this area but things have changed with MR2 and YARN, gee go figure... What I ended up doing was going through the Hive source code to figure out how the dev team was doing the testing. To help out people who come after me, I put together an article and github repo http://www.lopakalogic.com/articles/hadoop-articles/hive-testing/ With this I was able to step through my script, the Hadoop code, the Hive code, it was pretty cool! Hope it helps!
[GitHub] hive pull request: Hive 10843
GitHub user thejasmn opened a pull request: https://github.com/apache/hive/pull/39 Hive 10843 You can merge this pull request into a Git repository by running: $ git pull https://github.com/thejasmn/hive HIVE-10843 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/39.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #39 commit 9a99f25a0acc1d5b8e611fead6f5dffa985176e8 Author: Thejas Nair the...@hortonworks.com Date: 2015-05-27T18:12:52Z show tables now passes the current db name commit 574e3da1220500d1548d4b2431883db8a7da6028 Author: Thejas Nair the...@hortonworks.com Date: 2015-05-28T00:35:21Z add db info in describe db command --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] hive pull request: HIVE-10843
GitHub user thejasmn opened a pull request: https://github.com/apache/hive/pull/40 HIVE-10843 You can merge this pull request into a Git repository by running: $ git pull https://github.com/thejasmn/hive HIVE-10843 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/40.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #40 commit 9a99f25a0acc1d5b8e611fead6f5dffa985176e8 Author: Thejas Nair the...@hortonworks.com Date: 2015-05-27T18:12:52Z show tables now passes the current db name commit 574e3da1220500d1548d4b2431883db8a7da6028 Author: Thejas Nair the...@hortonworks.com Date: 2015-05-28T00:35:21Z add db info in describe db command --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Review Request 34447: HIVE-10761 : Create codahale-based metrics system for Hive
On May 27, 2015, 9:29 p.m., Xuefu Zhang wrote: common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/Metrics.java, line 141 https://reviews.apache.org/r/34447/diff/3/?file=972974#file972974line141 If the synchronized block is for the whole method, we might just as well declare the whole method as synchronized. Szehon Ho wrote: In this context, I think a object synchronization makes more sense than synchronizing on the class (sycnrhonized method). I think they are equivalent. A synchronized method is synchronizing on this. It will be on the class if the method is static. - Xuefu --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34447/#review85418 --- On May 28, 2015, 2:11 a.m., Szehon Ho wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34447/ --- (Updated May 28, 2015, 2:11 a.m.) Review request for hive, Chao Sun, Jimmy Xiang, and Xuefu Zhang. Bugs: HIVE-10761 https://issues.apache.org/jira/browse/HIVE-10761 Repository: hive-git Description --- See JIRA for the motivation. Summary: There is an existing metric system that uses some custom model and hooked up to JMX reporting, codahale-based metrics system will be desirable for standard model and reporting. This adds a codahale-based metrics system to HiveServer2 and HiveMetastore. Metrics implementation is now internally pluggable, and the existing Metrics system can be re-enabled by configuration if desired for backward-compatibility. Following metrics are supported by Metrics system: 1. JVMPauseMonitor (used to call Hadoop's internal implementation, now forked off to integrate with Metrics system) 2. HMS API calls 3. Standard JVM metrics (only for new implementation, as its free with codahale). The following metrics reporting are supported by new system (configuration exposed) 1. JMX 2. CONSOLE 3. JSON_FILE (periodic file of metrics that gets overwritten). A goal is to add a webserver that exposes the JSON metrics, but this will defer to a later implementation. Diffs - common/pom.xml a615c1e common/src/java/org/apache/hadoop/hive/common/JvmPauseMonitor.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/metrics/LegacyMetrics.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/metrics/Metrics.java 01c9d1d common/src/java/org/apache/hadoop/hive/common/metrics/common/Metrics.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/metrics/common/MetricsFactory.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/CodahaleMetrics.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/MetricsReporting.java PRE-CREATION common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 49b8f97 common/src/test/org/apache/hadoop/hive/common/metrics/TestLegacyMetrics.java PRE-CREATION common/src/test/org/apache/hadoop/hive/common/metrics/TestMetrics.java e85d3f8 common/src/test/org/apache/hadoop/hive/common/metrics/metrics2/TestCodahaleMetrics.java PRE-CREATION itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestMetaStoreMetrics.java PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java d81c856 pom.xml b21d894 service/src/java/org/apache/hive/service/server/HiveServer2.java 58e8e49 shims/0.20S/src/main/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java 6d8166c shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java 19324b8 shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 5a6bc44 Diff: https://reviews.apache.org/r/34447/diff/ Testing --- New unit test added. Manually tested. Thanks, Szehon Ho
[jira] [Created] (HIVE-10844) Combine equivalent Works for HoS[Spark Branch]
Chengxiang Li created HIVE-10844: Summary: Combine equivalent Works for HoS[Spark Branch] Key: HIVE-10844 URL: https://issues.apache.org/jira/browse/HIVE-10844 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Some Hive queries(like [TPCDS Q39|https://github.com/hortonworks/hive-testbench/blob/hive14/sample-queries-tpcds/query39.sql]) may share the same subquery, which translated into sperate, but equivalent Works in SparkWork, combining these equivalent Works into a single one would help to benifit from following dynamic RDD caching optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Revise docs for Hive indexing
Will Hive indexing ever be fixed? If not, should we remove the doc I cobbled together (Indexing https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Indexing) or just revise it? And should the design doc be moved from the Completed section to Incomplete (Hive Design Docs https://cwiki.apache.org/confluence/display/Hive/DesignDocs)? What about bitmap indexes, do they work (Bitmap Indexes https://cwiki.apache.org/confluence/display/Hive/IndexDev+Bitmap -- HIVE-1803 https://issues.apache.org/jira/browse/HIVE-1803)? -- Lefty
[jira] [Created] (HIVE-10842) LLAP: DAGs get stuck in yet another way
Sergey Shelukhin created HIVE-10842: --- Summary: LLAP: DAGs get stuck in yet another way Key: HIVE-10842 URL: https://issues.apache.org/jira/browse/HIVE-10842 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Looks exactly like HIVE-10744. Last comment there has internal app IDs. Logs upon request. 6 (number of slots) tasks from a machine are stuck. jstack for target daemon sayeth: {noformat} 7 Found one Java-level deadlock: 8 = 9 10 IPC Server handler 4 on 15001: 11 waiting to lock Monitor@0x7f3cb0005cb8 (Object@0x8cc3ce98, a java/lang/Object), 12 which is held by Wait-Queue-Scheduler-0 13 Wait-Queue-Scheduler-0: 14 waiting to lock Monitor@0x7f3cb0004d98 (Object@0x9234cf58, a org/apache/hadoop/hive/llap/daemon/impl/Q ueryInfo$FinishableStateTracker), 15 which is held by IPC Server handler 4 on 15001 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] hive pull request: Hive 10843
Github user thejasmn closed the pull request at: https://github.com/apache/hive/pull/39 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (HIVE-10843) desc database and show tables commands don't pass db to HiveAuthorizer check
Thejas M Nair created HIVE-10843: Summary: desc database and show tables commands don't pass db to HiveAuthorizer check Key: HIVE-10843 URL: https://issues.apache.org/jira/browse/HIVE-10843 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair 'show tables' and 'describe database' command should pass the database information for the command to HiveAuthorizer . This is needed for any auditing the hive authorizer might implement, or any authorization check it might decide to do based on the given database name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 34455: HIVE-10550 Dynamic RDD caching optimization for HoS.[Spark Branch]
On 五月 27, 2015, 10:13 p.m., Xuefu Zhang wrote: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 2062 https://reviews.apache.org/r/34455/diff/3/?file=972428#file972428line2062 Sorry for pointing this out late. I'm not certain if it's a good idea to expose these two configurations. Also this introduces a change of behavior. For now, can we get rid of them and change the persistency level back to MEM+DISK? We can come back to revisit this later on. At this moment, I don't feel confident to make the call. chengxiang li wrote: persistent to MEM + DISK may hurt the performance in certain cases, i think at least we should have a switch to open/close this optimization, Xuefu Zhang wrote: Agreed. However, before we find out more about in what cases this helps or hurts, I think it's better we keep the existing behavior. This doesn't prevent us from adding a flag later on. Ok, i would remove these configurations from patch in temp, we can discuss later when we got more knowledge about it. - chengxiang --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34455/#review85451 --- On 五月 27, 2015, 1:50 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34455/ --- (Updated 五月 27, 2015, 1:50 a.m.) Review request for hive, Chao Sun, Jimmy Xiang, and Xuefu Zhang. Bugs: HIVE-10550 https://issues.apache.org/jira/browse/HIVE-10550 Repository: hive-git Description --- see jira description Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc ql/src/java/org/apache/hadoop/hive/ql/exec/spark/CacheTran.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapTran.java 2170243 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java e60dfac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java ee5c78a ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 3f240f5 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkRddCachingResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java bb5dd79 Diff: https://reviews.apache.org/r/34455/diff/ Testing --- Thanks, chengxiang li
Review Request 34752: Beeline-CLI: Implement CLI source command using Beeline functionality
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34752/ --- Review request for hive and Xuefu Zhang. Bugs: HIVE-10821 https://issues.apache.org/jira/browse/HIVE-10821 Repository: hive-git Description --- Add source command support for CLI using beeline Diffs - beeline/src/java/org/apache/hive/beeline/BeeLine.java 4a82635 beeline/src/test/org/apache/hive/beeline/cli/TestHiveCli.java cc0b598 Diff: https://reviews.apache.org/r/34752/diff/ Testing --- Newly created UT passed Thanks, cheng xu
Re: Review Request 34447: HIVE-10761 : Create codahale-based metrics system for Hive
On May 27, 2015, 9:29 p.m., Xuefu Zhang wrote: common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/Metrics.java, line 141 https://reviews.apache.org/r/34447/diff/3/?file=972974#file972974line141 If the synchronized block is for the whole method, we might just as well declare the whole method as synchronized. In this context, I think a object synchronization makes more sense than synchronizing on the class (sycnrhonized method). - Szehon --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34447/#review85418 --- On May 28, 2015, 2:11 a.m., Szehon Ho wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34447/ --- (Updated May 28, 2015, 2:11 a.m.) Review request for hive, Chao Sun, Jimmy Xiang, and Xuefu Zhang. Bugs: HIVE-10761 https://issues.apache.org/jira/browse/HIVE-10761 Repository: hive-git Description --- See JIRA for the motivation. Summary: There is an existing metric system that uses some custom model and hooked up to JMX reporting, codahale-based metrics system will be desirable for standard model and reporting. This adds a codahale-based metrics system to HiveServer2 and HiveMetastore. Metrics implementation is now internally pluggable, and the existing Metrics system can be re-enabled by configuration if desired for backward-compatibility. Following metrics are supported by Metrics system: 1. JVMPauseMonitor (used to call Hadoop's internal implementation, now forked off to integrate with Metrics system) 2. HMS API calls 3. Standard JVM metrics (only for new implementation, as its free with codahale). The following metrics reporting are supported by new system (configuration exposed) 1. JMX 2. CONSOLE 3. JSON_FILE (periodic file of metrics that gets overwritten). A goal is to add a webserver that exposes the JSON metrics, but this will defer to a later implementation. Diffs - common/pom.xml a615c1e common/src/java/org/apache/hadoop/hive/common/JvmPauseMonitor.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/metrics/LegacyMetrics.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/metrics/Metrics.java 01c9d1d common/src/java/org/apache/hadoop/hive/common/metrics/common/Metrics.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/metrics/common/MetricsFactory.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/CodahaleMetrics.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/MetricsReporting.java PRE-CREATION common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 49b8f97 common/src/test/org/apache/hadoop/hive/common/metrics/TestLegacyMetrics.java PRE-CREATION common/src/test/org/apache/hadoop/hive/common/metrics/TestMetrics.java e85d3f8 common/src/test/org/apache/hadoop/hive/common/metrics/metrics2/TestCodahaleMetrics.java PRE-CREATION itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestMetaStoreMetrics.java PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java d81c856 pom.xml b21d894 service/src/java/org/apache/hive/service/server/HiveServer2.java 58e8e49 shims/0.20S/src/main/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java 6d8166c shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java 19324b8 shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 5a6bc44 Diff: https://reviews.apache.org/r/34447/diff/ Testing --- New unit test added. Manually tested. Thanks, Szehon Ho
Re: Review Request 34447: HIVE-10761 : Create codahale-based metrics system for Hive
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34447/ --- (Updated May 28, 2015, 2:11 a.m.) Review request for hive, Chao Sun, Jimmy Xiang, and Xuefu Zhang. Changes --- Address review comments. Bugs: HIVE-10761 https://issues.apache.org/jira/browse/HIVE-10761 Repository: hive-git Description --- See JIRA for the motivation. Summary: There is an existing metric system that uses some custom model and hooked up to JMX reporting, codahale-based metrics system will be desirable for standard model and reporting. This adds a codahale-based metrics system to HiveServer2 and HiveMetastore. Metrics implementation is now internally pluggable, and the existing Metrics system can be re-enabled by configuration if desired for backward-compatibility. Following metrics are supported by Metrics system: 1. JVMPauseMonitor (used to call Hadoop's internal implementation, now forked off to integrate with Metrics system) 2. HMS API calls 3. Standard JVM metrics (only for new implementation, as its free with codahale). The following metrics reporting are supported by new system (configuration exposed) 1. JMX 2. CONSOLE 3. JSON_FILE (periodic file of metrics that gets overwritten). A goal is to add a webserver that exposes the JSON metrics, but this will defer to a later implementation. Diffs (updated) - common/pom.xml a615c1e common/src/java/org/apache/hadoop/hive/common/JvmPauseMonitor.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/metrics/LegacyMetrics.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/metrics/Metrics.java 01c9d1d common/src/java/org/apache/hadoop/hive/common/metrics/common/Metrics.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/metrics/common/MetricsFactory.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/CodahaleMetrics.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/MetricsReporting.java PRE-CREATION common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 49b8f97 common/src/test/org/apache/hadoop/hive/common/metrics/TestLegacyMetrics.java PRE-CREATION common/src/test/org/apache/hadoop/hive/common/metrics/TestMetrics.java e85d3f8 common/src/test/org/apache/hadoop/hive/common/metrics/metrics2/TestCodahaleMetrics.java PRE-CREATION itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestMetaStoreMetrics.java PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java d81c856 pom.xml b21d894 service/src/java/org/apache/hive/service/server/HiveServer2.java 58e8e49 shims/0.20S/src/main/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java 6d8166c shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java 19324b8 shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 5a6bc44 Diff: https://reviews.apache.org/r/34447/diff/ Testing --- New unit test added. Manually tested. Thanks, Szehon Ho
JIRA: sort attachments by date
Is there any way to change the default for JIRA attachments to Sort By Date instead of Sort By Name? Manage Attachments doesn't have anything useful. -- Lefty
[jira] [Created] (HIVE-10841) [WHERE col is not null] does not work for large queries
Alexander Pivovarov created HIVE-10841: -- Summary: [WHERE col is not null] does not work for large queries Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Alexander Pivovarov The result from the following SELCT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10 122 NULLNULL NULLNULL {code} but it should be 1 row {code} 10 122 {code} 3. workaround is to put acct.brn is not null to join condition {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid and acct.brn is not null WHERE L.id = 4436; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [VOTE] Stable releases from branch-1 and experimental releases from master
+1 -- Lefty On Wed, May 27, 2015 at 3:21 PM, Alexander Pivovarov apivova...@gmail.com wrote: +1 On May 27, 2015 10:45 AM, Vikram Dixit K vikram.di...@gmail.com wrote: +1 for all the reasons outlined. On Tue, May 26, 2015 at 6:13 PM, Thejas Nair thejas.n...@gmail.com wrote: +1 - This is great for users who want to take longer to upgrade from hadoop-1 and care mainly for bug fixes and incremental features, rather than radical new features. - The ability to release initial 2.x releases marked as alpha/beta also helps to get users to try it out, and also lets them choose what is right for them. - This also lets developers focus on major new features without the burden of maintaining hadoop-1 compatibility. On Tue, May 26, 2015 at 11:41 AM, Alan Gates alanfga...@gmail.com wrote: We have discussed this for several weeks now. Some concerns have been raised which I have tried to address. I think it is time to vote on it as our release plan. To be specific, I propose: Hive makes a branch-1 from the current master. This would be used for 1.3 and future 1.x releases. This branch would not deprecate existing functionality. Any new features in this branch would also need to be put on master. An upgrade path for users will be maintained from one 1.x release to the next, as well as from the latest 1.x release to the latest 2.x release. Going forward releases numbered 2.x will be made from master. The purpose of these releases will be to enable users to get access to new features being developed in Hive and allow developers to get feedback. It is expected that for a while these releases will not be production ready and will be clearly so labeled. Some legacy features, such as Hadoop 1 and MapReduce, will no longer be supported in the master. Any critical bug fixes (security, incorrect results, crashes) fixed in master will also be ported to branch-1 for at least a year. This time period may be extended in the future based on the stability and adoption of 2.x releases. Based on Hive's bylaws this release plan vote will be open for 3 days and all active committers have binding votes. Here's my +1. Alan. -- Nothing better than when appreciated for hard work. -Mark
Re: [VOTE] Stable releases from branch-1 and experimental releases from master
+1 On May 27, 2015 10:45 AM, Vikram Dixit K vikram.di...@gmail.com wrote: +1 for all the reasons outlined. On Tue, May 26, 2015 at 6:13 PM, Thejas Nair thejas.n...@gmail.com wrote: +1 - This is great for users who want to take longer to upgrade from hadoop-1 and care mainly for bug fixes and incremental features, rather than radical new features. - The ability to release initial 2.x releases marked as alpha/beta also helps to get users to try it out, and also lets them choose what is right for them. - This also lets developers focus on major new features without the burden of maintaining hadoop-1 compatibility. On Tue, May 26, 2015 at 11:41 AM, Alan Gates alanfga...@gmail.com wrote: We have discussed this for several weeks now. Some concerns have been raised which I have tried to address. I think it is time to vote on it as our release plan. To be specific, I propose: Hive makes a branch-1 from the current master. This would be used for 1.3 and future 1.x releases. This branch would not deprecate existing functionality. Any new features in this branch would also need to be put on master. An upgrade path for users will be maintained from one 1.x release to the next, as well as from the latest 1.x release to the latest 2.x release. Going forward releases numbered 2.x will be made from master. The purpose of these releases will be to enable users to get access to new features being developed in Hive and allow developers to get feedback. It is expected that for a while these releases will not be production ready and will be clearly so labeled. Some legacy features, such as Hadoop 1 and MapReduce, will no longer be supported in the master. Any critical bug fixes (security, incorrect results, crashes) fixed in master will also be ported to branch-1 for at least a year. This time period may be extended in the future based on the stability and adoption of 2.x releases. Based on Hive's bylaws this release plan vote will be open for 3 days and all active committers have binding votes. Here's my +1. Alan. -- Nothing better than when appreciated for hard work. -Mark
[jira] [Created] (HIVE-10839) TestHCatLoaderEncryption.* tests fail in windows because of path related issues
Hari Sankar Sivarama Subramaniyan created HIVE-10839: Summary: TestHCatLoaderEncryption.* tests fail in windows because of path related issues Key: HIVE-10839 URL: https://issues.apache.org/jira/browse/HIVE-10839 Project: Hive Issue Type: Bug Components: Tests Environment: Windows OS Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan I am getting the following errors while trying to run org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption.* tests in windows. {code} Encryption key created: 'key_128' (1,Encryption Processor Helper Failed:Pathname /D:/w/hv/hcatalog/hcatalog-pig-adapter/target/tmp/org.apache.hive.hcatalog.pig.TestHCatLoader-1432579852919/warehouse/encryptedTable from D:/w/hv/hcatalog/hcatalog-pig-adapter/target/tmp/org.apache.hive.hcatalog.pig.TestHCatLoader-1432579852919/warehouse/encryptedTable is not a valid DFS filename.,null) Encryption key deleted: 'key_128' {code} {code} Error Message Could not fully delete D:\w\hv\hcatalog\hcatalog-pig-adapter\target\tmp\dfs\name1 Stacktrace java.io.IOException: Could not fully delete D:\w\hv\hcatalog\hcatalog-pig-adapter\target\tmp\dfs\name1 at org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:940) at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:811) at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:742) at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:612) at org.apache.hadoop.hive.shims.Hadoop23Shims.getMiniDfs(Hadoop23Shims.java:523) at org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption.initEncryptionShim(TestHCatLoaderEncryption.java:242) at org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption.setup(TestHCatLoaderEncryption.java:190) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 34455: HIVE-10550 Dynamic RDD caching optimization for HoS.[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34455/#review85451 --- common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/34455/#comment137023 Sorry for pointing this out late. I'm not certain if it's a good idea to expose these two configurations. Also this introduces a change of behavior. For now, can we get rid of them and change the persistency level back to MEM+DISK? We can come back to revisit this later on. At this moment, I don't feel confident to make the call. - Xuefu Zhang On May 27, 2015, 1:50 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34455/ --- (Updated May 27, 2015, 1:50 a.m.) Review request for hive, Chao Sun, Jimmy Xiang, and Xuefu Zhang. Bugs: HIVE-10550 https://issues.apache.org/jira/browse/HIVE-10550 Repository: hive-git Description --- see jira description Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc ql/src/java/org/apache/hadoop/hive/ql/exec/spark/CacheTran.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapTran.java 2170243 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java e60dfac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java ee5c78a ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 3f240f5 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkRddCachingResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java bb5dd79 Diff: https://reviews.apache.org/r/34455/diff/ Testing --- Thanks, chengxiang li
[jira] [Created] (HIVE-10838) Allow Hive metastore client can use different hostname which has multiple hostnames when security is enable
HeeSoo Kim created HIVE-10838: - Summary: Allow Hive metastore client can use different hostname which has multiple hostnames when security is enable Key: HIVE-10838 URL: https://issues.apache.org/jira/browse/HIVE-10838 Project: Hive Issue Type: Task Reporter: HeeSoo Kim Assignee: HeeSoo Kim Currently if Hive metastore client (e.g. HS2, oozie) tries to connect the hive metastore to when security is enabled, the Hive metastore client will fail to connect with an error like the following: {code} 2015-05-21 23:17:59,554 ERROR metadata.Hive (Hive.java:getDelegationToken(2638)) - MetaException(message:Unauthorized connection for super-user: hiveserver/hiveserver-dpci.s3s.altiscale@test.altiscale.com from IP 10.250.16.43) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result$get_delegation_token_resultStandardScheme.read(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result$get_delegation_token_resultStandardScheme.read(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result.read(ThriftHiveMetastore.java) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_delegation_token(ThriftHiveMetastore.java:3293) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_delegation_token(ThriftHiveMetastore.java:3279) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDelegationToken(HiveMetaStoreClient.java:1559) {code} This is the case when if Hive metastore client's default IP address is the different from hostname of the Hive metastore client's kerberos principal. And the Hive metastore client has multiple IP addresses. We need to set the bind address when Hive metastore client tries to connect Hive metastore based on hostname of Kerberos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10840) NumberFormatException while running analyze table partition compute statics query
Ashutosh Chauhan created HIVE-10840: --- Summary: NumberFormatException while running analyze table partition compute statics query Key: HIVE-10840 URL: https://issues.apache.org/jira/browse/HIVE-10840 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 1.2.0 Reporter: Jagruti Varia Assignee: Ashutosh Chauhan -- This message was sent by Atlassian JIRA (v6.3.4#6332)