[jira] [Assigned] (HIVE-7541) Support union all on Spark
[ https://issues.apache.org/jira/browse/HIVE-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Na Yang reassigned HIVE-7541: - Assignee: Na Yang Support union all on Spark -- Key: HIVE-7541 URL: https://issues.apache.org/jira/browse/HIVE-7541 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Na Yang For union all operator, we will use Spark's union transformation. Refer to the design doc on wiki for more information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7436) Load Spark configuration into Hive driver
[ https://issues.apache.org/jira/browse/HIVE-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14078962#comment-14078962 ] Lefty Leverenz commented on HIVE-7436: -- bq. Is HADOOP_CLASSPATH documented anywhere for Hive? Grepping the Hive wiki reveals three docs that mention HADOOP_CLASSPATH, but none for Hive: * [HCatalog InputOutput -- Running MapReduce with HCatalog (see first example) | https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-RunningMapReducewithHCatalog] * [Install WebHCat -- Hadoop Distributed Cache (see templeton.override.jars, which is the last config in the section) | https://cwiki.apache.org/confluence/display/Hive/WebHCat+InstallWebHCat#WebHCatInstallWebHCat-HadoopDistributedCache] * [WebHCat Configuration -- Configuration Variables (see templeton.override.jars, which is 5th in the table) | https://cwiki.apache.org/confluence/display/Hive/WebHCat+Configure#WebHCatConfigure-ConfigurationVariables] Load Spark configuration into Hive driver - Key: HIVE-7436 URL: https://issues.apache.org/jira/browse/HIVE-7436 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Fix For: spark-branch Attachments: HIVE-7436-Spark.1.patch, HIVE-7436-Spark.2.patch, HIVE-7436-Spark.3.patch load Spark configuration into Hive driver, there are 3 ways to setup spark configurations: # Java property. # Configure properties in spark configuration file(spark-defaults.conf). # Hive configuration file(hive-site.xml). The below configuration has more priority, and would overwrite previous configuration with the same property name. Please refer to [http://spark.apache.org/docs/latest/configuration.html] for all configurable properties of spark, and you can configure spark configuration in Hive through following ways: # Configure through spark configuration file. #* Create spark-defaults.conf, and place it in the /etc/spark/conf configuration directory. configure properties in spark-defaults.conf in java properties format. #* Create the $SPARK_CONF_DIR environment variable and set it to the location of spark-defaults.conf. export SPARK_CONF_DIR=/etc/spark/conf #* Add $SAPRK_CONF_DIR to the $HADOOP_CLASSPATH environment variable. export HADOOP_CLASSPATH=$SPARK_CONF_DIR:$HADOOP_CLASSPATH # Configure through hive configuration file. #* edit hive-site.xml in hive conf directory, configure properties in spark-defaults.conf in xml format. Hive driver default spark properties: ||name||default value||description|| |spark.master|local|Spark master url.| |spark.app.name|Hive on Spark|Default Spark application name.| NO PRECOMMIT TESTS. This is for spark-branch only. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7497) Fix some default values in HiveConf
[ https://issues.apache.org/jira/browse/HIVE-7497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14078973#comment-14078973 ] Lefty Leverenz commented on HIVE-7497: -- Good, that makes sense. Thanks [~dongc]. (I'd fix your env smiley but it's fun -- let the parenthesis remain open.) Fix some default values in HiveConf --- Key: HIVE-7497 URL: https://issues.apache.org/jira/browse/HIVE-7497 Project: Hive Issue Type: Task Reporter: Brock Noland Assignee: Dong Chen Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7497.1.patch, HIVE-7497.patch HIVE-5160 resolves an env variable at runtime via calling System.getenv(). As long as the variable is not defined when you run the build null is returned and the path is not placed in the hive-default,template. However if it is defined it will populate hive-default.template with a path which will be different based on the user running the build. We should use $\{system:HIVE_CONF_DIR\} instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7029) Vectorize ReduceWork
[ https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14078977#comment-14078977 ] Hive QA commented on HIVE-7029: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12658529/HIVE-7029.7.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5835 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/97/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/97/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-97/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12658529 Vectorize ReduceWork Key: HIVE-7029 URL: https://issues.apache.org/jira/browse/HIVE-7029 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, HIVE-7029.4.patch, HIVE-7029.5.patch, HIVE-7029.6.patch, HIVE-7029.7.patch This will enable vectorization team to independently work on vectorization on reduce side even before vectorized shuffle is ready. NOTE: Tez only (i.e. TezTask only) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7553) avoid the scheduling maintenance window for every jar change
Ferdinand Xu created HIVE-7553: -- Summary: avoid the scheduling maintenance window for every jar change Key: HIVE-7553 URL: https://issues.apache.org/jira/browse/HIVE-7553 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Ferdinand Xu Assignee: Ferdinand Xu When user needs to refresh existing or add a new jar to HS2, it needs to restart it. As HS2 is service exposed to clients, this requires scheduling maintenance window for every jar change. It would be great if we could avoid that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7436) Load Spark configuration into Hive driver
[ https://issues.apache.org/jira/browse/HIVE-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079007#comment-14079007 ] Chengxiang Li commented on HIVE-7436: - [~xuefuz] HADOOP_CONF_DIR is added to HADOOP_CLASSPATH in hadoop-config.sh, so as HIVE_CONF_DIR in hive-config.sh. if we only load spark configuration file from classpath, there are 2 choices: # export SPARK_CONF_DIR, and add it to HADOOP_CLASSPATH manually. # commit a patch which would add SPARK_CONF_DIR to HADOOP_CLASSPATH in hive scripts(such as hive-config.sh). export SPARK_CONF_DIR. my concern about supporting load spark configuration file from SPARK_CONF_DIR in implementation level is that: # HADOOP/HIVE/HIVE on TEZ only load configuration file from classpath actually. # it may introduce more complexity, like what should we do if different spark configuration file available on SPARK_CONF_DIR and HADOOP_CLASSPATH both? The way how to configure Hive on Tez is similar as current Hive on Spark. [Hive on Tez Configuration|http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.2/bk_installing_manually_book/content/rpm-chap-tez_configure_tez.html] Load Spark configuration into Hive driver - Key: HIVE-7436 URL: https://issues.apache.org/jira/browse/HIVE-7436 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Fix For: spark-branch Attachments: HIVE-7436-Spark.1.patch, HIVE-7436-Spark.2.patch, HIVE-7436-Spark.3.patch load Spark configuration into Hive driver, there are 3 ways to setup spark configurations: # Java property. # Configure properties in spark configuration file(spark-defaults.conf). # Hive configuration file(hive-site.xml). The below configuration has more priority, and would overwrite previous configuration with the same property name. Please refer to [http://spark.apache.org/docs/latest/configuration.html] for all configurable properties of spark, and you can configure spark configuration in Hive through following ways: # Configure through spark configuration file. #* Create spark-defaults.conf, and place it in the /etc/spark/conf configuration directory. configure properties in spark-defaults.conf in java properties format. #* Create the $SPARK_CONF_DIR environment variable and set it to the location of spark-defaults.conf. export SPARK_CONF_DIR=/etc/spark/conf #* Add $SAPRK_CONF_DIR to the $HADOOP_CLASSPATH environment variable. export HADOOP_CLASSPATH=$SPARK_CONF_DIR:$HADOOP_CLASSPATH # Configure through hive configuration file. #* edit hive-site.xml in hive conf directory, configure properties in spark-defaults.conf in xml format. Hive driver default spark properties: ||name||default value||description|| |spark.master|local|Spark master url.| |spark.app.name|Hive on Spark|Default Spark application name.| NO PRECOMMIT TESTS. This is for spark-branch only. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7519) Refactor QTestUtil to remove its duplication with QFileClient for qtest setup and teardown
[ https://issues.apache.org/jira/browse/HIVE-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079025#comment-14079025 ] Hive QA commented on HIVE-7519: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12658531/HIVE-7519.1.patch {color:red}ERROR:{color} -1 due to 31 failed/errored test(s), 5838 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_role_grant2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_print_header org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_non_string_partition org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_part_project org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_bucketmapjoin1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_context org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_parquet org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_timestamp_funcs org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.ql.parse.TestParse.testParse_case_sensitivity org.apache.hadoop.hive.ql.parse.TestParse.testParse_input5 org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_testsequencefile org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_testxpath org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_testxpath2 org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample2 org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample3 org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample4 org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample5 org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample6 org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample7 org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/98/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/98/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-98/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 31 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12658531 Refactor QTestUtil to remove its duplication with QFileClient for qtest setup and teardown --- Key: HIVE-7519 URL: https://issues.apache.org/jira/browse/HIVE-7519 Project: Hive Issue Type: Improvement Reporter: Ashish Kumar Singh Assignee: Ashish Kumar Singh Attachments: HIVE-7519.1.patch, HIVE-7519.patch QTestUtil hard codes creation and dropping of source tables for qtests. QFileClient does the same thing but in a better way, uses q_test_init.sql and q_test_cleanup.sql scripts. As QTestUtil is growing quite large it makes sense to refactor it to use QFileClient's approach. This will also remove duplication of code addressing same purpose. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7544) Changes related to TEZ-1288 (FastTezSerialization)
[ https://issues.apache.org/jira/browse/HIVE-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-7544: --- Attachment: HIVE-7544.1.patch Changes related to TEZ-1288 (FastTezSerialization) -- Key: HIVE-7544 URL: https://issues.apache.org/jira/browse/HIVE-7544 Project: Hive Issue Type: Sub-task Components: Tez Affects Versions: 0.14.0 Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: HIVE-7544.1.patch Add ability to make use of TezBytesWritableSerialization. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-4934) ntile function has to be the last thing in the select list
[ https://issues.apache.org/jira/browse/HIVE-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Francke reassigned HIVE-4934: -- Assignee: Lars Francke ntile function has to be the last thing in the select list -- Key: HIVE-4934 URL: https://issues.apache.org/jira/browse/HIVE-4934 Project: Hive Issue Type: Bug Reporter: Lars Francke Assignee: Lars Francke Priority: Minor {code} CREATE TABLE test (foo INT); SELECT ntile(10), foo OVER (PARTITION BY foo) FROM test; FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: Only COMPLETE mode supported for NTile function SELECT foo, ntile(10) OVER (PARTITION BY foo) FROM test; ...works... {code} I'm not sure if that is a bug or necessary. Either way the error message is not helpful as it's not documented anywhere what {{COMPLETE}} mode is. A cursory glance at the code didn't help me either. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-4934) ntile function has to be the last thing in the select list
[ https://issues.apache.org/jira/browse/HIVE-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Francke resolved HIVE-4934. Resolution: Fixed This was a misunderstanding on my part. I'll add a sentence to the documentation to clear this up for other. ntile function has to be the last thing in the select list -- Key: HIVE-4934 URL: https://issues.apache.org/jira/browse/HIVE-4934 Project: Hive Issue Type: Bug Reporter: Lars Francke Assignee: Lars Francke Priority: Minor {code} CREATE TABLE test (foo INT); SELECT ntile(10), foo OVER (PARTITION BY foo) FROM test; FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: Only COMPLETE mode supported for NTile function SELECT foo, ntile(10) OVER (PARTITION BY foo) FROM test; ...works... {code} I'm not sure if that is a bug or necessary. Either way the error message is not helpful as it's not documented anywhere what {{COMPLETE}} mode is. A cursory glance at the code didn't help me either. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-4934) Improve documentation of OVER clause
[ https://issues.apache.org/jira/browse/HIVE-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Francke updated HIVE-4934: --- Description: {code} CREATE TABLE test (foo INT); SELECT ntile(10), foo OVER (PARTITION BY foo) FROM test; FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: Only COMPLETE mode supported for NTile function SELECT foo, ntile(10) OVER (PARTITION BY foo) FROM test; ...works... {code} I'm not sure if that is a bug or necessary. Either way the error message is not helpful as it's not documented anywhere what {{COMPLETE}} mode is. A cursory glance at the code didn't help me either. Edit: It is not a bug, it wasn't clear to me that the OVER clause only applies to the directly preceding function. was: {code} CREATE TABLE test (foo INT); SELECT ntile(10), foo OVER (PARTITION BY foo) FROM test; FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: Only COMPLETE mode supported for NTile function SELECT foo, ntile(10) OVER (PARTITION BY foo) FROM test; ...works... {code} I'm not sure if that is a bug or necessary. Either way the error message is not helpful as it's not documented anywhere what {{COMPLETE}} mode is. A cursory glance at the code didn't help me either. Improve documentation of OVER clause Key: HIVE-4934 URL: https://issues.apache.org/jira/browse/HIVE-4934 Project: Hive Issue Type: Bug Reporter: Lars Francke Assignee: Lars Francke Priority: Minor {code} CREATE TABLE test (foo INT); SELECT ntile(10), foo OVER (PARTITION BY foo) FROM test; FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: Only COMPLETE mode supported for NTile function SELECT foo, ntile(10) OVER (PARTITION BY foo) FROM test; ...works... {code} I'm not sure if that is a bug or necessary. Either way the error message is not helpful as it's not documented anywhere what {{COMPLETE}} mode is. A cursory glance at the code didn't help me either. Edit: It is not a bug, it wasn't clear to me that the OVER clause only applies to the directly preceding function. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-4934) Improve documentation of OVER clause
[ https://issues.apache.org/jira/browse/HIVE-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Francke updated HIVE-4934: --- Summary: Improve documentation of OVER clause (was: ntile function has to be the last thing in the select list) Improve documentation of OVER clause Key: HIVE-4934 URL: https://issues.apache.org/jira/browse/HIVE-4934 Project: Hive Issue Type: Bug Reporter: Lars Francke Assignee: Lars Francke Priority: Minor {code} CREATE TABLE test (foo INT); SELECT ntile(10), foo OVER (PARTITION BY foo) FROM test; FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: Only COMPLETE mode supported for NTile function SELECT foo, ntile(10) OVER (PARTITION BY foo) FROM test; ...works... {code} I'm not sure if that is a bug or necessary. Either way the error message is not helpful as it's not documented anywhere what {{COMPLETE}} mode is. A cursory glance at the code didn't help me either. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 23799: HIVE-7390: refactor csv output format with in RFC mode and add one more option to support formatting as the csv format in hive cli
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23799/ --- (Updated July 30, 2014, 8:30 a.m.) Review request for hive. Changes --- 1. use hadoop.io.utils to close stream 2. change integrated test due to code changes 3. add quotedCsv format instead of option according to discussion 4. add one constructor parameter to specify the status of quoted Bugs: HIVE-7390 https://issues.apache.org/jira/browse/HIVE-7390 Repository: hive-git Description --- HIVE-7390: refactor csv output format with in RFC mode and add one more option to support formatting as the csv format in hive cli Diffs (updated) - beeline/pom.xml 6ec1d1aff3f35c097aa6054aae84faf2d63854f1 beeline/src/java/org/apache/hive/beeline/BeeLine.java 528a98e29c23421f9352bdf7c5edd3a9fae0e3ea beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 7853c3f38f3c3fb9ae0b9939c714f1dc940ba053 beeline/src/main/resources/BeeLine.properties 390d062b8dc52dfa790c7351f3db44c1e0dd7e37 itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java bd97aff5959fd9040fc0f0a1f6b782f2aa6f pom.xml b5a5697e6a3b689c2b244ba0338be541261eaa3d Diff: https://reviews.apache.org/r/23799/diff/ Testing --- Thanks, cheng xu
[jira] [Commented] (HIVE-7432) Remove deprecated Avro's Schema.parse usages
[ https://issues.apache.org/jira/browse/HIVE-7432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079067#comment-14079067 ] Hive QA commented on HIVE-7432: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12658545/HIVE-7432.patch {color:red}ERROR:{color} -1 due to 66 failed/errored test(s), 5838 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_change_schema org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_decimal org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_decimal_native org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_evolved_schemas org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_joins org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_joins_native org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_native org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_nullable_fields org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_partitioned_native org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_sanity_test org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_schema_evolution_native org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_schema_literal org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_serde org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeArrays org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeBytes org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeEnums org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeFixed org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeMapWithNullablePrimitiveValues org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeMapsWithPrimitiveKeys org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableEnums org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeRecords org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeUnions org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeVoidType org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyCaching org.apache.hadoop.hive.serde2.avro.TestAvroObjectInspectorGenerator.convertsNullableEnum org.apache.hadoop.hive.serde2.avro.TestAvroObjectInspectorGenerator.objectInspectorsAreCached org.apache.hadoop.hive.serde2.avro.TestAvroSerde.initializeDoesNotReuseSchemasFromConf org.apache.hadoop.hive.serde2.avro.TestAvroSerdeUtils.determineSchemaCanReadSchemaFromHDFS org.apache.hadoop.hive.serde2.avro.TestAvroSerdeUtils.getTypeFromNullableTypePositiveCase org.apache.hadoop.hive.serde2.avro.TestAvroSerdeUtils.isNullableTypeAcceptsNullableUnions org.apache.hadoop.hive.serde2.avro.TestAvroSerdeUtils.noneOptionWorksForSpecifyingSchemas org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeArraysWithNullableComplexElements org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeArraysWithNullablePrimitiveElements org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeBooleans org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeBytes org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeDecimals org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeDoubles org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeEnums org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeFixed org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeFloats org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeInts org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeListOfDecimals org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeLists org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeMapOfDecimals org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeMaps org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeMapsWithNullableComplexValues org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeMapsWithNullablePrimitiveValues org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeNullableBytes org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeNullableDecimals org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeNullableEnums org.apache.hadoop.hive.serde2.avro.TestAvroSerializer.canSerializeNullableFixed
[jira] [Commented] (HIVE-7509) Fast stripe level merging for ORC
[ https://issues.apache.org/jira/browse/HIVE-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079068#comment-14079068 ] Hive QA commented on HIVE-7509: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12658568/HIVE-7509.4.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/100/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/100/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-100/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-100/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerializer.java' Reverted 'serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java' Reverted 'serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java' Reverted 'serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerde.java' Reverted 'serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java' Reverted 'serde/src/test/org/apache/hadoop/hive/serde2/avro/TestThatEvolvedSchemasActAsWeWant.java' Reverted 'serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroObjectInspectorGenerator.java' Reverted 'serde/src/test/org/apache/hadoop/hive/serde2/avro/TestGenericAvroRecordWritable.java' Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/avro/SchemaResolutionProblem.java' Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java' Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java' ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1614583. At revision 1614583. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12658568 Fast stripe level merging for ORC
[jira] [Updated] (HIVE-7390) Make quote character optional and configurable in BeeLine CSV/TSV output
[ https://issues.apache.org/jira/browse/HIVE-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-7390: --- Attachment: HIVE-7390.4.patch code changes according to the discussion Make quote character optional and configurable in BeeLine CSV/TSV output Key: HIVE-7390 URL: https://issues.apache.org/jira/browse/HIVE-7390 Project: Hive Issue Type: New Feature Components: Clients Affects Versions: 0.13.1 Reporter: Jim Halfpenny Assignee: Ferdinand Xu Attachments: HIVE-7390.1.patch, HIVE-7390.2.patch, HIVE-7390.3.patch, HIVE-7390.4.patch, HIVE-7390.patch Currently when either the CSV or TSV output formats are used in beeline each column is wrapped in single quotes. Quote wrapping of columns should be optional and the user should be able to choose the character used to wrap the columns. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-4933) Document how aliases work with the OVER clause
[ https://issues.apache.org/jira/browse/HIVE-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Francke updated HIVE-4933: --- Summary: Document how aliases work with the OVER clause (was: Can't use alias directly before OVER clause) Document how aliases work with the OVER clause -- Key: HIVE-4933 URL: https://issues.apache.org/jira/browse/HIVE-4933 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Lars Francke Priority: Minor {code} CREATE TABLE test (foo INT); hive SELECT SUM(foo) AS bar OVER (PARTITION BY foo) FROM test; MismatchedTokenException(175!=110) at org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617) at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromClause(HiveParser_FromClauseParser.java:1424) at org.apache.hadoop.hive.ql.parse.HiveParser.fromClause(HiveParser.java:35998) at org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:33974) at org.apache.hadoop.hive.ql.parse.HiveParser.regular_body(HiveParser.java:33882) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatement(HiveParser.java:33389) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:33169) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1284) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:983) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:434) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:352) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:995) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1038) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:921) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:790) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:623) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) FAILED: ParseException line 1:20 mismatched input 'OVER' expecting FROM near 'bar' in from clause{code} The same happens without the {{AS}} but it works when leaving out the alias entirely. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4933) Document how aliases work with the OVER clause
[ https://issues.apache.org/jira/browse/HIVE-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079077#comment-14079077 ] Lars Francke commented on HIVE-4933: The proper usage turns out to be {code:sql} SELECT SUM(foo) OVER (PARTITION BY foo) AS bar FROM test; {code} I have added documentation to the Wiki for this. Document how aliases work with the OVER clause -- Key: HIVE-4933 URL: https://issues.apache.org/jira/browse/HIVE-4933 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Lars Francke Assignee: Lars Francke Priority: Minor {code} CREATE TABLE test (foo INT); hive SELECT SUM(foo) AS bar OVER (PARTITION BY foo) FROM test; MismatchedTokenException(175!=110) at org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617) at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromClause(HiveParser_FromClauseParser.java:1424) at org.apache.hadoop.hive.ql.parse.HiveParser.fromClause(HiveParser.java:35998) at org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:33974) at org.apache.hadoop.hive.ql.parse.HiveParser.regular_body(HiveParser.java:33882) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatement(HiveParser.java:33389) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:33169) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1284) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:983) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:434) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:352) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:995) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1038) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:921) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:790) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:623) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) FAILED: ParseException line 1:20 mismatched input 'OVER' expecting FROM near 'bar' in from clause{code} The same happens without the {{AS}} but it works when leaving out the alias entirely. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-4933) Document how aliases work with the OVER clause
[ https://issues.apache.org/jira/browse/HIVE-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Francke reassigned HIVE-4933: -- Assignee: Lars Francke Document how aliases work with the OVER clause -- Key: HIVE-4933 URL: https://issues.apache.org/jira/browse/HIVE-4933 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Lars Francke Assignee: Lars Francke Priority: Minor {code} CREATE TABLE test (foo INT); hive SELECT SUM(foo) AS bar OVER (PARTITION BY foo) FROM test; MismatchedTokenException(175!=110) at org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617) at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromClause(HiveParser_FromClauseParser.java:1424) at org.apache.hadoop.hive.ql.parse.HiveParser.fromClause(HiveParser.java:35998) at org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:33974) at org.apache.hadoop.hive.ql.parse.HiveParser.regular_body(HiveParser.java:33882) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatement(HiveParser.java:33389) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:33169) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1284) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:983) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:434) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:352) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:995) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1038) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:921) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:790) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:623) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) FAILED: ParseException line 1:20 mismatched input 'OVER' expecting FROM near 'bar' in from clause{code} The same happens without the {{AS}} but it works when leaving out the alias entirely. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-7327) Refactoring: make Hive map side data processing reusable
[ https://issues.apache.org/jira/browse/HIVE-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved HIVE-7327. --- Resolution: Won't Fix Closed as not fix. Will reopen if need comes back. Refactoring: make Hive map side data processing reusable Key: HIVE-7327 URL: https://issues.apache.org/jira/browse/HIVE-7327 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang ExecMapper is Hive's mapper implementation for MapReduce. Table rows are read by MR framework and processed by ExecMapper.map() method, which invokes Hive's map-side operator tree starting from MapOperator. This task is to extract the map-side data processing offered by the operator tree so that it can be used by other execution engine such as Spark. This is purely refactoring the existing code. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-7328) Refactoring: make Hive reduce side data processing reusable
[ https://issues.apache.org/jira/browse/HIVE-7328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved HIVE-7328. --- Resolution: Won't Fix Closed as will not fix. Will reopen if need comes back. Refactoring: make Hive reduce side data processing reusable --- Key: HIVE-7328 URL: https://issues.apache.org/jira/browse/HIVE-7328 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang ExecReducer is Hive's reducer implementation for MapReduce. Table rows are shuffled by MR framework to ExecReducer and further processed by ExecReducer.reduce() method, which invokes Hive's reduce-side operator tree starting. This task is to extract the reduce-side data processing offered by the operator tree so that it can be reused by other execution engine such as Spark. This is purely refactoring the existing code. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7552) Collect spark job statistic through spark metrics[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7552: Description: MR/Tez use counters to collect job statistic information, while Spark does not use accumulator to do the same thing. Instead, Spark store task metrics information in TaskMetrics and send it back to scheduler. We could get spark job statistic information through combine all TaskMetrics with SparkListener. NO PRECOMMIT TESTS. This is for spark-branch only. was: MR/Tez use counters to collect job statistic information, while Spark has a configurable metrics system based on the Coda Hale Metrics Library. We could collect spark job statistic information through spark metrics system in hive driver side. NO PRECOMMIT TESTS. This is for spark-branch only. Collect spark job statistic through spark metrics[Spark Branch] --- Key: HIVE-7552 URL: https://issues.apache.org/jira/browse/HIVE-7552 Project: Hive Issue Type: New Feature Components: Spark Reporter: Chengxiang Li MR/Tez use counters to collect job statistic information, while Spark does not use accumulator to do the same thing. Instead, Spark store task metrics information in TaskMetrics and send it back to scheduler. We could get spark job statistic information through combine all TaskMetrics with SparkListener. NO PRECOMMIT TESTS. This is for spark-branch only. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7436) Load Spark configuration into Hive driver
[ https://issues.apache.org/jira/browse/HIVE-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079131#comment-14079131 ] Xuefu Zhang commented on HIVE-7436: --- [~chengxiang li], I guess for now expecting spark-defaults.conf from hadoop classpath is fine for now, though we might need to go back to revisit and rebrainstorm on this. Note that we don't have to follow exactly what Tez did on every aspect, but I agree it can serve as a good reference point, giving users a similar experience. Load Spark configuration into Hive driver - Key: HIVE-7436 URL: https://issues.apache.org/jira/browse/HIVE-7436 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Fix For: spark-branch Attachments: HIVE-7436-Spark.1.patch, HIVE-7436-Spark.2.patch, HIVE-7436-Spark.3.patch load Spark configuration into Hive driver, there are 3 ways to setup spark configurations: # Java property. # Configure properties in spark configuration file(spark-defaults.conf). # Hive configuration file(hive-site.xml). The below configuration has more priority, and would overwrite previous configuration with the same property name. Please refer to [http://spark.apache.org/docs/latest/configuration.html] for all configurable properties of spark, and you can configure spark configuration in Hive through following ways: # Configure through spark configuration file. #* Create spark-defaults.conf, and place it in the /etc/spark/conf configuration directory. configure properties in spark-defaults.conf in java properties format. #* Create the $SPARK_CONF_DIR environment variable and set it to the location of spark-defaults.conf. export SPARK_CONF_DIR=/etc/spark/conf #* Add $SAPRK_CONF_DIR to the $HADOOP_CLASSPATH environment variable. export HADOOP_CLASSPATH=$SPARK_CONF_DIR:$HADOOP_CLASSPATH # Configure through hive configuration file. #* edit hive-site.xml in hive conf directory, configure properties in spark-defaults.conf in xml format. Hive driver default spark properties: ||name||default value||description|| |spark.master|local|Spark master url.| |spark.app.name|Hive on Spark|Default Spark application name.| NO PRECOMMIT TESTS. This is for spark-branch only. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7436) Load Spark configuration into Hive driver
[ https://issues.apache.org/jira/browse/HIVE-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079150#comment-14079150 ] Xuefu Zhang commented on HIVE-7436: --- One more question: where did you see that tez-site.xml is read from classpath by Hive, in the code or documentation somewhere? I wasn't able to find either. Load Spark configuration into Hive driver - Key: HIVE-7436 URL: https://issues.apache.org/jira/browse/HIVE-7436 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Fix For: spark-branch Attachments: HIVE-7436-Spark.1.patch, HIVE-7436-Spark.2.patch, HIVE-7436-Spark.3.patch load Spark configuration into Hive driver, there are 3 ways to setup spark configurations: # Java property. # Configure properties in spark configuration file(spark-defaults.conf). # Hive configuration file(hive-site.xml). The below configuration has more priority, and would overwrite previous configuration with the same property name. Please refer to [http://spark.apache.org/docs/latest/configuration.html] for all configurable properties of spark, and you can configure spark configuration in Hive through following ways: # Configure through spark configuration file. #* Create spark-defaults.conf, and place it in the /etc/spark/conf configuration directory. configure properties in spark-defaults.conf in java properties format. #* Create the $SPARK_CONF_DIR environment variable and set it to the location of spark-defaults.conf. export SPARK_CONF_DIR=/etc/spark/conf #* Add $SAPRK_CONF_DIR to the $HADOOP_CLASSPATH environment variable. export HADOOP_CLASSPATH=$SPARK_CONF_DIR:$HADOOP_CLASSPATH # Configure through hive configuration file. #* edit hive-site.xml in hive conf directory, configure properties in spark-defaults.conf in xml format. Hive driver default spark properties: ||name||default value||description|| |spark.master|local|Spark master url.| |spark.app.name|Hive on Spark|Default Spark application name.| NO PRECOMMIT TESTS. This is for spark-branch only. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 23799: HIVE-7390: refactor csv output format with in RFC mode and add one more option to support formatting as the csv format in hive cli
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23799/#review49091 --- In general this feels a bit awkward. I think better CSV/TSV support is a good idea but quotedCsv seems misleading as the old csv and tsv now quote as well if the separator is contained in the column value. beeline/src/java/org/apache/hive/beeline/BeeLine.java https://reviews.apache.org/r/23799/#comment85924 Missing space here and next line beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java https://reviews.apache.org/r/23799/#comment85920 remove this and call to getSeparator, can just be separator. beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java https://reviews.apache.org/r/23799/#comment85915 Can be converted to a variable arity function (e.g. String... vals) beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java https://reviews.apache.org/r/23799/#comment85916 Rename to writer? beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java https://reviews.apache.org/r/23799/#comment85917 Same as above: Can be converted to variable arity method beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java https://reviews.apache.org/r/23799/#comment85918 ...variable arity beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java https://reviews.apache.org/r/23799/#comment85919 Remove this and probably replace the call to isSingleQuoted with just singleQuoted, no need to go through a simple getter beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java https://reviews.apache.org/r/23799/#comment85923 Missing spaces around the else beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java https://reviews.apache.org/r/23799/#comment85922 I'd either remove the getter and setters entirely or they need changing so that things are properly updated when separator/singleQuoted/csvPreference are changed. Example: Someone passes in a CsvPreference with a different separator than the one set in here. I think part of this patch needs to be the removal of all these simple (getter/)setters. If you don't want that then you need some verification logic that things make sense. beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java https://reviews.apache.org/r/23799/#comment85921 This is not a getter but a setter. - Lars Francke On July 30, 2014, 8:30 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23799/ --- (Updated July 30, 2014, 8:30 a.m.) Review request for hive. Bugs: HIVE-7390 https://issues.apache.org/jira/browse/HIVE-7390 Repository: hive-git Description --- HIVE-7390: refactor csv output format with in RFC mode and add one more option to support formatting as the csv format in hive cli Diffs - beeline/pom.xml 6ec1d1aff3f35c097aa6054aae84faf2d63854f1 beeline/src/java/org/apache/hive/beeline/BeeLine.java 528a98e29c23421f9352bdf7c5edd3a9fae0e3ea beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 7853c3f38f3c3fb9ae0b9939c714f1dc940ba053 beeline/src/main/resources/BeeLine.properties 390d062b8dc52dfa790c7351f3db44c1e0dd7e37 itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java bd97aff5959fd9040fc0f0a1f6b782f2aa6f pom.xml b5a5697e6a3b689c2b244ba0338be541261eaa3d Diff: https://reviews.apache.org/r/23799/diff/ Testing --- Thanks, cheng xu
[jira] [Commented] (HIVE-7532) allow disabling direct sql per query with external metastore
[ https://issues.apache.org/jira/browse/HIVE-7532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079180#comment-14079180 ] Hive QA commented on HIVE-7532: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12658566/HIVE-7532.2.patch.txt {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5823 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/101/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/101/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-101/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12658566 allow disabling direct sql per query with external metastore Key: HIVE-7532 URL: https://issues.apache.org/jira/browse/HIVE-7532 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Navis Attachments: HIVE-7532.1.patch.txt, HIVE-7532.2.patch.txt Currently with external metastore, direct sql can only be disabled via metastore config globally. Perhaps it makes sense to have the ability to propagate the setting per query from client to override the metastore setting, e.g. if one particular query causes it to fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7390) Make quote character optional and configurable in BeeLine CSV/TSV output
[ https://issues.apache.org/jira/browse/HIVE-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079216#comment-14079216 ] Lars Francke commented on HIVE-7390: As noted in my review I'm not too sure about adding another format especially if it's called quotedCSV because that implies that the others aren't using quoting but they actually are when needed. The old way sometimes produces invalid CSV (when quoting or delimiter chars exist in the data) so I think it's a good idea to fix this (and super-csv seems to solve that). I'm not sure if preserving the old functionality is worth anything. And if you do then maybe deprecate it and name it `deprecatedCSV` or something like that. I'd be in favor of two options instead (similar to what was suggested originally) * Delimiter * Quoting character Maybe even a third: Quoting mode. I'm in favor of always adding quotes as it makes parsing easier (no need to check for quoted/unquoted columns etc.). If not adding that I'd vote in favor of changing the current quoting mode to the AllwaysQuote mode. Make quote character optional and configurable in BeeLine CSV/TSV output Key: HIVE-7390 URL: https://issues.apache.org/jira/browse/HIVE-7390 Project: Hive Issue Type: New Feature Components: Clients Affects Versions: 0.13.1 Reporter: Jim Halfpenny Assignee: Ferdinand Xu Attachments: HIVE-7390.1.patch, HIVE-7390.2.patch, HIVE-7390.3.patch, HIVE-7390.4.patch, HIVE-7390.patch Currently when either the CSV or TSV output formats are used in beeline each column is wrapped in single quotes. Quote wrapping of columns should be optional and the user should be able to choose the character used to wrap the columns. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7547) Add ipAddress and userName to ExecHook
[ https://issues.apache.org/jira/browse/HIVE-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079292#comment-14079292 ] Hive QA commented on HIVE-7547: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12658571/HIVE-7547.2.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5825 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/102/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/102/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-102/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12658571 Add ipAddress and userName to ExecHook -- Key: HIVE-7547 URL: https://issues.apache.org/jira/browse/HIVE-7547 Project: Hive Issue Type: New Feature Components: Diagnosability Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-7547.2.patch, HIVE-7547.patch Auditing tools should be able to know about the ipAddress and userName of the user executing operations. These could be made available through the Hive execution-hooks. -- This message was sent by Atlassian JIRA (v6.2#6252)
hive udf cannot recognize generic method
Hi there I am writing a hive UDF function. The input could be string, int, double etc. The return is based on the data type. I was trying to use the generic method, however, hive seems not recognize it. Here is the piece of code I have as example. public T T evaluate(final T s, final String column_name, final int bitmap) throws Exception { if (s instanceof Double) return (T) new Double(-1.0); Else if( s instance of Integer) Return (T) new Integer(-1) ; ….. } Does anyone know if hive supports the generic method ? Or I have to override the evaluate method for each type of input. Thanks Dan
[jira] [Commented] (HIVE-6437) DefaultHiveAuthorizationProvider should not initialize a new HiveConf
[ https://issues.apache.org/jira/browse/HIVE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079334#comment-14079334 ] Hive QA commented on HIVE-6437: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12658572/HIVE-6437.6.patch.txt {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5838 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/103/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/103/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-103/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12658572 DefaultHiveAuthorizationProvider should not initialize a new HiveConf - Key: HIVE-6437 URL: https://issues.apache.org/jira/browse/HIVE-6437 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 0.13.0 Reporter: Harsh J Assignee: Navis Priority: Trivial Attachments: HIVE-6437.1.patch.txt, HIVE-6437.2.patch.txt, HIVE-6437.3.patch.txt, HIVE-6437.4.patch.txt, HIVE-6437.5.patch.txt, HIVE-6437.6.patch.txt During a HS2 connection, every SessionState got initializes a new DefaultHiveAuthorizationProvider object (on stock configs). In turn, DefaultHiveAuthorizationProvider carries a {{new HiveConf(…)}} that may prove too expensive, and unnecessary to do, since SessionState itself sends in a fully applied HiveConf to it in the first place. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7554) Parquet Hive should resolve column names in case insensitive manner
Brock Noland created HIVE-7554: -- Summary: Parquet Hive should resolve column names in case insensitive manner Key: HIVE-7554 URL: https://issues.apache.org/jira/browse/HIVE-7554 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7554) Parquet Hive should resolve column names in case insensitive manner
[ https://issues.apache.org/jira/browse/HIVE-7554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7554: --- Attachment: HIVE-7554.patch Parquet Hive should resolve column names in case insensitive manner --- Key: HIVE-7554 URL: https://issues.apache.org/jira/browse/HIVE-7554 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-7554.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7554) Parquet Hive should resolve column names in case insensitive manner
[ https://issues.apache.org/jira/browse/HIVE-7554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079398#comment-14079398 ] Brock Noland commented on HIVE-7554: Patch cleans up ws. Parquet Hive should resolve column names in case insensitive manner --- Key: HIVE-7554 URL: https://issues.apache.org/jira/browse/HIVE-7554 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-7554.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7446) Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables
[ https://issues.apache.org/jira/browse/HIVE-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079418#comment-14079418 ] Hive QA commented on HIVE-7446: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12658576/HIVE-7446.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5840 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/104/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/104/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-104/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12658576 Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables -- Key: HIVE-7446 URL: https://issues.apache.org/jira/browse/HIVE-7446 Project: Hive Issue Type: New Feature Reporter: Ashish Kumar Singh Assignee: Ashish Kumar Singh Attachments: HIVE-7446.patch HIVE-6806 adds native support for creating hive table stored as Avro. It would be good to add support to ALTER TABLE .. ADD COLUMN to Avro backed tables. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6437) DefaultHiveAuthorizationProvider should not initialize a new HiveConf
[ https://issues.apache.org/jira/browse/HIVE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079544#comment-14079544 ] Thejas M Nair commented on HIVE-6437: - [~navis] The latest patch also has this change in SQLStdHiveAccessController.java to make admin role comparison case sensitive. But role names are not case sensitive in sql std auth mode (also documented in the wiki). {code} -if (!HiveMetaStore.ADMIN.equalsIgnoreCase(role.getRoleName())) { +if (!HiveMetaStore.ADMIN.equals(role.getRoleName())) { {code} DefaultHiveAuthorizationProvider should not initialize a new HiveConf - Key: HIVE-6437 URL: https://issues.apache.org/jira/browse/HIVE-6437 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 0.13.0 Reporter: Harsh J Assignee: Navis Priority: Trivial Attachments: HIVE-6437.1.patch.txt, HIVE-6437.2.patch.txt, HIVE-6437.3.patch.txt, HIVE-6437.4.patch.txt, HIVE-6437.5.patch.txt, HIVE-6437.6.patch.txt During a HS2 connection, every SessionState got initializes a new DefaultHiveAuthorizationProvider object (on stock configs). In turn, DefaultHiveAuthorizationProvider carries a {{new HiveConf(…)}} that may prove too expensive, and unnecessary to do, since SessionState itself sends in a fully applied HiveConf to it in the first place. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-7545) Tableau connecting with MapR ODBC driver cannot get more than 43 columns
[ https://issues.apache.org/jira/browse/HIVE-7545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata krishnan Sowrirajan resolved HIVE-7545. --- Resolution: Invalid Tableau connecting with MapR ODBC driver cannot get more than 43 columns Key: HIVE-7545 URL: https://issues.apache.org/jira/browse/HIVE-7545 Project: Hive Issue Type: Bug Environment: Tableau connecting using MapR ODBC driver - Windows Reporter: Venkata krishnan Sowrirajan Fix For: 0.13.1 Hive table with 170 columns and 1million rows. When I queried like 170 columns in a hive table with tableau using MapR ODBC driver. It cannot query more than 43 columns. After that its giving out an error saying [MapR][HiveODBC] (35) Error from Hive: error code: '10007' error message: 'Error while compiling statement: FAILED: SemanticException [Error 10007]: Ambiguous column reference c_43'. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7549) Code cleanup of Task.java and HiveInputFormat.java
[ https://issues.apache.org/jira/browse/HIVE-7549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079564#comment-14079564 ] Hive QA commented on HIVE-7549: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12658575/HIVE-7549.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5838 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/105/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/105/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-105/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12658575 Code cleanup of Task.java and HiveInputFormat.java -- Key: HIVE-7549 URL: https://issues.apache.org/jira/browse/HIVE-7549 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland Priority: Minor Attachments: HIVE-7549.patch While working on Hive + Spark I noticed some ugly code which I've seen before but neglected. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7509) Fast stripe level merging for ORC
[ https://issues.apache.org/jira/browse/HIVE-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7509: - Attachment: HIVE-7509.5.patch Thanks [~leftylev] for your comments. I fixed them in .5 patch. Fast stripe level merging for ORC - Key: HIVE-7509 URL: https://issues.apache.org/jira/browse/HIVE-7509 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Attachments: HIVE-7509.1.patch, HIVE-7509.2.patch, HIVE-7509.3.patch, HIVE-7509.4.patch, HIVE-7509.5.patch Similar to HIVE-1950, add support for fast stripe level merging of ORC files through CONCATENATE command and conditional merge task. This fast merging is ideal for merging many small ORC files to a larger file without decompressing and decoding the data of small orc files. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6437) DefaultHiveAuthorizationProvider should not initialize a new HiveConf
[ https://issues.apache.org/jira/browse/HIVE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079594#comment-14079594 ] Thejas M Nair commented on HIVE-6437: - Can you also please update the reviewboard with new patch ? DefaultHiveAuthorizationProvider should not initialize a new HiveConf - Key: HIVE-6437 URL: https://issues.apache.org/jira/browse/HIVE-6437 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 0.13.0 Reporter: Harsh J Assignee: Navis Priority: Trivial Attachments: HIVE-6437.1.patch.txt, HIVE-6437.2.patch.txt, HIVE-6437.3.patch.txt, HIVE-6437.4.patch.txt, HIVE-6437.5.patch.txt, HIVE-6437.6.patch.txt During a HS2 connection, every SessionState got initializes a new DefaultHiveAuthorizationProvider object (on stock configs). In turn, DefaultHiveAuthorizationProvider carries a {{new HiveConf(…)}} that may prove too expensive, and unnecessary to do, since SessionState itself sends in a fully applied HiveConf to it in the first place. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 23953: HIVE-7519: Refactor QTestUtil to remove its duplication with QFileClient for qtest setup and teardown
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23953/#review49132 --- Ship it! Looks good to me, pending test fixes. - Szehon Ho On July 29, 2014, 11:46 p.m., Ashish Singh wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23953/ --- (Updated July 29, 2014, 11:46 p.m.) Review request for hive. Bugs: HIVE-7519 https://issues.apache.org/jira/browse/HIVE-7519 Repository: hive-git Description --- HIVE-7519: Refactor QTestUtil to remove its duplication with QFileClient for qtest setup and teardown Diffs - ant/src/org/apache/hadoop/hive/ant/QTestGenTask.java 33f227fe6eb0ea6df936775f02e4339ed496f6ad data/conf/hive-site.xml fe8080addcadac4d52868866457dd038ea8d3d91 data/conf/tez/hive-site.xml 0c99bb6914bd26de26cef77cf29cf37f070098dc data/scripts/q_test_cleanup.sql 31bd7205d85916ea352f715f2fd1462efc788208 data/scripts/q_test_init.sql 12afdf391132e3fdd219aaa581e1f2e210d6dee2 hbase-handler/src/test/templates/TestHBaseCliDriver.vm 01d596aa6591ddccff016436c7f31324b3896d00 hbase-handler/src/test/templates/TestHBaseNegativeCliDriver.vm 45c73389cb26d0d461080cc146c5d74aee199c4e itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestLocationQueries.java 9edd7f30ff91bf7e01a2f52699192994fe0829f5 itests/qtest/pom.xml 249956fc170c0cef2b8f98454fa952c498b9e29e itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseQTestUtil.java 96a0de2829c2ec065b7835b12c4932d1278f9a84 itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 2fefa067791bd74412c0b4efb697dc0d8bb03cd7 ql/src/test/templates/TestCliDriver.vm 4776c75c16329c7d3f6f1a032eef192d553cc3cc ql/src/test/templates/TestCompareCliDriver.vm f6f43b847fdd4039328632ef70d841fce9006d6d ql/src/test/templates/TestNegativeCliDriver.vm 991d5ac1b2fde66dbe60b39c853916577449b1a4 ql/src/test/templates/TestParse.vm c476536940dc3a48000bf4e60e0b551ec7904d63 ql/src/test/templates/TestParseNegative.vm f62f17e4df5c1439d3787fc5c361804121bfcaf1 Diff: https://reviews.apache.org/r/23953/diff/ Testing --- qTests. Thanks, Ashish Singh
[jira] [Commented] (HIVE-7519) Refactor QTestUtil to remove its duplication with QFileClient for qtest setup and teardown
[ https://issues.apache.org/jira/browse/HIVE-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079606#comment-14079606 ] Szehon Ho commented on HIVE-7519: - +1, pending tests. This is good code cleanup. Refactor QTestUtil to remove its duplication with QFileClient for qtest setup and teardown --- Key: HIVE-7519 URL: https://issues.apache.org/jira/browse/HIVE-7519 Project: Hive Issue Type: Improvement Reporter: Ashish Kumar Singh Assignee: Ashish Kumar Singh Attachments: HIVE-7519.1.patch, HIVE-7519.patch QTestUtil hard codes creation and dropping of source tables for qtests. QFileClient does the same thing but in a better way, uses q_test_init.sql and q_test_cleanup.sql scripts. As QTestUtil is growing quite large it makes sense to refactor it to use QFileClient's approach. This will also remove duplication of code addressing same purpose. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7550) Extend cached evaluation to multiple expressions
[ https://issues.apache.org/jira/browse/HIVE-7550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079706#comment-14079706 ] Hive QA commented on HIVE-7550: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12658580/HIVE-7550.1.patch.txt {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5838 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table_udfs org.apache.hadoop.hive.cli.TestCompareCliDriver.testCompareCliDriver_vectorized_math_funcs org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/106/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/106/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-106/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12658580 Extend cached evaluation to multiple expressions Key: HIVE-7550 URL: https://issues.apache.org/jira/browse/HIVE-7550 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-7550.1.patch.txt Currently, hive.cache.expr.evaluation caches per expression. But cache context might be shared for multiple expressions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7555) inner join is being resolves as cartesian product
J. Tipan Verella created HIVE-7555: -- Summary: inner join is being resolves as cartesian product Key: HIVE-7555 URL: https://issues.apache.org/jira/browse/HIVE-7555 Project: Hive Issue Type: Bug Environment: CentOS Reporter: J. Tipan Verella I believe this is a bug, because I do not seem to be able to find a way around the following stackoverflow question, http://stackoverflow.com/questions/25020190/hive-query-returns-cartesian-product-instead-of-inner-join The issue is as follows (repeated from SO for convenience). This is type of query I am sending to HIVE: SELECT BigTable.nicefield,LargeTable.* FROM LargeTable INNER JOIN BigTable ON ( LargeTable.joinfield1of4 = BigTable.joinfield1of4 AND LargeTable.joinfield2of4 = BigTable.joinfield2of4 ) WHERE LargeTable.joinfield3of4=20140726 AND LargeTable.joinfield4of4=15 AND BigTable.joinfield3of4=20140726 AND BigTable.joinfield4of4=15 AND LargeTable.filterfiled1of2=123456 AND LargeTable.filterfiled2of2=98765 AND LargeTable.joinfield2of4=12 AND LargeTable.joinfield1of4='iwanttolikehive' It returns `2418025` rows. The issue is that SELECT * FROM LargeTable WHERE joinfield3of4=20140726 AND joinfield4of4=15 AND filterfiled1of2=123456 AND filterfiled2of2=98765 AND joinfield2of4=12 AND joinfield1of4='iwanttolikehive' returns `1555` rows, and so does: SELECT * FROM BigTable WHERE joinfield3of4=20140726 AND joinfield4of4=15 AND joinfield2of4=12 AND joinfield1of4='iwanttolikehive' Note that **1555^2 = 2418025**. Feel free to discard this issue if it is not a bug, but please provide a solution on SO. Thank you. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: hive udf cannot recognize generic method
Sounds like you are using the older style UDF class. In that case, yes you would have to override evaluate() for each type of input. You could also try overriding the GenericUDF class - that would allow you to do a single method, though it may be a bit more complicated (can look at the Hive code for some examples) On Jul 30, 2014, at 7:43 AM, Dan Fan d...@appnexus.com wrote: Hi there I am writing a hive UDF function. The input could be string, int, double etc. The return is based on the data type. I was trying to use the generic method, however, hive seems not recognize it. Here is the piece of code I have as example. public T T evaluate(final T s, final String column_name, final int bitmap) throws Exception { if (s instanceof Double) return (T) new Double(-1.0); Else if( s instance of Integer) Return (T) new Integer(-1) ; ….. } Does anyone know if hive supports the generic method ? Or I have to override the evaluate method for each type of input. Thanks Dan -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Updated] (HIVE-7549) Code cleanup of Task.java and HiveInputFormat.java
[ https://issues.apache.org/jira/browse/HIVE-7549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7549: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Thank you for the review Ashutosh! I have committed this to trunk. Code cleanup of Task.java and HiveInputFormat.java -- Key: HIVE-7549 URL: https://issues.apache.org/jira/browse/HIVE-7549 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7549.patch While working on Hive + Spark I noticed some ugly code which I've seen before but neglected. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7390) Make quote character optional and configurable in BeeLine CSV/TSV output
[ https://issues.apache.org/jira/browse/HIVE-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079776#comment-14079776 ] Szehon Ho commented on HIVE-7390: - Thanks for the details, I was just reading the earlier comments and wrongly assumed that the two valid CSV options ones are double-quotes, and no quotes at all. You're right that normal quote mode still means quotes sometimes, so my proposed naming didnt make sense, sorry about that Ferdinand. So we should: # Fix the current CSV to conform by using super-csv (like the patch I originally looked at in HIVE-7434). No debate on that. # See what CSV options (if any) we are going to expose I'd still try to keep it simple if possible. Can we expose quote mode only? (always, normal). Im not sure if delimiter, quote character would add that much value, but I'm not heavy CSV user. Thoughts? Make quote character optional and configurable in BeeLine CSV/TSV output Key: HIVE-7390 URL: https://issues.apache.org/jira/browse/HIVE-7390 Project: Hive Issue Type: New Feature Components: Clients Affects Versions: 0.13.1 Reporter: Jim Halfpenny Assignee: Ferdinand Xu Attachments: HIVE-7390.1.patch, HIVE-7390.2.patch, HIVE-7390.3.patch, HIVE-7390.4.patch, HIVE-7390.patch Currently when either the CSV or TSV output formats are used in beeline each column is wrapped in single quotes. Quote wrapping of columns should be optional and the user should be able to choose the character used to wrap the columns. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24084: HIVE-7547 - Add ipAddress and userName to ExecHook
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24084/#review49141 --- itests/hive-minikdc/src/test/java/org/apache/hive/minikdc/TestHs2HooksWithMiniKdc.java https://reviews.apache.org/r/24084/#comment85986 if the hook does not run these two NPE. Let's have an assertion first for not null ql/src/java/org/apache/hadoop/hive/ql/Driver.java https://reviews.apache.org/r/24084/#comment85987 Let's put this in javadoc format service/src/java/org/apache/hive/service/cli/CLIService.java https://reviews.apache.org/r/24084/#comment85988 If this should not happen, should we throw these? - Brock Noland On July 30, 2014, 2:13 a.m., Szehon Ho wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24084/ --- (Updated July 30, 2014, 2:13 a.m.) Review request for hive. Bugs: HIVE-7547 https://issues.apache.org/jira/browse/HIVE-7547 Repository: hive-git Description --- Passing the ipAddress and userName (already calculated in ThriftCLIService for other purposes) through several layers down to the hooks. Diffs - itests/hive-minikdc/src/test/java/org/apache/hive/minikdc/TestHs2HooksWithMiniKdc.java PRE-CREATION itests/hive-unit/src/test/java/org/apache/hadoop/hive/hooks/TestHs2Hooks.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/Driver.java e512199 ql/src/java/org/apache/hadoop/hive/ql/hooks/HookContext.java b11cb86 service/src/java/org/apache/hive/service/cli/CLIService.java add37a1 service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java de54ca1 service/src/java/org/apache/hive/service/cli/session/HiveSession.java 9785e95 service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 5c87bcb Diff: https://reviews.apache.org/r/24084/diff/ Testing --- Added tests in both kerberos and non-kerberos mode. Thanks, Szehon Ho
Re: Why does SMB join generate hash table locally, even if input tables are large?
+hive-users On Tue, Jul 29, 2014 at 1:56 PM, Pala M Muthaia mchett...@rocketfuelinc.com wrote: Hi, I am testing SMB join for 2 large tables. The tables are bucketed and sorted on the join column. I notice that even though the table is large, Hive attempts to generate hash table for the 'small' table locally, similar to map join. Since the table is large in my case, the client runs out of memory and the query fails. I am using Hive 0.12 with the following settings: set hive.optimize.bucketmapjoin=true; set hive.optimize.bucketmapjoin.sortedmerge=true; set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; My test query does a simple join and a select, no subqueries/nested queries etc. I understand why a (bucket) map join requires hash table generation, but why is that included for an SMB join? Shouldn't a SMB join just spin up one mapper for each bucket and perform a sort merge join directly on the mapper? Thanks, pala
[jira] [Created] (HIVE-7556) Fix code style, license header, tabs, etc. [Spark Branch]
Xuefu Zhang created HIVE-7556: - Summary: Fix code style, license header, tabs, etc. [Spark Branch] Key: HIVE-7556 URL: https://issues.apache.org/jira/browse/HIVE-7556 Project: Hive Issue Type: Bug Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-7556.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7556) Fix code style, license header, tabs, etc. [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7556: -- Attachment: HIVE-7556.patch Fix code style, license header, tabs, etc. [Spark Branch] - Key: HIVE-7556 URL: https://issues.apache.org/jira/browse/HIVE-7556 Project: Hive Issue Type: Bug Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-7556.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7556) Fix code style, license header, tabs, etc. [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7556: -- Status: Patch Available (was: Open) Fix code style, license header, tabs, etc. [Spark Branch] - Key: HIVE-7556 URL: https://issues.apache.org/jira/browse/HIVE-7556 Project: Hive Issue Type: Bug Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-7556.patch NO PRECOMMIT TESTS. This is for spark branch only. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7556) Fix code style, license header, tabs, etc. [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7556: -- Description: NO PRECOMMIT TESTS. This is for spark branch only. Fix code style, license header, tabs, etc. [Spark Branch] - Key: HIVE-7556 URL: https://issues.apache.org/jira/browse/HIVE-7556 Project: Hive Issue Type: Bug Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-7556.patch NO PRECOMMIT TESTS. This is for spark branch only. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7556) Fix code style, license header, tabs, etc. [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7556: -- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Patch committed to spark branch. Fix code style, license header, tabs, etc. [Spark Branch] - Key: HIVE-7556 URL: https://issues.apache.org/jira/browse/HIVE-7556 Project: Hive Issue Type: Bug Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: spark-branch Attachments: HIVE-7556.patch NO PRECOMMIT TESTS. This is for spark branch only. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7029) Vectorize ReduceWork
[ https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079821#comment-14079821 ] Matt McCline commented on HIVE-7029: Temporarily off (Tez) dynpart_sort_opt_vectorization.q test. Created https://issues.apache.org/jira/browse/HIVE-7557 to cover that issue. Vectorize ReduceWork Key: HIVE-7029 URL: https://issues.apache.org/jira/browse/HIVE-7029 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, HIVE-7029.4.patch, HIVE-7029.5.patch, HIVE-7029.6.patch, HIVE-7029.7.patch This will enable vectorization team to independently work on vectorization on reduce side even before vectorized shuffle is ready. NOTE: Tez only (i.e. TezTask only) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7557) When reduce is vectorized, dynpart_sort_opt_vectorization.q under Tez fails
Matt McCline created HIVE-7557: -- Summary: When reduce is vectorized, dynpart_sort_opt_vectorization.q under Tez fails Key: HIVE-7557 URL: https://issues.apache.org/jira/browse/HIVE-7557 Project: Hive Issue Type: Bug Reporter: Matt McCline Assignee: Rajesh Balamohan Turned off dynpart_sort_opt_vectorization.q (Tez) since it fails when reduce is vectorized to get HIVE-7029 checked in. Stack trace: {code} Container released by application, AttemptID:attempt_1406747677386_0003_2_00_00_2 Info:Error: java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing vector batch (tag=0) [Error getting row data with exception java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.DoubleColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.LongColumnVector at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:168) at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:159) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.processVectors(ReduceRecordProcessor.java:481) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.processRows(ReduceRecordProcessor.java:371) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:291) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:165) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307) at org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:562) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:394) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:551) ] at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:188) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307) at org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:562) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:394) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:551) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing vector batch (tag=0) [Error getting row data with exception java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.DoubleColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.LongColumnVector at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:168) at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:159) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.processVectors(ReduceRecordProcessor.java:481) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.processRows(ReduceRecordProcessor.java:371) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:291) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:165) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307) at org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:562) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:394) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:551) ] at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.processRows(ReduceRecordProcessor.java:382) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:291) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:165) ... 6 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing vector batch (tag=0) [Error getting row data with exception java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.DoubleColumnVector cannot be cast to
[jira] [Updated] (HIVE-7029) Vectorize ReduceWork
[ https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-7029: --- Status: In Progress (was: Patch Available) Vectorize ReduceWork Key: HIVE-7029 URL: https://issues.apache.org/jira/browse/HIVE-7029 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, HIVE-7029.4.patch, HIVE-7029.5.patch, HIVE-7029.6.patch, HIVE-7029.7.patch, HIVE-7029.8.patch This will enable vectorization team to independently work on vectorization on reduce side even before vectorized shuffle is ready. NOTE: Tez only (i.e. TezTask only) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7029) Vectorize ReduceWork
[ https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-7029: --- Attachment: HIVE-7029.8.patch Vectorize ReduceWork Key: HIVE-7029 URL: https://issues.apache.org/jira/browse/HIVE-7029 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, HIVE-7029.4.patch, HIVE-7029.5.patch, HIVE-7029.6.patch, HIVE-7029.7.patch, HIVE-7029.8.patch This will enable vectorization team to independently work on vectorization on reduce side even before vectorized shuffle is ready. NOTE: Tez only (i.e. TezTask only) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7029) Vectorize ReduceWork
[ https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-7029: --- Status: Patch Available (was: In Progress) Vectorize ReduceWork Key: HIVE-7029 URL: https://issues.apache.org/jira/browse/HIVE-7029 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, HIVE-7029.4.patch, HIVE-7029.5.patch, HIVE-7029.6.patch, HIVE-7029.7.patch, HIVE-7029.8.patch This will enable vectorization team to independently work on vectorization on reduce side even before vectorized shuffle is ready. NOTE: Tez only (i.e. TezTask only) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7526) Research to use groupby transformation to replace Hive existing partitionByKey and SparkCollector combination
[ https://issues.apache.org/jira/browse/HIVE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao updated HIVE-7526: --- Attachment: HIVE-7526.3.patch An attempt to fix the last patch by moving groupBy op to ShuffleTran. Also, since now SparkTran::transform may have input/output value types other than BytesWritable, we need to make it generic as well.. Also added a CompTran class, which is basically a composition of transformations. It offers better type compatibility than ChainedTran. This is NOT the perfect solution, and may subject to further change. Research to use groupby transformation to replace Hive existing partitionByKey and SparkCollector combination - Key: HIVE-7526 URL: https://issues.apache.org/jira/browse/HIVE-7526 Project: Hive Issue Type: Task Components: Spark Reporter: Xuefu Zhang Assignee: Chao Attachments: HIVE-7526.2.patch, HIVE-7526.3.patch, HIVE-7526.patch Currently SparkClient shuffles data by calling paritionByKey(). This transformation outputs key, value tuples. However, Hive's ExecMapper expects key, iteratorvalue tuples, and Spark's groupByKey() seems outputing this directly. Thus, using groupByKey, we may be able to avoid its own key clustering mechanism (in HiveReduceFunction). This research is to have a try. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7348) Beeline could not parse ; separated queries provided with -e option
[ https://issues.apache.org/jira/browse/HIVE-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079827#comment-14079827 ] Hive QA commented on HIVE-7348: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12658582/HIVE-7348.patch {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 5838 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hive.beeline.TestBeeLineWithArgs.testBeelineHiveConfVariable org.apache.hive.beeline.TestBeeLineWithArgs.testBeelineHiveVariable org.apache.hive.beeline.TestBeeLineWithArgs.testBeelineMultiHiveVariable org.apache.hive.beeline.TestBeeLineWithArgs.testNullDefault org.apache.hive.beeline.TestBeeLineWithArgs.testNullEmpty org.apache.hive.beeline.TestBeeLineWithArgs.testNullEmptyCmdArg org.apache.hive.beeline.TestBeeLineWithArgs.testNullNonEmpty org.apache.hive.beeline.TestSchemaTool.testSchemaInit org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade org.apache.hive.beeline.TestSchemaTool.testSchemaUpgradeDryRun org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/107/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/107/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-107/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12658582 Beeline could not parse ; separated queries provided with -e option --- Key: HIVE-7348 URL: https://issues.apache.org/jira/browse/HIVE-7348 Project: Hive Issue Type: Bug Reporter: Ashish Kumar Singh Assignee: Ashish Kumar Singh Attachments: HIVE-7348.patch Beeline could not parse ; separated queries provided with -e option. This works fine on hive cli. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7558) HCatLoader reuses credentials across jobs
Thiruvel Thirumoolan created HIVE-7558: -- Summary: HCatLoader reuses credentials across jobs Key: HIVE-7558 URL: https://issues.apache.org/jira/browse/HIVE-7558 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Thiruvel Thirumoolan Fix For: 0.14.0 HCatLoader reuses credentials of stage1 in stage2 for some of the pig queries. This causes stage-2 to fail, if stage-2 runs for more than 10 mins. Pig queries which loads data using HCatLoader, filters only by partition columns and does an order by will run into this problem. Exceptions will be very similar to the following: 2014-07-22 17:28:49,337 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to recreate exception from backed error: AttemptID:attemptid Info:RemoteTrace: org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DELEGATION_TOKEN token tokenid for user) can't be found in cache at org.apache.hadoop.ipc.Client.call(Client.java:1095) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:195) at $Proxy7.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:102) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:67) at $Proxy7.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1305) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:734) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:176) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:51) at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:284) at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1300) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:281) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:51) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) at LocalTrace: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: token (HDFS_DELEGATION_TOKEN token tokenid for user) can't be found in cache at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217) at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:823) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:224) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46) at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:353) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1476) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1472) at java.security.AccessController.doPrivileged(Native Method) at
[jira] [Assigned] (HIVE-7558) HCatLoader reuses credentials across jobs
[ https://issues.apache.org/jira/browse/HIVE-7558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan reassigned HIVE-7558: -- Assignee: Thiruvel Thirumoolan HCatLoader reuses credentials across jobs - Key: HIVE-7558 URL: https://issues.apache.org/jira/browse/HIVE-7558 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Fix For: 0.14.0 HCatLoader reuses credentials of stage1 in stage2 for some of the pig queries. This causes stage-2 to fail, if stage-2 runs for more than 10 mins. Pig queries which loads data using HCatLoader, filters only by partition columns and does an order by will run into this problem. Exceptions will be very similar to the following: 2014-07-22 17:28:49,337 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to recreate exception from backed error: AttemptID:attemptid Info:RemoteTrace: org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DELEGATION_TOKEN token tokenid for user) can't be found in cache at org.apache.hadoop.ipc.Client.call(Client.java:1095) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:195) at $Proxy7.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:102) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:67) at $Proxy7.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1305) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:734) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:176) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:51) at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:284) at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1300) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:281) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:51) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) at LocalTrace: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: token (HDFS_DELEGATION_TOKEN token tokenid for user) can't be found in cache at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217) at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:823) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:224) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46) at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:353) at
[jira] [Updated] (HIVE-7558) HCatLoader reuses credentials across jobs
[ https://issues.apache.org/jira/browse/HIVE-7558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HIVE-7558: --- Attachment: HIVE-7558.patch Attaching patch. Do not copy job's credentials in HCatLoader's objects. HCatLoader reuses credentials across jobs - Key: HIVE-7558 URL: https://issues.apache.org/jira/browse/HIVE-7558 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Fix For: 0.14.0 Attachments: HIVE-7558.patch HCatLoader reuses credentials of stage1 in stage2 for some of the pig queries. This causes stage-2 to fail, if stage-2 runs for more than 10 mins. Pig queries which loads data using HCatLoader, filters only by partition columns and does an order by will run into this problem. Exceptions will be very similar to the following: 2014-07-22 17:28:49,337 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to recreate exception from backed error: AttemptID:attemptid Info:RemoteTrace: org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DELEGATION_TOKEN token tokenid for user) can't be found in cache at org.apache.hadoop.ipc.Client.call(Client.java:1095) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:195) at $Proxy7.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:102) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:67) at $Proxy7.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1305) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:734) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:176) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:51) at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:284) at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1300) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:281) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:51) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) at LocalTrace: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: token (HDFS_DELEGATION_TOKEN token tokenid for user) can't be found in cache at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217) at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:823) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:224) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46) at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57) at
[jira] [Commented] (HIVE-7547) Add ipAddress and userName to ExecHook
[ https://issues.apache.org/jira/browse/HIVE-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079912#comment-14079912 ] Thejas M Nair commented on HIVE-7547: - [~szehon] SessionState already provides username and IP address. (IP address part was added recently as part of HIVE-7416). I think SessionState is a good place to store and retrieve this session information. Add ipAddress and userName to ExecHook -- Key: HIVE-7547 URL: https://issues.apache.org/jira/browse/HIVE-7547 Project: Hive Issue Type: New Feature Components: Diagnosability Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-7547.2.patch, HIVE-7547.patch Auditing tools should be able to know about the ipAddress and userName of the user executing operations. These could be made available through the Hive execution-hooks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7443) Fix HiveConnection to communicate with Kerberized Hive JDBC server and alternative JDKs
[ https://issues.apache.org/jira/browse/HIVE-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079933#comment-14079933 ] Hive QA commented on HIVE-7443: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12658595/HIVE-7443.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5838 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/108/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/108/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-108/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12658595 Fix HiveConnection to communicate with Kerberized Hive JDBC server and alternative JDKs --- Key: HIVE-7443 URL: https://issues.apache.org/jira/browse/HIVE-7443 Project: Hive Issue Type: Bug Components: JDBC, Security Affects Versions: 0.12.0, 0.13.1 Environment: Kerberos Run Hive server2 and client with IBM JDK7.1 Reporter: Yu Gao Assignee: Yu Gao Attachments: HIVE-7443.patch Hive Kerberos authentication has been enabled in my cluster. I ran kinit to initialize the current login user's ticket cache successfully, and then tried to use beeline to connect to Hive Server2, but failed. After I manually added some logging to catch the failure exception, this is what I got that caused the failure: beeline !connect jdbc:hive2://hiveserver.host:1/default;principal=hive/hiveserver.host@REALM.COM org.apache.hive.jdbc.HiveDriver scan complete in 2ms Connecting to jdbc:hive2://hiveserver.host:1/default;principal=hive/hiveserver.host@REALM.COM Enter password for jdbc:hive2://hiveserver.host:1/default;principal=hive/hiveserver.host@REALM.COM: 14/07/17 15:12:45 ERROR jdbc.HiveConnection: Failed to open client transport javax.security.sasl.SaslException: Failed to open client transport [Caused by java.io.IOException: Could not instantiate SASL transport] at org.apache.hive.service.auth.KerberosSaslHelper.getKerberosTransport(KerberosSaslHelper.java:78) at org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:342) at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:200) at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:178) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:582) at java.sql.DriverManager.getConnection(DriverManager.java:198) at org.apache.hive.beeline.DatabaseConnection.connect(DatabaseConnection.java:145) at org.apache.hive.beeline.DatabaseConnection.getConnection(DatabaseConnection.java:186) at org.apache.hive.beeline.Commands.connect(Commands.java:959) at org.apache.hive.beeline.Commands.connect(Commands.java:880) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) at java.lang.reflect.Method.invoke(Method.java:619) at org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:44) at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:801) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:659) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:368) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:351) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) at java.lang.reflect.Method.invoke(Method.java:619) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.io.IOException: Could not instantiate SASL transport at
[jira] [Commented] (HIVE-6988) Hive changes for tez-0.5.x compatibility
[ https://issues.apache.org/jira/browse/HIVE-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079937#comment-14079937 ] Hive QA commented on HIVE-6988: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12658598/HIVE-6988.6.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/109/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/109/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-109/ Messages: {noformat} This message was trimmed, see log for full details As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:68:4: Decision can match input such as LPAREN KW_CASE KW_ARRAY using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:68:4: Decision can match input such as LPAREN KW_NOT SmallintLiteral using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:68:4: Decision can match input such as LPAREN KW_NOT TinyintLiteral using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:68:4: Decision can match input such as LPAREN LPAREN Number using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:115:5: Decision can match input such as KW_CLUSTER KW_BY LPAREN using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:127:5: Decision can match input such as KW_PARTITION KW_BY LPAREN using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:138:5: Decision can match input such as KW_DISTRIBUTE KW_BY LPAREN using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:149:5: Decision can match input such as KW_SORT KW_BY LPAREN using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:166:7: Decision can match input such as STAR using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:179:5: Decision can match input such as KW_UNIONTYPE using multiple alternatives: 5, 6 As a result, alternative(s) 6 were disabled for that input warning(200): IdentifiersParser.g:179:5: Decision can match input such as KW_STRUCT using multiple alternatives: 4, 6 As a result, alternative(s) 6 were disabled for that input warning(200): IdentifiersParser.g:179:5: Decision can match input such as KW_ARRAY using multiple alternatives: 2, 6 As a result, alternative(s) 6 were disabled for that input warning(200): IdentifiersParser.g:261:5: Decision can match input such as KW_DATE StringLiteral using multiple alternatives: 2, 3 As a result, alternative(s) 3 were disabled for that input warning(200): IdentifiersParser.g:261:5: Decision can match input such as KW_NULL using multiple alternatives: 1, 8 As a result, alternative(s) 8 were disabled for that input warning(200): IdentifiersParser.g:261:5: Decision can match input such as KW_FALSE using multiple alternatives: 3, 8 As a result, alternative(s) 8 were disabled for that input warning(200): IdentifiersParser.g:261:5: Decision can match input such as KW_TRUE using multiple alternatives: 3, 8 As a result, alternative(s) 8 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT KW_INTO using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_LATERAL KW_VIEW using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_GROUP KW_BY using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_CLUSTER KW_BY using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as KW_BETWEEN KW_MAP LPAREN using multiple alternatives: 8, 9 As a result, alternative(s) 9 were disabled for that input warning(200):
[jira] [Commented] (HIVE-7526) Research to use groupby transformation to replace Hive existing partitionByKey and SparkCollector combination
[ https://issues.apache.org/jira/browse/HIVE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079952#comment-14079952 ] Brock Noland commented on HIVE-7526: Thank you [~csun]! May I ask you to upload patch to https://reviews.apache.org and post link here? Research to use groupby transformation to replace Hive existing partitionByKey and SparkCollector combination - Key: HIVE-7526 URL: https://issues.apache.org/jira/browse/HIVE-7526 Project: Hive Issue Type: Task Components: Spark Reporter: Xuefu Zhang Assignee: Chao Attachments: HIVE-7526.2.patch, HIVE-7526.3.patch, HIVE-7526.patch Currently SparkClient shuffles data by calling paritionByKey(). This transformation outputs key, value tuples. However, Hive's ExecMapper expects key, iteratorvalue tuples, and Spark's groupByKey() seems outputing this directly. Thus, using groupByKey, we may be able to avoid its own key clustering mechanism (in HiveReduceFunction). This research is to have a try. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7545) Tableau connecting with MapR ODBC driver cannot get more than 43 columns
[ https://issues.apache.org/jira/browse/HIVE-7545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079955#comment-14079955 ] George Chow commented on HIVE-7545: --- Is it possible to include the query so that it can be repro with a similar table? The error message looks to originate from Hive (SQLOperations::prepare). Tableau connecting with MapR ODBC driver cannot get more than 43 columns Key: HIVE-7545 URL: https://issues.apache.org/jira/browse/HIVE-7545 Project: Hive Issue Type: Bug Environment: Tableau connecting using MapR ODBC driver - Windows Reporter: Venkata krishnan Sowrirajan Fix For: 0.13.1 Hive table with 170 columns and 1million rows. When I queried like 170 columns in a hive table with tableau using MapR ODBC driver. It cannot query more than 43 columns. After that its giving out an error saying [MapR][HiveODBC] (35) Error from Hive: error code: '10007' error message: 'Error while compiling statement: FAILED: SemanticException [Error 10007]: Ambiguous column reference c_43'. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7559) Move configuration from SparkClient to HiveConf
Brock Noland created HIVE-7559: -- Summary: Move configuration from SparkClient to HiveConf Key: HIVE-7559 URL: https://issues.apache.org/jira/browse/HIVE-7559 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Priority: Minor The SparkClient class has some configuration keys and defaults. These should be moved to HiveConf. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7560) Fix exception handling in POC code
Brock Noland created HIVE-7560: -- Summary: Fix exception handling in POC code Key: HIVE-7560 URL: https://issues.apache.org/jira/browse/HIVE-7560 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland The POC code just printed exceptions to stderr. We should either: 1) LOG at INFO/WARN/ERROR 2) Or rethrow (perhaps wrapped in runtime exception) anything is a fatal error -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7561) Move from assert to Guava Preconditions.* in Hive on Spark
Brock Noland created HIVE-7561: -- Summary: Move from assert to Guava Preconditions.* in Hive on Spark Key: HIVE-7561 URL: https://issues.apache.org/jira/browse/HIVE-7561 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Hive uses the assert keyword all over the place. The problem is that assertions are rarely enabled since they have to be specifically enabled. In the Spark code, e.g. GenSparkUtils, let's use Preconditions.*. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7561) Move from assert to Guava Preconditions.* in Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7561: --- Labels: StarterProject (was: newbie) Move from assert to Guava Preconditions.* in Hive on Spark -- Key: HIVE-7561 URL: https://issues.apache.org/jira/browse/HIVE-7561 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Labels: StarterProject Hive uses the assert keyword all over the place. The problem is that assertions are rarely enabled since they have to be specifically enabled. In the Spark code, e.g. GenSparkUtils, let's use Preconditions.*. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7560) Fix exception handling in POC code
[ https://issues.apache.org/jira/browse/HIVE-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7560: --- Labels: StarterProject (was: newbie) Fix exception handling in POC code -- Key: HIVE-7560 URL: https://issues.apache.org/jira/browse/HIVE-7560 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Labels: StarterProject The POC code just printed exceptions to stderr. We should either: 1) LOG at INFO/WARN/ERROR 2) Or rethrow (perhaps wrapped in runtime exception) anything is a fatal error -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7559) Move configuration from SparkClient to HiveConf
[ https://issues.apache.org/jira/browse/HIVE-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7559: --- Labels: StarterProject (was: newbie) Move configuration from SparkClient to HiveConf --- Key: HIVE-7559 URL: https://issues.apache.org/jira/browse/HIVE-7559 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Priority: Minor Labels: StarterProject The SparkClient class has some configuration keys and defaults. These should be moved to HiveConf. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7560) StarterProject: Fix exception handling in POC code
[ https://issues.apache.org/jira/browse/HIVE-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7560: --- Summary: StarterProject: Fix exception handling in POC code (was: Fix exception handling in POC code) StarterProject: Fix exception handling in POC code -- Key: HIVE-7560 URL: https://issues.apache.org/jira/browse/HIVE-7560 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Labels: StarterProject The POC code just printed exceptions to stderr. We should either: 1) LOG at INFO/WARN/ERROR 2) Or rethrow (perhaps wrapped in runtime exception) anything is a fatal error -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7561) StarterProject: Move from assert to Guava Preconditions.* in Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7561: --- Summary: StarterProject: Move from assert to Guava Preconditions.* in Hive on Spark (was: Move from assert to Guava Preconditions.* in Hive on Spark) StarterProject: Move from assert to Guava Preconditions.* in Hive on Spark -- Key: HIVE-7561 URL: https://issues.apache.org/jira/browse/HIVE-7561 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Labels: StarterProject Hive uses the assert keyword all over the place. The problem is that assertions are rarely enabled since they have to be specifically enabled. In the Spark code, e.g. GenSparkUtils, let's use Preconditions.*. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7559) StarterProject: Move configuration from SparkClient to HiveConf
[ https://issues.apache.org/jira/browse/HIVE-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7559: --- Summary: StarterProject: Move configuration from SparkClient to HiveConf (was: Move configuration from SparkClient to HiveConf) StarterProject: Move configuration from SparkClient to HiveConf --- Key: HIVE-7559 URL: https://issues.apache.org/jira/browse/HIVE-7559 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Priority: Minor Labels: StarterProject The SparkClient class has some configuration keys and defaults. These should be moved to HiveConf. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7503) Support Hive's multi-table insert query with Spark
[ https://issues.apache.org/jira/browse/HIVE-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang reassigned HIVE-7503: - Assignee: Xuefu Zhang Support Hive's multi-table insert query with Spark -- Key: HIVE-7503 URL: https://issues.apache.org/jira/browse/HIVE-7503 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang For Hive's multi insert query (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there may be an MR job for each insert. When we achieve this with Spark, it would be nice if all the inserts can happen concurrently. It seems that this functionality isn't available in Spark. To make things worse, the source of the insert may be re-computed unless it's staged. Even with this, the inserts will happen sequentially, making the performance suffer. This task is to find out what takes in Spark to enable this without requiring staging the source and sequential insertion. If this has to be solved in Hive, find out an optimum way to do this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7503) Support Hive's multi-table insert query with Spark
[ https://issues.apache.org/jira/browse/HIVE-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080076#comment-14080076 ] Xuefu Zhang commented on HIVE-7503: --- Assigned to myself for initial research. Support Hive's multi-table insert query with Spark -- Key: HIVE-7503 URL: https://issues.apache.org/jira/browse/HIVE-7503 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang For Hive's multi insert query (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there may be an MR job for each insert. When we achieve this with Spark, it would be nice if all the inserts can happen concurrently. It seems that this functionality isn't available in Spark. To make things worse, the source of the insert may be re-computed unless it's staged. Even with this, the inserts will happen sequentially, making the performance suffer. This task is to find out what takes in Spark to enable this without requiring staging the source and sequential insertion. If this has to be solved in Hive, find out an optimum way to do this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (HIVE-7506) MetadataUpdater: provide a mechanism to edit the statistics of a column in a table (or a partition of a table)
[ https://issues.apache.org/jira/browse/HIVE-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner reopened HIVE-7506: -- MetadataUpdater: provide a mechanism to edit the statistics of a column in a table (or a partition of a table) -- Key: HIVE-7506 URL: https://issues.apache.org/jira/browse/HIVE-7506 Project: Hive Issue Type: New Feature Components: Database/Schema Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Critical Original Estimate: 252h Remaining Estimate: 252h Two motivations: (1) CBO depends heavily on the statistics of a column in a table (or a partition of a table). If we would like to test whether CBO chooses the best plan under different statistics, it would be time consuming if we load the whole table and create the statistics from ground up. (2) As database runs, the statistics of a column in a table (or a partition of a table) may change. We need a way or a mechanism to synchronize. We propose the following command to achieve that: ALTER TABLE table_name PARTITION partition_spec [COLUMN col_name] UPDATE STATISTICS col_statistics [COMMENT col_comment] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7488) pass column names being used for inputs to authorization api
[ https://issues.apache.org/jira/browse/HIVE-7488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080118#comment-14080118 ] Jason Dere commented on HIVE-7488: -- +1. Test failures not related? pass column names being used for inputs to authorization api Key: HIVE-7488 URL: https://issues.apache.org/jira/browse/HIVE-7488 Project: Hive Issue Type: Bug Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-7488.1.patch, HIVE-7488.2.patch, HIVE-7488.3.patch.txt, HIVE-7488.4.patch, HIVE-7488.5.patch, HIVE-7488.6.patch HivePrivilegeObject in the authorization api has support for columns, but the columns being used are not being populated for non grant-revoke queries. This is for enabling any implementation of the api to use this column information for its authorization decisions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7506) MetadataUpdater: provide a mechanism to edit the statistics of a column in a table (or a partition of a table)
[ https://issues.apache.org/jira/browse/HIVE-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080125#comment-14080125 ] Gunther Hagleitner commented on HIVE-7506: -- [~damien.carol] I think the use for this is different that analyze. The ability to update certain stats without scanning any data or without hacking the backend db is useful in a number of cases. It helps (esp for CBO work) to set up unit tests quickly and verify both cbo and the stats subsystem. It also helps when experimenting with the system if you're just trying out hive/hadoop on a small cluster. Finally it gives you a quick and clean way to fix things when something went wrong wrt stats in your environment. MetadataUpdater: provide a mechanism to edit the statistics of a column in a table (or a partition of a table) -- Key: HIVE-7506 URL: https://issues.apache.org/jira/browse/HIVE-7506 Project: Hive Issue Type: New Feature Components: Database/Schema Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Original Estimate: 252h Remaining Estimate: 252h Two motivations: (1) CBO depends heavily on the statistics of a column in a table (or a partition of a table). If we would like to test whether CBO chooses the best plan under different statistics, it would be time consuming if we load the whole table and create the statistics from ground up. (2) As database runs, the statistics of a column in a table (or a partition of a table) may change. We need a way or a mechanism to synchronize. We propose the following command to achieve that: ALTER TABLE table_name PARTITION partition_spec [COLUMN col_name] UPDATE STATISTICS col_statistics [COMMENT col_comment] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7390) Make quote character optional and configurable in BeeLine CSV/TSV output
[ https://issues.apache.org/jira/browse/HIVE-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080128#comment-14080128 ] Lars Francke commented on HIVE-7390: You summed it up nicely, thanks. The original intention of this issue was to make the quote character optional and configurable so Jim must have had a use-case for that. I can't think of a good one atm. I can however think of a good reason for a configurable delimiter. Comma, semicolon or tab occur relatively frequently in data but some other character (\001 or |) might not occur in the data and being able to pick this as the delimiter allows to make parsing way simpler (just split on delimiter instead of looking for quoted strings etc.). This is especially interesting when you then want to mount another table on that data in Hive or post-process in any other simple way where you don't have access to a full fledged CSV parsing library. So: Picking the delimiter is often very helpful in avoiding a whole class of parsing issues and allows to just split on the delimiter. I think that we can easily catch most common issues with two changes: 1. Fix current CSV and TSV. As you say: No debate on that 2. Allow delimiter to be specified and keep normal quoting mode That allows everyone who really understands his data to avoid quoting and everyone else can get properly formatted CSVs for a full CSV parser. In the same vein I think that {{surroundingSpacesNeedQuotes}} should stay disabled. But as I said: This is kinda hijacking Jim's original issue... Make quote character optional and configurable in BeeLine CSV/TSV output Key: HIVE-7390 URL: https://issues.apache.org/jira/browse/HIVE-7390 Project: Hive Issue Type: New Feature Components: Clients Affects Versions: 0.13.1 Reporter: Jim Halfpenny Assignee: Ferdinand Xu Attachments: HIVE-7390.1.patch, HIVE-7390.2.patch, HIVE-7390.3.patch, HIVE-7390.4.patch, HIVE-7390.patch Currently when either the CSV or TSV output formats are used in beeline each column is wrapped in single quotes. Quote wrapping of columns should be optional and the user should be able to choose the character used to wrap the columns. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join
[ https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-7096: - Affects Version/s: tez-branch Support grouped splits in Tez partitioned broadcast join Key: HIVE-7096 URL: https://issues.apache.org/jira/browse/HIVE-7096 Project: Hive Issue Type: Bug Components: Tez Affects Versions: tez-branch Reporter: Gunther Hagleitner Assignee: Vikram Dixit K Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, HIVE-7096.4.patch, HIVE-7096.tez.branch.patch Same checks for schema + deser + file format done in HiveSplitGenerator need to be done in the CustomPartitionVertex. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join
[ https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-7096: - Attachment: HIVE-7096.4.patch This patch works with tez-0.5 only. Since only the tez branch has been upgraded to that version, this is only applicable to that hive branch. Support grouped splits in Tez partitioned broadcast join Key: HIVE-7096 URL: https://issues.apache.org/jira/browse/HIVE-7096 Project: Hive Issue Type: Bug Components: Tez Affects Versions: tez-branch Reporter: Gunther Hagleitner Assignee: Vikram Dixit K Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, HIVE-7096.4.patch, HIVE-7096.tez.branch.patch Same checks for schema + deser + file format done in HiveSplitGenerator need to be done in the CustomPartitionVertex. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join
[ https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-7096: - Component/s: Tez Support grouped splits in Tez partitioned broadcast join Key: HIVE-7096 URL: https://issues.apache.org/jira/browse/HIVE-7096 Project: Hive Issue Type: Bug Components: Tez Affects Versions: tez-branch Reporter: Gunther Hagleitner Assignee: Vikram Dixit K Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, HIVE-7096.4.patch, HIVE-7096.tez.branch.patch Same checks for schema + deser + file format done in HiveSplitGenerator need to be done in the CustomPartitionVertex. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join
[ https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-7096: - Attachment: HIVE-7096.4.patch Support grouped splits in Tez partitioned broadcast join Key: HIVE-7096 URL: https://issues.apache.org/jira/browse/HIVE-7096 Project: Hive Issue Type: Bug Components: Tez Affects Versions: tez-branch Reporter: Gunther Hagleitner Assignee: Vikram Dixit K Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, HIVE-7096.tez.branch.patch Same checks for schema + deser + file format done in HiveSplitGenerator need to be done in the CustomPartitionVertex. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join
[ https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-7096: - Attachment: (was: HIVE-7096.4.patch) Support grouped splits in Tez partitioned broadcast join Key: HIVE-7096 URL: https://issues.apache.org/jira/browse/HIVE-7096 Project: Hive Issue Type: Bug Components: Tez Affects Versions: tez-branch Reporter: Gunther Hagleitner Assignee: Vikram Dixit K Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, HIVE-7096.tez.branch.patch Same checks for schema + deser + file format done in HiveSplitGenerator need to be done in the CustomPartitionVertex. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join
[ https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-7096: - Attachment: (was: HIVE-7096.4.patch) Support grouped splits in Tez partitioned broadcast join Key: HIVE-7096 URL: https://issues.apache.org/jira/browse/HIVE-7096 Project: Hive Issue Type: Bug Components: Tez Affects Versions: tez-branch Reporter: Gunther Hagleitner Assignee: Vikram Dixit K Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, HIVE-7096.tez.branch.patch Same checks for schema + deser + file format done in HiveSplitGenerator need to be done in the CustomPartitionVertex. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join
[ https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-7096: - Attachment: HIVE-7096.4.patch Support grouped splits in Tez partitioned broadcast join Key: HIVE-7096 URL: https://issues.apache.org/jira/browse/HIVE-7096 Project: Hive Issue Type: Bug Components: Tez Affects Versions: tez-branch Reporter: Gunther Hagleitner Assignee: Vikram Dixit K Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, HIVE-7096.4.patch, HIVE-7096.tez.branch.patch Same checks for schema + deser + file format done in HiveSplitGenerator need to be done in the CustomPartitionVertex. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7509) Fast stripe level merging for ORC
[ https://issues.apache.org/jira/browse/HIVE-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080178#comment-14080178 ] Hive QA commented on HIVE-7509: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12658680/HIVE-7509.5.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5842 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/110/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/110/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-110/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12658680 Fast stripe level merging for ORC - Key: HIVE-7509 URL: https://issues.apache.org/jira/browse/HIVE-7509 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Attachments: HIVE-7509.1.patch, HIVE-7509.2.patch, HIVE-7509.3.patch, HIVE-7509.4.patch, HIVE-7509.5.patch Similar to HIVE-1950, add support for fast stripe level merging of ORC files through CONCATENATE command and conditional merge task. This fast merging is ideal for merging many small ORC files to a larger file without decompressing and decoding the data of small orc files. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join
[ https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-7096: - Attachment: (was: HIVE-7096.4.patch) Support grouped splits in Tez partitioned broadcast join Key: HIVE-7096 URL: https://issues.apache.org/jira/browse/HIVE-7096 Project: Hive Issue Type: Bug Components: Tez Affects Versions: tez-branch Reporter: Gunther Hagleitner Assignee: Vikram Dixit K Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, HIVE-7096.4.patch, HIVE-7096.tez.branch.patch Same checks for schema + deser + file format done in HiveSplitGenerator need to be done in the CustomPartitionVertex. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join
[ https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-7096: - Attachment: HIVE-7096.4.patch Support grouped splits in Tez partitioned broadcast join Key: HIVE-7096 URL: https://issues.apache.org/jira/browse/HIVE-7096 Project: Hive Issue Type: Bug Components: Tez Affects Versions: tez-branch Reporter: Gunther Hagleitner Assignee: Vikram Dixit K Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, HIVE-7096.4.patch, HIVE-7096.tez.branch.patch Same checks for schema + deser + file format done in HiveSplitGenerator need to be done in the CustomPartitionVertex. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7509) Fast stripe level merging for ORC
[ https://issues.apache.org/jira/browse/HIVE-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080204#comment-14080204 ] Lefty Leverenz commented on HIVE-7509: -- Good doc fixes, thanks [~prasanth_j]. +1 for docs only. Fast stripe level merging for ORC - Key: HIVE-7509 URL: https://issues.apache.org/jira/browse/HIVE-7509 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Attachments: HIVE-7509.1.patch, HIVE-7509.2.patch, HIVE-7509.3.patch, HIVE-7509.4.patch, HIVE-7509.5.patch Similar to HIVE-1950, add support for fast stripe level merging of ORC files through CONCATENATE command and conditional merge task. This fast merging is ideal for merging many small ORC files to a larger file without decompressing and decoding the data of small orc files. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24084: HIVE-7547 - Add ipAddress and userName to ExecHook
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24084/ --- (Updated July 30, 2014, 11:40 p.m.) Review request for hive. Changes --- Incorporating Brock and Thejas review comments. As Thejas pointed out, turns out ipAddress is already stored in sessionState, so using that and code becomes a lot cleaner. However, the ipAddress calculated in TSetIpAddressProcessor doesnt work in kerberos mode, so fixing it so its set in all modes. Bugs: HIVE-7547 https://issues.apache.org/jira/browse/HIVE-7547 Repository: hive-git Description --- Passing the ipAddress and userName (already calculated in ThriftCLIService for other purposes) through several layers down to the hooks. Diffs (updated) - itests/hive-minikdc/src/test/java/org/apache/hive/minikdc/TestHs2HooksWithMiniKdc.java PRE-CREATION itests/hive-unit/src/test/java/org/apache/hadoop/hive/hooks/TestHs2Hooks.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/Driver.java e512199 ql/src/java/org/apache/hadoop/hive/ql/hooks/HookContext.java b11cb86 service/src/java/org/apache/hive/service/cli/CLIService.java add37a1 service/src/java/org/apache/hive/service/cli/session/HiveSession.java 9785e95 service/src/java/org/apache/hive/service/cli/session/SessionManager.java 816bea4 service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 5c87bcb Diff: https://reviews.apache.org/r/24084/diff/ Testing --- Added tests in both kerberos and non-kerberos mode. Thanks, Szehon Ho
[jira] [Updated] (HIVE-7547) Add ipAddress and userName to ExecHook
[ https://issues.apache.org/jira/browse/HIVE-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-7547: Attachment: HIVE-7547.3.patch Thanks Thejas for pointing that out. I refactored the code to use SessionState. The SessionState's ipAddress didnt seem to be set for Kerberos mode, so I'm also changing how its being set to work for all modes. Let me know if its not right. Add ipAddress and userName to ExecHook -- Key: HIVE-7547 URL: https://issues.apache.org/jira/browse/HIVE-7547 Project: Hive Issue Type: New Feature Components: Diagnosability Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-7547.2.patch, HIVE-7547.3.patch, HIVE-7547.patch Auditing tools should be able to know about the ipAddress and userName of the user executing operations. These could be made available through the Hive execution-hooks. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24084: HIVE-7547 - Add ipAddress and userName to ExecHook
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24084/ --- (Updated July 30, 2014, 11:46 p.m.) Review request for hive. Bugs: HIVE-7547 https://issues.apache.org/jira/browse/HIVE-7547 Repository: hive-git Description --- Passing the ipAddress and userName (already calculated in ThriftCLIService for other purposes) through several layers down to the hooks. Diffs (updated) - itests/hive-minikdc/src/test/java/org/apache/hive/minikdc/TestHs2HooksWithMiniKdc.java PRE-CREATION itests/hive-unit/src/test/java/org/apache/hadoop/hive/hooks/TestHs2Hooks.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/Driver.java e512199 ql/src/java/org/apache/hadoop/hive/ql/hooks/HookContext.java b11cb86 service/src/java/org/apache/hive/service/cli/CLIService.java add37a1 service/src/java/org/apache/hive/service/cli/session/HiveSession.java 9785e95 service/src/java/org/apache/hive/service/cli/session/SessionManager.java 816bea4 service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 5c87bcb Diff: https://reviews.apache.org/r/24084/diff/ Testing --- Added tests in both kerberos and non-kerberos mode. Thanks, Szehon Ho
[jira] [Created] (HIVE-7562) Cleanup ExecReducer
Brock Noland created HIVE-7562: -- Summary: Cleanup ExecReducer Key: HIVE-7562 URL: https://issues.apache.org/jira/browse/HIVE-7562 Project: Hive Issue Type: Improvement Reporter: Brock Noland Attachments: HIVE-7562.patch ExecReducer places member variables at random with random visibility. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7547) Add ipAddress and userName to ExecHook
[ https://issues.apache.org/jira/browse/HIVE-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-7547: Attachment: HIVE-7547.4.patch Add ipAddress and userName to ExecHook -- Key: HIVE-7547 URL: https://issues.apache.org/jira/browse/HIVE-7547 Project: Hive Issue Type: New Feature Components: Diagnosability Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-7547.2.patch, HIVE-7547.3.patch, HIVE-7547.4.patch, HIVE-7547.patch Auditing tools should be able to know about the ipAddress and userName of the user executing operations. These could be made available through the Hive execution-hooks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7562) Cleanup ExecReducer
[ https://issues.apache.org/jira/browse/HIVE-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7562: --- Attachment: HIVE-7562.patch Cleanup ExecReducer --- Key: HIVE-7562 URL: https://issues.apache.org/jira/browse/HIVE-7562 Project: Hive Issue Type: Improvement Reporter: Brock Noland Attachments: HIVE-7562.patch ExecReducer places member variables at random with random visibility. -- This message was sent by Atlassian JIRA (v6.2#6252)