[jira] [Commented] (HIVE-8839) Support alter table .. add/replace columns cascade
[ https://issues.apache.org/jira/browse/HIVE-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217613#comment-14217613 ] Lefty Leverenz commented on HIVE-8839: -- Does this also need to be documented in the Alter Table section of the DDL doc? * [DDL -- AlterTable/Partition/Column | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/Partition/Column] * [DDL -- Add/Replace Columns | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Add/ReplaceColumns] By the way, CASCADE is only mentioned twice in the DDL doc, in Drop Database and Drop Partitions. But Drop Partitions says For tables that are protected by NO DROP CASCADE, ... although NO DROP CASCADE isn't documented anywhere for creating or altering tables. So it looks like more doc updates are needed. * [DDL -- Drop Database | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Drop Database] * [DDL -- Drop Partitions | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DropPartitions] Support alter table .. add/replace columns cascade Key: HIVE-8839 URL: https://issues.apache.org/jira/browse/HIVE-8839 Project: Hive Issue Type: Improvement Components: SQL Environment: Reporter: Chaoyu Tang Assignee: Chaoyu Tang Labels: TODOC15 Fix For: 0.15.0 Attachments: HIVE-8839.1.patch, HIVE-8839.2.patch, HIVE-8839.2.patch, HIVE-8839.patch We often run into some issues like HIVE-6131which is due to inconsistent column descriptors between table and partitions after alter table. HIVE-8441/HIVE-7971 provided the flexibility to alter table at partition level. But most cases we have need change the table and partitions at same time. In addition, alter table is usually required prior to alter table partition .. since querying table partition data is also through table. Instead of do that in two steps, here we provide a convenient ddl like alter table ... cascade to cascade table changes to partitions as well. The changes are only limited and applicable to add/replace columns and change column name, datatype, position and comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8122) Make use of SearchArgument classes for Parquet SERDE
[ https://issues.apache.org/jira/browse/HIVE-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217619#comment-14217619 ] Lefty Leverenz commented on HIVE-8122: -- Doc question: This doesn't need any wiki documentation, does it? * [Parquet | https://cwiki.apache.org/confluence/display/Hive/Parquet] Make use of SearchArgument classes for Parquet SERDE Key: HIVE-8122 URL: https://issues.apache.org/jira/browse/HIVE-8122 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Ferdinand Xu Fix For: 0.15.0 Attachments: HIVE-8122.1.patch, HIVE-8122.2.patch, HIVE-8122.3.patch, HIVE-8122.4.patch, HIVE-8122.patch ParquetSerde could be much cleaner if we used SearchArgument and associated classes like ORC does: https://github.com/apache/hive/blob/trunk/serde/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgument.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8863) Cannot drop table with uppercase name after compute statistics for columns
[ https://issues.apache.org/jira/browse/HIVE-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217631#comment-14217631 ] Prabhu Joseph commented on HIVE-8863: - Hi Juan, The issue got reproduced in my machine with hive 0.12 version. I am using Derby for metastore in server mode. Earlier when i drop the table i recieved different exception. drop table Test; The error i got is Caused by: MetaException(message:java.lang.RuntimeException: commitTransaction was called but openTransactionCalls = 0. This probably indicates that there are unbalanced calls to openTransaction/commitTransaction) After applying HIVE-4996 patch, receiving same exception you have reported. drop table Test; NestedThrowablesStackTrace: java.sql.BatchUpdateException: DELETE on table 'TBLS' caused a violation of foreign key constraint 'TAB_COL_STATS_FK1' for key (19). But with or without applying patch, drop table test; [ small 't'] works. There is some issue. Cannot drop table with uppercase name after compute statistics for columns Key: HIVE-8863 URL: https://issues.apache.org/jira/browse/HIVE-8863 Project: Hive Issue Type: Bug Components: Metastore Reporter: Juan Yu Create a table with uppercase name Test, run analyze table Test compute statistics for columns col1 After this, you cannot drop the table by drop table Test; Got error: NestedThrowablesStackTrace: java.sql.BatchUpdateException: Cannot delete or update a parent row: a foreign key constraint fails (hive2.TAB_COL_STATS, CONSTRAINT TAB_COL_STATS_FK FOREIGN KEY (TBL_ID) REFERENCES TBLS (TBL_ID)) workaround is to use lowercase table name drop table test; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8122) Make use of SearchArgument classes for Parquet SERDE
[ https://issues.apache.org/jira/browse/HIVE-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217657#comment-14217657 ] Ferdinand Xu commented on HIVE-8122: Hi [~leftylev], I don't think this jira needs any wiki documentation. Thanks! Make use of SearchArgument classes for Parquet SERDE Key: HIVE-8122 URL: https://issues.apache.org/jira/browse/HIVE-8122 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Ferdinand Xu Fix For: 0.15.0 Attachments: HIVE-8122.1.patch, HIVE-8122.2.patch, HIVE-8122.3.patch, HIVE-8122.4.patch, HIVE-8122.patch ParquetSerde could be much cleaner if we used SearchArgument and associated classes like ORC does: https://github.com/apache/hive/blob/trunk/serde/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgument.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8910) Refactoring of PassThroughOutputFormat
[ https://issues.apache.org/jira/browse/HIVE-8910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217659#comment-14217659 ] Hive QA commented on HIVE-8910: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12682367/HIVE-8910.1.patch.txt {color:red}ERROR:{color} -1 due to 54 failed/errored test(s), 6648 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_index org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_database_drop org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fileformat_mix org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auth org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_empty org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_file_format org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_multiple org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_self_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_unused org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_update org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_auto_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_compression org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_rc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_binary_search org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compression org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_creation org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_serde org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_skewtable org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_stale org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_stale_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input45 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_indexes_edge_cases org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_indexes_syntax org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_short_regress org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_short_regress org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap_auto org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_create_insert_outputformat org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fileformat_void_output org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_index_compact_entry_limit org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_index_compact_size_limit org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1841/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1841/console Test logs:
Re: Review Request 27699: HIVE-8435
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27699/ --- (Updated Nov. 19, 2014, 11:51 a.m.) Review request for hive and Ashutosh Chauhan. Changes --- Only observed changes in results in infer_bucket_sort.q, multiMapJoin1.q, windowing.q, and in Tez mrr.q (change of order of results). The rest are changes in the plans. Ashutosh, can you check? Repository: hive-git Description (updated) --- HIVE-8435 Patch with the most conservative approach of project remover optimization. Diffs (updated) - accumulo-handler/src/test/results/positive/accumulo_queries.q.out 254eeaba4b8d633c63c706c0c74bb1165089 common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a8411c9edb2f2db84cf2540deb20133c36152103 hbase-handler/src/test/results/positive/hbase_queries.q.out b1e7936738b1121c14132909178646290ee8b4d5 ql/src/java/org/apache/hadoop/hive/ql/exec/SelectOperator.java 95d2d76c80aa59b62e9464f704523d921302d401 ql/src/java/org/apache/hadoop/hive/ql/optimizer/IdentityProjectRemover.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 5be0e4540a6843c6b40cb5c22db6e90e1f0da922 ql/src/test/results/clientpositive/annotate_stats_groupby.q.out 718b43c6e0fc2c28981f8caf0f38c1360e69837d ql/src/test/results/clientpositive/auto_join0.q.out 9261ce02f3cfcfd9f048f15fe7357846bb386c31 ql/src/test/results/clientpositive/auto_join10.q.out 3d2bcc216dea80522002f149e5777a73ca52fe5b ql/src/test/results/clientpositive/auto_join11.q.out 8dbad6724475b71dc53d1198c77e36dfe752484e ql/src/test/results/clientpositive/auto_join12.q.out 037116c2c6994fe8bc7bccbb89950a13854cd9af ql/src/test/results/clientpositive/auto_join13.q.out 0cb9b4ffc460121584887f395eb1697bd53013c3 ql/src/test/results/clientpositive/auto_join16.q.out f96bae3590f5e26b059458650dd508b3dd4b1235 ql/src/test/results/clientpositive/auto_join18.q.out 0de3f2a2c8ca5646071fb852c838337b76aab9f9 ql/src/test/results/clientpositive/auto_join18_multi_distinct.q.out 46559a746f51fa3ad516629220bcf0f31bef685a ql/src/test/results/clientpositive/auto_join24.q.out 1fa3e6ea54f809c529d4ec7b50d5d5191284939f ql/src/test/results/clientpositive/auto_join26.q.out d494d95785283b7083820d0defaadb351f783085 ql/src/test/results/clientpositive/auto_join27.q.out c16992f2bed4de9dd23dcfbe004825f37abbe56e ql/src/test/results/clientpositive/auto_join30.q.out 608ca22323e3b4f1900dd5077a7aecf54d8a8ca2 ql/src/test/results/clientpositive/auto_join31.q.out b0df20270ba3dbb9115c529c50aaca5d13d57a95 ql/src/test/results/clientpositive/auto_join32.q.out bc2d56c0199133e84efd213dff1538173f1686c7 ql/src/test/results/clientpositive/auto_smb_mapjoin_14.q.out 2583d9a50d4a07db50dca7f88c6db141c392a3b8 ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out 5a7f174a52d60028f524a7aac14a9b326d060af8 ql/src/test/results/clientpositive/auto_sortmerge_join_10.q.out 7606dd2adcd43ca410e66e0c8f1799084fa4f39e ql/src/test/results/clientpositive/auto_sortmerge_join_11.q.out 8372a6312a2fe85fd78f0c6da0665164b49b320c ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out 3c30a315d9028fda114def015e41a6171341153a ql/src/test/results/clientpositive/auto_sortmerge_join_14.q.out 69bd43af9a8210b19cbea17181f90bf707d93e85 ql/src/test/results/clientpositive/auto_sortmerge_join_15.q.out 10b20d84eb06a30ed3655e346431bc52dfb486fe ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out 72242bbd713baa216d41c40749f9c732271102cb ql/src/test/results/clientpositive/auto_sortmerge_join_3.q.out 35fa02fa60f6c50d6acf55ed3fae1570a644c1e1 ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out 4fea70d4e47bbd75530e92f5b2a8be2edd66bdbd ql/src/test/results/clientpositive/auto_sortmerge_join_5.q.out 1904cc246729a8d3fd2cd1815e563b50e261da6a ql/src/test/results/clientpositive/auto_sortmerge_join_6.q.out e5e2a6a770d5064df944c69576d81d07b1d95c77 ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out abb1db4a87e6b8e820ff7df53d21a4036254b098 ql/src/test/results/clientpositive/auto_sortmerge_join_8.q.out 9226dc6b2929c2b185f5904bd607a7b18e356dca ql/src/test/results/clientpositive/auto_sortmerge_join_9.q.out 1a7fdf9650f3e5650400ecc24177637856701536 ql/src/test/results/clientpositive/bucket_map_join_1.q.out b194a2be3e39c0294df14c00fe69c6d6f9283702 ql/src/test/results/clientpositive/bucket_map_join_2.q.out 07c887854179e333e4c68d02c247216b1c06dee7 ql/src/test/results/clientpositive/bucketcontext_1.q.out 0ea304dbff38d878d271930cda22b852f0175329 ql/src/test/results/clientpositive/bucketcontext_2.q.out e961f062d1cf21d058566c6c9c6a73db16a3454e ql/src/test/results/clientpositive/bucketcontext_3.q.out 1de62119c2909f4ff49dcec0a50843df7a00419a
[jira] [Updated] (HIVE-8536) Enable SkewJoinResolver for spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-8536: - Summary: Enable SkewJoinResolver for spark [Spark Branch] (was: Enable runtime skew join optimization for spark [Spark Branch]) Enable SkewJoinResolver for spark [Spark Branch] Key: HIVE-8536 URL: https://issues.apache.org/jira/browse/HIVE-8536 Project: Hive Issue Type: Improvement Components: Spark Reporter: Rui Li Assignee: Rui Li Sub-task of HIVE-8406 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8913) Make SparkMapJoinResolver handle runtime skew join [Spark Branch]
Rui Li created HIVE-8913: Summary: Make SparkMapJoinResolver handle runtime skew join [Spark Branch] Key: HIVE-8913 URL: https://issues.apache.org/jira/browse/HIVE-8913 Project: Hive Issue Type: Improvement Components: Spark Reporter: Rui Li Sub-task of HIVE-8406. Now we have {{SparkMapJoinResolver}} in place. But at the moment, it doesn't handle the map join task created by upstream SkewJoinResolver, i.e. those wrapped in a ConditionalTask. We have to implement this part for runtime skew join to work on spark. To do so, we can borrow logic from {{MapJoinResolver}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8435) Add identity project remover optimization
[ https://issues.apache.org/jira/browse/HIVE-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho Rodríguez updated HIVE-8435: -- Attachment: HIVE-8435.09.patch This patch is the same than .08, but it contains the changes in the test results files. Add identity project remover optimization - Key: HIVE-8435 URL: https://issues.apache.org/jira/browse/HIVE-8435 Project: Hive Issue Type: New Feature Components: Logical Optimizer Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0 Reporter: Ashutosh Chauhan Assignee: Jesús Camacho Rodríguez Attachments: HIVE-8435.02.patch, HIVE-8435.03.patch, HIVE-8435.03.patch, HIVE-8435.04.patch, HIVE-8435.05.patch, HIVE-8435.05.patch, HIVE-8435.06.patch, HIVE-8435.07.patch, HIVE-8435.08.patch, HIVE-8435.09.patch, HIVE-8435.1.patch, HIVE-8435.patch In some cases there is an identity project in plan which is useless. Better to optimize it away to avoid evaluating it without any benefit at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8435) Add identity project remover optimization
[ https://issues.apache.org/jira/browse/HIVE-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217971#comment-14217971 ] Hive QA commented on HIVE-8435: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12682415/HIVE-8435.09.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6648 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1842/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1842/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1842/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12682415 - PreCommit-HIVE-TRUNK-Build Add identity project remover optimization - Key: HIVE-8435 URL: https://issues.apache.org/jira/browse/HIVE-8435 Project: Hive Issue Type: New Feature Components: Logical Optimizer Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0 Reporter: Ashutosh Chauhan Assignee: Jesús Camacho Rodríguez Attachments: HIVE-8435.02.patch, HIVE-8435.03.patch, HIVE-8435.03.patch, HIVE-8435.04.patch, HIVE-8435.05.patch, HIVE-8435.05.patch, HIVE-8435.06.patch, HIVE-8435.07.patch, HIVE-8435.08.patch, HIVE-8435.09.patch, HIVE-8435.1.patch, HIVE-8435.patch In some cases there is an identity project in plan which is useless. Better to optimize it away to avoid evaluating it without any benefit at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8839) Support alter table .. add/replace columns cascade
[ https://issues.apache.org/jira/browse/HIVE-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218012#comment-14218012 ] Chaoyu Tang commented on HIVE-8839: --- Thanks [~szehon] and [~jdere] for reviewing the patch and getting it committed. [~leftylev] I have requested the write permission to Apache Hive Wiki. Once I get it, I will update https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterColumn with CASCADE|RESTRICT in alter table syntax. We currently do not support the partition protection mode in alter table, maybe we consider adding that to alter table partition as another enhancement? Support alter table .. add/replace columns cascade Key: HIVE-8839 URL: https://issues.apache.org/jira/browse/HIVE-8839 Project: Hive Issue Type: Improvement Components: SQL Environment: Reporter: Chaoyu Tang Assignee: Chaoyu Tang Labels: TODOC15 Fix For: 0.15.0 Attachments: HIVE-8839.1.patch, HIVE-8839.2.patch, HIVE-8839.2.patch, HIVE-8839.patch We often run into some issues like HIVE-6131which is due to inconsistent column descriptors between table and partitions after alter table. HIVE-8441/HIVE-7971 provided the flexibility to alter table at partition level. But most cases we have need change the table and partitions at same time. In addition, alter table is usually required prior to alter table partition .. since querying table partition data is also through table. Instead of do that in two steps, here we provide a convenient ddl like alter table ... cascade to cascade table changes to partitions as well. The changes are only limited and applicable to add/replace columns and change column name, datatype, position and comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Urgent help: Not able to connect to hiveserver2 through beeline remote client
Hi All, I am trying to connect hiveserver2 through beeline remote client. I followed following steps but not able to connect through Remote Client with HiveServer2 TCP Transport Mode and SASL Authentication I followed following steps to run hive tests through beeline in authorization mode, 1. I did required configuration change and started hiveserver2 in authorization mode. I followed below document, https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization#SQLStandardBasedHiveAuthorization-Configuration 2. I started the hiveserver with following command, hive --service hiveserver2 --hiveconf hive.security.authorization.enabled=true --hiveconf hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator --hiveconf hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory --hiveconf hive.metastore.uris=' ' 3. Server started successfully, I verified it in server log file. As per document it should be started in standard authorization mode. After that I tried to connect using beeline steps mentioned at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.5/bk_dataintegration/content/ch_using-hive-clients-examples.html 4. When I run following command beeline !connect jdbc:hive2://localhost:1/default it ask for username and password and after that it hangs. Some times in server log I found Out Of Memory error. I am able to connect to hiveserver2 through beeline in embedded mode. I tried to search older conversation for similar issues. I found similar issues discussed @ http://mail-archives.apache.org/mod_mbox/hive-user/201407.mbox/browser But I there is no solution for this issue is mentioned till end. Any pointer on this would be great help. Thanks and Regards, Ravi DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
[jira] [Commented] (HIVE-8863) Cannot drop table with uppercase name after compute statistics for columns
[ https://issues.apache.org/jira/browse/HIVE-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218031#comment-14218031 ] Chaoyu Tang commented on HIVE-8863: --- [~j...@cloudera.com] what database you are using? I am not able to reproduce the issue with trunk code against postgresql. Cannot drop table with uppercase name after compute statistics for columns Key: HIVE-8863 URL: https://issues.apache.org/jira/browse/HIVE-8863 Project: Hive Issue Type: Bug Components: Metastore Reporter: Juan Yu Create a table with uppercase name Test, run analyze table Test compute statistics for columns col1 After this, you cannot drop the table by drop table Test; Got error: NestedThrowablesStackTrace: java.sql.BatchUpdateException: Cannot delete or update a parent row: a foreign key constraint fails (hive2.TAB_COL_STATS, CONSTRAINT TAB_COL_STATS_FK FOREIGN KEY (TBL_ID) REFERENCES TBLS (TBL_ID)) workaround is to use lowercase table name drop table test; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8863) Cannot drop table with uppercase name after compute statistics for columns
[ https://issues.apache.org/jira/browse/HIVE-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218032#comment-14218032 ] Chaoyu Tang commented on HIVE-8863: --- [~j...@cloudera.com] what database you are using? I am not able to reproduce the issue with trunk code against postgresql. Cannot drop table with uppercase name after compute statistics for columns Key: HIVE-8863 URL: https://issues.apache.org/jira/browse/HIVE-8863 Project: Hive Issue Type: Bug Components: Metastore Reporter: Juan Yu Create a table with uppercase name Test, run analyze table Test compute statistics for columns col1 After this, you cannot drop the table by drop table Test; Got error: NestedThrowablesStackTrace: java.sql.BatchUpdateException: Cannot delete or update a parent row: a foreign key constraint fails (hive2.TAB_COL_STATS, CONSTRAINT TAB_COL_STATS_FK FOREIGN KEY (TBL_ID) REFERENCES TBLS (TBL_ID)) workaround is to use lowercase table name drop table test; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8861) Use hiveconf:test.data.dir instead of hardcoded path
[ https://issues.apache.org/jira/browse/HIVE-8861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-8861: -- Resolution: Won't Fix Status: Resolved (was: Patch Available) Use hiveconf:test.data.dir instead of hardcoded path Key: HIVE-8861 URL: https://issues.apache.org/jira/browse/HIVE-8861 Project: Hive Issue Type: Test Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Trivial Attachments: HIVE-8861.patch In loading test schema to a standalone cluster, I got this error: {noformat} FAILED: SemanticException Line 3:23 Invalid path ''/home/jxiang/data/files/cbo_t1.txt'' {noformat} We should use hiveconf:test.data.dir instead of ../../data/files -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8435) Add identity project remover optimization
[ https://issues.apache.org/jira/browse/HIVE-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8435: --- Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, [~jcamachorodriguez] Add identity project remover optimization - Key: HIVE-8435 URL: https://issues.apache.org/jira/browse/HIVE-8435 Project: Hive Issue Type: New Feature Components: Logical Optimizer Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0 Reporter: Ashutosh Chauhan Assignee: Jesús Camacho Rodríguez Fix For: 0.15.0 Attachments: HIVE-8435.02.patch, HIVE-8435.03.patch, HIVE-8435.03.patch, HIVE-8435.04.patch, HIVE-8435.05.patch, HIVE-8435.05.patch, HIVE-8435.06.patch, HIVE-8435.07.patch, HIVE-8435.08.patch, HIVE-8435.09.patch, HIVE-8435.1.patch, HIVE-8435.patch In some cases there is an identity project in plan which is useless. Better to optimize it away to avoid evaluating it without any benefit at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8910) Refactoring of PassThroughOutputFormat
[ https://issues.apache.org/jira/browse/HIVE-8910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218046#comment-14218046 ] Ashutosh Chauhan commented on HIVE-8910: I agree, its more complicated than it needs to be. Lets simplify! Refactoring of PassThroughOutputFormat --- Key: HIVE-8910 URL: https://issues.apache.org/jira/browse/HIVE-8910 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-8910.1.patch.txt It's overly complicated just for doing simple wrapping of output format. Before things get more worse, we should refactor this codes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8836) Enable automatic tests with remote spark client.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-8836: Attachment: HIVE-8836.1-spark.patch upload the patch as a reference since i would take a week leave. Enable automatic tests with remote spark client.[Spark Branch] -- Key: HIVE-8836 URL: https://issues.apache.org/jira/browse/HIVE-8836 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M3 Attachments: HIVE-8836.1-spark.patch In real production environment, remote spark client should be used to submit spark job for Hive mostly, we should enable automatic test with remote spark client to make sure the Hive feature workable with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8836) Enable automatic tests with remote spark client.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-8836: Assignee: Rui Li (was: Chengxiang Li) Enable automatic tests with remote spark client.[Spark Branch] -- Key: HIVE-8836 URL: https://issues.apache.org/jira/browse/HIVE-8836 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Rui Li Labels: Spark-M3 Attachments: HIVE-8836.1-spark.patch In real production environment, remote spark client should be used to submit spark job for Hive mostly, we should enable automatic test with remote spark client to make sure the Hive feature workable with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8868) SparkSession and SparkClient mapping[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-8868: Assignee: Rui Li (was: Chengxiang Li) SparkSession and SparkClient mapping[Spark Branch] -- Key: HIVE-8868 URL: https://issues.apache.org/jira/browse/HIVE-8868 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Rui Li Labels: Spark-M3, TODOC-SPARK Attachments: HIVE-8868.1-spark.patch, HIVE-8868.2-spark.patch It should be a seperate spark context for each user session, currently we share a singleton local spark context in all user sessions with local spark, and create remote spark context for each spark job with spark cluster. To binding one spark context to each user session, we may construct spark client on session open, one thing to notify is that, is SparkSession::conf is consist with Context::getConf? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8914) HDFSCleanup thread holds reference to FileSystem
shanyu zhao created HIVE-8914: - Summary: HDFSCleanup thread holds reference to FileSystem Key: HIVE-8914 URL: https://issues.apache.org/jira/browse/HIVE-8914 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.13.1 Reporter: shanyu zhao Assignee: shanyu zhao WebHCat server has a long running cleanup thread (HDFSCleanup) which holds a reference to FileSystem. Because of this reference, many FileSystem related objects (e.g. MetricsSystemImpl) cannot be garbage collected. Sometimes this causes OOM exception. Since the cleanup is done every 12 hours by default, we can simply recreate a FileSystem when we need to use it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8914) HDFSCleanup thread holds reference to FileSystem
[ https://issues.apache.org/jira/browse/HIVE-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shanyu zhao updated HIVE-8914: -- Attachment: HIVE-8914.patch Patch attached. HDFSCleanup thread holds reference to FileSystem Key: HIVE-8914 URL: https://issues.apache.org/jira/browse/HIVE-8914 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.13.1 Reporter: shanyu zhao Assignee: shanyu zhao Attachments: HIVE-8914.patch WebHCat server has a long running cleanup thread (HDFSCleanup) which holds a reference to FileSystem. Because of this reference, many FileSystem related objects (e.g. MetricsSystemImpl) cannot be garbage collected. Sometimes this causes OOM exception. Since the cleanup is done every 12 hours by default, we can simply recreate a FileSystem when we need to use it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8863) Cannot drop table with uppercase name after compute statistics for columns
[ https://issues.apache.org/jira/browse/HIVE-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218115#comment-14218115 ] Juan Yu commented on HIVE-8863: --- I tested with both mysql and postgresql. but I am not using trunk version. Cannot drop table with uppercase name after compute statistics for columns Key: HIVE-8863 URL: https://issues.apache.org/jira/browse/HIVE-8863 Project: Hive Issue Type: Bug Components: Metastore Reporter: Juan Yu Create a table with uppercase name Test, run analyze table Test compute statistics for columns col1 After this, you cannot drop the table by drop table Test; Got error: NestedThrowablesStackTrace: java.sql.BatchUpdateException: Cannot delete or update a parent row: a foreign key constraint fails (hive2.TAB_COL_STATS, CONSTRAINT TAB_COL_STATS_FK FOREIGN KEY (TBL_ID) REFERENCES TBLS (TBL_ID)) workaround is to use lowercase table name drop table test; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6914) parquet-hive cannot write nested map (map value is map)
[ https://issues.apache.org/jira/browse/HIVE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218143#comment-14218143 ] Mickael Lacour commented on HIVE-6914: -- [~spena], [~brocknoland], [~rdblue] I will redo this patch using the path available for HIVE-8359. I think there is a link with this one too HIVE-8909. With the previous patch I can read and write parquet complex nested types. So maybe it will be better to add my qtests to HIVE-8909 and fix the writing bug ? What do you think ? parquet-hive cannot write nested map (map value is map) --- Key: HIVE-6914 URL: https://issues.apache.org/jira/browse/HIVE-6914 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.12.0, 0.13.0 Reporter: Tongjie Chen Labels: parquet, serialization Attachments: HIVE-6914.1.patch, HIVE-6914.2.patch // table schema (identical for both plain text version and parquet version) desc hive desc text_mmap; m map // sample nested map entry {level1:{level2_key1:value1,level2_key2:value2}} The following query will fail, insert overwrite table parquet_mmap select * from text_mmap; Caused by: parquet.io.ParquetEncodingException: This should be an ArrayWritable or MapWritable: org.apache.hadoop.hive.ql.io.parquet.writable.BinaryWritable@f2f8106 at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:85) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeArray(DataWritableWriter.java:118) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:80) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:82) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:55) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31) at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:115) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:81) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:37) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:77) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:90) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:622) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) ... 9 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Hive-0.14 - Build # 735 - Still Failing
Changes for Build #696 [rohini] PIG-4186: Fix e2e run against new build of pig and some enhancements (rohini) Changes for Build #697 Changes for Build #698 Changes for Build #699 Changes for Build #700 Changes for Build #701 Changes for Build #702 Changes for Build #703 [daijy] HIVE-8484: HCatalog throws an exception if Pig job is of type 'fetch' (Lorand Bendig via Daniel Dai) Changes for Build #704 [gunther] HIVE-8781: Nullsafe joins are busted on Tez (Gunther Hagleitner, reviewed by Prasanth J) Changes for Build #705 [gunther] HIVE-8760: Pass a copy of HiveConf to hooks (Gunther Hagleitner, reviewed by Gopal V) Changes for Build #706 [thejas] HIVE-8772 : zookeeper info logs are always printed from beeline with service discovery mode (Thejas Nair, reviewed by Vaibhav Gumashta) Changes for Build #707 [gunther] HIVE-8782: HBase handler doesn't compile with hadoop-1 (Jimmy Xiang, reviewed by Xuefu and Sergey) Changes for Build #708 Changes for Build #709 [thejas] HIVE-8785 : HiveServer2 LogDivertAppender should be more selective for beeline getLogs (Thejas Nair, reviewed by Gopal V) Changes for Build #710 [vgumashta] HIVE-8764: Windows: HiveServer2 TCP SSL cannot recognize localhost (Vaibhav Gumashta reviewed by Thejas Nair) Changes for Build #711 [gunther] HIVE-8768: CBO: Fix filter selectivity for 'in clause' '' (Laljo John Pullokkaran via Gunther Hagleitner) Changes for Build #712 [gunther] HIVE-8794: Hive on Tez leaks AMs when killed before first dag is run (Gunther Hagleitner, reviewed by Gopal V) Changes for Build #713 [gunther] HIVE-8798: Some Oracle deadlocks not being caught in TxnHandler (Alan Gates via Gunther Hagleitner) Changes for Build #714 [gunther] HIVE-8800: Update release notes and notice for hive .14 (Gunther Hagleitner, reviewed by Prasanth J) [gunther] HIVE-8799: boatload of missing apache headers (Gunther Hagleitner, reviewed by Thejas M Nair) Changes for Build #715 [gunther] Preparing for release 0.14.0 Changes for Build #716 [gunther] Preparing for release 0.14.0 [gunther] Preparing for release 0.14.0 Changes for Build #717 Changes for Build #718 Changes for Build #719 Changes for Build #720 [gunther] HIVE-8811: Dynamic partition pruning can result in NPE during query compilation (Gunther Hagleitner, reviewed by Gopal V) Changes for Build #721 [gunther] HIVE-8805: CBO skipped due to SemanticException: Line 0:-1 Both left and right aliases encountered in JOIN 'avg_cs_ext_discount_amt' (Laljo John Pullokkaran via Gunther Hagleitner) [sershe] HIVE-8715 : Hive 14 upgrade scripts can fail for statistics if database was created using auto-create ADDENDUM (Sergey Shelukhin, reviewed by Ashutosh Chauhan and Gunther Hagleitner) Changes for Build #722 Changes for Build #723 Changes for Build #724 [gunther] HIVE-8845: Switch to Tez 0.5.2 (Gunther Hagleitner, reviewed by Gopal V) Changes for Build #725 [sershe] HIVE-8295 : Add batch retrieve partition objects for metastore direct sql (Selina Zhang and Sergey Shelukhin, reviewed by Ashutosh Chauhan) Changes for Build #726 Changes for Build #727 [gunther] HIVE-8873: Switch to calcite 0.9.2 (Gunther Hagleitner, reviewed by Gopal V) Changes for Build #728 [thejas] HIVE-8830 : hcatalog process don't exit because of non daemon thread (Thejas Nair, reviewed by Eugene Koifman, Sushanth Sowmyan) Changes for Build #729 Changes for Build #730 Changes for Build #731 Changes for Build #732 Changes for Build #733 Changes for Build #734 Changes for Build #735 No tests ran. The Apache Jenkins build system has built Hive-0.14 (build #735) Status: Still Failing Check console output at https://builds.apache.org/job/Hive-0.14/735/ to view the results.
[jira] [Commented] (HIVE-8639) Convert SMBJoin to MapJoin [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218168#comment-14218168 ] Chinna Rao Lalam commented on HIVE-8639: Hi [~brocknoland], I am investigating test failures. I need some time for this issue, If folks freeing up they can take it over. Convert SMBJoin to MapJoin [Spark Branch] - Key: HIVE-8639 URL: https://issues.apache.org/jira/browse/HIVE-8639 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Szehon Ho Assignee: Chinna Rao Lalam HIVE-8202 supports auto-conversion of SMB Join. However, if the tables are partitioned, there could be a slow down as each mapper would need to get a very small chunk of a partition which has a single key. Thus, in some scenarios it's beneficial to convert SMB join to map join. The task is to research and support the conversion from SMB join to map join for Spark execution engine. See the equivalent of MapReduce in SortMergeJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8850) ObjectStore:: rollbackTransaction() should set the transaction status to TXN_STATUS.ROLLBACK irrespective of whether it is active or not
[ https://issues.apache.org/jira/browse/HIVE-8850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218230#comment-14218230 ] Chaoyu Tang commented on HIVE-8850: --- [~sushanth] Thanks for the deep insight and analysis to the cause of the unbalanced calls to openTransaction/commitTransaction issue. I think HIVE-8891 is slightly different from the issue here and it just ensures to clean PersistenceManager cache after rollback to avoid the NucleusObjectNotFoundException and should not involve in the nested txn count issue. Because rollbackTransaction always resets the openTransactionCalls to 0, so even if you set the transactionStatus to TXN_STATUS.ROLLBACK regardless in rollbackTrasaction, in the nested transaction example you gave above (e.g. sqldirect fallback to jdo), the nested openTransaction following rollbackTransaction will set it back to TXN_STATUS.OPEN, and patch provided here would still not solve the issue for this example, is it right? ObjectStore:: rollbackTransaction() should set the transaction status to TXN_STATUS.ROLLBACK irrespective of whether it is active or not Key: HIVE-8850 URL: https://issues.apache.org/jira/browse/HIVE-8850 Project: Hive Issue Type: Bug Components: Metastore Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-8850.1.patch We can run into issues as described below: Hive script adds 2800 partitions to a table and during this it can get a SQLState 08S01 [Communication Link Error] and bonecp kill all the connections in the pool. The partitions are added and a create table statement executes (Metering_IngestedData_Compressed). The map job finishes successfully and while moving the table to the hive warehouse the ObjectStore.java commitTransaction() raises the error: commitTransaction was called but openTransactionCalls = 0. This probably indicates that there are unbalanced calls to openTransaction/commitTransaction -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6914) parquet-hive cannot write nested map (map value is map)
[ https://issues.apache.org/jira/browse/HIVE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218250#comment-14218250 ] Sergio Peña commented on HIVE-6914: --- Hi [~mickaellcr], It sounds good if you use the patch from HIVE-8359 for this bug. Regarding adding the qtests to HIVE-8909, I think that ticket is meant to fix the reading part of different nested types formats generated by Thrift and Avro tools (it does not touch the writing part); so I think it should be good to have these writing tests separated from the reading tests. parquet-hive cannot write nested map (map value is map) --- Key: HIVE-6914 URL: https://issues.apache.org/jira/browse/HIVE-6914 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.12.0, 0.13.0 Reporter: Tongjie Chen Labels: parquet, serialization Attachments: HIVE-6914.1.patch, HIVE-6914.2.patch // table schema (identical for both plain text version and parquet version) desc hive desc text_mmap; m map // sample nested map entry {level1:{level2_key1:value1,level2_key2:value2}} The following query will fail, insert overwrite table parquet_mmap select * from text_mmap; Caused by: parquet.io.ParquetEncodingException: This should be an ArrayWritable or MapWritable: org.apache.hadoop.hive.ql.io.parquet.writable.BinaryWritable@f2f8106 at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:85) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeArray(DataWritableWriter.java:118) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:80) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:82) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:55) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31) at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:115) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:81) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:37) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:77) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:90) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:622) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) ... 9 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8909) Hive doesn't correctly read Parquet nested types
[ https://issues.apache.org/jira/browse/HIVE-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218252#comment-14218252 ] Sergio Peña commented on HIVE-8909: --- [~rdblue], is this ticket related to the different nested types found on this document? https://github.com/rdblue/incubator-parquet-format/blob/PARQUET-113-add-list-and-map-spec/LogicalTypes.md Hive doesn't correctly read Parquet nested types Key: HIVE-8909 URL: https://issues.apache.org/jira/browse/HIVE-8909 Project: Hive Issue Type: Bug Reporter: Ryan Blue Assignee: Ryan Blue Attachments: HIVE-8909-1.patch Parquet's Avro and Thrift object models don't produce the same parquet type representation for lists and maps that Hive does. In the Parquet community, we've defined what should be written and backward-compatibility rules for existing data written by parquet-avro and parquet-thrift in PARQUET-113. We need to implement those rules in the Hive Converter classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8909) Hive doesn't correctly read Parquet nested types
[ https://issues.apache.org/jira/browse/HIVE-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218286#comment-14218286 ] Ryan Blue commented on HIVE-8909: - Yes. It implements the rules for reading lists in existing data: 1. If the repeated field is not a group, then its type is the element type and elements are required. 2. If the repeated field is a group with multiple fields, then its type is the element type and elements are required. 3. If the repeated field is a group with one field and is named either array or uses the LIST-annotated group's name with _tuple appended then the repeated type is the element type and elements are required. 4. Otherwise, the repeated field's type is the element type with the repeated field's repetition. It also structures the converters to match the other projects. LIST and MAP will use ElementConverter and KeyValueConverter and the list version supports these rules while matching the ArrayWritable structure expected by the SerDe (confirmed by tests that pass in both trunk and this patch). Repeated groups that aren't annotated are deserialized into lists as before, but I changed this to put less work on the DataWritableGroupConverter that is now called StructConverter. Struct needs to support repeated inner groups, but rather than keeping a second array of objects, it passes its start() and end() calls to the repeated children converters, which use them to add the correct object to the struct. It's an easier-to-follow method that produces the same result. (By all means, please verify this!) Hive doesn't correctly read Parquet nested types Key: HIVE-8909 URL: https://issues.apache.org/jira/browse/HIVE-8909 Project: Hive Issue Type: Bug Reporter: Ryan Blue Assignee: Ryan Blue Attachments: HIVE-8909-1.patch Parquet's Avro and Thrift object models don't produce the same parquet type representation for lists and maps that Hive does. In the Parquet community, we've defined what should be written and backward-compatibility rules for existing data written by parquet-avro and parquet-thrift in PARQUET-113. We need to implement those rules in the Hive Converter classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7073) Implement Binary in ParquetSerDe
[ https://issues.apache.org/jira/browse/HIVE-7073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218291#comment-14218291 ] Brock Noland commented on HIVE-7073: instead of outputting raw binary the out file perhaps we should call hex as define here HIVE-2482? Implement Binary in ParquetSerDe Key: HIVE-7073 URL: https://issues.apache.org/jira/browse/HIVE-7073 Project: Hive Issue Type: Sub-task Reporter: David Chen Assignee: Ferdinand Xu Attachments: HIVE-7073.1.patch, HIVE-7073.patch The ParquetSerDe currently does not support the BINARY data type. This ticket is to implement the BINARY data type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8359) Map containing null values are not correctly written in Parquet files
[ https://issues.apache.org/jira/browse/HIVE-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8359: --- Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) Thank you so much Sergio, Ryan and Mickael!! I have committed this contribution to trunk! Map containing null values are not correctly written in Parquet files - Key: HIVE-8359 URL: https://issues.apache.org/jira/browse/HIVE-8359 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.13.1 Reporter: Frédéric TERRAZZONI Assignee: Sergio Peña Fix For: 0.15.0 Attachments: HIVE-8359.1.patch, HIVE-8359.2.patch, HIVE-8359.4.patch, HIVE-8359.5.patch, map_null_val.avro Tried write a mapstring,string column in a Parquet file. The table should contain : {code} {key3:val3,key4:null} {key3:val3,key4:null} {key1:null,key2:val2} {key3:val3,key4:null} {key3:val3,key4:null} {code} ... and when you do a query like {code}SELECT * from mytable{code} We can see that the table is corrupted : {code} {key3:val3} {key4:val3} {key3:val2} {key4:val3} {key1:val3} {code} I've not been able to read the Parquet file in our software afterwards, and consequently I suspect it to be corrupted. For those who are interested, I generated this Parquet table from an Avro file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8909) Hive doesn't correctly read Parquet nested types
[ https://issues.apache.org/jira/browse/HIVE-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218323#comment-14218323 ] Brock Noland commented on HIVE-8909: FYI - I think this patch will need a rebase post HIVE-6914. Additionally once ready, please click {{Submit Patch}} to have the patch tested. Hive doesn't correctly read Parquet nested types Key: HIVE-8909 URL: https://issues.apache.org/jira/browse/HIVE-8909 Project: Hive Issue Type: Bug Reporter: Ryan Blue Assignee: Ryan Blue Attachments: HIVE-8909-1.patch Parquet's Avro and Thrift object models don't produce the same parquet type representation for lists and maps that Hive does. In the Parquet community, we've defined what should be written and backward-compatibility rules for existing data written by parquet-avro and parquet-thrift in PARQUET-113. We need to implement those rules in the Hive Converter classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HIVE-8854: - Status: Patch Available (was: Open) Guava dependency conflict between hive driver and remote spark context[Spark Branch] Key: HIVE-8854 URL: https://issues.apache.org/jira/browse/HIVE-8854 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Marcelo Vanzin Labels: Spark-M3 Attachments: HIVE-8854.1-spark.patch, hive-dirver-classloader-info.output Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark context depends on guava 14.0.1, It should be JobMetrics deserialize failed on Hive driver side since Absent is used in Metrics, here is the hive driver log: {noformat} java.lang.IllegalAccessError: tried to access method com.google.common.base.Optional.init()V from class com.google.common.base.Absent at com.google.common.base.Absent.init(Absent.java:35) at com.google.common.base.Absent.clinit(Absent.java:33) at sun.misc.Unsafe.ensureClassInitialized(Native Method) at sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43) at sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140) at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057) at java.lang.reflect.Field.getFieldAccessor(Field.java:1038) at java.lang.reflect.Field.getLong(Field.java:591) at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663) at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468) at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass.init(ObjectStreamClass.java:468) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136) at akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) at scala.util.Try$.apply(Try.scala:161) at akka.serialization.Serialization.deserialize(Serialization.scala:98) at akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:63) at akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) at scala.util.Try$.apply(Try.scala:161) at akka.serialization.Serialization.deserialize(Serialization.scala:98) at akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23) at akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58) at akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:58) at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:76) at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at
[jira] [Updated] (HIVE-8829) Upgrade to Thrift 0.9.2
[ https://issues.apache.org/jira/browse/HIVE-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8829: --- Resolution: Fixed Status: Resolved (was: Patch Available) Thank you Prasad for the patch and Vaibhav for the report! I have committed this to trunk! Upgrade to Thrift 0.9.2 --- Key: HIVE-8829 URL: https://issues.apache.org/jira/browse/HIVE-8829 Project: Hive Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Vaibhav Gumashta Assignee: Prasad Mujumdar Labels: HiveServer2, metastore Fix For: 0.15.0 Attachments: HIVE-8829.1.patch, HIVE-8829.1.patch Apache Thrift 0.9.2 was released recently (https://thrift.apache.org/download). It has a fix for THRIFT-2660 which can cause HS2 (tcp mode) and Metastore processes to go OOM on getting a non-thrift request when they use SASL transport. The reason ([thrift code|https://github.com/apache/thrift/blob/0.9.x/lib/java/src/org/apache/thrift/transport/TSaslTransport.java#L177]): {code} protected SaslResponse receiveSaslMessage() throws TTransportException { underlyingTransport.readAll(messageHeader, 0, messageHeader.length); byte statusByte = messageHeader[0]; byte[] payload = new byte[EncodingUtils.decodeBigEndian(messageHeader, STATUS_BYTES)]; underlyingTransport.readAll(payload, 0, payload.length); NegotiationStatus status = NegotiationStatus.byValue(statusByte); if (status == null) { sendAndThrowMessage(NegotiationStatus.ERROR, Invalid status + statusByte); } else if (status == NegotiationStatus.BAD || status == NegotiationStatus.ERROR) { try { String remoteMessage = new String(payload, UTF-8); throw new TTransportException(Peer indicated failure: + remoteMessage); } catch (UnsupportedEncodingException e) { throw new TTransportException(e); } } {code} Basically since there are no message format checks / size checks before creating the byte array, on getting a non-SASL message this creates a huge byte array from some garbage size. For HS2, an attempt was made to fix it here: HIVE-6468, which never went in. I think for 0.15.0 it's best to upgarde to Thrift 0.9.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join
[ https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218342#comment-14218342 ] Sergio Peña commented on HIVE-8745: --- [~leftylev] I believed you added a statement on documentation for HIVE-7373 fix; but this patch is reverting the trailing zeroes fix. So you might wanna revert that document statement as well. Joins on decimal keys return different results whether they are run as reduce join or map join -- Key: HIVE-8745 URL: https://issues.apache.org/jira/browse/HIVE-8745 Project: Hive Issue Type: Bug Components: Types Affects Versions: 0.14.0 Reporter: Gunther Hagleitner Assignee: Jason Dere Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8745.1.patch, HIVE-8745.2.patch, HIVE-8745.3.patch, join_test.q See attached .q file to reproduce. The difference seems to be whether trailing 0s are considered the same value or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-8854: Attachment: HIVE-8854.1-spark.patch Not sure why precommit test not picking up, attaching again. Guava dependency conflict between hive driver and remote spark context[Spark Branch] Key: HIVE-8854 URL: https://issues.apache.org/jira/browse/HIVE-8854 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Marcelo Vanzin Labels: Spark-M3 Attachments: HIVE-8854.1-spark.patch, HIVE-8854.1-spark.patch, hive-dirver-classloader-info.output Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark context depends on guava 14.0.1, It should be JobMetrics deserialize failed on Hive driver side since Absent is used in Metrics, here is the hive driver log: {noformat} java.lang.IllegalAccessError: tried to access method com.google.common.base.Optional.init()V from class com.google.common.base.Absent at com.google.common.base.Absent.init(Absent.java:35) at com.google.common.base.Absent.clinit(Absent.java:33) at sun.misc.Unsafe.ensureClassInitialized(Native Method) at sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43) at sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140) at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057) at java.lang.reflect.Field.getFieldAccessor(Field.java:1038) at java.lang.reflect.Field.getLong(Field.java:591) at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663) at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468) at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass.init(ObjectStreamClass.java:468) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136) at akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) at scala.util.Try$.apply(Try.scala:161) at akka.serialization.Serialization.deserialize(Serialization.scala:98) at akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:63) at akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) at scala.util.Try$.apply(Try.scala:161) at akka.serialization.Serialization.deserialize(Serialization.scala:98) at akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23) at akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58) at akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:58) at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:76) at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
[jira] [Commented] (HIVE-8850) ObjectStore:: rollbackTransaction() should set the transaction status to TXN_STATUS.ROLLBACK irrespective of whether it is active or not
[ https://issues.apache.org/jira/browse/HIVE-8850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218356#comment-14218356 ] Sushanth Sowmyan commented on HIVE-8850: Yeah, I agree HIVE-8891 is different, and is necessary as well - I called you guys in here so that I have more people familiar with this looking at this as well. :) And yes, the patch provided here would still not solve the issue. It's a good first step in that it solves the issue of commitTransaction after a rollbackTransaction where the connection is invalidated in the background by something else, such as bonecp, but it does not yet solve the issue of an openTransaction after a rollbackTransaction in a nested scope. ObjectStore:: rollbackTransaction() should set the transaction status to TXN_STATUS.ROLLBACK irrespective of whether it is active or not Key: HIVE-8850 URL: https://issues.apache.org/jira/browse/HIVE-8850 Project: Hive Issue Type: Bug Components: Metastore Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-8850.1.patch We can run into issues as described below: Hive script adds 2800 partitions to a table and during this it can get a SQLState 08S01 [Communication Link Error] and bonecp kill all the connections in the pool. The partitions are added and a create table statement executes (Metering_IngestedData_Compressed). The map job finishes successfully and while moving the table to the hive warehouse the ObjectStore.java commitTransaction() raises the error: commitTransaction was called but openTransactionCalls = 0. This probably indicates that there are unbalanced calls to openTransaction/commitTransaction -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-8638) Implement bucket map join optimization [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang reassigned HIVE-8638: - Assignee: Jimmy Xiang (was: Na Yang) Implement bucket map join optimization [Spark Branch] - Key: HIVE-8638 URL: https://issues.apache.org/jira/browse/HIVE-8638 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Na Yang Assignee: Jimmy Xiang In the hive-on-mr implementation, bucket map join optimization has to depend on the map join hint. While in the hive-on-tez implementation, a join can be automatically converted to bucket map join if certain conditions are met such as: 1. the optimization flag hive.convert.join.bucket.mapjoin.tez is ON 2. all join tables are buckets and each small table's bucket number can be divided by big table's bucket number 3. bucket columns == join columns In the hive-on-spark implementation, it is ideal to have the bucket map join auto-convertion support. when all the required criteria are met, a join can be automatically converted to a bucket map join. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8915) Log file explosion due to non-existence of COMPACTION_QUEUE table
Sushanth Sowmyan created HIVE-8915: -- Summary: Log file explosion due to non-existence of COMPACTION_QUEUE table Key: HIVE-8915 URL: https://issues.apache.org/jira/browse/HIVE-8915 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0, 0.15.0, 0.14.1 Reporter: Sushanth Sowmyan I hit an issue with a fresh set up of hive in a vm, where I did not have db tables as specified by hive-txn-schema-0.14.0.mysql.sql created. On metastore startup, I got an endless loop of errors being populated to the log file, which caused the log file to grow to 1.7GB in 5 minutes, with 950k copies of the same error stack trace in it before I realized what was happening and killed it. We should either have a delay of sorts to make sure we don't endlessly respin on that error so quickly, or we should error out and fail if we're not able to start. The stack trace in question is as follows: {noformat} 2014-11-19 01:44:57,654 ERROR compactor.Cleaner (Cleaner.java:run(143)) - Caught an exception in the main loop of compactor cleaner, MetaException(message:Unable to connect to transaction database com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 'hive.COMPACTION_QUEUE' doesn't exist at sun.reflect.GeneratedConstructorAccessor20.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) at com.mysql.jdbc.Util.getInstance(Util.java:386) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1990) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2619) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2569) at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1524) at com.jolbox.bonecp.StatementHandle.executeQuery(StatementHandle.java:464) at org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.findReadyToClean(CompactionTxnHandler.java:266) at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.run(Cleaner.java:86) ) at org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.findReadyToClean(CompactionTxnHandler.java:291) at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.run(Cleaner.java:86) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8915) Log file explosion due to non-existence of COMPACTION_QUEUE table
[ https://issues.apache.org/jira/browse/HIVE-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218371#comment-14218371 ] Sushanth Sowmyan commented on HIVE-8915: [~alangates], you might be interested in this issue. Log file explosion due to non-existence of COMPACTION_QUEUE table - Key: HIVE-8915 URL: https://issues.apache.org/jira/browse/HIVE-8915 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0, 0.15.0, 0.14.1 Reporter: Sushanth Sowmyan I hit an issue with a fresh set up of hive in a vm, where I did not have db tables as specified by hive-txn-schema-0.14.0.mysql.sql created. On metastore startup, I got an endless loop of errors being populated to the log file, which caused the log file to grow to 1.7GB in 5 minutes, with 950k copies of the same error stack trace in it before I realized what was happening and killed it. We should either have a delay of sorts to make sure we don't endlessly respin on that error so quickly, or we should error out and fail if we're not able to start. The stack trace in question is as follows: {noformat} 2014-11-19 01:44:57,654 ERROR compactor.Cleaner (Cleaner.java:run(143)) - Caught an exception in the main loop of compactor cleaner, MetaException(message:Unable to connect to transaction database com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 'hive.COMPACTION_QUEUE' doesn't exist at sun.reflect.GeneratedConstructorAccessor20.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) at com.mysql.jdbc.Util.getInstance(Util.java:386) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1990) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2619) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2569) at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1524) at com.jolbox.bonecp.StatementHandle.executeQuery(StatementHandle.java:464) at org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.findReadyToClean(CompactionTxnHandler.java:266) at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.run(Cleaner.java:86) ) at org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.findReadyToClean(CompactionTxnHandler.java:291) at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.run(Cleaner.java:86) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8848) data loading from text files or text file processing doesn't handle nulls correctly
[ https://issues.apache.org/jira/browse/HIVE-8848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-8848: --- Status: Patch Available (was: Open) data loading from text files or text file processing doesn't handle nulls correctly --- Key: HIVE-8848 URL: https://issues.apache.org/jira/browse/HIVE-8848 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Attachments: HIVE-8848.patch I am not sure how nulls are supposed to be stored in text tables, but after loading some data with null or NULL strings, or x00 characters, we get bunch of annoying logging from LazyPrimitive that data is not in INT format and was converted to null, with data being null (string saying null, I assume from the code). Either load should load them as nulls, or there should be some defined way to load nulls. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8739) handle Derby errors with joins and filters in Direct SQL in a Derby-specific path
[ https://issues.apache.org/jira/browse/HIVE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218385#comment-14218385 ] Sergey Shelukhin commented on HIVE-8739: It also affects Oracle handle Derby errors with joins and filters in Direct SQL in a Derby-specific path - Key: HIVE-8739 URL: https://issues.apache.org/jira/browse/HIVE-8739 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.15.0 Attachments: HIVE-8739.01.patch, HIVE-8739.02.patch, HIVE-8739.patch, HIVE-8739.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8739) handle Derby and Oracle errors with joins and filters in Direct SQL in a invalid-DB-specific path
[ https://issues.apache.org/jira/browse/HIVE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-8739: --- Summary: handle Derby and Oracle errors with joins and filters in Direct SQL in a invalid-DB-specific path (was: handle Derby errors with joins and filters in Direct SQL in a Derby-specific path) handle Derby and Oracle errors with joins and filters in Direct SQL in a invalid-DB-specific path - Key: HIVE-8739 URL: https://issues.apache.org/jira/browse/HIVE-8739 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.15.0 Attachments: HIVE-8739.01.patch, HIVE-8739.02.patch, HIVE-8739.patch, HIVE-8739.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8876) incorrect upgrade script for Oracle (13-14)
[ https://issues.apache.org/jira/browse/HIVE-8876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-8876: --- Fix Version/s: 0.14.1 incorrect upgrade script for Oracle (13-14) Key: HIVE-8876 URL: https://issues.apache.org/jira/browse/HIVE-8876 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Critical Fix For: 0.15.0, 0.14.1 Attachments: HIVE-8876.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8876) incorrect upgrade script for Oracle (13-14)
[ https://issues.apache.org/jira/browse/HIVE-8876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218393#comment-14218393 ] Sergey Shelukhin commented on HIVE-8876: committed to 14 incorrect upgrade script for Oracle (13-14) Key: HIVE-8876 URL: https://issues.apache.org/jira/browse/HIVE-8876 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Critical Fix For: 0.15.0, 0.14.1 Attachments: HIVE-8876.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-8915) Log file explosion due to non-existence of COMPACTION_QUEUE table
[ https://issues.apache.org/jira/browse/HIVE-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned HIVE-8915: Assignee: Alan Gates Log file explosion due to non-existence of COMPACTION_QUEUE table - Key: HIVE-8915 URL: https://issues.apache.org/jira/browse/HIVE-8915 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0, 0.15.0, 0.14.1 Reporter: Sushanth Sowmyan Assignee: Alan Gates I hit an issue with a fresh set up of hive in a vm, where I did not have db tables as specified by hive-txn-schema-0.14.0.mysql.sql created. On metastore startup, I got an endless loop of errors being populated to the log file, which caused the log file to grow to 1.7GB in 5 minutes, with 950k copies of the same error stack trace in it before I realized what was happening and killed it. We should either have a delay of sorts to make sure we don't endlessly respin on that error so quickly, or we should error out and fail if we're not able to start. The stack trace in question is as follows: {noformat} 2014-11-19 01:44:57,654 ERROR compactor.Cleaner (Cleaner.java:run(143)) - Caught an exception in the main loop of compactor cleaner, MetaException(message:Unable to connect to transaction database com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 'hive.COMPACTION_QUEUE' doesn't exist at sun.reflect.GeneratedConstructorAccessor20.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) at com.mysql.jdbc.Util.getInstance(Util.java:386) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1990) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2619) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2569) at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1524) at com.jolbox.bonecp.StatementHandle.executeQuery(StatementHandle.java:464) at org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.findReadyToClean(CompactionTxnHandler.java:266) at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.run(Cleaner.java:86) ) at org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.findReadyToClean(CompactionTxnHandler.java:291) at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.run(Cleaner.java:86) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Hive-0.14 - Build # 736 - Still Failing
Changes for Build #696 [rohini] PIG-4186: Fix e2e run against new build of pig and some enhancements (rohini) Changes for Build #697 Changes for Build #698 Changes for Build #699 Changes for Build #700 Changes for Build #701 Changes for Build #702 Changes for Build #703 [daijy] HIVE-8484: HCatalog throws an exception if Pig job is of type 'fetch' (Lorand Bendig via Daniel Dai) Changes for Build #704 [gunther] HIVE-8781: Nullsafe joins are busted on Tez (Gunther Hagleitner, reviewed by Prasanth J) Changes for Build #705 [gunther] HIVE-8760: Pass a copy of HiveConf to hooks (Gunther Hagleitner, reviewed by Gopal V) Changes for Build #706 [thejas] HIVE-8772 : zookeeper info logs are always printed from beeline with service discovery mode (Thejas Nair, reviewed by Vaibhav Gumashta) Changes for Build #707 [gunther] HIVE-8782: HBase handler doesn't compile with hadoop-1 (Jimmy Xiang, reviewed by Xuefu and Sergey) Changes for Build #708 Changes for Build #709 [thejas] HIVE-8785 : HiveServer2 LogDivertAppender should be more selective for beeline getLogs (Thejas Nair, reviewed by Gopal V) Changes for Build #710 [vgumashta] HIVE-8764: Windows: HiveServer2 TCP SSL cannot recognize localhost (Vaibhav Gumashta reviewed by Thejas Nair) Changes for Build #711 [gunther] HIVE-8768: CBO: Fix filter selectivity for 'in clause' '' (Laljo John Pullokkaran via Gunther Hagleitner) Changes for Build #712 [gunther] HIVE-8794: Hive on Tez leaks AMs when killed before first dag is run (Gunther Hagleitner, reviewed by Gopal V) Changes for Build #713 [gunther] HIVE-8798: Some Oracle deadlocks not being caught in TxnHandler (Alan Gates via Gunther Hagleitner) Changes for Build #714 [gunther] HIVE-8800: Update release notes and notice for hive .14 (Gunther Hagleitner, reviewed by Prasanth J) [gunther] HIVE-8799: boatload of missing apache headers (Gunther Hagleitner, reviewed by Thejas M Nair) Changes for Build #715 [gunther] Preparing for release 0.14.0 Changes for Build #716 [gunther] Preparing for release 0.14.0 [gunther] Preparing for release 0.14.0 Changes for Build #717 Changes for Build #718 Changes for Build #719 Changes for Build #720 [gunther] HIVE-8811: Dynamic partition pruning can result in NPE during query compilation (Gunther Hagleitner, reviewed by Gopal V) Changes for Build #721 [gunther] HIVE-8805: CBO skipped due to SemanticException: Line 0:-1 Both left and right aliases encountered in JOIN 'avg_cs_ext_discount_amt' (Laljo John Pullokkaran via Gunther Hagleitner) [sershe] HIVE-8715 : Hive 14 upgrade scripts can fail for statistics if database was created using auto-create ADDENDUM (Sergey Shelukhin, reviewed by Ashutosh Chauhan and Gunther Hagleitner) Changes for Build #722 Changes for Build #723 Changes for Build #724 [gunther] HIVE-8845: Switch to Tez 0.5.2 (Gunther Hagleitner, reviewed by Gopal V) Changes for Build #725 [sershe] HIVE-8295 : Add batch retrieve partition objects for metastore direct sql (Selina Zhang and Sergey Shelukhin, reviewed by Ashutosh Chauhan) Changes for Build #726 Changes for Build #727 [gunther] HIVE-8873: Switch to calcite 0.9.2 (Gunther Hagleitner, reviewed by Gopal V) Changes for Build #728 [thejas] HIVE-8830 : hcatalog process don't exit because of non daemon thread (Thejas Nair, reviewed by Eugene Koifman, Sushanth Sowmyan) Changes for Build #729 Changes for Build #730 Changes for Build #731 Changes for Build #732 Changes for Build #733 Changes for Build #734 Changes for Build #735 Changes for Build #736 [sershe] HIVE-8876 : incorrect upgrade script for Oracle (13-14) (Sergey Shelukhin, reviewed by Ashutosh Chauhan) No tests ran. The Apache Jenkins build system has built Hive-0.14 (build #736) Status: Still Failing Check console output at https://builds.apache.org/job/Hive-0.14/736/ to view the results.
[jira] [Created] (HIVE-8916) Handle user@domain username under LDAP authentication
Mohit Sabharwal created HIVE-8916: - Summary: Handle user@domain username under LDAP authentication Key: HIVE-8916 URL: https://issues.apache.org/jira/browse/HIVE-8916 Project: Hive Issue Type: Bug Components: Authentication Reporter: Mohit Sabharwal Assignee: Mohit Sabharwal If LDAP is configured with multiple domains for authentication, users can be in different domains. Currently, LdapAuthenticationProviderImpl blindly appends the domain configured hive.server2.authentication.ldap.Domain to the username, which limits user to that domain. However, under multi-domain authentication, the username may already include the domain (ex: u...@domain.foo.com). We should not append a domain if one is already present. Also, if username already includes the domain, rest of Hive and authorization providers still expects the short name (user and not u...@domain.foo.com) for looking up privilege rules, etc. As such, any domain info in the username should be stripped off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218453#comment-14218453 ] Hive QA commented on HIVE-8854: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12681608/HIVE-8854.1-spark.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 7181 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hive.hcatalog.streaming.TestStreaming.testRemainingTransactions {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/393/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/393/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-393/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12681608 - PreCommit-HIVE-SPARK-Build Guava dependency conflict between hive driver and remote spark context[Spark Branch] Key: HIVE-8854 URL: https://issues.apache.org/jira/browse/HIVE-8854 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Marcelo Vanzin Labels: Spark-M3 Attachments: HIVE-8854.1-spark.patch, HIVE-8854.1-spark.patch, hive-dirver-classloader-info.output Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark context depends on guava 14.0.1, It should be JobMetrics deserialize failed on Hive driver side since Absent is used in Metrics, here is the hive driver log: {noformat} java.lang.IllegalAccessError: tried to access method com.google.common.base.Optional.init()V from class com.google.common.base.Absent at com.google.common.base.Absent.init(Absent.java:35) at com.google.common.base.Absent.clinit(Absent.java:33) at sun.misc.Unsafe.ensureClassInitialized(Native Method) at sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43) at sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140) at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057) at java.lang.reflect.Field.getFieldAccessor(Field.java:1038) at java.lang.reflect.Field.getLong(Field.java:591) at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663) at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468) at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass.init(ObjectStreamClass.java:468) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136) at akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
Review Request 28255: HIVE-8916 : Handle user@domain username under LDAP authentication
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28255/ --- Review request for hive. Bugs: HIVE-8916 https://issues.apache.org/jira/browse/HIVE-8916 Repository: hive-git Description --- HIVE-8916 : Handle user@domain username under LDAP authentication If LDAP is configured with multiple domains for authentication, users can be in different domains. Currently, LdapAuthenticationProviderImpl blindly appends the domain configured hive.server2.authentication.ldap.Domain to the username, which limits user to that domain. However, under multi-domain authentication, the username may already include the domain (ex: u...@domain.foo.com). We should not append a domain if one is already present. Also, if username already includes the domain, rest of Hive and authorization providers still expects the short name (user and not u...@domain.foo.com) for looking up privilege rules, etc. As such, any domain info in the username should be stripped off. Diffs - service/src/java/org/apache/hive/service/ServiceUtils.java PRE-CREATION service/src/java/org/apache/hive/service/auth/LdapAuthenticationProviderImpl.java d075761d079f8a18d7d317483783fe3b801e00d5 service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 3a8ae70d8bd31c9958ea6ae00a2d01c315c80615 Diff: https://reviews.apache.org/r/28255/diff/ Testing --- Configured HS2 for LDAP authentication: property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://foo.ldap.server.com/value /property property namehive.server2.authentication.ldap.Domain/name valuefoo.ldap.domain.com/value /property Ran beeline with user names with and without ldap domain, in both cases authentication works. Before the change, authentication failed if domain was present in username: beeline -u jdbc:hive2://localhost:1 -n u...@foo.ldap.domain.com -p TestPassword --debug beeline -u jdbc:hive2://localhost:1 -n user -p TestPassword --debug Thanks, Mohit Sabharwal
[jira] [Updated] (HIVE-8916) Handle user@domain username under LDAP authentication
[ https://issues.apache.org/jira/browse/HIVE-8916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohit Sabharwal updated HIVE-8916: -- Attachment: HIVE-8916.patch Handle user@domain username under LDAP authentication - Key: HIVE-8916 URL: https://issues.apache.org/jira/browse/HIVE-8916 Project: Hive Issue Type: Bug Components: Authentication Reporter: Mohit Sabharwal Assignee: Mohit Sabharwal Attachments: HIVE-8916.patch If LDAP is configured with multiple domains for authentication, users can be in different domains. Currently, LdapAuthenticationProviderImpl blindly appends the domain configured hive.server2.authentication.ldap.Domain to the username, which limits user to that domain. However, under multi-domain authentication, the username may already include the domain (ex: u...@domain.foo.com). We should not append a domain if one is already present. Also, if username already includes the domain, rest of Hive and authorization providers still expects the short name (user and not u...@domain.foo.com) for looking up privilege rules, etc. As such, any domain info in the username should be stripped off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8916) Handle user@domain username under LDAP authentication
[ https://issues.apache.org/jira/browse/HIVE-8916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohit Sabharwal updated HIVE-8916: -- Status: Patch Available (was: Open) Handle user@domain username under LDAP authentication - Key: HIVE-8916 URL: https://issues.apache.org/jira/browse/HIVE-8916 Project: Hive Issue Type: Bug Components: Authentication Reporter: Mohit Sabharwal Assignee: Mohit Sabharwal Attachments: HIVE-8916.patch If LDAP is configured with multiple domains for authentication, users can be in different domains. Currently, LdapAuthenticationProviderImpl blindly appends the domain configured hive.server2.authentication.ldap.Domain to the username, which limits user to that domain. However, under multi-domain authentication, the username may already include the domain (ex: u...@domain.foo.com). We should not append a domain if one is already present. Also, if username already includes the domain, rest of Hive and authorization providers still expects the short name (user and not u...@domain.foo.com) for looking up privilege rules, etc. As such, any domain info in the username should be stripped off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218496#comment-14218496 ] Szehon Ho commented on HIVE-8854: - +1, test failures dont look related Guava dependency conflict between hive driver and remote spark context[Spark Branch] Key: HIVE-8854 URL: https://issues.apache.org/jira/browse/HIVE-8854 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Marcelo Vanzin Labels: Spark-M3 Attachments: HIVE-8854.1-spark.patch, HIVE-8854.1-spark.patch, hive-dirver-classloader-info.output Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark context depends on guava 14.0.1, It should be JobMetrics deserialize failed on Hive driver side since Absent is used in Metrics, here is the hive driver log: {noformat} java.lang.IllegalAccessError: tried to access method com.google.common.base.Optional.init()V from class com.google.common.base.Absent at com.google.common.base.Absent.init(Absent.java:35) at com.google.common.base.Absent.clinit(Absent.java:33) at sun.misc.Unsafe.ensureClassInitialized(Native Method) at sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43) at sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140) at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057) at java.lang.reflect.Field.getFieldAccessor(Field.java:1038) at java.lang.reflect.Field.getLong(Field.java:591) at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663) at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468) at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass.init(ObjectStreamClass.java:468) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136) at akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) at scala.util.Try$.apply(Try.scala:161) at akka.serialization.Serialization.deserialize(Serialization.scala:98) at akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:63) at akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) at scala.util.Try$.apply(Try.scala:161) at akka.serialization.Serialization.deserialize(Serialization.scala:98) at akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23) at akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58) at akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:58) at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:76) at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at
[jira] [Commented] (HIVE-8893) Implement whitelist for builtin UDFs to avoid untrused code execution in multiuser mode
[ https://issues.apache.org/jira/browse/HIVE-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218512#comment-14218512 ] Szehon Ho commented on HIVE-8893: - Hi Prasad, sorry about that, was looking at the patch again and whitespace is still there on the latest patch, I didnt notice it. Also I took a look and its inconsistent in setupBlockedUdfs() to check for empty for black and white list. While we are changing, can we also use the guava splitter with omitEmptyString() argument for this situation, so the logic is cleaner? Again sorry I didnt look that closely before. Implement whitelist for builtin UDFs to avoid untrused code execution in multiuser mode --- Key: HIVE-8893 URL: https://issues.apache.org/jira/browse/HIVE-8893 Project: Hive Issue Type: Bug Components: Authorization, HiveServer2, SQL Affects Versions: 0.14.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.15.0 Attachments: HIVE-8893.3.patch, HIVE-8893.4.patch, HIVE-8893.5.patch The udfs like reflect() or java_method() enables executing a java method as udf. While this offers lot of flexibility in the standalone mode, it can become a security loophole in a secure multiuser environment. For example, in HiveServer2 one can execute any available java code with user hive's credentials. We need a whitelist and blacklist to restrict builtin udfs in Hiveserver2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 28255: HIVE-8916 : Handle user@domain username under LDAP authentication
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28255/#review62235 --- Hi Mohit, looks great, just one suggestion on rb for your consideration. service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java https://reviews.apache.org/r/28255/#comment104245 Will be it simpler to use a regex like [^\@]+ to find this? - Szehon Ho On Nov. 19, 2014, 8:49 p.m., Mohit Sabharwal wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28255/ --- (Updated Nov. 19, 2014, 8:49 p.m.) Review request for hive. Bugs: HIVE-8916 https://issues.apache.org/jira/browse/HIVE-8916 Repository: hive-git Description --- HIVE-8916 : Handle user@domain username under LDAP authentication If LDAP is configured with multiple domains for authentication, users can be in different domains. Currently, LdapAuthenticationProviderImpl blindly appends the domain configured hive.server2.authentication.ldap.Domain to the username, which limits user to that domain. However, under multi-domain authentication, the username may already include the domain (ex: u...@domain.foo.com). We should not append a domain if one is already present. Also, if username already includes the domain, rest of Hive and authorization providers still expects the short name (user and not u...@domain.foo.com) for looking up privilege rules, etc. As such, any domain info in the username should be stripped off. Diffs - service/src/java/org/apache/hive/service/ServiceUtils.java PRE-CREATION service/src/java/org/apache/hive/service/auth/LdapAuthenticationProviderImpl.java d075761d079f8a18d7d317483783fe3b801e00d5 service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 3a8ae70d8bd31c9958ea6ae00a2d01c315c80615 Diff: https://reviews.apache.org/r/28255/diff/ Testing --- Configured HS2 for LDAP authentication: property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://foo.ldap.server.com/value /property property namehive.server2.authentication.ldap.Domain/name valuefoo.ldap.domain.com/value /property Ran beeline with user names with and without ldap domain, in both cases authentication works. Before the change, authentication failed if domain was present in username: beeline -u jdbc:hive2://localhost:1 -n u...@foo.ldap.domain.com -p TestPassword --debug beeline -u jdbc:hive2://localhost:1 -n user -p TestPassword --debug Thanks, Mohit Sabharwal
[jira] [Commented] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218532#comment-14218532 ] Hive QA commented on HIVE-8854: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12682460/HIVE-8854.1-spark.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 7181 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/394/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/394/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-394/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12682460 - PreCommit-HIVE-SPARK-Build Guava dependency conflict between hive driver and remote spark context[Spark Branch] Key: HIVE-8854 URL: https://issues.apache.org/jira/browse/HIVE-8854 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Marcelo Vanzin Labels: Spark-M3 Attachments: HIVE-8854.1-spark.patch, HIVE-8854.1-spark.patch, hive-dirver-classloader-info.output Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark context depends on guava 14.0.1, It should be JobMetrics deserialize failed on Hive driver side since Absent is used in Metrics, here is the hive driver log: {noformat} java.lang.IllegalAccessError: tried to access method com.google.common.base.Optional.init()V from class com.google.common.base.Absent at com.google.common.base.Absent.init(Absent.java:35) at com.google.common.base.Absent.clinit(Absent.java:33) at sun.misc.Unsafe.ensureClassInitialized(Native Method) at sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43) at sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140) at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057) at java.lang.reflect.Field.getFieldAccessor(Field.java:1038) at java.lang.reflect.Field.getLong(Field.java:591) at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663) at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468) at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass.init(ObjectStreamClass.java:468) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136) at akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
[jira] [Commented] (HIVE-8916) Handle user@domain username under LDAP authentication
[ https://issues.apache.org/jira/browse/HIVE-8916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218529#comment-14218529 ] Szehon Ho commented on HIVE-8916: - Thanks for this fix! Left a comment on the review-board. Handle user@domain username under LDAP authentication - Key: HIVE-8916 URL: https://issues.apache.org/jira/browse/HIVE-8916 Project: Hive Issue Type: Bug Components: Authentication Reporter: Mohit Sabharwal Assignee: Mohit Sabharwal Attachments: HIVE-8916.patch If LDAP is configured with multiple domains for authentication, users can be in different domains. Currently, LdapAuthenticationProviderImpl blindly appends the domain configured hive.server2.authentication.ldap.Domain to the username, which limits user to that domain. However, under multi-domain authentication, the username may already include the domain (ex: u...@domain.foo.com). We should not append a domain if one is already present. Also, if username already includes the domain, rest of Hive and authorization providers still expects the short name (user and not u...@domain.foo.com) for looking up privilege rules, etc. As such, any domain info in the username should be stripped off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8863) Cannot drop table with uppercase name after compute statistics for columns
[ https://issues.apache.org/jira/browse/HIVE-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-8863: -- Status: Patch Available (was: Open) Cannot drop table with uppercase name after compute statistics for columns Key: HIVE-8863 URL: https://issues.apache.org/jira/browse/HIVE-8863 Project: Hive Issue Type: Bug Components: Metastore Reporter: Juan Yu Assignee: Chaoyu Tang Attachments: HIVE-8863.patch Create a table with uppercase name Test, run analyze table Test compute statistics for columns col1 After this, you cannot drop the table by drop table Test; Got error: NestedThrowablesStackTrace: java.sql.BatchUpdateException: Cannot delete or update a parent row: a foreign key constraint fails (hive2.TAB_COL_STATS, CONSTRAINT TAB_COL_STATS_FK FOREIGN KEY (TBL_ID) REFERENCES TBLS (TBL_ID)) workaround is to use lowercase table name drop table test; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-8863) Cannot drop table with uppercase name after compute statistics for columns
[ https://issues.apache.org/jira/browse/HIVE-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang reassigned HIVE-8863: - Assignee: Chaoyu Tang Cannot drop table with uppercase name after compute statistics for columns Key: HIVE-8863 URL: https://issues.apache.org/jira/browse/HIVE-8863 Project: Hive Issue Type: Bug Components: Metastore Reporter: Juan Yu Assignee: Chaoyu Tang Attachments: HIVE-8863.patch Create a table with uppercase name Test, run analyze table Test compute statistics for columns col1 After this, you cannot drop the table by drop table Test; Got error: NestedThrowablesStackTrace: java.sql.BatchUpdateException: Cannot delete or update a parent row: a foreign key constraint fails (hive2.TAB_COL_STATS, CONSTRAINT TAB_COL_STATS_FK FOREIGN KEY (TBL_ID) REFERENCES TBLS (TBL_ID)) workaround is to use lowercase table name drop table test; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8863) Cannot drop table with uppercase name after compute statistics for columns
[ https://issues.apache.org/jira/browse/HIVE-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-8863: -- Attachment: HIVE-8863.patch This case issue does not only occur to the table, but also to the database. For example, drop table TESTDB.test will also fail after compute statistics. Upload a patch with tests. Cannot drop table with uppercase name after compute statistics for columns Key: HIVE-8863 URL: https://issues.apache.org/jira/browse/HIVE-8863 Project: Hive Issue Type: Bug Components: Metastore Reporter: Juan Yu Attachments: HIVE-8863.patch Create a table with uppercase name Test, run analyze table Test compute statistics for columns col1 After this, you cannot drop the table by drop table Test; Got error: NestedThrowablesStackTrace: java.sql.BatchUpdateException: Cannot delete or update a parent row: a foreign key constraint fails (hive2.TAB_COL_STATS, CONSTRAINT TAB_COL_STATS_FK FOREIGN KEY (TBL_ID) REFERENCES TBLS (TBL_ID)) workaround is to use lowercase table name drop table test; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-8854: Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Committed to spark branch. Thanks Marcelo! Guava dependency conflict between hive driver and remote spark context[Spark Branch] Key: HIVE-8854 URL: https://issues.apache.org/jira/browse/HIVE-8854 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Marcelo Vanzin Labels: Spark-M3 Fix For: spark-branch Attachments: HIVE-8854.1-spark.patch, HIVE-8854.1-spark.patch, hive-dirver-classloader-info.output Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark context depends on guava 14.0.1, It should be JobMetrics deserialize failed on Hive driver side since Absent is used in Metrics, here is the hive driver log: {noformat} java.lang.IllegalAccessError: tried to access method com.google.common.base.Optional.init()V from class com.google.common.base.Absent at com.google.common.base.Absent.init(Absent.java:35) at com.google.common.base.Absent.clinit(Absent.java:33) at sun.misc.Unsafe.ensureClassInitialized(Native Method) at sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43) at sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140) at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057) at java.lang.reflect.Field.getFieldAccessor(Field.java:1038) at java.lang.reflect.Field.getLong(Field.java:591) at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663) at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468) at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass.init(ObjectStreamClass.java:468) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136) at akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) at scala.util.Try$.apply(Try.scala:161) at akka.serialization.Serialization.deserialize(Serialization.scala:98) at akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:63) at akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) at scala.util.Try$.apply(Try.scala:161) at akka.serialization.Serialization.deserialize(Serialization.scala:98) at akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23) at akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58) at akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:58) at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:76) at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at
[jira] [Commented] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218546#comment-14218546 ] Gunther Hagleitner commented on HIVE-: -- Don't think the test failures are related. [~prasanth_j] thoughts? I'm +1 on the last patch. Mapjoin with LateralViewJoin generates wrong plan in Tez Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.3.patch, HIVE-.4.patch Queries like these {code} with sub1 as (select aid, avalue from expod1 lateral view explode(av) avs as avalue ), sub2 as (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue) select sub1.aid, sub1.avalue, sub2.bvalue from sub1,sub2 where sub1.aid=sub2.bid; {code} generates twice the number of rows in Tez when compared to MR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218549#comment-14218549 ] Prasanth J commented on HIVE-: -- [~hagleitn] Even I don't think the test failure is related. The code changes should not affect TestCliDriver tests. I ran the test locally and it ran successfully. Also can we have this for 0.14.1? Mapjoin with LateralViewJoin generates wrong plan in Tez Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.3.patch, HIVE-.4.patch Queries like these {code} with sub1 as (select aid, avalue from expod1 lateral view explode(av) avs as avalue ), sub2 as (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue) select sub1.aid, sub1.avalue, sub2.bvalue from sub1,sub2 where sub1.aid=sub2.bid; {code} generates twice the number of rows in Tez when compared to MR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7948) Add an E2E test to verify fix for HIVE-7155
[ https://issues.apache.org/jira/browse/HIVE-7948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218560#comment-14218560 ] Eugene Koifman commented on HIVE-7948: -- there is currently only 1 webhcat-site.xml. This patch modifies it to set the templeton.mapper.memory.mb such that every (at least most) job will fail. So basically I think this patch will break a lot of other tests. Add an E2E test to verify fix for HIVE-7155 Key: HIVE-7948 URL: https://issues.apache.org/jira/browse/HIVE-7948 Project: Hive Issue Type: Test Components: Tests, WebHCat Reporter: Aswathy Chellammal Sreekumar Assignee: Aswathy Chellammal Sreekumar Priority: Minor Attachments: HIVE-7948.1.patch, HIVE-7948.patch E2E Test to verify webhcat property templeton.mapper.memory.mb correctly overrides mapreduce.map.memory.mb. The feature was added as part of HIVE-7155. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8917) HIVE-5679 adds two thread safety problems
[ https://issues.apache.org/jira/browse/HIVE-8917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218567#comment-14218567 ] Brock Noland commented on HIVE-8917: FYI [~sershe] HIVE-5679 adds two thread safety problems - Key: HIVE-8917 URL: https://issues.apache.org/jira/browse/HIVE-8917 Project: Hive Issue Type: Bug Reporter: Brock Noland HIVE-5679 adds two static {{SimpleDateFormat}} objects and {{SimpleDateFormat}} is not thread safe. These should be converted to thread locals. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8917) HIVE-5679 adds two thread safety problems
Brock Noland created HIVE-8917: -- Summary: HIVE-5679 adds two thread safety problems Key: HIVE-8917 URL: https://issues.apache.org/jira/browse/HIVE-8917 Project: Hive Issue Type: Bug Reporter: Brock Noland HIVE-5679 adds two static {{SimpleDateFormat}} objects and {{SimpleDateFormat}} is not thread safe. These should be converted to thread locals. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8918) Beeline terminal cannot be initialized due to jline2 change
Sergio Peña created HIVE-8918: - Summary: Beeline terminal cannot be initialized due to jline2 change Key: HIVE-8918 URL: https://issues.apache.org/jira/browse/HIVE-8918 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Sergio Peña I fetched the latest changes from trunk, and I got the following error when attempting to execute beeline: {noformat} ERROR] Terminal initialization failed; falling back to unsupported java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at jline.TerminalFactory.create(TerminalFactory.java:101) at jline.TerminalFactory.get(TerminalFactory.java:158) at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:73) at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Exception in thread main java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:101) at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} I executed the following command: {noformat} hive --service beeline -u jdbc:hive2://localhost:1 -n sergio {noformat} The commit before the jline2 is working fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8918) Beeline terminal cannot be initialized due to jline2 change
[ https://issues.apache.org/jira/browse/HIVE-8918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218574#comment-14218574 ] Sergio Peña commented on HIVE-8918: --- FYI [~Ferd]. You worked on moving jline2, so you might have some ideas about what is happening. Beeline terminal cannot be initialized due to jline2 change --- Key: HIVE-8918 URL: https://issues.apache.org/jira/browse/HIVE-8918 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Sergio Peña I fetched the latest changes from trunk, and I got the following error when attempting to execute beeline: {noformat} ERROR] Terminal initialization failed; falling back to unsupported java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at jline.TerminalFactory.create(TerminalFactory.java:101) at jline.TerminalFactory.get(TerminalFactory.java:158) at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:73) at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Exception in thread main java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:101) at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} I executed the following command: {noformat} hive --service beeline -u jdbc:hive2://localhost:1 -n sergio {noformat} The commit before the jline2 is working fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-8730) schemaTool failure when date partition has non-date value
[ https://issues.apache.org/jira/browse/HIVE-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang reassigned HIVE-8730: - Assignee: Chaoyu Tang schemaTool failure when date partition has non-date value - Key: HIVE-8730 URL: https://issues.apache.org/jira/browse/HIVE-8730 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.0 Environment: CDH5.2 Reporter: Johndee Burks Assignee: Chaoyu Tang Priority: Minor If there is a none date value in the PART_KEY_VAL column within the PARTITION_KEY_VALS table in the metastore db, this will cause the HIVE-5700 script to fail. The failure will be picked up by the schemaTool causing the upgrade to fail. A classic example of a value that can be present without users really being aware is __HIVE_DEFAULT_PARTITION__ which is filled in by hive automatically when doing dynamic partitioning and value is not present in source data for the partition column. The reason for the failure is that the upgrade script does not account for none date values. What it is currently: {code} UPDATE PARTITION_KEY_VALS INNER JOIN PARTITIONS ON PARTITION_KEY_VALS.PART_ID = PARTITIONS.PART_ID INNER JOIN PARTITION_KEYS ON PARTITION_KEYS.TBL_ID = PARTITIONS.TBL_ID AND PARTITION_KEYS.INTEGER_IDX = PARTITION_KEY_VALS.INTEGER_IDX AND PARTITION_KEYS.PKEY_TYPE = 'date' SET PART_KEY_VAL = IFNULL(DATE_FORMAT(cast(PART_KEY_VAL as date),'%Y-%m-%d'), PART_KEY_VAL); {code} What it should be to avoid issue: {code} UPDATE PARTITION_KEY_VALS INNER JOIN PARTITIONS ON PARTITION_KEY_VALS.PART_ID = PARTITIONS.PART_ID INNER JOIN PARTITION_KEYS ON PARTITION_KEYS.TBL_ID = PARTITIONS.TBL_ID AND PARTITION_KEYS.INTEGER_IDX = PARTITION_KEY_VALS.INTEGER_IDX AND PARTITION_KEYS.PKEY_TYPE = 'date' AND PART_KEY_VAL != '__HIVE_DEFAULT_PARTITION__' SET PART_KEY_VAL = IFNULL(DATE_FORMAT(cast(PART_KEY_VAL as date),'%Y-%m-%d'), PART_KEY_VAL); {code} == Metastore DB {code} mysql select * from PARTITION_KEY_VALS; +-++-+ | PART_ID | PART_KEY_VAL | INTEGER_IDX | +-++-+ | 171 | 2099-12-31 | 0 | | 172 | __HIVE_DEFAULT_PARTITION__ | 0 | | 184 | 2099-12-01 | 0 | | 185 | 2099-12-30 | 0 | +-++-+ {code} == stdout.log {code} 0: jdbc:mysql://10.16.8.121:3306/metastore !autocommit on 0: jdbc:mysql://10.16.8.121:3306/metastore SELECT 'Upgrading MetaStore schema from 0.12.0 to 0.13.0' AS ' ' +---+--+ | | +---+--+ | Upgrading MetaStore schema from 0.12.0 to 0.13.0 | +---+--+ 0: jdbc:mysql://10.16.8.121:3306/metastore SELECT ' HIVE-5700 enforce single date format for partition column storage ' AS ' ' ++--+ || ++--+ | HIVE-5700 enforce single date format for partition column storage | ++--+ 0: jdbc:mysql://10.16.8.121:3306/metastore UPDATE PARTITION_KEY_VALS INNER JOIN PARTITIONS ON PARTITION_KEY_VALS.PART_ID = PARTITIONS.PART_ID INNER JOIN PARTITION_KEYS ON PARTITION_KEYS.TBL_ID = PARTITIONS.TBL_ID AND PARTITION_KEYS.INTEGER_IDX = PARTITION_KEY_VALS.INTEGER_IDX AND PARTITION_KEYS.PKEY_TYPE = 'date' SET PART_KEY_VAL = IFNULL(DATE_FORMAT(cast(PART_KEY_VAL as date),'%Y-%m-%d'), PART_KEY_VAL) {code} == stderr.log {code} exec /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop/bin/hadoop jar /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib/hive-cli-0.13.1-cdh5.2.0.jar org.apache.hive.beeline.HiveSchemaTool -verbose -dbType mysql -upgradeSchema Connecting to jdbc:mysql://10.16.8.121:3306/metastore?useUnicode=truecharacterEncoding=UTF-8 Connected to: MySQL (version 5.1.73) Driver: MySQL-AB JDBC Driver (version mysql-connector-java-5.1.17-SNAPSHOT ( Revision: ${bzr.revision-id} )) Transaction isolation: TRANSACTION_READ_COMMITTED Autocommit status: true 1 row selected (0.025 seconds) 1 row selected (0.004 seconds) Closing: 0: jdbc:mysql://10.16.8.121:3306/metastore?useUnicode=truecharacterEncoding=UTF-8 org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore state would be
[jira] [Updated] (HIVE-6421) abs() should preserve precision/scale of decimal input
[ https://issues.apache.org/jira/browse/HIVE-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-6421: - Attachment: HIVE-6421.2.patch re-upload to run precommit tests abs() should preserve precision/scale of decimal input -- Key: HIVE-6421 URL: https://issues.apache.org/jira/browse/HIVE-6421 Project: Hive Issue Type: Bug Components: UDF Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-6421.1.txt, HIVE-6421.2.patch {noformat} hive describe dec1; OK c1decimal(10,2) None hive explain select c1, abs(c1) from dec1; ... Select Operator expressions: c1 (type: decimal(10,2)), abs(c1) (type: decimal(38,18)) {noformat} Given that abs() is a GenericUDF it should be possible for the return type precision/scale to match the input precision/scale. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8909) Hive doesn't correctly read Parquet nested types
[ https://issues.apache.org/jira/browse/HIVE-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue updated HIVE-8909: Attachment: HIVE-8909-2.patch Rebased patch on Sergio's changes. This didn't conflict except for the change to ArrayWritableGroupConverter, which was removed (so any change would conflict). Hive doesn't correctly read Parquet nested types Key: HIVE-8909 URL: https://issues.apache.org/jira/browse/HIVE-8909 Project: Hive Issue Type: Bug Reporter: Ryan Blue Assignee: Ryan Blue Attachments: HIVE-8909-1.patch, HIVE-8909-2.patch Parquet's Avro and Thrift object models don't produce the same parquet type representation for lists and maps that Hive does. In the Parquet community, we've defined what should be written and backward-compatibility rules for existing data written by parquet-avro and parquet-thrift in PARQUET-113. We need to implement those rules in the Hive Converter classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8909) Hive doesn't correctly read Parquet nested types
[ https://issues.apache.org/jira/browse/HIVE-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue updated HIVE-8909: Affects Version/s: 0.13.1 Status: Patch Available (was: Open) Hive doesn't correctly read Parquet nested types Key: HIVE-8909 URL: https://issues.apache.org/jira/browse/HIVE-8909 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Ryan Blue Assignee: Ryan Blue Attachments: HIVE-8909-1.patch, HIVE-8909-2.patch Parquet's Avro and Thrift object models don't produce the same parquet type representation for lists and maps that Hive does. In the Parquet community, we've defined what should be written and backward-compatibility rules for existing data written by parquet-avro and parquet-thrift in PARQUET-113. We need to implement those rules in the Hive Converter classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 28145: HIVE-8883 - Investigate test failures on auto_join30.q [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28145/ --- (Updated Nov. 19, 2014, 11:35 p.m.) Review request for hive, Jimmy Xiang and Szehon Ho. Bugs: HIVE-8883 https://issues.apache.org/jira/browse/HIVE-8883 Repository: hive-git Description --- This test fails with the following stack trace: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: spark.SparkReduceRecordHandler (SparkReduceRecordHandler.java:processRow(285)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_0},value:{_col0:0}} at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected exception: null at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319) ... 14 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257) ... 17 more auto_join27.q and auto_join31.q seem to fail with the same error. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 2895d80 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 141ae6f Diff: https://reviews.apache.org/r/28145/diff/ Testing --- Tested with auto_join30.q, auto_join31.q, and auto_join27.q. They now generates correct results. Thanks, Chao Sun
[jira] [Updated] (HIVE-8883) Investigate test failures on auto_join30.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao updated HIVE-8883: --- Attachment: HIVE-8883.3-spark.patch This patch solves two issues when a MapJoinOperator is in a ReduceWork 1. Like in SparkMapRecordHandler, in SparkReduceRecordHandler, we also need to initialize all the dummy operators associated with the MJ operator, and close them at end; 2. in HashTableLoader, the currentInputPath wil be null, since it's only set in a MapWork. It looks hard to pass the path info between MapWork and ReduceWork. Currently, if this is the case, we just pass null to {{getBucketFileName}}, which wil treat it as a non-bucket join case. This should be fine since for bucket join the MJ operator will never be in a ReduceWork. Investigate test failures on auto_join30.q [Spark Branch] - Key: HIVE-8883 URL: https://issues.apache.org/jira/browse/HIVE-8883 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao Fix For: spark-branch Attachments: HIVE-8883.1-spark.patch, HIVE-8883.2-spark.patch, HIVE-8883.3-spark.patch This test fails with the following stack trace: {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: spark.SparkReduceRecordHandler (SparkReduceRecordHandler.java:processRow(285)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_0},value:{_col0:0}} at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected exception: null at
Re: Review Request 28145: HIVE-8883 - Investigate test failures on auto_join30.q [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28145/#review62285 --- ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java https://reviews.apache.org/r/28145/#comment104311 We don't need this any more? - Jimmy Xiang On Nov. 19, 2014, 11:35 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28145/ --- (Updated Nov. 19, 2014, 11:35 p.m.) Review request for hive, Jimmy Xiang and Szehon Ho. Bugs: HIVE-8883 https://issues.apache.org/jira/browse/HIVE-8883 Repository: hive-git Description --- This test fails with the following stack trace: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: spark.SparkReduceRecordHandler (SparkReduceRecordHandler.java:processRow(285)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_0},value:{_col0:0}} at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected exception: null at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319) ... 14 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257) ... 17 more
[jira] [Updated] (HIVE-8893) Implement whitelist for builtin UDFs to avoid untrused code execution in multiuser mode
[ https://issues.apache.org/jira/browse/HIVE-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-8893: -- Attachment: HIVE-8893.6.patch Updated patch that addressed review feedback. Implement whitelist for builtin UDFs to avoid untrused code execution in multiuser mode --- Key: HIVE-8893 URL: https://issues.apache.org/jira/browse/HIVE-8893 Project: Hive Issue Type: Bug Components: Authorization, HiveServer2, SQL Affects Versions: 0.14.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.15.0 Attachments: HIVE-8893.3.patch, HIVE-8893.4.patch, HIVE-8893.5.patch, HIVE-8893.6.patch The udfs like reflect() or java_method() enables executing a java method as udf. While this offers lot of flexibility in the standalone mode, it can become a security loophole in a secure multiuser environment. For example, in HiveServer2 one can execute any available java code with user hive's credentials. We need a whitelist and blacklist to restrict builtin udfs in Hiveserver2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8850) ObjectStore:: rollbackTransaction() and getHelper class needs to be looked into further.
[ https://issues.apache.org/jira/browse/HIVE-8850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-8850: Summary: ObjectStore:: rollbackTransaction() and getHelper class needs to be looked into further. (was: ObjectStore:: rollbackTransaction() should set the transaction status to TXN_STATUS.ROLLBACK irrespective of whether it is active or not) ObjectStore:: rollbackTransaction() and getHelper class needs to be looked into further. Key: HIVE-8850 URL: https://issues.apache.org/jira/browse/HIVE-8850 Project: Hive Issue Type: Bug Components: Metastore Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-8850.1.patch We can run into issues as described below: Hive script adds 2800 partitions to a table and during this it can get a SQLState 08S01 [Communication Link Error] and bonecp kill all the connections in the pool. The partitions are added and a create table statement executes (Metering_IngestedData_Compressed). The map job finishes successfully and while moving the table to the hive warehouse the ObjectStore.java commitTransaction() raises the error: commitTransaction was called but openTransactionCalls = 0. This probably indicates that there are unbalanced calls to openTransaction/commitTransaction -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8919) Fix FileUtils.copy() method to call distcp only for HDFS files (not local files)
Sergio Peña created HIVE-8919: - Summary: Fix FileUtils.copy() method to call distcp only for HDFS files (not local files) Key: HIVE-8919 URL: https://issues.apache.org/jira/browse/HIVE-8919 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña When loading a big file ( 32Mb) from the local filesystem to the HDFS filesystem, Hive fails because the local filesystem cannot load the 'distcp' class. The 'distcp' class is used only by HDFS filesystem. We should use distcp only when copying files between the HDFS filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HIVE-8919) Fix FileUtils.copy() method to call distcp only for HDFS files (not local files)
[ https://issues.apache.org/jira/browse/HIVE-8919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-8919 started by Sergio Peña. - Fix FileUtils.copy() method to call distcp only for HDFS files (not local files) Key: HIVE-8919 URL: https://issues.apache.org/jira/browse/HIVE-8919 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña When loading a big file ( 32Mb) from the local filesystem to the HDFS filesystem, Hive fails because the local filesystem cannot load the 'distcp' class. The 'distcp' class is used only by HDFS filesystem. We should use distcp only when copying files between the HDFS filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 28145: HIVE-8883 - Investigate test failures on auto_join30.q [Spark Branch]
On Nov. 19, 2014, 11:50 p.m., Jimmy Xiang wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java, line 74 https://reviews.apache.org/r/28145/diff/3/?file=770558#file770558line74 We don't need this any more? I was thinking about cleaning it and then restoring the code in the non-staged map join JIRA. But, after talking with Szehon, I decided to keep it anyway. - Chao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28145/#review62285 --- On Nov. 19, 2014, 11:35 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28145/ --- (Updated Nov. 19, 2014, 11:35 p.m.) Review request for hive, Jimmy Xiang and Szehon Ho. Bugs: HIVE-8883 https://issues.apache.org/jira/browse/HIVE-8883 Repository: hive-git Description --- This test fails with the following stack trace: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: spark.SparkReduceRecordHandler (SparkReduceRecordHandler.java:processRow(285)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_0},value:{_col0:0}} at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected exception: null at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at
Re: Review Request 28145: HIVE-8883 - Investigate test failures on auto_join30.q [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28145/ --- (Updated Nov. 19, 2014, 11:57 p.m.) Review request for hive, Jimmy Xiang and Szehon Ho. Bugs: HIVE-8883 https://issues.apache.org/jira/browse/HIVE-8883 Repository: hive-git Description --- This test fails with the following stack trace: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: spark.SparkReduceRecordHandler (SparkReduceRecordHandler.java:processRow(285)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_0},value:{_col0:0}} at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected exception: null at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319) ... 14 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257) ... 17 more auto_join27.q and auto_join31.q seem to fail with the same error. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 2895d80 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 141ae6f Diff: https://reviews.apache.org/r/28145/diff/ Testing --- Tested with auto_join30.q, auto_join31.q, and auto_join27.q. They now generates correct results. Thanks, Chao Sun
[jira] [Updated] (HIVE-8883) Investigate test failures on auto_join30.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao updated HIVE-8883: --- Attachment: HIVE-8883.4-spark.patch Investigate test failures on auto_join30.q [Spark Branch] - Key: HIVE-8883 URL: https://issues.apache.org/jira/browse/HIVE-8883 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao Fix For: spark-branch Attachments: HIVE-8883.1-spark.patch, HIVE-8883.2-spark.patch, HIVE-8883.3-spark.patch, HIVE-8883.4-spark.patch This test fails with the following stack trace: {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: spark.SparkReduceRecordHandler (SparkReduceRecordHandler.java:processRow(285)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_0},value:{_col0:0}} at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected exception: null at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319) ... 14 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257) ... 17 more {noformat} {{auto_join27.q}} and {{auto_join31.q}} seem to fail with the same error. -- This message was sent by
[jira] [Updated] (HIVE-8919) Fix FileUtils.copy() method to call distcp only for HDFS files (not local files)
[ https://issues.apache.org/jira/browse/HIVE-8919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8919: -- Status: Patch Available (was: In Progress) Fix FileUtils.copy() method to call distcp only for HDFS files (not local files) Key: HIVE-8919 URL: https://issues.apache.org/jira/browse/HIVE-8919 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña Attachments: CDH-23392.1.patch When loading a big file ( 32Mb) from the local filesystem to the HDFS filesystem, Hive fails because the local filesystem cannot load the 'distcp' class. The 'distcp' class is used only by HDFS filesystem. We should use distcp only when copying files between the HDFS filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8919) Fix FileUtils.copy() method to call distcp only for HDFS files (not local files)
[ https://issues.apache.org/jira/browse/HIVE-8919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8919: -- Attachment: CDH-23392.1.patch Fix FileUtils.copy() method to call distcp only for HDFS files (not local files) Key: HIVE-8919 URL: https://issues.apache.org/jira/browse/HIVE-8919 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña Attachments: CDH-23392.1.patch When loading a big file ( 32Mb) from the local filesystem to the HDFS filesystem, Hive fails because the local filesystem cannot load the 'distcp' class. The 'distcp' class is used only by HDFS filesystem. We should use distcp only when copying files between the HDFS filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 28145: HIVE-8883 - Investigate test failures on auto_join30.q [Spark Branch]
On Nov. 19, 2014, 11:50 p.m., Jimmy Xiang wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java, line 74 https://reviews.apache.org/r/28145/diff/3/?file=770558#file770558line74 We don't need this any more? Chao Sun wrote: I was thinking about cleaning it and then restoring the code in the non-staged map join JIRA. But, after talking with Szehon, I decided to keep it anyway. I see. Perhaps, you can move it around in the non-staged map join JIRA. - Jimmy --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28145/#review62285 --- On Nov. 19, 2014, 11:57 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28145/ --- (Updated Nov. 19, 2014, 11:57 p.m.) Review request for hive, Jimmy Xiang and Szehon Ho. Bugs: HIVE-8883 https://issues.apache.org/jira/browse/HIVE-8883 Repository: hive-git Description --- This test fails with the following stack trace: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: spark.SparkReduceRecordHandler (SparkReduceRecordHandler.java:processRow(285)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_0},value:{_col0:0}} at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected exception: null at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at
[jira] [Commented] (HIVE-8266) create function using resource statement compilation should include resource URI entity
[ https://issues.apache.org/jira/browse/HIVE-8266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218743#comment-14218743 ] Prasad Mujumdar commented on HIVE-8266: --- [~leftylev] That's correct, it's not changing any user experience. Doesn't need a doc change. Thanks! create function using resource statement compilation should include resource URI entity - Key: HIVE-8266 URL: https://issues.apache.org/jira/browse/HIVE-8266 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.13.1 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.15.0 Attachments: HIVE-8266.2.patch, HIVE-8266.3.patch The compiler add function name and db name as write entities for create function using resource statement. We should also include the resource URI path in the write entity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-4009) CLI Tests fail randomly due to MapReduce LocalJobRunner race condition
[ https://issues.apache.org/jira/browse/HIVE-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland reopened HIVE-4009: CLI Tests fail randomly due to MapReduce LocalJobRunner race condition -- Key: HIVE-4009 URL: https://issues.apache.org/jira/browse/HIVE-4009 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-4009-0.patch Hadoop has a race condition MAPREDUCE-5001 which causes tests to fail randomly when using LocalJobRunner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4009) CLI Tests fail randomly due to MapReduce LocalJobRunner race condition
[ https://issues.apache.org/jira/browse/HIVE-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218748#comment-14218748 ] Brock Noland commented on HIVE-4009: I've seen this again. Time to fix it. CLI Tests fail randomly due to MapReduce LocalJobRunner race condition -- Key: HIVE-4009 URL: https://issues.apache.org/jira/browse/HIVE-4009 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-4009-0.patch Hadoop has a race condition MAPREDUCE-5001 which causes tests to fail randomly when using LocalJobRunner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-4009) CLI Tests fail randomly due to MapReduce LocalJobRunner race condition
[ https://issues.apache.org/jira/browse/HIVE-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4009: --- Attachment: HIVE-4009.patch CLI Tests fail randomly due to MapReduce LocalJobRunner race condition -- Key: HIVE-4009 URL: https://issues.apache.org/jira/browse/HIVE-4009 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-4009-0.patch, HIVE-4009.patch Hadoop has a race condition MAPREDUCE-5001 which causes tests to fail randomly when using LocalJobRunner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4009) CLI Tests fail randomly due to MapReduce LocalJobRunner race condition
[ https://issues.apache.org/jira/browse/HIVE-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218751#comment-14218751 ] Brock Noland commented on HIVE-4009: Too be clear, although MAPREDUCE-5001 improves the situation in that an exception is not throw, it's still possible for LJR to return null an fail. This happens on hosts which are very busy. Let's just not the racy status section of code when in local mode. CLI Tests fail randomly due to MapReduce LocalJobRunner race condition -- Key: HIVE-4009 URL: https://issues.apache.org/jira/browse/HIVE-4009 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-4009-0.patch, HIVE-4009.patch Hadoop has a race condition MAPREDUCE-5001 which causes tests to fail randomly when using LocalJobRunner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 28145: HIVE-8883 - Investigate test failures on auto_join30.q [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28145/#review62295 --- Ship it! Ship It! - Szehon Ho On Nov. 19, 2014, 11:57 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28145/ --- (Updated Nov. 19, 2014, 11:57 p.m.) Review request for hive, Jimmy Xiang and Szehon Ho. Bugs: HIVE-8883 https://issues.apache.org/jira/browse/HIVE-8883 Repository: hive-git Description --- This test fails with the following stack trace: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: spark.SparkReduceRecordHandler (SparkReduceRecordHandler.java:processRow(285)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_0},value:{_col0:0}} at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected exception: null at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319) ... 14 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257) ... 17 more auto_join27.q and auto_join31.q seem to fail with the same error. Diffs -
[jira] [Commented] (HIVE-8893) Implement whitelist for builtin UDFs to avoid untrused code execution in multiuser mode
[ https://issues.apache.org/jira/browse/HIVE-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218759#comment-14218759 ] Szehon Ho commented on HIVE-8893: - Thanks! This looks great, will commit it once tests pass. Implement whitelist for builtin UDFs to avoid untrused code execution in multiuser mode --- Key: HIVE-8893 URL: https://issues.apache.org/jira/browse/HIVE-8893 Project: Hive Issue Type: Bug Components: Authorization, HiveServer2, SQL Affects Versions: 0.14.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.15.0 Attachments: HIVE-8893.3.patch, HIVE-8893.4.patch, HIVE-8893.5.patch, HIVE-8893.6.patch The udfs like reflect() or java_method() enables executing a java method as udf. While this offers lot of flexibility in the standalone mode, it can become a security loophole in a secure multiuser environment. For example, in HiveServer2 one can execute any available java code with user hive's credentials. We need a whitelist and blacklist to restrict builtin udfs in Hiveserver2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8883) Investigate test failures on auto_join30.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218763#comment-14218763 ] Szehon Ho commented on HIVE-8883: - Thanks Chao, +1 on latest patch Investigate test failures on auto_join30.q [Spark Branch] - Key: HIVE-8883 URL: https://issues.apache.org/jira/browse/HIVE-8883 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao Fix For: spark-branch Attachments: HIVE-8883.1-spark.patch, HIVE-8883.2-spark.patch, HIVE-8883.3-spark.patch, HIVE-8883.4-spark.patch This test fails with the following stack trace: {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: spark.SparkReduceRecordHandler (SparkReduceRecordHandler.java:processRow(285)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_0},value:{_col0:0}} at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected exception: null at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319) ... 14 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257) ... 17 more {noformat} {{auto_join27.q}} and {{auto_join31.q}} seem to
[jira] [Commented] (HIVE-8889) JDBC Driver ResultSet.getXXXXXX(String columnLabel) methods Broken
[ https://issues.apache.org/jira/browse/HIVE-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218769#comment-14218769 ] G Lingle commented on HIVE-8889: yep, don't use the as name my code is like below. Try this: {code} String sql = select * from src; ResultSet res = stmt.executeQuery(sql); while (res.next()) { System.out.println(key: + res.getString(key)); } {code} When it runs, an exception is thrown on the res.getString(). When I step into the code I see that normalizedColumnNames contains table name.column name. In the example above you'd see src.key and src.value in the normalizedColumnNames list, neither of those match the requested column name key, so the exception is thrown. HTH and Thanks for the prompt response, G JDBC Driver ResultSet.getXX(String columnLabel) methods Broken -- Key: HIVE-8889 URL: https://issues.apache.org/jira/browse/HIVE-8889 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: G Lingle Assignee: Chaoyu Tang Priority: Critical Using hive-jdbc-0.13.1-cdh5.2.0.jar. All of the get-by-column-label methods of HiveBaseResultSet are now broken. They don't take just the column label as they should. Instead you have to pass in table name.column name. This requirement doesn't conform to the java ResultSet API which specifies: columnLabel - the label for the column specified with the SQL AS clause. If the SQL AS clause was not specified, then the label is the name of the column Looking at the code, it seems that the problem is that findColumn() method is looking in normalizedColumnNames instead of the columnNames. BTW, Another annoying issue with the code is that the SQLException thrown gives no indication of what the problem is. It should at least say that the column name wasn't found in the description string. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8918) Beeline terminal cannot be initialized due to jline2 change
[ https://issues.apache.org/jira/browse/HIVE-8918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218771#comment-14218771 ] Ferdinand Xu commented on HIVE-8918: Have you done the steps mentioned in HIVE-8609 which will be DOC before starting the beeline? One thing needs to be DOC that users should backup and remove the jline-0.9.94.jar file under the path $HADOOP_HOME/share/hadoop/yarn/lib/jline-0.9.94.jar which is conflict with beeline's dependency before using beeline. Once YARN-2815 resolved, the jline-0.9.94.jar will be removed. Beeline terminal cannot be initialized due to jline2 change --- Key: HIVE-8918 URL: https://issues.apache.org/jira/browse/HIVE-8918 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Sergio Peña I fetched the latest changes from trunk, and I got the following error when attempting to execute beeline: {noformat} ERROR] Terminal initialization failed; falling back to unsupported java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at jline.TerminalFactory.create(TerminalFactory.java:101) at jline.TerminalFactory.get(TerminalFactory.java:158) at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:73) at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Exception in thread main java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:101) at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} I executed the following command: {noformat} hive --service beeline -u jdbc:hive2://localhost:1 -n sergio {noformat} The commit before the jline2 is working fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8889) JDBC Driver ResultSet.getXXXXXX(String columnLabel) methods Broken
[ https://issues.apache.org/jira/browse/HIVE-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218780#comment-14218780 ] G Lingle commented on HIVE-8889: We'd been using this code for over a year and it was working fine before upgrading to cdh5 release. JDBC Driver ResultSet.getXX(String columnLabel) methods Broken -- Key: HIVE-8889 URL: https://issues.apache.org/jira/browse/HIVE-8889 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: G Lingle Assignee: Chaoyu Tang Priority: Critical Using hive-jdbc-0.13.1-cdh5.2.0.jar. All of the get-by-column-label methods of HiveBaseResultSet are now broken. They don't take just the column label as they should. Instead you have to pass in table name.column name. This requirement doesn't conform to the java ResultSet API which specifies: columnLabel - the label for the column specified with the SQL AS clause. If the SQL AS clause was not specified, then the label is the name of the column Looking at the code, it seems that the problem is that findColumn() method is looking in normalizedColumnNames instead of the columnNames. BTW, Another annoying issue with the code is that the SQLException thrown gives no indication of what the problem is. It should at least say that the column name wasn't found in the description string. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8920) SplitSparkWorkResolver doesn't work with UnionWork
Chao created HIVE-8920: -- Summary: SplitSparkWorkResolver doesn't work with UnionWork Key: HIVE-8920 URL: https://issues.apache.org/jira/browse/HIVE-8920 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao The following query will not work: {code} from (select * from table0 union all select * from table1) s insert overwrite table table3 select s.x, count(1) group by s.x insert overwrite table table4 select s.y, count(1) group by s.y; {code} Currently, the plan for this query, before SplitSparkWorkResolver, looks like below: {noformat} M1M2 \ / \ U3 R5 | R4 {noformat} In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the {{childWork}} is a ReduceWork, but for this case, you can see that for M2 the childWork could be UnionWork U3. Thus, the code will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8920) SplitSparkWorkResolver doesn't work with UnionWork [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao updated HIVE-8920: --- Summary: SplitSparkWorkResolver doesn't work with UnionWork [Spark Branch] (was: SplitSparkWorkResolver doesn't work with UnionWork) SplitSparkWorkResolver doesn't work with UnionWork [Spark Branch] - Key: HIVE-8920 URL: https://issues.apache.org/jira/browse/HIVE-8920 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao The following query will not work: {code} from (select * from table0 union all select * from table1) s insert overwrite table table3 select s.x, count(1) group by s.x insert overwrite table table4 select s.y, count(1) group by s.y; {code} Currently, the plan for this query, before SplitSparkWorkResolver, looks like below: {noformat} M1M2 \ / \ U3 R5 | R4 {noformat} In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the {{childWork}} is a ReduceWork, but for this case, you can see that for M2 the childWork could be UnionWork U3. Thus, the code will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218809#comment-14218809 ] Prasanth J commented on HIVE-: -- Committed to trunk Mapjoin with LateralViewJoin generates wrong plan in Tez Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0 Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.15.0 Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.3.patch, HIVE-.4.patch Queries like these {code} with sub1 as (select aid, avalue from expod1 lateral view explode(av) avs as avalue ), sub2 as (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue) select sub1.aid, sub1.avalue, sub2.bvalue from sub1,sub2 where sub1.aid=sub2.bid; {code} generates twice the number of rows in Tez when compared to MR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)