[jira] [Updated] (HIVE-6960) Set Hive pom to use Hadoop-2.4
[ https://issues.apache.org/jira/browse/HIVE-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6960: --- Status: Open (was: Patch Available) TestJdbcwithKDC looks genuine failure. [~jdere] can you take a look? Set Hive pom to use Hadoop-2.4 -- Key: HIVE-6960 URL: https://issues.apache.org/jira/browse/HIVE-6960 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.14.0 Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-6960.1.patch A number of the hadoop-2 unit test failures are due to HADOOP-10425, fixed in Hadoop 2.4. Perhaps we should move onto that version. - org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nullgroup3 - org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join4 - org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_select_dummy_source - org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_symlink_text_input_format - org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_current_database - org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketizedhiveinputformat -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead
[ https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982207#comment-13982207 ] Lefty Leverenz commented on HIVE-6430: -- This adds *hive.mapjoin.optimized.hashtable* and *hive.mapjoin.optimized.hashtable.wbsize* to HiveConf.java. They both need descriptions -- I assume wb means write buffer. The descriptions can go in HiveConf comments or a release note for now, or you can patch hive-default.xml.template and I'll add a comment on HIVE-6586 (for HIVE-6037, Synchronize HiveConf with hive-default.xml.template and support show conf). MapJoin hash table has large memory overhead Key: HIVE-6430 URL: https://issues.apache.org/jira/browse/HIVE-6430 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.08.patch, HIVE-6430.09.patch, HIVE-6430.10.patch, HIVE-6430.patch Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 for row) can take several hundred bytes, which is ridiculous. I am reducing the size of MJKey and MJRowContainer in other jiras, but in general we don't need to have java hash table there. We can either use primitive-friendly hashtable like the one from HPPC (Apache-licenced), or some variation, to map primitive keys to single row storage structure without an object per row (similar to vectorization). -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: What is the minimal required version of Hadoop for Hive 0.13.0?
This needs to be documented somewhere. -- Lefty On Wed, Apr 23, 2014 at 11:45 PM, Edward Capriolo edlinuxg...@gmail.comwrote: LOL. I thought I was the last 0.20.2 hold out. On Wed, Apr 23, 2014 at 4:01 PM, Thejas Nair the...@hortonworks.com wrote: There is a jira for the hadoop 1.0.x compatibility issue. https://issues.apache.org/jira/browse/HIVE-6962 I have suggested a possible workaround there. There is no patch for it yet. I am planning to propose a 0.13.1 release, primarily for the issues around use of Oracle as metastore database, and one in SQL standard authorization (HIVE-6945, HIVE-6919). We can also include patch to get hive working with older versions of hadoop, specially 1.x versions. Hive build and tests are currently run against hadoop 1.2.1 and 2.3.0 versions (as you can see in pom.xml). But I don't believe there was a conscious decision to have 1.2.1 as the *minimum* required version. On Wed, Apr 23, 2014 at 7:20 AM, David Gayou david.ga...@kxen.com wrote: I actually have pretty the same issue with Hadoop 1.1.2 There is a jira issue opened here : https://issues.apache.org/jira/browse/HIVE-6962 with a link to the issue that created our problem. A quick search in release notes seem's to indicate that the unset method appeared in the Haddop 1.2.1 Is it now the minimal required version ? If not, will there be a Hive 0.13.1 for older hadoop? Regards, David On Wed, Apr 23, 2014 at 4:00 PM, Dmitry Vasilenko dvasi...@gmail.com wrote: Hive 0.12.0 (and previous versions) worked with Hadoop 0.20.x, 0.23.x.y, 1.x.y, 2.x.y. Hive 0.13.0 did not work with Hadoop 0.20.x out of the box and to make it work I had to patch Hadoop installation and add Path(URL) constructor and Configuration.unset() method. After that the basic functionality seems to be working. Both issues originate from the org.apache.hadoop.hive.ql.exec.Utilities I know that Hadoop 0.20.x is old but some of us still have to work with that version. So does anyone know what is the minimal required version of Hadoop for Hive 0.13.0? Thanks Dmitry Vasilenko -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Updated] (HIVE-6979) Hadoop-2 test failures related to quick stats not being populated correctly
[ https://issues.apache.org/jira/browse/HIVE-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-6979: - Attachment: HIVE-6979.2.patch Addressed [~ashutoshc]'s review comments. [~ashutoshc] I fixed the recent test failures. Can you please take a look at the changes in RB? Hadoop-2 test failures related to quick stats not being populated correctly --- Key: HIVE-6979 URL: https://issues.apache.org/jira/browse/HIVE-6979 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-6979.1.patch, HIVE-6979.2.patch The test failures that are currently reported by Hive QA running on hadoop-2 (https://issues.apache.org/jira/browse/HIVE-6968?focusedCommentId=13980570page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13980570) are related to difference in the way hadoop FileSystem.globStatus() api behaves. For a directory structure like below {code} dir1/file1 dir1/file2 {code} Two level of path pattern like dir1/*/* will return both files in hadoop 1.x but will return empty result in hadoop 2.x (in fact it will say no such file or directory and return empty file status array). Hadoop 2.x seems to be compliant to linux behaviour (ls dir1/*/*) but hadoop 1.x is not. As a result of this, the fast statistics (NUM_FILES and TOTAL_SIZE) are populated wrongly causing diffs in qfile tests for hadoop-1 and hadoop-2. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6979) Hadoop-2 test failures related to quick stats not being populated correctly
[ https://issues.apache.org/jira/browse/HIVE-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-6979: - Status: Patch Available (was: Open) Hadoop-2 test failures related to quick stats not being populated correctly --- Key: HIVE-6979 URL: https://issues.apache.org/jira/browse/HIVE-6979 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-6979.1.patch, HIVE-6979.2.patch The test failures that are currently reported by Hive QA running on hadoop-2 (https://issues.apache.org/jira/browse/HIVE-6968?focusedCommentId=13980570page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13980570) are related to difference in the way hadoop FileSystem.globStatus() api behaves. For a directory structure like below {code} dir1/file1 dir1/file2 {code} Two level of path pattern like dir1/*/* will return both files in hadoop 1.x but will return empty result in hadoop 2.x (in fact it will say no such file or directory and return empty file status array). Hadoop 2.x seems to be compliant to linux behaviour (ls dir1/*/*) but hadoop 1.x is not. As a result of this, the fast statistics (NUM_FILES and TOTAL_SIZE) are populated wrongly causing diffs in qfile tests for hadoop-1 and hadoop-2. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6979) Hadoop-2 test failures related to quick stats not being populated correctly
[ https://issues.apache.org/jira/browse/HIVE-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982305#comment-13982305 ] Hive QA commented on HIVE-6979: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12642117/HIVE-6979.2.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5419 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_25 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dynamic_partitions_with_whitelist {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/59/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/59/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12642117 Hadoop-2 test failures related to quick stats not being populated correctly --- Key: HIVE-6979 URL: https://issues.apache.org/jira/browse/HIVE-6979 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-6979.1.patch, HIVE-6979.2.patch The test failures that are currently reported by Hive QA running on hadoop-2 (https://issues.apache.org/jira/browse/HIVE-6968?focusedCommentId=13980570page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13980570) are related to difference in the way hadoop FileSystem.globStatus() api behaves. For a directory structure like below {code} dir1/file1 dir1/file2 {code} Two level of path pattern like dir1/*/* will return both files in hadoop 1.x but will return empty result in hadoop 2.x (in fact it will say no such file or directory and return empty file status array). Hadoop 2.x seems to be compliant to linux behaviour (ls dir1/*/*) but hadoop 1.x is not. As a result of this, the fast statistics (NUM_FILES and TOTAL_SIZE) are populated wrongly causing diffs in qfile tests for hadoop-1 and hadoop-2. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6901) Explain plan doesn't show operator tree for the fetch operator
[ https://issues.apache.org/jira/browse/HIVE-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-6901: -- Attachment: HIVE-6901.5.patch Explain plan doesn't show operator tree for the fetch operator -- Key: HIVE-6901 URL: https://issues.apache.org/jira/browse/HIVE-6901 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Priority: Minor Attachments: HIVE-6901.1.patch, HIVE-6901.2.patch, HIVE-6901.3.patch, HIVE-6901.4.patch, HIVE-6901.5.patch, HIVE-6901.patch Explaining a simple select query that involves a MR phase doesn't show processor tree for the fetch operator. {code} hive explain select d from test; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: ... Stage: Stage-0 Fetch Operator limit: -1 {code} It would be nice if the operator tree is shown even if there is only one node. Please note that in local execution, the operator tree is complete: {code} hive explain select * from test; OK STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: test Statistics: Num rows: 8 Data size: 34 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: d (type: int) outputColumnNames: _col0 Statistics: Num rows: 8 Data size: 34 Basic stats: COMPLETE Column stats: NONE ListSink {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6901) Explain plan doesn't show operator tree for the fetch operator
[ https://issues.apache.org/jira/browse/HIVE-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982400#comment-13982400 ] Hive QA commented on HIVE-6901: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12642131/HIVE-6901.5.patch {color:red}ERROR:{color} -1 due to 55 failed/errored test(s), 5484 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_numeric org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nullformatCTAS org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nullgroup3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_createas1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_select_dummy_source org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_create_table_alter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_create_table_serde org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_tblproperties org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_symlink_text_input_format org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_truncate_column_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_current_database org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_19 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_21 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_22 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_24 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_unset_table_view_property org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_count org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_limit_pushdown org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union5 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union7 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union9 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketizedhiveinputformat org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dynamic_partitions_with_whitelist org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_partialscan_autogether org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_unset_table_property {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/60/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/60/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing
[jira] [Updated] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-6785: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk! Thank you for the contribution! Thank you Szehon for the review! query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen Assignee: Tongjie Chen Fix For: 0.14.0 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt, HIVE-6785.3.patch When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe, AND if this table has string column[s], hive generates confusing error message: Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector This is confusing because timestamp is mentioned even if it is not been used by the table. The reason is when there is SerDe difference between table and partition, hive tries to convert objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector (newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector, which ObjectInspectorConverters expect for string category objector inspector. There is no break statement in STRING case statement, hence the following TIMESTAMP case statement is executed, generating confusing error message. see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector is package scope instead of public or protected, we would need to move ParquetStringInspector to the same package with JavaStringObjectInspector. Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data, since the corresponding setStructFieldData and create methods return a list. This is also needed when table SerDe is ParquetHiveSerDe, and partition SerDe is something else. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-6785: --- Assignee: Tongjie Chen query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen Assignee: Tongjie Chen Fix For: 0.14.0 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt, HIVE-6785.3.patch When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe, AND if this table has string column[s], hive generates confusing error message: Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector This is confusing because timestamp is mentioned even if it is not been used by the table. The reason is when there is SerDe difference between table and partition, hive tries to convert objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector (newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector, which ObjectInspectorConverters expect for string category objector inspector. There is no break statement in STRING case statement, hence the following TIMESTAMP case statement is executed, generating confusing error message. see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector is package scope instead of public or protected, we would need to move ParquetStringInspector to the same package with JavaStringObjectInspector. Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data, since the corresponding setStructFieldData and create methods return a list. This is also needed when table SerDe is ParquetHiveSerDe, and partition SerDe is something else. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6979) Hadoop-2 test failures related to quick stats not being populated correctly
[ https://issues.apache.org/jira/browse/HIVE-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982429#comment-13982429 ] Ashutosh Chauhan commented on HIVE-6979: +1 Hadoop-2 test failures related to quick stats not being populated correctly --- Key: HIVE-6979 URL: https://issues.apache.org/jira/browse/HIVE-6979 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-6979.1.patch, HIVE-6979.2.patch The test failures that are currently reported by Hive QA running on hadoop-2 (https://issues.apache.org/jira/browse/HIVE-6968?focusedCommentId=13980570page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13980570) are related to difference in the way hadoop FileSystem.globStatus() api behaves. For a directory structure like below {code} dir1/file1 dir1/file2 {code} Two level of path pattern like dir1/*/* will return both files in hadoop 1.x but will return empty result in hadoop 2.x (in fact it will say no such file or directory and return empty file status array). Hadoop 2.x seems to be compliant to linux behaviour (ls dir1/*/*) but hadoop 1.x is not. As a result of this, the fast statistics (NUM_FILES and TOTAL_SIZE) are populated wrongly causing diffs in qfile tests for hadoop-1 and hadoop-2. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HIVE-6979) Hadoop-2 test failures related to quick stats not being populated correctly
[ https://issues.apache.org/jira/browse/HIVE-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982429#comment-13982429 ] Ashutosh Chauhan edited comment on HIVE-6979 at 4/27/14 7:10 PM: - union_remove_25 wasn't expected to fail. Prashant, can you take a look? was (Author: ashutoshc): +1 Hadoop-2 test failures related to quick stats not being populated correctly --- Key: HIVE-6979 URL: https://issues.apache.org/jira/browse/HIVE-6979 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-6979.1.patch, HIVE-6979.2.patch The test failures that are currently reported by Hive QA running on hadoop-2 (https://issues.apache.org/jira/browse/HIVE-6968?focusedCommentId=13980570page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13980570) are related to difference in the way hadoop FileSystem.globStatus() api behaves. For a directory structure like below {code} dir1/file1 dir1/file2 {code} Two level of path pattern like dir1/*/* will return both files in hadoop 1.x but will return empty result in hadoop 2.x (in fact it will say no such file or directory and return empty file status array). Hadoop 2.x seems to be compliant to linux behaviour (ls dir1/*/*) but hadoop 1.x is not. As a result of this, the fast statistics (NUM_FILES and TOTAL_SIZE) are populated wrongly causing diffs in qfile tests for hadoop-1 and hadoop-2. -- This message was sent by Atlassian JIRA (v6.2#6252)
Delete old website
Hi, The old website content is located here: https://svn.apache.org/repos/asf/hive/site/ while the new CMS based website is here https://svn.apache.org/repos/asf/hive/cms/. Instructions on how to edit the website are here: https://cwiki.apache.org/confluence/display/Hive/How+to+edit+the+website I think we should delete the old website content since it has confused folks. Brock
Re: Delete old website
+1 On Sun, Apr 27, 2014 at 12:33 PM, Brock Noland br...@cloudera.com wrote: Hi, The old website content is located here: https://svn.apache.org/repos/asf/hive/site/ while the new CMS based website is here https://svn.apache.org/repos/asf/hive/cms/. Instructions on how to edit the website are here: https://cwiki.apache.org/confluence/display/Hive/How+to+edit+the+website I think we should delete the old website content since it has confused folks. Brock
Re: Delete old website
+1 I was about to ask where to find the new menu, then found it under templates here: https://svn.apache.org/repos/asf/hive/cms/trunk/templates/sidenav.mdtext. -- Lefty On Sun, Apr 27, 2014 at 3:41 PM, Ashutosh Chauhan hashut...@apache.orgwrote: +1 On Sun, Apr 27, 2014 at 12:33 PM, Brock Noland br...@cloudera.com wrote: Hi, The old website content is located here: https://svn.apache.org/repos/asf/hive/site/ while the new CMS based website is here https://svn.apache.org/repos/asf/hive/cms/. Instructions on how to edit the website are here: https://cwiki.apache.org/confluence/display/Hive/How+to+edit+the+website I think we should delete the old website content since it has confused folks. Brock
[jira] [Commented] (HIVE-2853) Add pre event listeners to metastore
[ https://issues.apache.org/jira/browse/HIVE-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982480#comment-13982480 ] Lefty Leverenz commented on HIVE-2853: -- For the record: This added *hive.metastore.pre.event.listeners* to HiveConf.java. Add pre event listeners to metastore Key: HIVE-2853 URL: https://issues.apache.org/jira/browse/HIVE-2853 Project: Hive Issue Type: Improvement Reporter: Kevin Wilfong Assignee: Kevin Wilfong Fix For: 0.9.0 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2853.D2175.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2853.D2175.2.patch Currently there are event listeners in the metastore which run after the completion of a method. It would be useful to have similar hooks which run before the metastore method is executed. These can be used to make validating names, locations, etc. customizable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6901) Explain plan doesn't show operator tree for the fetch operator
[ https://issues.apache.org/jira/browse/HIVE-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-6901: -- Attachment: HIVE-6901.6.patch Explain plan doesn't show operator tree for the fetch operator -- Key: HIVE-6901 URL: https://issues.apache.org/jira/browse/HIVE-6901 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Priority: Minor Attachments: HIVE-6901.1.patch, HIVE-6901.2.patch, HIVE-6901.3.patch, HIVE-6901.4.patch, HIVE-6901.5.patch, HIVE-6901.6.patch, HIVE-6901.patch Explaining a simple select query that involves a MR phase doesn't show processor tree for the fetch operator. {code} hive explain select d from test; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: ... Stage: Stage-0 Fetch Operator limit: -1 {code} It would be nice if the operator tree is shown even if there is only one node. Please note that in local execution, the operator tree is complete: {code} hive explain select * from test; OK STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: test Statistics: Num rows: 8 Data size: 34 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: d (type: int) outputColumnNames: _col0 Statistics: Num rows: 8 Data size: 34 Basic stats: COMPLETE Column stats: NONE ListSink {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6980) Drop table by using direct sql
Selina Zhang created HIVE-6980: -- Summary: Drop table by using direct sql Key: HIVE-6980 URL: https://issues.apache.org/jira/browse/HIVE-6980 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.12.0 Reporter: Selina Zhang Assignee: Selina Zhang Dropping table which has lots of partitions is slow. Even after applying the patch of HIVE-6265, the drop table still takes hours (100K+ partitions). The fixes come with two parts: 1. use directSQL to query the partitions protect mode; the current implementation needs to transfer the Partition object to client and check the protect mode for each partition. I'd like to move this part of logic to metastore. The check will be done by direct sql (if direct sql is disabled, execute the same logic in the ObjectStore); 2. use directSQL to drop partitions for table; there maybe two solutions here: 1. add DELETE CASCADE in the schema. In this way we only need to delete entries from partitions table use direct sql. May need to change datanucleus.deletionPolicy = DataNucleus. 2. clean up the dependent tables by issue DELETE statement. This also needs to turn on datanucleus.query.sql.allowAll Both of above solutions should be able to fix the problem. The DELETE CASCADE has to change schemas and prepare upgrade scripts. The second solutions added maintenance cost if new tables added in the future releases. Please advice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6901) Explain plan doesn't show operator tree for the fetch operator
[ https://issues.apache.org/jira/browse/HIVE-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982734#comment-13982734 ] Hive QA commented on HIVE-6901: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12642168/HIVE-6901.6.patch {color:red}ERROR:{color} -1 due to 46 failed/errored test(s), 5419 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_numeric org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nullformatCTAS org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nullgroup3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_createas1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_select_dummy_source org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_create_table_alter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_create_table_serde org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_tblproperties org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_symlink_text_input_format org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_truncate_column_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_current_database org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_19 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_21 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_22 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_24 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_unset_table_view_property org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketizedhiveinputformat org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dynamic_partitions_with_whitelist org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_partialscan_autogether org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_unset_table_property org.apache.hadoop.hive.metastore.TestMetaStoreAuthorization.testMetaStoreAuthorization org.apache.hadoop.hive.metastore.TestRetryingHMSHandler.testRetryingHMSHandler org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/61/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/61/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 46 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12642168 Explain plan doesn't show operator tree for the fetch operator -- Key: HIVE-6901 URL: https://issues.apache.org/jira/browse/HIVE-6901 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Xuefu Zhang
[jira] [Updated] (HIVE-6979) Hadoop-2 test failures related to quick stats not being populated correctly
[ https://issues.apache.org/jira/browse/HIVE-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-6979: - Attachment: HIVE-6979.3.patch union_remove_25 was failing as it wasn't able to find partition. So increased the limit count so that the specific partition value is always present. Updated stats_partscan_1_23 test as well. Other tests seems to pass in my local setup (Mac OS X). Hadoop-2 test failures related to quick stats not being populated correctly --- Key: HIVE-6979 URL: https://issues.apache.org/jira/browse/HIVE-6979 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-6979.1.patch, HIVE-6979.2.patch, HIVE-6979.3.patch The test failures that are currently reported by Hive QA running on hadoop-2 (https://issues.apache.org/jira/browse/HIVE-6968?focusedCommentId=13980570page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13980570) are related to difference in the way hadoop FileSystem.globStatus() api behaves. For a directory structure like below {code} dir1/file1 dir1/file2 {code} Two level of path pattern like dir1/*/* will return both files in hadoop 1.x but will return empty result in hadoop 2.x (in fact it will say no such file or directory and return empty file status array). Hadoop 2.x seems to be compliant to linux behaviour (ls dir1/*/*) but hadoop 1.x is not. As a result of this, the fast statistics (NUM_FILES and TOTAL_SIZE) are populated wrongly causing diffs in qfile tests for hadoop-1 and hadoop-2. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6469) skipTrash option in hive command line
[ https://issues.apache.org/jira/browse/HIVE-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-6469: -- Fix Version/s: (was: 0.12.1) skipTrash option in hive command line - Key: HIVE-6469 URL: https://issues.apache.org/jira/browse/HIVE-6469 Project: Hive Issue Type: New Feature Components: CLI Affects Versions: 0.12.0 Reporter: Jayesh Assignee: Jayesh Fix For: 0.14.0 Attachments: HIVE-6469.1.patch, HIVE-6469.2.patch, HIVE-6469.3.patch, HIVE-6469.patch hive drop table command deletes the data from HDFS warehouse and puts it into Trash. Currently there is no way to provide flag to tell warehouse to skip trash while deleting table data. This ticket is to add skipTrash feature in hive command-line, that looks as following. hive -e drop table skipTrash testTable This would be good feature to add, so that user can specify when not to put data into trash directory and thus not to fill hdfs space instead of relying on trash interval and policy configuration to take care of disk filling issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6469) skipTrash option in hive command line
[ https://issues.apache.org/jira/browse/HIVE-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-6469: -- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks Jayesh for the contribution. skipTrash option in hive command line - Key: HIVE-6469 URL: https://issues.apache.org/jira/browse/HIVE-6469 Project: Hive Issue Type: New Feature Components: CLI Affects Versions: 0.12.0 Reporter: Jayesh Assignee: Jayesh Fix For: 0.12.1, 0.14.0 Attachments: HIVE-6469.1.patch, HIVE-6469.2.patch, HIVE-6469.3.patch, HIVE-6469.patch hive drop table command deletes the data from HDFS warehouse and puts it into Trash. Currently there is no way to provide flag to tell warehouse to skip trash while deleting table data. This ticket is to add skipTrash feature in hive command-line, that looks as following. hive -e drop table skipTrash testTable This would be good feature to add, so that user can specify when not to put data into trash directory and thus not to fill hdfs space instead of relying on trash interval and policy configuration to take care of disk filling issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Apache Hive 0.13.1
I would like to have https://issues.apache.org/jira/browse/HIVE-6953 also as part 0.13.1. I still could not figure out the reason for test failures, when all tests are running, the CompactorTests fail. Thanks Amareshwari On Sun, Apr 27, 2014 at 2:49 AM, Sushanth Sowmyan khorg...@gmail.comwrote: Added. If others have difficulty editing( I can't figure out how to change editing privileges, but it seems to indicate that others can edit) the page, I'll accept replies to this thread as well and can add it in. On Apr 25, 2014 6:25 PM, Sergey Shelukhin ser...@hortonworks.com wrote: I don't have access to edit this page (or cannot figure out the UI). Username sershe. Can you add HIVE-6961 : Drop partitions treats partition columns as strings (area - metastore) On Fri, Apr 25, 2014 at 4:20 PM, Sushanth Sowmyan khorg...@gmail.com wrote: I've created the following wiki link : https://cwiki.apache.org/confluence/display/Hive/Hive+0.13.1+Release+tracking People should be able to request additional jiras by adding it to the list. I think it might make sense to halt addition of requests to the list 3 days before the RC is cut, so as to prevent an endless-tail scenario, unless the bug in question is a breaking severe issue, where, yes, after discussion, we can vote to add it to the list. That also gives us time to run a full suite of tests on a stable build before we cut the RC. I propose that the first RC (RC0) be built on Monday May 5th at 6pm PDT, and the jira list on the wiki be closed to open/easy additions at 6pm PDT on Friday May 2nd. On Fri, Apr 25, 2014 at 2:40 PM, Gunther Hagleitner ghagleit...@hortonworks.com wrote: Sorry - HIVE-6824 isn't needed. Just the other 3. My bad. Thanks, Gunther. On Fri, Apr 25, 2014 at 2:10 PM, Gunther Hagleitner ghagleit...@hortonworks.com wrote: I'd like to request to include these Tez fixes: HIVE-6824, HIVE-6826, HIVE-6828, HIVE-6898 Thanks, Gunther. On Fri, Apr 25, 2014 at 11:59 AM, Sushanth Sowmyan khorg...@gmail.com wrote: True, I was counting two weeks from today, but 0.13 has already been out for a week. I'm amenable to having an RC1 out on May 5th. If any further issues appear that block, then we can deal with them in an RC2/etc modification to that. On Fri, Apr 25, 2014 at 11:45 AM, Thejas Nair the...@hortonworks.com wrote: On Fri, Apr 25, 2014 at 11:33 AM, Sushanth Sowmyan khorg...@gmail.com wrote: I think it's important to get a bugfix/stabilization release reasonably quickly, but it's also important to give people a little time to try out 0.13, discover/report bugs and fix them. So I think about two weeks is a good point? And instead of releasing an RC on a friday, I'm thinking of pushing it out to Monday - does 12th May sound good to everyone? I think we can aim for an earlier date. Most of these issues seem to be already committed to trunk or have patches available. So the remaining ones also might get committed to trunk by early next week. How about shooting for May 5th (Monday) ? By then 0.13 would also have been out for 2 weeks. If we have any new critical bug reported that needs a fix, we can hold off on the RC for few days. What do you think ? Thanks, Thejas -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is
[jira] [Updated] (HIVE-6979) Hadoop-2 test failures related to quick stats not being populated correctly
[ https://issues.apache.org/jira/browse/HIVE-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6979: --- Status: Patch Available (was: Open) Hadoop-2 test failures related to quick stats not being populated correctly --- Key: HIVE-6979 URL: https://issues.apache.org/jira/browse/HIVE-6979 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-6979.1.patch, HIVE-6979.2.patch, HIVE-6979.3.patch The test failures that are currently reported by Hive QA running on hadoop-2 (https://issues.apache.org/jira/browse/HIVE-6968?focusedCommentId=13980570page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13980570) are related to difference in the way hadoop FileSystem.globStatus() api behaves. For a directory structure like below {code} dir1/file1 dir1/file2 {code} Two level of path pattern like dir1/*/* will return both files in hadoop 1.x but will return empty result in hadoop 2.x (in fact it will say no such file or directory and return empty file status array). Hadoop 2.x seems to be compliant to linux behaviour (ls dir1/*/*) but hadoop 1.x is not. As a result of this, the fast statistics (NUM_FILES and TOTAL_SIZE) are populated wrongly causing diffs in qfile tests for hadoop-1 and hadoop-2. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6979) Hadoop-2 test failures related to quick stats not being populated correctly
[ https://issues.apache.org/jira/browse/HIVE-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6979: --- Status: Open (was: Patch Available) Hadoop-2 test failures related to quick stats not being populated correctly --- Key: HIVE-6979 URL: https://issues.apache.org/jira/browse/HIVE-6979 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-6979.1.patch, HIVE-6979.2.patch, HIVE-6979.3.patch The test failures that are currently reported by Hive QA running on hadoop-2 (https://issues.apache.org/jira/browse/HIVE-6968?focusedCommentId=13980570page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13980570) are related to difference in the way hadoop FileSystem.globStatus() api behaves. For a directory structure like below {code} dir1/file1 dir1/file2 {code} Two level of path pattern like dir1/*/* will return both files in hadoop 1.x but will return empty result in hadoop 2.x (in fact it will say no such file or directory and return empty file status array). Hadoop 2.x seems to be compliant to linux behaviour (ls dir1/*/*) but hadoop 1.x is not. As a result of this, the fast statistics (NUM_FILES and TOTAL_SIZE) are populated wrongly causing diffs in qfile tests for hadoop-1 and hadoop-2. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6979) Hadoop-2 test failures related to quick stats not being populated correctly
[ https://issues.apache.org/jira/browse/HIVE-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982761#comment-13982761 ] Ashutosh Chauhan commented on HIVE-6979: +1 Hadoop-2 test failures related to quick stats not being populated correctly --- Key: HIVE-6979 URL: https://issues.apache.org/jira/browse/HIVE-6979 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-6979.1.patch, HIVE-6979.2.patch, HIVE-6979.3.patch The test failures that are currently reported by Hive QA running on hadoop-2 (https://issues.apache.org/jira/browse/HIVE-6968?focusedCommentId=13980570page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13980570) are related to difference in the way hadoop FileSystem.globStatus() api behaves. For a directory structure like below {code} dir1/file1 dir1/file2 {code} Two level of path pattern like dir1/*/* will return both files in hadoop 1.x but will return empty result in hadoop 2.x (in fact it will say no such file or directory and return empty file status array). Hadoop 2.x seems to be compliant to linux behaviour (ls dir1/*/*) but hadoop 1.x is not. As a result of this, the fast statistics (NUM_FILES and TOTAL_SIZE) are populated wrongly causing diffs in qfile tests for hadoop-1 and hadoop-2. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6980) Drop table by using direct sql
[ https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982763#comment-13982763 ] Ashutosh Chauhan commented on HIVE-6980: Can you try with patch of HIVE-6809 ? That should help. Drop table by using direct sql -- Key: HIVE-6980 URL: https://issues.apache.org/jira/browse/HIVE-6980 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.12.0 Reporter: Selina Zhang Assignee: Selina Zhang Dropping table which has lots of partitions is slow. Even after applying the patch of HIVE-6265, the drop table still takes hours (100K+ partitions). The fixes come with two parts: 1. use directSQL to query the partitions protect mode; the current implementation needs to transfer the Partition object to client and check the protect mode for each partition. I'd like to move this part of logic to metastore. The check will be done by direct sql (if direct sql is disabled, execute the same logic in the ObjectStore); 2. use directSQL to drop partitions for table; there maybe two solutions here: 1. add DELETE CASCADE in the schema. In this way we only need to delete entries from partitions table use direct sql. May need to change datanucleus.deletionPolicy = DataNucleus. 2. clean up the dependent tables by issue DELETE statement. This also needs to turn on datanucleus.query.sql.allowAll Both of above solutions should be able to fix the problem. The DELETE CASCADE has to change schemas and prepare upgrade scripts. The second solutions added maintenance cost if new tables added in the future releases. Please advice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6469) skipTrash option in hive command line
[ https://issues.apache.org/jira/browse/HIVE-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982769#comment-13982769 ] Lefty Leverenz commented on HIVE-6469: -- What user doc does this need, besides adding *hive.warehouse.data.skipTrash* to the Configuration Properties wikidoc? If it's just an administrative config, it could be mentioned in the Admin Manual's section on Configuration Variables (link below). But if it's also for users, it should be mentioned in the DDL doc's Drop Table section, which includes this sentence and further discussion: The data is actually moved to the .Trash/Current directory if Trash is configured. * [Admin Manual -- Configuration Variables |https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-ConfigurationVariables] * [Language Manual -- DDL -- Drop Table |https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DropTable] skipTrash option in hive command line - Key: HIVE-6469 URL: https://issues.apache.org/jira/browse/HIVE-6469 Project: Hive Issue Type: New Feature Components: CLI Affects Versions: 0.12.0 Reporter: Jayesh Assignee: Jayesh Fix For: 0.14.0 Attachments: HIVE-6469.1.patch, HIVE-6469.2.patch, HIVE-6469.3.patch, HIVE-6469.patch hive drop table command deletes the data from HDFS warehouse and puts it into Trash. Currently there is no way to provide flag to tell warehouse to skip trash while deleting table data. This ticket is to add skipTrash feature in hive command-line, that looks as following. hive -e drop table skipTrash testTable This would be good feature to add, so that user can specify when not to put data into trash directory and thus not to fill hdfs space instead of relying on trash interval and policy configuration to take care of disk filling issue. -- This message was sent by Atlassian JIRA (v6.2#6252)