[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries
[ https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089861#comment-14089861 ] Lefty Leverenz commented on HIVE-5091: -- Doc note: This changed the default value of configuration parameter *hive.exec.orc.write.format* from 0.11 to null (before it was first released in 0.12). *hive.exec.orc.write.format* was created in HIVE-4123, and it's documented in the wiki here: * [Configuration Properties -- hive.exec.orc.write.format | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.orc.write.format] ORC files should have an option to pad stripes to the HDFS block boundaries --- Key: HIVE-5091 URL: https://issues.apache.org/jira/browse/HIVE-5091 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.12.0 Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch, HIVE-5091.D12249.3.patch With ORC stripes being large, if a stripe straddles an HDFS block, the locality of read is suboptimal. It would be good to add padding to ensure that stripes don't straddle HDFS blocks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries
[ https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754651#comment-13754651 ] Hudson commented on HIVE-5091: -- ABORTED: Integrated in Hive-trunk-hadoop2 #389 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/389/]) HIVE-5091: ORC files should have an option to pad stripes to the HDFS block boundaries (Owen O'Malley via Gunther Hagleitner) (gunther: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518830) * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestNewIntegerEncoding.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java * /hive/trunk/ql/src/test/resources/orc-file-dump.out ORC files should have an option to pad stripes to the HDFS block boundaries --- Key: HIVE-5091 URL: https://issues.apache.org/jira/browse/HIVE-5091 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.12.0 Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch, HIVE-5091.D12249.3.patch With ORC stripes being large, if a stripe straddles an HDFS block, the locality of read is suboptimal. It would be good to add padding to ensure that stripes don't straddle HDFS blocks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries
[ https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754701#comment-13754701 ] Hudson commented on HIVE-5091: -- FAILURE: Integrated in Hive-trunk-hadoop2-ptest #77 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/77/]) HIVE-5091: ORC files should have an option to pad stripes to the HDFS block boundaries (Owen O'Malley via Gunther Hagleitner) (gunther: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518830) * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestNewIntegerEncoding.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java * /hive/trunk/ql/src/test/resources/orc-file-dump.out ORC files should have an option to pad stripes to the HDFS block boundaries --- Key: HIVE-5091 URL: https://issues.apache.org/jira/browse/HIVE-5091 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.12.0 Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch, HIVE-5091.D12249.3.patch With ORC stripes being large, if a stripe straddles an HDFS block, the locality of read is suboptimal. It would be good to add padding to ensure that stripes don't straddle HDFS blocks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries
[ https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754755#comment-13754755 ] Hudson commented on HIVE-5091: -- SUCCESS: Integrated in Hive-trunk-h0.21 #2298 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2298/]) HIVE-5091: ORC files should have an option to pad stripes to the HDFS block boundaries (Owen O'Malley via Gunther Hagleitner) (gunther: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518830) * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestNewIntegerEncoding.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java * /hive/trunk/ql/src/test/resources/orc-file-dump.out ORC files should have an option to pad stripes to the HDFS block boundaries --- Key: HIVE-5091 URL: https://issues.apache.org/jira/browse/HIVE-5091 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.12.0 Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch, HIVE-5091.D12249.3.patch With ORC stripes being large, if a stripe straddles an HDFS block, the locality of read is suboptimal. It would be good to add padding to ensure that stripes don't straddle HDFS blocks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries
[ https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754794#comment-13754794 ] Hudson commented on HIVE-5091: -- FAILURE: Integrated in Hive-trunk-hadoop1-ptest #145 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/145/]) HIVE-5091: ORC files should have an option to pad stripes to the HDFS block boundaries (Owen O'Malley via Gunther Hagleitner) (gunther: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518830) * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestNewIntegerEncoding.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java * /hive/trunk/ql/src/test/resources/orc-file-dump.out ORC files should have an option to pad stripes to the HDFS block boundaries --- Key: HIVE-5091 URL: https://issues.apache.org/jira/browse/HIVE-5091 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.12.0 Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch, HIVE-5091.D12249.3.patch With ORC stripes being large, if a stripe straddles an HDFS block, the locality of read is suboptimal. It would be good to add padding to ensure that stripes don't straddle HDFS blocks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries
[ https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753379#comment-13753379 ] Hive QA commented on HIVE-5091: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12600486/HIVE-5091.D12249.3.patch {color:green}SUCCESS:{color} +1 2902 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/555/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/555/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ORC files should have an option to pad stripes to the HDFS block boundaries --- Key: HIVE-5091 URL: https://issues.apache.org/jira/browse/HIVE-5091 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch, HIVE-5091.D12249.3.patch With ORC stripes being large, if a stripe straddles an HDFS block, the locality of read is suboptimal. It would be good to add padding to ensure that stripes don't straddle HDFS blocks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries
[ https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754073#comment-13754073 ] Gunther Hagleitner commented on HIVE-5091: -- Committed to trunk. Thanks Owen! ORC files should have an option to pad stripes to the HDFS block boundaries --- Key: HIVE-5091 URL: https://issues.apache.org/jira/browse/HIVE-5091 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch, HIVE-5091.D12249.3.patch With ORC stripes being large, if a stripe straddles an HDFS block, the locality of read is suboptimal. It would be good to add padding to ensure that stripes don't straddle HDFS blocks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries
[ https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747154#comment-13747154 ] Phabricator commented on HIVE-5091: --- hagleitn has commented on the revision HIVE-5091 [jira] ORC files should have an option to pad stripes to the HDFS block boundaries. LGTM. I like the new WriterOptions. Nice and clean. +1 REVISION DETAIL https://reviews.facebook.net/D12249 To: JIRA, omalley Cc: hagleitn ORC files should have an option to pad stripes to the HDFS block boundaries --- Key: HIVE-5091 URL: https://issues.apache.org/jira/browse/HIVE-5091 Project: Hive Issue Type: Bug Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-5091.D12249.1.patch With ORC stripes being large, if a stripe straddles an HDFS block, the locality of read is suboptimal. It would be good to add padding to ensure that stripes don't straddle HDFS blocks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries
[ https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747165#comment-13747165 ] Gunther Hagleitner commented on HIVE-5091: -- Looked at the failing tests. The problem is the *NOT*USED*, that will be passed as the desired version which leads to an Exception. [~owen.omalley]: I think you want to change that to null. Other than that looks good. ORC files should have an option to pad stripes to the HDFS block boundaries --- Key: HIVE-5091 URL: https://issues.apache.org/jira/browse/HIVE-5091 Project: Hive Issue Type: Bug Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-5091.D12249.1.patch With ORC stripes being large, if a stripe straddles an HDFS block, the locality of read is suboptimal. It would be good to add padding to ensure that stripes don't straddle HDFS blocks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries
[ https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740776#comment-13740776 ] Hive QA commented on HIVE-5091: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12598033/HIVE-5091.D12249.1.patch {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 2859 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_diff_part_cols org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_createas1 org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDump org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_empty_files org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_date_serde org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_ends_with_nulls org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_empty_strings org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_create org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udtf_not_supported2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_dictionary_threshold {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/444/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/444/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ORC files should have an option to pad stripes to the HDFS block boundaries --- Key: HIVE-5091 URL: https://issues.apache.org/jira/browse/HIVE-5091 Project: Hive Issue Type: Bug Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-5091.D12249.1.patch With ORC stripes being large, if a stripe straddles an HDFS block, the locality of read is suboptimal. It would be good to add padding to ensure that stripes don't straddle HDFS blocks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries
[ https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740052#comment-13740052 ] Owen O'Malley commented on HIVE-5091: - This patch: * Adds a new table property orc.block.padding, which defaults to true. * For stripes smaller than a block, if they would straddle the block boundary, zeros are written to get to the start of the next block. * The max block size is set to 1.5GB since 2GB - 1 created issues with blocksizes needing to be divisible by the checksum length (512). * Cleans up the interface to the OrcFile.createWriter so that the user can set parameters by name. * Cleans up the ability to write the 0.11 version of ORC files that was added in HIVE-4123. Ensures that the direct string encoding isn't used for 0.11 ORC files. * Updated most of the tests to use the new createWriter API. ORC files should have an option to pad stripes to the HDFS block boundaries --- Key: HIVE-5091 URL: https://issues.apache.org/jira/browse/HIVE-5091 Project: Hive Issue Type: Bug Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-5091.D12249.1.patch With ORC stripes being large, if a stripe straddles an HDFS block, the locality of read is suboptimal. It would be good to add padding to ensure that stripes don't straddle HDFS blocks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira