[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries

2014-08-07 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089861#comment-14089861
 ] 

Lefty Leverenz commented on HIVE-5091:
--

Doc note:  This changed the default value of configuration parameter 
*hive.exec.orc.write.format* from 0.11 to null (before it was first released in 
0.12).

*hive.exec.orc.write.format* was created in HIVE-4123, and it's documented in 
the wiki here:

* [Configuration Properties -- hive.exec.orc.write.format | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.orc.write.format]

 ORC files should have an option to pad stripes to the HDFS block boundaries
 ---

 Key: HIVE-5091
 URL: https://issues.apache.org/jira/browse/HIVE-5091
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.12.0

 Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch, 
 HIVE-5091.D12249.3.patch


 With ORC stripes being large, if a stripe straddles an HDFS block, the 
 locality of read is suboptimal. It would be good to add padding to ensure 
 that stripes don't straddle HDFS blocks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries

2013-08-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754651#comment-13754651
 ] 

Hudson commented on HIVE-5091:
--

ABORTED: Integrated in Hive-trunk-hadoop2 #389 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/389/])
HIVE-5091: ORC files should have an option to pad stripes to the HDFS block 
boundaries (Owen O'Malley via Gunther Hagleitner) (gunther: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518830)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestNewIntegerEncoding.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java
* /hive/trunk/ql/src/test/resources/orc-file-dump.out


 ORC files should have an option to pad stripes to the HDFS block boundaries
 ---

 Key: HIVE-5091
 URL: https://issues.apache.org/jira/browse/HIVE-5091
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.12.0

 Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch, 
 HIVE-5091.D12249.3.patch


 With ORC stripes being large, if a stripe straddles an HDFS block, the 
 locality of read is suboptimal. It would be good to add padding to ensure 
 that stripes don't straddle HDFS blocks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries

2013-08-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754701#comment-13754701
 ] 

Hudson commented on HIVE-5091:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #77 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/77/])
HIVE-5091: ORC files should have an option to pad stripes to the HDFS block 
boundaries (Owen O'Malley via Gunther Hagleitner) (gunther: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518830)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestNewIntegerEncoding.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java
* /hive/trunk/ql/src/test/resources/orc-file-dump.out


 ORC files should have an option to pad stripes to the HDFS block boundaries
 ---

 Key: HIVE-5091
 URL: https://issues.apache.org/jira/browse/HIVE-5091
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.12.0

 Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch, 
 HIVE-5091.D12249.3.patch


 With ORC stripes being large, if a stripe straddles an HDFS block, the 
 locality of read is suboptimal. It would be good to add padding to ensure 
 that stripes don't straddle HDFS blocks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries

2013-08-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754755#comment-13754755
 ] 

Hudson commented on HIVE-5091:
--

SUCCESS: Integrated in Hive-trunk-h0.21 #2298 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2298/])
HIVE-5091: ORC files should have an option to pad stripes to the HDFS block 
boundaries (Owen O'Malley via Gunther Hagleitner) (gunther: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518830)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestNewIntegerEncoding.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java
* /hive/trunk/ql/src/test/resources/orc-file-dump.out


 ORC files should have an option to pad stripes to the HDFS block boundaries
 ---

 Key: HIVE-5091
 URL: https://issues.apache.org/jira/browse/HIVE-5091
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.12.0

 Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch, 
 HIVE-5091.D12249.3.patch


 With ORC stripes being large, if a stripe straddles an HDFS block, the 
 locality of read is suboptimal. It would be good to add padding to ensure 
 that stripes don't straddle HDFS blocks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries

2013-08-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754794#comment-13754794
 ] 

Hudson commented on HIVE-5091:
--

FAILURE: Integrated in Hive-trunk-hadoop1-ptest #145 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/145/])
HIVE-5091: ORC files should have an option to pad stripes to the HDFS block 
boundaries (Owen O'Malley via Gunther Hagleitner) (gunther: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518830)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestNewIntegerEncoding.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java
* /hive/trunk/ql/src/test/resources/orc-file-dump.out


 ORC files should have an option to pad stripes to the HDFS block boundaries
 ---

 Key: HIVE-5091
 URL: https://issues.apache.org/jira/browse/HIVE-5091
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.12.0

 Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch, 
 HIVE-5091.D12249.3.patch


 With ORC stripes being large, if a stripe straddles an HDFS block, the 
 locality of read is suboptimal. It would be good to add padding to ensure 
 that stripes don't straddle HDFS blocks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries

2013-08-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753379#comment-13753379
 ] 

Hive QA commented on HIVE-5091:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12600486/HIVE-5091.D12249.3.patch

{color:green}SUCCESS:{color} +1 2902 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/555/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/555/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 ORC files should have an option to pad stripes to the HDFS block boundaries
 ---

 Key: HIVE-5091
 URL: https://issues.apache.org/jira/browse/HIVE-5091
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch, 
 HIVE-5091.D12249.3.patch


 With ORC stripes being large, if a stripe straddles an HDFS block, the 
 locality of read is suboptimal. It would be good to add padding to ensure 
 that stripes don't straddle HDFS blocks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries

2013-08-29 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754073#comment-13754073
 ] 

Gunther Hagleitner commented on HIVE-5091:
--

Committed to trunk. Thanks Owen!

 ORC files should have an option to pad stripes to the HDFS block boundaries
 ---

 Key: HIVE-5091
 URL: https://issues.apache.org/jira/browse/HIVE-5091
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch, 
 HIVE-5091.D12249.3.patch


 With ORC stripes being large, if a stripe straddles an HDFS block, the 
 locality of read is suboptimal. It would be good to add padding to ensure 
 that stripes don't straddle HDFS blocks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries

2013-08-21 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747154#comment-13747154
 ] 

Phabricator commented on HIVE-5091:
---

hagleitn has commented on the revision HIVE-5091 [jira] ORC files should have 
an option to pad stripes to the HDFS block boundaries.

  LGTM. I like the new WriterOptions. Nice and clean. +1

REVISION DETAIL
  https://reviews.facebook.net/D12249

To: JIRA, omalley
Cc: hagleitn


 ORC files should have an option to pad stripes to the HDFS block boundaries
 ---

 Key: HIVE-5091
 URL: https://issues.apache.org/jira/browse/HIVE-5091
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-5091.D12249.1.patch


 With ORC stripes being large, if a stripe straddles an HDFS block, the 
 locality of read is suboptimal. It would be good to add padding to ensure 
 that stripes don't straddle HDFS blocks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries

2013-08-21 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747165#comment-13747165
 ] 

Gunther Hagleitner commented on HIVE-5091:
--

Looked at the failing tests. The problem is the *NOT*USED*, that will be 
passed as the desired version which leads to an Exception. [~owen.omalley]: I 
think you want to change that to null. Other than that looks good.

 ORC files should have an option to pad stripes to the HDFS block boundaries
 ---

 Key: HIVE-5091
 URL: https://issues.apache.org/jira/browse/HIVE-5091
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-5091.D12249.1.patch


 With ORC stripes being large, if a stripe straddles an HDFS block, the 
 locality of read is suboptimal. It would be good to add padding to ensure 
 that stripes don't straddle HDFS blocks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries

2013-08-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740776#comment-13740776
 ] 

Hive QA commented on HIVE-5091:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12598033/HIVE-5091.D12249.1.patch

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 2859 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_diff_part_cols
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_createas1
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDump
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_empty_files
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_date_serde
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_ends_with_nulls
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_empty_strings
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_create
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udtf_not_supported2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_dictionary_threshold
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/444/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/444/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

 ORC files should have an option to pad stripes to the HDFS block boundaries
 ---

 Key: HIVE-5091
 URL: https://issues.apache.org/jira/browse/HIVE-5091
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-5091.D12249.1.patch


 With ORC stripes being large, if a stripe straddles an HDFS block, the 
 locality of read is suboptimal. It would be good to add padding to ensure 
 that stripes don't straddle HDFS blocks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries

2013-08-14 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740052#comment-13740052
 ] 

Owen O'Malley commented on HIVE-5091:
-

This patch:
* Adds a new table property orc.block.padding, which defaults to true.
* For stripes smaller than a block, if they would straddle the block boundary, 
zeros are written to get to the start of the next block.
* The max block size is set to 1.5GB since 2GB - 1 created issues with 
blocksizes needing to be divisible by the checksum length (512).
* Cleans up the interface to the OrcFile.createWriter so that the user can set 
parameters by name.
* Cleans up the ability to write the 0.11 version of ORC files that was added 
in HIVE-4123. Ensures that the direct string encoding isn't used for 0.11 ORC 
files.
* Updated most of the tests to use the new createWriter API.


 ORC files should have an option to pad stripes to the HDFS block boundaries
 ---

 Key: HIVE-5091
 URL: https://issues.apache.org/jira/browse/HIVE-5091
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-5091.D12249.1.patch


 With ORC stripes being large, if a stripe straddles an HDFS block, the 
 locality of read is suboptimal. It would be good to add padding to ensure 
 that stripes don't straddle HDFS blocks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira