[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-05-10 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537540#comment-14537540
 ] 

Selina Zhang commented on HIVE-10036:
-

The above two unit test failures seem irrelevant to this patch. 

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
  Labels: orcfile
 Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, 
 HIVE-10036.3.patch, HIVE-10036.5.patch, HIVE-10036.6.patch, 
 HIVE-10036.7.patch, HIVE-10036.8.patch, HIVE-10036.9.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-05-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537415#comment-14537415
 ] 

Hive QA commented on HIVE-10036:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12731823/HIVE-10036.9.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8921 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3841/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3841/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3841/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12731823 - PreCommit-HIVE-TRUNK-Build

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
  Labels: orcfile
 Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, 
 HIVE-10036.3.patch, HIVE-10036.5.patch, HIVE-10036.6.patch, 
 HIVE-10036.7.patch, HIVE-10036.8.patch, HIVE-10036.9.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-05-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536335#comment-14536335
 ] 

Hive QA commented on HIVE-10036:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12731308/HIVE-10036.8.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 8919 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testMemoryManagementV12[0]
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testMemoryManagementV12[1]
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3824/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3824/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3824/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12731308 - PreCommit-HIVE-TRUNK-Build

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
  Labels: orcfile
 Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, 
 HIVE-10036.3.patch, HIVE-10036.5.patch, HIVE-10036.6.patch, 
 HIVE-10036.7.patch, HIVE-10036.8.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-05-03 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526023#comment-14526023
 ] 

Selina Zhang commented on HIVE-10036:
-

[~gopalv] Thank you! I included the io.netty to ql/pom.xml and uploaded a new 
patch. 

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
  Labels: orcfile
 Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, 
 HIVE-10036.3.patch, HIVE-10036.5.patch, HIVE-10036.6.patch, HIVE-10036.7.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-04-23 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510096#comment-14510096
 ] 

Owen O'Malley commented on HIVE-10036:
--

I understand the problem, but this patch creates more trouble than it fixes. 
The original design is such that you don't do buffer copies. This patch 
destroys that design and adds both buffer copies and reallocations. We should 
set the buffer sizes down for the bit vectors, but this patch is going the 
wrong way.

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
  Labels: orcfile
 Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, 
 HIVE-10036.3.patch, HIVE-10036.5.patch, HIVE-10036.6.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-04-15 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495849#comment-14495849
 ] 

Gopal V commented on HIVE-10036:


[~selinazh]: I added this to my ORC bench for HadoopSummit and it looks like 
we're not shipping io.netty inside hive-exec.jar.

The io.netty:netty-buffer needs to be included in the maven-shade lines  the 
dependencies for ql/pom.xml.

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, 
 HIVE-10036.3.patch, HIVE-10036.5.patch, HIVE-10036.6.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-04-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493925#comment-14493925
 ] 

Hive QA commented on HIVE-10036:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12725128/HIVE-10036.6.patch

{color:red}ERROR:{color} -1 due to 323 failed/errored test(s), 8688 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a 
TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did 
not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_project
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_2_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete_own_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update_own_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_serde
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_date_serde
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_join2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_all_non_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_all_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_orig_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_tmp_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_no_match
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_non_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_whole_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization_acid
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial

[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-04-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14491458#comment-14491458
 ] 

Hive QA commented on HIVE-10036:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12724721/HIVE-10036.5.patch

{color:red}ERROR:{color} -1 due to 316 failed/errored test(s), 8672 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a 
TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did 
not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_project
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_2_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete_own_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update_own_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_serde
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_date_serde
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_join2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_all_non_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_all_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_orig_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_tmp_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_no_match
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_non_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_whole_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization_acid
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial

[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-04-02 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392404#comment-14392404
 ] 

Prasanth Jayachandran commented on HIVE-10036:
--

[~selinazh] Thanks for the patch! Left some comments in RB.

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, 
 HIVE-10036.3.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-03-24 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378863#comment-14378863
 ] 

Selina Zhang commented on HIVE-10036:
-

[~gopalv] I checked my maven repository, it seems avro 1.7.5 pulled in the 
netty-3.4.0.Final, which override the one defined in main pom.xml.   

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, 
 HIVE-10036.3.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-03-24 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378176#comment-14378176
 ] 

Selina Zhang commented on HIVE-10036:
-

The review request:
https://reviews.apache.org/r/32445/

Thank you!

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, 
 HIVE-10036.3.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-03-24 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378260#comment-14378260
 ] 

Gopal V commented on HIVE-10036:


[~selinazh]: quick q - the netty version hive-1.2 is 4.0. Do you know where the 
netty-3 ChannelBuffer in the patch is from? The recent release renamed that to 
ByteBuf.

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, 
 HIVE-10036.3.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-03-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377054#comment-14377054
 ] 

Hive QA commented on HIVE-10036:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12706769/HIVE-10036.3.patch

{color:green}SUCCESS:{color} +1 7824 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3123/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3123/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3123/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12706769 - PreCommit-HIVE-TRUNK-Build

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, 
 HIVE-10036.3.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372158#comment-14372158
 ] 

Hive QA commented on HIVE-10036:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12706001/HIVE-10036.1.patch

{color:red}ERROR:{color} -1 due to 39 failed/errored test(s), 7819 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete_own_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update_own_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_all_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization_acid
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_tmp_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_empty_files
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_vectorization_ppd
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_acid
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_after_multiple_inserts
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_all_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_values_tmp_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge6
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_vectorization_ppd
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_after_multiple_inserts
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testBloomFilter2
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDictionaryThreshold
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDump
org.apache.hadoop.hive.ql.io.orc.TestInStream.testCompressed
org.apache.hadoop.hive.ql.io.orc.TestInStream.testDisjointBuffers
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testMemoryManagementV12[0]
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testMemoryManagementV12[1]
org.apache.hadoop.hive.ql.io.orc.TestOrcRawRecordMerger.testEmpty
org.apache.hive.hcatalog.pig.TestHCatLoader.testColumnarStorePushdown[3]
org.apache.hive.hcatalog.pig.TestHCatStorer.testEmptyStore[3]
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3095/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3095/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3095/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 39 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12706001 - PreCommit-HIVE-TRUNK-Build

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10036.1.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for