Review Request: Improve RCFile::sync(long) by 10x

2013-04-26 Thread Gopal V

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10795/
---

Review request for hive, Ashutosh Chauhan and Gunther Hagleitner.


Description
---

Speed up RCFile::sync() by reading large blocks of data from HDFS rather than 
using readByte() on the input stream. 

This improves the loop behaviour and reduces the number of calls on the 
synchronized read() methods within HDFS, resulting in a 10x performance boost 
to this function.

In real time, it converts a call that takes upto a second and brings it below 
100ms, by reading 512 byte chunks instead of reading data 1 byte at a time.


This addresses bug HIVE-4423.
https://issues.apache.org/jira/browse/HIVE-4423


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java d3d98d0 

Diff: https://reviews.apache.org/r/10795/diff/


Testing
---

ant test -Dtestcase=TestRCFile -Dmodule=ql
ant test -Dtestcase=TestCliDriver -Dqfile_regex=.*rcfile.* -Dmodule=ql

And benchmarking with count(1) on the store_sales rcfile table at scale=10

before: 43.8, after: 39.5 


Thanks,

Gopal V



Re: Review Request: Improve RCFile::sync(long) by 10x

2013-04-26 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10795/#review19770
---

Ship it!


Ship It!

- Ashutosh Chauhan


On April 26, 2013, 11:25 a.m., Gopal V wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/10795/
 ---
 
 (Updated April 26, 2013, 11:25 a.m.)
 
 
 Review request for hive, Ashutosh Chauhan and Gunther Hagleitner.
 
 
 Description
 ---
 
 Speed up RCFile::sync() by reading large blocks of data from HDFS rather than 
 using readByte() on the input stream. 
 
 This improves the loop behaviour and reduces the number of calls on the 
 synchronized read() methods within HDFS, resulting in a 10x performance boost 
 to this function.
 
 In real time, it converts a call that takes upto a second and brings it below 
 100ms, by reading 512 byte chunks instead of reading data 1 byte at a time.
 
 
 This addresses bug HIVE-4423.
 https://issues.apache.org/jira/browse/HIVE-4423
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java d3d98d0 
 
 Diff: https://reviews.apache.org/r/10795/diff/
 
 
 Testing
 ---
 
 ant test -Dtestcase=TestRCFile -Dmodule=ql
 ant test -Dtestcase=TestCliDriver -Dqfile_regex=.*rcfile.* -Dmodule=ql
 
 And benchmarking with count(1) on the store_sales rcfile table at scale=10
 
 before: 43.8, after: 39.5 
 
 
 Thanks,
 
 Gopal V