[jira] [Created] (HIVE-5100) RCFile::sync(long) missing 1 byte in System.arraycopy()

2013-08-14 Thread tagus wang (JIRA)
tagus wang created HIVE-5100:


 Summary:  RCFile::sync(long)  missing 1 byte in System.arraycopy()
 Key: HIVE-5100
 URL: https://issues.apache.org/jira/browse/HIVE-5100
 Project: Hive
  Issue Type: Bug
Reporter: tagus wang


this has a bug in this:
System.arraycopy(buffer, buffer.length - prefix - 1, buffer, 0, prefix);
it should be 
System.arraycopy(buffer, buffer.length - prefix, buffer, 0, prefix);
it is missing 1 byte at the end.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4423) Improve RCFile::sync(long) 10x

2013-08-14 Thread tagus wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740559#comment-13740559
 ] 

tagus wang commented on HIVE-4423:
--

Gopal V, i report it in HIVE-5100, but i cannot assign it to you.
so you need to help yourself.


 Improve RCFile::sync(long) 10x
 --

 Key: HIVE-4423
 URL: https://issues.apache.org/jira/browse/HIVE-4423
 Project: Hive
  Issue Type: Improvement
 Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM)
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
  Labels: optimization
 Fix For: 0.12.0

 Attachments: HIVE-4423.patch


 RCFile::sync(long) takes approx ~1 second everytime it gets called because of 
 the inner loops in the function.
 From what was observed with HDFS-4710, single byte reads are an order of 
 magnitude slower than larger 512 byte buffer reads. 
 Even when disk I/O is buffered to this size, there is overhead due to the 
 synchronized read() methods in BlockReaderLocal  RemoteBlockReader classes.
 Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) 
 call will speed this function 10x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4423) Improve RCFile::sync(long) 10x

2013-08-13 Thread tagus wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739221#comment-13739221
 ] 

tagus wang commented on HIVE-4423:
--

this has a bug in this:
System.arraycopy(buffer, buffer.length - prefix - 1, buffer, 0, prefix);
it should be 
System.arraycopy(buffer, buffer.length - prefix, buffer, 0, prefix);


 Improve RCFile::sync(long) 10x
 --

 Key: HIVE-4423
 URL: https://issues.apache.org/jira/browse/HIVE-4423
 Project: Hive
  Issue Type: Improvement
 Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM)
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
  Labels: optimization
 Fix For: 0.12.0

 Attachments: HIVE-4423.patch


 RCFile::sync(long) takes approx ~1 second everytime it gets called because of 
 the inner loops in the function.
 From what was observed with HDFS-4710, single byte reads are an order of 
 magnitude slower than larger 512 byte buffer reads. 
 Even when disk I/O is buffered to this size, there is overhead due to the 
 synchronized read() methods in BlockReaderLocal  RemoteBlockReader classes.
 Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) 
 call will speed this function 10x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira