[jira] [Created] (HIVE-5100) RCFile::sync(long) missing 1 byte in System.arraycopy()
tagus wang created HIVE-5100: Summary: RCFile::sync(long) missing 1 byte in System.arraycopy() Key: HIVE-5100 URL: https://issues.apache.org/jira/browse/HIVE-5100 Project: Hive Issue Type: Bug Reporter: tagus wang this has a bug in this: System.arraycopy(buffer, buffer.length - prefix - 1, buffer, 0, prefix); it should be System.arraycopy(buffer, buffer.length - prefix, buffer, 0, prefix); it is missing 1 byte at the end. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4423) Improve RCFile::sync(long) 10x
[ https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740559#comment-13740559 ] tagus wang commented on HIVE-4423: -- Gopal V, i report it in HIVE-5100, but i cannot assign it to you. so you need to help yourself. Improve RCFile::sync(long) 10x -- Key: HIVE-4423 URL: https://issues.apache.org/jira/browse/HIVE-4423 Project: Hive Issue Type: Improvement Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM) Reporter: Gopal V Assignee: Gopal V Priority: Minor Labels: optimization Fix For: 0.12.0 Attachments: HIVE-4423.patch RCFile::sync(long) takes approx ~1 second everytime it gets called because of the inner loops in the function. From what was observed with HDFS-4710, single byte reads are an order of magnitude slower than larger 512 byte buffer reads. Even when disk I/O is buffered to this size, there is overhead due to the synchronized read() methods in BlockReaderLocal RemoteBlockReader classes. Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) call will speed this function 10x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4423) Improve RCFile::sync(long) 10x
[ https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739221#comment-13739221 ] tagus wang commented on HIVE-4423: -- this has a bug in this: System.arraycopy(buffer, buffer.length - prefix - 1, buffer, 0, prefix); it should be System.arraycopy(buffer, buffer.length - prefix, buffer, 0, prefix); Improve RCFile::sync(long) 10x -- Key: HIVE-4423 URL: https://issues.apache.org/jira/browse/HIVE-4423 Project: Hive Issue Type: Improvement Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM) Reporter: Gopal V Assignee: Gopal V Priority: Minor Labels: optimization Fix For: 0.12.0 Attachments: HIVE-4423.patch RCFile::sync(long) takes approx ~1 second everytime it gets called because of the inner loops in the function. From what was observed with HDFS-4710, single byte reads are an order of magnitude slower than larger 512 byte buffer reads. Even when disk I/O is buffered to this size, there is overhead due to the synchronized read() methods in BlockReaderLocal RemoteBlockReader classes. Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) call will speed this function 10x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira