[jira] [Commented] (HIVE-4423) Improve RCFile::sync(long) 10x
[ https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740051#comment-13740051 ] Gopal V commented on HIVE-4423: --- Good catch [~taguswang], it is in fact missing 1 byte at the end. Please log a new bug assign it to me - I will fix this and add an extra test-case for this off-by-one error. Improve RCFile::sync(long) 10x -- Key: HIVE-4423 URL: https://issues.apache.org/jira/browse/HIVE-4423 Project: Hive Issue Type: Improvement Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM) Reporter: Gopal V Assignee: Gopal V Priority: Minor Labels: optimization Fix For: 0.12.0 Attachments: HIVE-4423.patch RCFile::sync(long) takes approx ~1 second everytime it gets called because of the inner loops in the function. From what was observed with HDFS-4710, single byte reads are an order of magnitude slower than larger 512 byte buffer reads. Even when disk I/O is buffered to this size, there is overhead due to the synchronized read() methods in BlockReaderLocal RemoteBlockReader classes. Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) call will speed this function 10x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4423) Improve RCFile::sync(long) 10x
[ https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740559#comment-13740559 ] tagus wang commented on HIVE-4423: -- Gopal V, i report it in HIVE-5100, but i cannot assign it to you. so you need to help yourself. Improve RCFile::sync(long) 10x -- Key: HIVE-4423 URL: https://issues.apache.org/jira/browse/HIVE-4423 Project: Hive Issue Type: Improvement Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM) Reporter: Gopal V Assignee: Gopal V Priority: Minor Labels: optimization Fix For: 0.12.0 Attachments: HIVE-4423.patch RCFile::sync(long) takes approx ~1 second everytime it gets called because of the inner loops in the function. From what was observed with HDFS-4710, single byte reads are an order of magnitude slower than larger 512 byte buffer reads. Even when disk I/O is buffered to this size, there is overhead due to the synchronized read() methods in BlockReaderLocal RemoteBlockReader classes. Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) call will speed this function 10x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4423) Improve RCFile::sync(long) 10x
[ https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739221#comment-13739221 ] tagus wang commented on HIVE-4423: -- this has a bug in this: System.arraycopy(buffer, buffer.length - prefix - 1, buffer, 0, prefix); it should be System.arraycopy(buffer, buffer.length - prefix, buffer, 0, prefix); Improve RCFile::sync(long) 10x -- Key: HIVE-4423 URL: https://issues.apache.org/jira/browse/HIVE-4423 Project: Hive Issue Type: Improvement Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM) Reporter: Gopal V Assignee: Gopal V Priority: Minor Labels: optimization Fix For: 0.12.0 Attachments: HIVE-4423.patch RCFile::sync(long) takes approx ~1 second everytime it gets called because of the inner loops in the function. From what was observed with HDFS-4710, single byte reads are an order of magnitude slower than larger 512 byte buffer reads. Even when disk I/O is buffered to this size, there is overhead due to the synchronized read() methods in BlockReaderLocal RemoteBlockReader classes. Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) call will speed this function 10x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4423) Improve RCFile::sync(long) 10x
[ https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739224#comment-13739224 ] Edward Capriolo commented on HIVE-4423: --- Maybe we can add a test when we fix. Improve RCFile::sync(long) 10x -- Key: HIVE-4423 URL: https://issues.apache.org/jira/browse/HIVE-4423 Project: Hive Issue Type: Improvement Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM) Reporter: Gopal V Assignee: Gopal V Priority: Minor Labels: optimization Fix For: 0.12.0 Attachments: HIVE-4423.patch RCFile::sync(long) takes approx ~1 second everytime it gets called because of the inner loops in the function. From what was observed with HDFS-4710, single byte reads are an order of magnitude slower than larger 512 byte buffer reads. Even when disk I/O is buffered to this size, there is overhead due to the synchronized read() methods in BlockReaderLocal RemoteBlockReader classes. Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) call will speed this function 10x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4423) Improve RCFile::sync(long) 10x
[ https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643774#comment-13643774 ] Hudson commented on HIVE-4423: -- Integrated in Hive-trunk-h0.21 #2082 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2082/]) HIVE-4423 : Improve RCFile::sync(long) 10x (Gopal V via Ashutosh Chauhan) (Revision 1476648) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1476648 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java Improve RCFile::sync(long) 10x -- Key: HIVE-4423 URL: https://issues.apache.org/jira/browse/HIVE-4423 Project: Hive Issue Type: Improvement Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM) Reporter: Gopal V Assignee: Gopal V Priority: Minor Labels: optimization Fix For: 0.12.0 Attachments: HIVE-4423.patch RCFile::sync(long) takes approx ~1 second everytime it gets called because of the inner loops in the function. From what was observed with HDFS-4710, single byte reads are an order of magnitude slower than larger 512 byte buffer reads. Even when disk I/O is buffered to this size, there is overhead due to the synchronized read() methods in BlockReaderLocal RemoteBlockReader classes. Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) call will speed this function 10x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4423) Improve RCFile::sync(long) 10x
[ https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643778#comment-13643778 ] Hudson commented on HIVE-4423: -- Integrated in Hive-trunk-hadoop2 #179 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/179/]) HIVE-4423 : Improve RCFile::sync(long) 10x (Gopal V via Ashutosh Chauhan) (Revision 1476648) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1476648 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java Improve RCFile::sync(long) 10x -- Key: HIVE-4423 URL: https://issues.apache.org/jira/browse/HIVE-4423 Project: Hive Issue Type: Improvement Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM) Reporter: Gopal V Assignee: Gopal V Priority: Minor Labels: optimization Fix For: 0.12.0 Attachments: HIVE-4423.patch RCFile::sync(long) takes approx ~1 second everytime it gets called because of the inner loops in the function. From what was observed with HDFS-4710, single byte reads are an order of magnitude slower than larger 512 byte buffer reads. Even when disk I/O is buffered to this size, there is overhead due to the synchronized read() methods in BlockReaderLocal RemoteBlockReader classes. Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) call will speed this function 10x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4423) Improve RCFile::sync(long) 10x
[ https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642745#comment-13642745 ] Gopal V commented on HIVE-4423: --- || split location || before || after || | store_sales/00_0:67108864+67108864 | 748 ms |81 ms | | store_sales/02_0:67108864+67108864 | 966 ms |54 ms | | store_sales/04_0:67108864+67108864 | 948 ms |51 ms | | store_sales/06_0:67108864+67108864 | 922 ms |42 ms | | store_sales/08_0:67108864+67108864 | 842 ms |40 ms | | store_sales/10_0:67108864+67108864 | 1302 ms | 82 ms | | store_sales/12_0:67108864+67108864 | 989 ms |50 ms | | store_sales/14_0:67108864+67108864 | 970 ms |43 ms | | store_sales/01_0:67108864+67108864 | 829 ms |47 ms | | store_sales/03_0:67108864+67108864 | 811 ms |43 ms | | store_sales/07_0:67108864+67108864 | 865 ms |51 ms | | store_sales/05_0:67108864+67108864 | 1042 ms | 59 ms | | store_sales/09_0:67108864+67108864 | 902 ms |39 ms | | store_sales/11_0:67108864+67108864 | 1046 ms | 42 ms | | store_sales/13_0:67108864+67108864 | 1048 ms | 44 ms | As expected, the function is faster by an order of magnitude fast enough to not need more optimization in the inner sync.length for loop. Over all, the query was faster by 2+ seconds for a 28 second query (since we have 8 slots and 15 mappers, so that's expected). Improve RCFile::sync(long) 10x -- Key: HIVE-4423 URL: https://issues.apache.org/jira/browse/HIVE-4423 Project: Hive Issue Type: Improvement Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM) Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: 0.11.0 Attachments: HIVE-4423.patch RCFile::sync(long) takes approx ~1 second everytime it gets called because of the inner loops in the function. From what was observed with HDFS-4710, single byte reads are an order of magnitude slower than larger 512 byte buffer reads. Even when disk I/O is buffered to this size, there is overhead due to the synchronized read() methods in BlockReaderLocal RemoteBlockReader classes. Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) call will speed this function 10x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4423) Improve RCFile::sync(long) 10x
[ https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642944#comment-13642944 ] Ashutosh Chauhan commented on HIVE-4423: +1 will commit if tests pass Improve RCFile::sync(long) 10x -- Key: HIVE-4423 URL: https://issues.apache.org/jira/browse/HIVE-4423 Project: Hive Issue Type: Improvement Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM) Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: 0.11.0 Attachments: HIVE-4423.patch RCFile::sync(long) takes approx ~1 second everytime it gets called because of the inner loops in the function. From what was observed with HDFS-4710, single byte reads are an order of magnitude slower than larger 512 byte buffer reads. Even when disk I/O is buffered to this size, there is overhead due to the synchronized read() methods in BlockReaderLocal RemoteBlockReader classes. Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) call will speed this function 10x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira