[jira] [Commented] (HIVE-4423) Improve RCFile::sync(long) 10x

2013-08-14 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740051#comment-13740051
 ] 

Gopal V commented on HIVE-4423:
---

Good catch [~taguswang], it is in fact missing 1 byte at the end.

Please log a new bug  assign it to me - I will fix this and add an extra 
test-case for this off-by-one error.


 Improve RCFile::sync(long) 10x
 --

 Key: HIVE-4423
 URL: https://issues.apache.org/jira/browse/HIVE-4423
 Project: Hive
  Issue Type: Improvement
 Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM)
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
  Labels: optimization
 Fix For: 0.12.0

 Attachments: HIVE-4423.patch


 RCFile::sync(long) takes approx ~1 second everytime it gets called because of 
 the inner loops in the function.
 From what was observed with HDFS-4710, single byte reads are an order of 
 magnitude slower than larger 512 byte buffer reads. 
 Even when disk I/O is buffered to this size, there is overhead due to the 
 synchronized read() methods in BlockReaderLocal  RemoteBlockReader classes.
 Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) 
 call will speed this function 10x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4423) Improve RCFile::sync(long) 10x

2013-08-14 Thread tagus wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740559#comment-13740559
 ] 

tagus wang commented on HIVE-4423:
--

Gopal V, i report it in HIVE-5100, but i cannot assign it to you.
so you need to help yourself.


 Improve RCFile::sync(long) 10x
 --

 Key: HIVE-4423
 URL: https://issues.apache.org/jira/browse/HIVE-4423
 Project: Hive
  Issue Type: Improvement
 Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM)
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
  Labels: optimization
 Fix For: 0.12.0

 Attachments: HIVE-4423.patch


 RCFile::sync(long) takes approx ~1 second everytime it gets called because of 
 the inner loops in the function.
 From what was observed with HDFS-4710, single byte reads are an order of 
 magnitude slower than larger 512 byte buffer reads. 
 Even when disk I/O is buffered to this size, there is overhead due to the 
 synchronized read() methods in BlockReaderLocal  RemoteBlockReader classes.
 Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) 
 call will speed this function 10x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4423) Improve RCFile::sync(long) 10x

2013-08-13 Thread tagus wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739221#comment-13739221
 ] 

tagus wang commented on HIVE-4423:
--

this has a bug in this:
System.arraycopy(buffer, buffer.length - prefix - 1, buffer, 0, prefix);
it should be 
System.arraycopy(buffer, buffer.length - prefix, buffer, 0, prefix);


 Improve RCFile::sync(long) 10x
 --

 Key: HIVE-4423
 URL: https://issues.apache.org/jira/browse/HIVE-4423
 Project: Hive
  Issue Type: Improvement
 Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM)
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
  Labels: optimization
 Fix For: 0.12.0

 Attachments: HIVE-4423.patch


 RCFile::sync(long) takes approx ~1 second everytime it gets called because of 
 the inner loops in the function.
 From what was observed with HDFS-4710, single byte reads are an order of 
 magnitude slower than larger 512 byte buffer reads. 
 Even when disk I/O is buffered to this size, there is overhead due to the 
 synchronized read() methods in BlockReaderLocal  RemoteBlockReader classes.
 Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) 
 call will speed this function 10x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4423) Improve RCFile::sync(long) 10x

2013-08-13 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739224#comment-13739224
 ] 

Edward Capriolo commented on HIVE-4423:
---

Maybe we can add a test when we fix.

 Improve RCFile::sync(long) 10x
 --

 Key: HIVE-4423
 URL: https://issues.apache.org/jira/browse/HIVE-4423
 Project: Hive
  Issue Type: Improvement
 Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM)
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
  Labels: optimization
 Fix For: 0.12.0

 Attachments: HIVE-4423.patch


 RCFile::sync(long) takes approx ~1 second everytime it gets called because of 
 the inner loops in the function.
 From what was observed with HDFS-4710, single byte reads are an order of 
 magnitude slower than larger 512 byte buffer reads. 
 Even when disk I/O is buffered to this size, there is overhead due to the 
 synchronized read() methods in BlockReaderLocal  RemoteBlockReader classes.
 Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) 
 call will speed this function 10x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4423) Improve RCFile::sync(long) 10x

2013-04-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643774#comment-13643774
 ] 

Hudson commented on HIVE-4423:
--

Integrated in Hive-trunk-h0.21 #2082 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2082/])
HIVE-4423 : Improve RCFile::sync(long) 10x (Gopal V via Ashutosh Chauhan) 
(Revision 1476648)

 Result = FAILURE
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1476648
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java


 Improve RCFile::sync(long) 10x
 --

 Key: HIVE-4423
 URL: https://issues.apache.org/jira/browse/HIVE-4423
 Project: Hive
  Issue Type: Improvement
 Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM)
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
  Labels: optimization
 Fix For: 0.12.0

 Attachments: HIVE-4423.patch


 RCFile::sync(long) takes approx ~1 second everytime it gets called because of 
 the inner loops in the function.
 From what was observed with HDFS-4710, single byte reads are an order of 
 magnitude slower than larger 512 byte buffer reads. 
 Even when disk I/O is buffered to this size, there is overhead due to the 
 synchronized read() methods in BlockReaderLocal  RemoteBlockReader classes.
 Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) 
 call will speed this function 10x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4423) Improve RCFile::sync(long) 10x

2013-04-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643778#comment-13643778
 ] 

Hudson commented on HIVE-4423:
--

Integrated in Hive-trunk-hadoop2 #179 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/179/])
HIVE-4423 : Improve RCFile::sync(long) 10x (Gopal V via Ashutosh Chauhan) 
(Revision 1476648)

 Result = FAILURE
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1476648
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java


 Improve RCFile::sync(long) 10x
 --

 Key: HIVE-4423
 URL: https://issues.apache.org/jira/browse/HIVE-4423
 Project: Hive
  Issue Type: Improvement
 Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM)
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
  Labels: optimization
 Fix For: 0.12.0

 Attachments: HIVE-4423.patch


 RCFile::sync(long) takes approx ~1 second everytime it gets called because of 
 the inner loops in the function.
 From what was observed with HDFS-4710, single byte reads are an order of 
 magnitude slower than larger 512 byte buffer reads. 
 Even when disk I/O is buffered to this size, there is overhead due to the 
 synchronized read() methods in BlockReaderLocal  RemoteBlockReader classes.
 Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) 
 call will speed this function 10x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4423) Improve RCFile::sync(long) 10x

2013-04-26 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642745#comment-13642745
 ] 

Gopal V commented on HIVE-4423:
---

|| split location || before || after ||
| store_sales/00_0:67108864+67108864 | 748 ms |81 ms  |
| store_sales/02_0:67108864+67108864 | 966 ms |54 ms |
| store_sales/04_0:67108864+67108864 | 948 ms |51 ms |
| store_sales/06_0:67108864+67108864 | 922 ms |42 ms |
| store_sales/08_0:67108864+67108864 | 842 ms |40 ms |
| store_sales/10_0:67108864+67108864 | 1302 ms |   82 ms |
| store_sales/12_0:67108864+67108864 | 989 ms |50 ms |
| store_sales/14_0:67108864+67108864 | 970 ms |43 ms |
| store_sales/01_0:67108864+67108864 | 829 ms |47 ms |
| store_sales/03_0:67108864+67108864 | 811 ms |43 ms |
| store_sales/07_0:67108864+67108864 | 865 ms |51 ms |
| store_sales/05_0:67108864+67108864 | 1042 ms |   59 ms |
| store_sales/09_0:67108864+67108864 | 902 ms |39 ms |
| store_sales/11_0:67108864+67108864 | 1046 ms |   42 ms |
| store_sales/13_0:67108864+67108864 | 1048 ms |   44 ms |

As expected, the function is faster by an order of magnitude  fast enough to 
not need more optimization in the inner sync.length for loop.

Over all, the query was faster by 2+ seconds for a 28 second query (since we 
have 8 slots and 15 mappers, so that's expected).

 Improve RCFile::sync(long) 10x
 --

 Key: HIVE-4423
 URL: https://issues.apache.org/jira/browse/HIVE-4423
 Project: Hive
  Issue Type: Improvement
 Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM)
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Fix For: 0.11.0

 Attachments: HIVE-4423.patch


 RCFile::sync(long) takes approx ~1 second everytime it gets called because of 
 the inner loops in the function.
 From what was observed with HDFS-4710, single byte reads are an order of 
 magnitude slower than larger 512 byte buffer reads. 
 Even when disk I/O is buffered to this size, there is overhead due to the 
 synchronized read() methods in BlockReaderLocal  RemoteBlockReader classes.
 Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) 
 call will speed this function 10x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4423) Improve RCFile::sync(long) 10x

2013-04-26 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642944#comment-13642944
 ] 

Ashutosh Chauhan commented on HIVE-4423:


+1 will commit if tests pass

 Improve RCFile::sync(long) 10x
 --

 Key: HIVE-4423
 URL: https://issues.apache.org/jira/browse/HIVE-4423
 Project: Hive
  Issue Type: Improvement
 Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM)
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Fix For: 0.11.0

 Attachments: HIVE-4423.patch


 RCFile::sync(long) takes approx ~1 second everytime it gets called because of 
 the inner loops in the function.
 From what was observed with HDFS-4710, single byte reads are an order of 
 magnitude slower than larger 512 byte buffer reads. 
 Even when disk I/O is buffered to this size, there is overhead due to the 
 synchronized read() methods in BlockReaderLocal  RemoteBlockReader classes.
 Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) 
 call will speed this function 10x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira