[jira] [Commented] (HBASE-9102) HFile block pre-loading for large sequential scan

2013-08-01 Thread Chao Shi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726224#comment-13726224
 ] 

Chao Shi commented on HBASE-9102:
-

I don't think block cache should be used for such prefetch, as large sequential 
scan will swap-out blocks for random read.
If we use hdfs client for prefetch, we also need to implement scanner-sticky 
DFSInputStream, as seek called by another scanner will clear all the prefetch 
work. 

Another question is how do we consider if a scan is sequential or random. The 
current implementation (before Lars's patch HBASE-7336) only treats Get as 
random and thus uses pread. In our scenario, there are two kinds of scans: a) 
from online system and b) MR. Most of a) does not scan more than 1 block and 
are expected to return within tens of milliseconds.

 HFile block pre-loading for large sequential scan
 -

 Key: HBASE-9102
 URL: https://issues.apache.org/jira/browse/HBASE-9102
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Liyin Tang
Assignee: Liyin Tang

 The current HBase scan model cannot take full advantage of the aggrediate 
 disk throughput, especially for the large sequential scan cases. And for the 
 large sequential scan, it is easy to predict what the next block to read in 
 advance so that it can pre-load and decompress/decoded these data blocks from 
 HDFS into block cache right before the current read point. 
 Therefore, this jira is to optimized the large sequential scan performance by 
 pre-loading the HFile blocks into the block cache in a stream fashion so that 
 the scan query can read from the cache directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9102) HFile block pre-loading for large sequential scan

2013-08-01 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726635#comment-13726635
 ] 

Liyin Tang commented on HBASE-9102:
---

Chao, You are right that the pre-load will run in a rate/limit fashion to make 
sure it won't pollute the block cache substantially.
The pre-loading targets on the large sequential scan case. The client is able 
to enable/disable on each request basis. 


 HFile block pre-loading for large sequential scan
 -

 Key: HBASE-9102
 URL: https://issues.apache.org/jira/browse/HBASE-9102
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Liyin Tang
Assignee: Liyin Tang

 The current HBase scan model cannot take full advantage of the aggrediate 
 disk throughput, especially for the large sequential scan cases. And for the 
 large sequential scan, it is easy to predict what the next block to read in 
 advance so that it can pre-load and decompress/decoded these data blocks from 
 HDFS into block cache right before the current read point. 
 Therefore, this jira is to optimized the large sequential scan performance by 
 pre-loading the HFile blocks into the block cache in a stream fashion so that 
 the scan query can read from the cache directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9102) HFile block pre-loading for large sequential scan

2013-08-01 Thread Chao Shi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13727242#comment-13727242
 ] 

Chao Shi commented on HBASE-9102:
-

bq. The client is able to enable/disable on each request basis.

Is this switch available for now? I guess this is enough to improve under our 
workload (as most of our scan requests only touch 1 block). For such requests, 
we can enable this switch to use pread.

 HFile block pre-loading for large sequential scan
 -

 Key: HBASE-9102
 URL: https://issues.apache.org/jira/browse/HBASE-9102
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Liyin Tang
Assignee: Liyin Tang

 The current HBase scan model cannot take full advantage of the aggrediate 
 disk throughput, especially for the large sequential scan cases. And for the 
 large sequential scan, it is easy to predict what the next block to read in 
 advance so that it can pre-load and decompress/decoded these data blocks from 
 HDFS into block cache right before the current read point. 
 Therefore, this jira is to optimized the large sequential scan performance by 
 pre-loading the HFile blocks into the block cache in a stream fashion so that 
 the scan query can read from the cache directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9102) HFile block pre-loading for large sequential scan

2013-07-31 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725446#comment-13725446
 ] 

Lars Hofhansl commented on HBASE-9102:
--

Should we handle this from the client instead. Part of the problem is that the 
network pipe is not kept full, so by triggering prefetching from the client we 
can solve both problems.

 HFile block pre-loading for large sequential scan
 -

 Key: HBASE-9102
 URL: https://issues.apache.org/jira/browse/HBASE-9102
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Liyin Tang
Assignee: Liyin Tang

 The current HBase scan model cannot take full advantage of the aggrediate 
 disk throughput, especially for the large sequential scan cases. And for the 
 large sequential scan, it is easy to predict what the next block to read in 
 advance so that it can pre-load and decompress/decoded these data blocks from 
 HDFS into block cache right before the current read point. 
 Therefore, this jira is to optimized the large sequential scan performance by 
 pre-loading the HFile blocks into the block cache in a stream fashion so that 
 the scan query can read from the cache directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9102) HFile block pre-loading for large sequential scan

2013-07-31 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725453#comment-13725453
 ] 

Vladimir Rodionov commented on HBASE-9102:
--

This is what, actually, OS does and all modern disk controllers do on a lower 
lvel. If short-circuit reads are enabled and HBase cluster has good HDFS 
locality and data is compacted you will get dick blocks pre-fetching for free. 

 HFile block pre-loading for large sequential scan
 -

 Key: HBASE-9102
 URL: https://issues.apache.org/jira/browse/HBASE-9102
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Liyin Tang
Assignee: Liyin Tang

 The current HBase scan model cannot take full advantage of the aggrediate 
 disk throughput, especially for the large sequential scan cases. And for the 
 large sequential scan, it is easy to predict what the next block to read in 
 advance so that it can pre-load and decompress/decoded these data blocks from 
 HDFS into block cache right before the current read point. 
 Therefore, this jira is to optimized the large sequential scan performance by 
 pre-loading the HFile blocks into the block cache in a stream fashion so that 
 the scan query can read from the cache directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9102) HFile block pre-loading for large sequential scan

2013-07-31 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725515#comment-13725515
 ] 

Liyin Tang commented on HBASE-9102:
---

It is true that OS cached the compressed/encoded blocks and the DFSClient 
non-pread operation is also able to pre-load all the bytes up to that DFS 
block. And this feature is to pre-load (decompress/decoded) these data blocks 
in additional to the OS cache/disk read-ahead.

Also the scan prefetch is currently implemented in the RegionScanner level. I 
think it is a good idea to implement some prefetch logic in the HBase client as 
well.

 HFile block pre-loading for large sequential scan
 -

 Key: HBASE-9102
 URL: https://issues.apache.org/jira/browse/HBASE-9102
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Liyin Tang
Assignee: Liyin Tang

 The current HBase scan model cannot take full advantage of the aggrediate 
 disk throughput, especially for the large sequential scan cases. And for the 
 large sequential scan, it is easy to predict what the next block to read in 
 advance so that it can pre-load and decompress/decoded these data blocks from 
 HDFS into block cache right before the current read point. 
 Therefore, this jira is to optimized the large sequential scan performance by 
 pre-loading the HFile blocks into the block cache in a stream fashion so that 
 the scan query can read from the cache directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira