[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-22 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944209#comment-13944209
 ] 

Andrew Purtell commented on HBASE-10191:


Just for documentary purposes at this point, since the implementation is early 
and has a long way to go, but RedHat recently announced ongoing work on a new 
GC called Shenandoah, with the stated goals Reduce GC pause times on extremely 
large heaps by doing evacuation work concurrently with Java threads and making 
pause times independent of heap size.. 
- JEP: http://openjdk.java.net/jeps/189
- Project: http://icedtea.classpath.org/shenandoah/
- Source: http://icedtea.classpath.org/hg/shenandoah

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-13 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934565#comment-13934565
 ] 

Liyin Tang commented on HBASE-10191:


Just curious, has anyone experienced any imbalance memory allocation among the 
NUMA nodes when allocating large off heap arena ? 

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-10 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925530#comment-13925530
 ] 

Yu Li commented on HBASE-10191:
---

Hi [~mcorgan] and [~stack],

I find you ever had a discussion long ago in HBASE-3484 ([here| 
https://issues.apache.org/jira/browse/HBASE-3484?focusedCommentId=13410934page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13410934]),
 but it seems no further progress since then. And [~mcorban] I find you have 
more detailed design thought now according to your above comment, so I'm 
wondering whether you have done some real work to implement this design? Or any 
plan?

Actually I think the design you proposed is kind of different from the JIRA 
topic here or in HBASE-3484, since it's more like an in-memory-flush to reduce 
memory fragmentation rather than move off heap. I'm wondering whether it 
would be better to open another JIRA to make the discussion more explicit, 
while leaving the off heap discussion here?

I've been watching this thread or say this topic for some while and now we've 
decided to do similar improvement to our online hbase service here, so I'd 
really like to work together with community to complete the design and 
implementation of the in-memory-flush stuff. :-)

I'm totally new face here in this discussion, so please kindly forgive me if 
I've stated anything naive. :-)

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-10 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925606#comment-13925606
 ] 

Anoop Sam John commented on HBASE-10191:


[~carp84]  I am working on this stuff of CellBlocks.  (Yes in memory flushes) 
Coding wise mostly it is done and will do perf tests also.  Some time I had 
worked in HBASE-3484 but later dropped.  Ya here along with Off heap , the 
discussion of CellBlocks also came in.  This can greatly reduce the issue we 
face today with CSLM (When there are too many KVs in it).  We are parallely 
working on the Off heap stuff also. My code is like in a combined form now. Let 
me seperate it out.  Also see HBASE-10648 which will allow us to have different 
MemStore impls.

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-10 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925855#comment-13925855
 ] 

Yu Li commented on HBASE-10191:
---

Hi [~anoop.hbase],

Thanks for the info, really good to know the progress, I almost started to do 
the impl by myself. :-) It's also great to see the patch of making MemStore 
impls pluggable almost ready. 
{quote}
My code is like in a combined form now. Let me seperate it out.
{quote}
I guess the code changes about CellBlocks would base on HBASE-10648? I searched 
but found no seperate JIRA for this CellBlocks impl, would you create one after 
separating the code out? Really cannot wait to take a look at it. :-)

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-10 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925850#comment-13925850
 ] 

Matt Corgan commented on HBASE-10191:
-

[~carp84] you're right, flushing the memstore to memory is a separate issue 
than off-heap storage, but it's important to mention here so off-heap storage 
can be designed to support it.  My comments about splitting the memstore into 
stripes could also be a separate issue since it's just an improvement that 
saves you some in-memory compaction work on non-uniform data distributions.

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-10 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925861#comment-13925861
 ] 

Yu Li commented on HBASE-10191:
---

Hi [~mcorgan],

Got it, thanks for the explanation :-)

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-10 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925889#comment-13925889
 ] 

Anoop Sam John commented on HBASE-10191:


bq.would you create one 
HBASE-10713.  Will come up with patch soon. Welcome ur suggestions.  Pls keep 
all such discussions under this new jira issue

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-04 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920599#comment-13920599
 ] 

ramkrishna.s.vasudevan commented on HBASE-10191:


bq.Would be sweet if the value at least was not on heap
Yes, this could be a nice.  

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918407#comment-13918407
 ] 

stack commented on HBASE-10191:
---

[~mcorgan]

bq.  It's basically creating small in-memory HFiles that can be compacted 
several times in memory without going to disk, and holding on to the WAL 
entries until they do go to disk.

Pardon dumb questions, creating small in-memory HFiles...  -- from a small 
CSLM that does the sort for us?  Or, I remember talking to Martin Thompson once 
trying to ask how  he'd go about the MemStore 'problem' and I'm sure he didn't 
follow what I was on about (I was doing a crappy job explaining I'm sure),, but 
other than his usual adage of try everything and measure, he suggested just 
trying a sort on the fly... Are you thinking the same Matt?  So we'd keep 
around Cells and then once we had a batch or if after some nanos had elapsed, 
we'd do a merge sort w/ current set of in-memory edits and then put in place 
the new sorted 'in-memory-hfile' and up the mvcc read point so it was readable? 
 Once they got to a certain size we'd do like we do now with snapshot and start 
up a new foreground set of edits to merge into?


bq. ...and holding on to the WAL entries until they do go to disk

What you thinking here?  Would be good if the WAL system was not related to the 
MemStore system (though chatting w/ [~liyin] recently, he had an idea that 
would make the WAL sync more 'live' if WAL sync updated mvcc (mvcc and seqid 
being tied).

bq. Anoop, Ram, and I were throwing around ideas of making in-memory HFiles out 
of memstore snapshots

Would be sweet if the value at least was not on heap   Sounds like nice 
experiment Andrew.




 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-03 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918632#comment-13918632
 ] 

Matt Corgan commented on HBASE-10191:
-

{quote}creating small in-memory HFiles... – from a small CSLM that does the 
sort for us?{quote}yes, that is all i meant.  The CSLM would remain small 
because it gets flushed more often.  I don't doubt there are better ways to do 
it than the CSLM (like the deferred sorting you mention), but even just 
shrinking the size of the CSLM would be an improvement without having to 
re-think the memstore's concurrency mechanisms.

Let's say you have a 500MB memstore limit, and that encodes (not compresses) to 
100MB.  You could:
* split it into 10 stripes, each with ~50MB limit, and flush each of the 10 
stripes (to memory) individually
** you probably have a performance boost already because 10 50MB CSLMs is 
better than 1 500MB CSLM
* for a given stripe, flush the CSLM each time it reaches 25MB, which will spit 
out 5MB encoded memory hfile to the off-heap storage
* optionally compact a stripe's memory hfiles in the background to increase 
read performance
* when a stripe has 25MB CSLM + 5 encoded snapshots, flush/compact the whole 
thing to disk
* release the WAL entries for the stripe

On the WAL entries, i was just pointing out that you can no longer release the 
WAL entries when you flush the CSLM.  You have to hold on to the WAL entries 
until you flush the memory hfiles to disk.

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-02 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917352#comment-13917352
 ] 

Andrew Purtell commented on HBASE-10191:


bq. (Matt Corgan) I could see using an allocator based on huge on or off-heap 
slabs where smaller pages/blocks are referenced by reusable ByteRanges. The 
allocator could recycle memory by continuously picking the least utilized slab 
and copying (moving) its occupied ByteRanges to the slab at the head of the 
queue. This would provide constant compaction via fast sequential copying.

We could make the investment of writing our own slab allocator. Experiments 
with Netty 4 ByteBufs are in part about seeing if we can re-use open source in 
production already rather than redo the work. On the other hand, it could be a 
crucial component so maybe it's necessary to have complete control. Perhaps we 
can move additional comments on this sub-topic over to HBASE-10573?


 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-02 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917682#comment-13917682
 ] 

Matt Corgan commented on HBASE-10191:
-

{quote}How then to have KeyValues/Cells w/o calling them out as individual 
objects?   For MemStore, once we hit some upper bound – say 64k, 1M? – 
'flush' it to an inmemory, sorted, cellblock? Reading, we'd consult the (small) 
CSLM memstore and some tiering of cellblocks?{quote}I think there's been talk 
of this before, and it makes sense to me.  It's basically creating small 
in-memory HFiles that can be compacted several times in memory without going to 
disk, and holding on to the WAL entries until they do go to disk.  We'd get 
huge space savings from reduction in objects, references, and repetition via 
block encoding.  The problem is that if you have hundreds of 1MB in-memory 
HFiles, then it becomes too expensive to merge them all (via KVHeap) when 
scanning.  A possible solution is to subdivide the memstore into stripes 
(probably smaller than the stripe compaction stripes) and periodically compact 
the in-memory stripes.  It sounds complicated compared to the current memstore, 
but it's probably simpler than other parts of hbase because you don't have to 
deal with IOExceptions, retries, etc.

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-02 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917703#comment-13917703
 ] 

Andrew Purtell commented on HBASE-10191:


bq. The problem is that if you have hundreds of 1MB in-memory HFiles, then it 
becomes too expensive to merge them all (via KVHeap) when scanning. A possible 
solution is to subdivide the memstore into stripes (probably smaller than the 
stripe compaction stripes) and periodically compact the in-memory stripes

Anoop, Ram, and I were throwing around ideas of making in-memory HFiles out of 
memstore snapshots, and then doing in-memory compaction over them. If we have 
off-heap backing for memstore we could potentially carry larger datasets 
leading to less frequent flushes and significantly less write amplification 
overall. 

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-02-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13909235#comment-13909235
 ] 

stack commented on HBASE-10191:
---

(Good discussion going on here)

How then to have KeyValues/Cells w/o calling them out as individual objects?  
Keep cellblocks of KeyValues/Cells w/ a CellScanner to read over 64k blocks of 
them?   For MemStore, once we hit some upper bound -- say 64k, 1M? -- 'flush' 
it to an inmemory, sorted, cellblock?  Reading, we'd consult the (small) CSLM 
memstore and some tiering of cellblocks?

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-02-21 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13909242#comment-13909242
 ] 

Lars Hofhansl commented on HBASE-10191:
---

HBASE-5311 and HBASE-9440 have related discussion. If we're smart we can build 
all these things such that they work on- and off heap.

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-02-20 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907299#comment-13907299
 ] 

Vladimir Rodionov commented on HBASE-10191:
---

{quote}
We also want to consider addressing up 1 TB of usable memory without loading up 
cores with redundant work / multiple processes. 
{quote}
6TB of RAM. 
http://www.supermicro.nl/newsroom/pressreleases/2014/press140218_4U_4-Way.cfm

{quote}
Collection times are not a function of the heap size but rather of heap 
complexity, i.e. the number of objects to track 
{quote}

Heap compaction is a function of a heap size (at least in CMS).

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-02-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907889#comment-13907889
 ] 

Lars Hofhansl commented on HBASE-10191:
---

bq. Heap compaction is a function of a heap size (at least in CMS).

Not to start a long, tangential argument here... Last I looked CMS was 
non-compacting, and thus the only relevant metric is the number of objects to 
trace, not their size. A 100G heap with 1 objects is far easier to manage 
than a 100G heap with 100 million objects.


 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-02-20 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907925#comment-13907925
 ] 

Vladimir Rodionov commented on HBASE-10191:
---

Right, CMS is not compacting but, nevertheless, compaction happens from time to 
time (Full GC) and it is a function of a heap size.

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-02-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907967#comment-13907967
 ] 

Lars Hofhansl commented on HBASE-10191:
---

(not if all objects are of roughly the same size then you will never need a 
full GC)

In any case, nobody is arguing (at least I am not) that 1T or more (6T? Wow) 
should be managed off-heap with contemporary Hotspot JVMs. I'm looking forward 
to what Andrew and folks will produce here.


 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-02-20 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908002#comment-13908002
 ] 

Matt Corgan commented on HBASE-10191:
-

I hate to continue the tangent, but I'd add that even the occasional compaction 
that CMS triggers is dependent on how many objects need to be compacted.  It's 
because random access memory isn't as random anymore because there are 
enormous speed boosts when copying long swaths of sequential memory.  So 
compacting 100 1GB slabs should be far faster than compacting 1 billion 100B 
KeyValues that are scattered around the heap.  I also wonder if there's a slab 
size big enough that hotspot won't bother moving it during a compaction (but i 
have no idea).

Separately, one of the reasons Nick and I thought ByteRange should be an 
interface was that we could back it with varying implementations including 
arrays, HeapByteBuffers, DirectByteBuffers, netty ByteBufs, etc.  A utility 
similar to IOUtils.copy could help optimizing the copies between the different 
implementations.  Another advantage of using it as the primary interface is 
that its internal compareTo method uses hbase-friendly unsigned byte 
comparison, making it easy to put ByteRanges into traditional sorted 
collections like TreeSet/CSLM without passing an external comparator.

I could see using an allocator based on huge on or off-heap slabs where smaller 
pages/blocks are referenced by reusable ByteRanges.  The allocator could 
recycle memory by continuously picking the least utilized slab and copying 
(moving) its occupied ByteRanges to the slab at the head of the queue.  This 
would provide constant compaction via fast sequential copying.

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-02-19 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906456#comment-13906456
 ] 

Andrew Purtell commented on HBASE-10191:


I'm looking at Netty 4's netty-buffer module 
(http://netty.io/4.0/api/io/netty/buffer/package-summary.html), which has some 
nice properties, including composite buffers, arena allocation, dynamic buffer 
resizing, and reference counting, never mind dev and testing by another 
community. I also like it because you can plug in your own allocators and 
specialize the abstract ByteBuf base type. More on this later.

When I get closer to seeing what exactly needs to be done I will post a design 
doc. Current thinking follows. Below the term 'buffer' currently means Netty 
ByteBufs or derived classes backed by off-heap allocated direct buffers.

*Write*

When coming in from RPC, cells are laid out by codecs into cellbocks in buffers 
and the cellblocks/buffers are handed to the memstore. Netty's allocation 
arenas replace the MemstoreLAB. The memstore data structure evolves into an 
index over cellblocks.

Per [~mcorgan]'s comment above, we should think about how the memstore index 
can be built with fewer object allocations than the number of cells in the 
memstore, yet be in the ballpark with efficiency of concurrent access. A tall 
order. CSLM wouldn't be the right choice as it allocates at least one list 
entry per key, but we could punt and use it initially and make a replacement 
datastructure as a follow on task.

*Read*

We feed down buffers to HDFS to fill with file block data. We pick which pool 
to get a buffer from for a read depending on family caching strategy. Pools 
could be backed by arenas that match up with LRU policy strata, with a common 
pool/arena for noncaching reads. (Or for noncaching reads, can we optionally 
use a new API for getting buffers up from HDFS, perhaps backed by the pinned 
shared RAM cache, since we know we will be referring to the contents only 
briefly?) It will be important to get reference counting right as we will be 
servicing scans while attempting to evict. Related, eviction of a block may not 
immediately return a buffer to a pool, if there is more than one block in a 
buffer.

We maintain new metrics on numbers of buffers allocated, stats on arenas, stats 
on wastage and internal fragmentation of the buffers, etc, and use these to 
guide optimizations and refinements.

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-02-19 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906599#comment-13906599
 ] 

Lars Hofhansl commented on HBASE-10191:
---

This might not be very popular viewpoint these days, but anyway. My office 
neighbor used to work on a proprietary Java database, and he says they used 
128GB or even 192GB Java heaps and larger all the time without any significant 
GC impact.

(non moving) Collection times are not a function of the heap size but rather of 
heap complexity, i.e. the number of objects to track (HBase also produces a lot 
of garbage, but that is short lived and can be quickly collected by a moving 
collector for the young gen).
With memstoreLAB and the block cache HBase already does a good job on this. 
Even as is currently, if we fill an entire 128GB of heap with 64k blocks from 
the blockcache that would only be about 2m objects.
Now, if we want to forage into the  100ms latency area we need to rethink 
things, but then Java might just not be the right choice.

Before we embark on an all-or-nothing adventure and move everything out of the 
Java heap, we should also investigate whether we can make the GC's life easier, 
yet.


 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-02-19 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906612#comment-13906612
 ] 

Andrew Purtell commented on HBASE-10191:


I intend to prototype something so we don't have to argue supposition. 

Yes enabling sub 100 ms collections at 95th or 99th is an important 
consideration. We also want to consider addressing up 1 TB of usable memory 
without loading up cores with redundant work / multiple processes. 

Some GC overheads are a linear function of the heap size, at least for G1. 

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-02-19 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906682#comment-13906682
 ] 

Lars Hofhansl commented on HBASE-10191:
---

Yeah, was talking about CMS and definitely less than 1TB.

Please do not read my comment as criticism, this is very important work.
No doubt you can drive max latency down significantly by going off heap, at the 
same time are probably a lot of further improvement we make to current HBase in 
the heap allocation area.


 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2013-12-18 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851990#comment-13851990
 ] 

Vladimir Rodionov commented on HBASE-10191:
---

{quote}
It's abundantly clear once using heaps larger than ~8 GB that collection pauses 
under safepoint blow out latency SLAs at the high percentiles.
{quote}

What HBase version are you using? No bucket cache yet?

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Umbrella issue for moving large arena storage off heap.
 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2013-12-18 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852031#comment-13852031
 ] 

Andrew Purtell commented on HBASE-10191:


bq. What HBase version are you using? No bucket cache yet?

Trunk, what is now 0.98. 

As you point out above, serialization/deserialization costs limit the bucket 
cache, which is why I propose the goal of direct operation on allocations 
backed by off-heap memory. This has to be approached in stages. 

The bucket cache encourages looking at this approach. Although you'll see 
reduced throughput, it will smooth out the latency tail and allow the 
blockcache to address RAM without increasing heap size, which also helps smooth 
out the latency tail with respect to collection pause distribution. However, 
using large heaps e.g. 128+ GB mixed generation collections exceeding the 
ZooKeeper heartbeat timeout are inevitable under mixed read+write load, nothing 
mitigates that sufficiently that I have found. 

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Umbrella issue for moving large arena storage off heap.
 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2013-12-18 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852086#comment-13852086
 ] 

Vladimir Rodionov commented on HBASE-10191:
---

{quote}
The bucket cache encourages looking at this approach. Although you'll see 
reduced throughput, it will smooth out the latency tail and allow the 
blockcache to address RAM without increasing heap size, which also helps smooth 
out the latency tail with respect to collection pause distribution. However, 
using large heaps e.g. 128+ GB mixed generation collections exceeding the 
ZooKeeper heartbeat timeout are inevitable under mixed read+write load, nothing 
mitigates that sufficiently that I have found.
{quote}

It looks like you have done some bucket cache research and tests. Are there any 
numbers available? We are considering upgrading to 0.96 release and bucket 
cache is the major attraction for us. According to you, its not that usable or 
it does not give any performance advantage?  I really doubt, that 80GB on heap 
block cache is viable alternative to off heap cache in mixed read/write load 
scenario even in Java7 with G1. 

One thing to note: having serialization barrier has one huge advantage over 
direct off heap access. You can compress blocks in off heap. For our 
application compression ratio is close to 4. 

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Umbrella issue for moving large arena storage off heap.
 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2013-12-18 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852093#comment-13852093
 ] 

Andrew Purtell commented on HBASE-10191:


It is on my to do list to produce a technical report, but my time is quite 
constrained and that item is not close to the top of the list. As always, you 
should evaluate HBase using your application and environment. You may be quite 
happy with 0.96, with or without the bucket cache.

bq. having serialization barrier has one huge advantage over direct off heap 
access. You can compress blocks in off heap

That's a great point. I would actually like to operate on an encoded block 
representation from disk to socket. This is a trick in memory databases have 
been using for years, and will let us push through the memory wall, but that is 
several steps down a long road. The scope of this JIRA is described in the 
'Description' field above.

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Umbrella issue for moving large arena storage off heap.
 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2013-12-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852328#comment-13852328
 ] 

stack commented on HBASE-10191:
---

[~apurtell] There are a couple of off-heap experiments ongoing.  This JIRA 
covers memstore and blockcache allocations.  Seems like we need a larger 
umbrella issue than this allows?  If you agree I'll open one because would be 
useful be able to tie all effots.  Good on you [~apurtell]

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Umbrella issue for moving large arena storage off heap.
 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2013-12-18 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852333#comment-13852333
 ] 

Andrew Purtell commented on HBASE-10191:


If you want to reparent this somewhere that's fine with me [~stack]. We're 
going to start with memstore and blockcache (likely a unified pool) and go from 
there based on results. If there are other things going on would be good to put 
them all together so we can try to coordinate.

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2013-12-18 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852436#comment-13852436
 ] 

Matt Corgan commented on HBASE-10191:
-

Something to keep in mind is that GC pauses can be influenced as much or more 
by the number of live objects as they can by the raw size of the heap.  32GB of 
block cache could be made of only 1mm 32KB blocks.  This particular 32GB of 
memory may not stop the world for very long.  It's all the small remaining 
objects that are keeping the garbage collector busy, and I bet the biggest 
culprit here is the individual KeyValues in the memstores.

MemstoreLAB combines the backing arrays into big chunks to reduce heap 
fragmentation, but there is still one object per KeyValue, and each object 
needs to be considered by the collector.  A big heap has big memstores, which 
have lots of KeyValues - possibly far more than the 1mm blocks in the block 
cache.  A big advantage of flattening the memstores into blocks of key values 
is that you might be reducing ~500 KeyValues to a single block object.  This 
500x reduction in objects strikes me as a significant GC pause improvement that 
is independent from off-heap techniques.


Moving blocks off-heap and operating on them directly will be very cool.  
DataBlockEncoders should be able to read off-heap blocks similarly to how they 
do now, namely, copying only the modified bytes from the previous cell into an 
array buffer.  Vladimir makes a good point that it would be tough to match the 
scan performance of unencoded data, so that would need some thinking.


 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2013-12-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852645#comment-13852645
 ] 

stack commented on HBASE-10191:
---

If we supplied DFSClient our own DBB, then maybe we could read from dfs and put 
into an offheap blockcache w/o going over the heap (see HDFS-2834 
ByteBuffer-based read API for DFSInputStream)

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2013-12-17 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851145#comment-13851145
 ] 

Nick Dimiduk commented on HBASE-10191:
--

Apparently you're reading my mind :)

Nicely articulated [~apurtell]. I'd like to see a body of evidence that points 
to specific components which make meaningful sense for moving off-heap. 
Memstore and BlockCache are commonly cited as the offending components, but 
I've not seen anyone present conclusive profiling results making this clear. 
Nor is there clear advice regarding at what point a heap becomes too large. 
I've started work to track down some read data here on both of these points 
before pressing forward with recommendations.

See also [~nkeywal]'s recent profiling work reducing the GC burden imposed by 
the protobuf RPC implementation. This is an example where a major offender 
isn't on the above short-list. I am excited work toward and experiment with an 
entirely off-heap data flow, at least for the read path (HDFS - BlockCache - 
RPC send buffer)!

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Umbrella issue for moving large arena storage off heap.
 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2013-12-17 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851148#comment-13851148
 ] 

Andrew Purtell commented on HBASE-10191:


bq. Memstore and BlockCache are commonly cited as the offending components, but 
I've not seen anyone present conclusive profiling results making this clear

It's abundantly clear once using heaps larger than ~8 GB that collection pauses 
under safepoint blow out latency SLAs at the high percentiles. I've observed 
this directly under mixed read+write load. (Read-only loads work ok with G1 
even with very large heaps, e.g. 192 GB.) Why would we need heaps larger than 
this? To take direct advantage of large server RAM. Memstore and blockcache are 
then the largest allocators of heap memory. If we move them off heap, they can 
soak up most of the available RAM, leaving remaining heap demand relatively 
small - this is the idea.

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Umbrella issue for moving large arena storage off heap.
 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2013-12-17 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851154#comment-13851154
 ] 

Vladimir Rodionov commented on HBASE-10191:
---

This will require the whole data flow redesign in HBase. Currently, the minimum 
(and maximum) data exchange element in HBase's internal pipeline is KeyValue, 
which is heavy, on-heap (byte array backed) data structure. Moving data 
allocations to off-heap is a half a problem, another one is how to avoid 
copy-data-on-read and copy data on write (from/to off heap). Serialization is 
quite expensive. 

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Umbrella issue for moving large arena storage off heap.
 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)