[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944209#comment-13944209 ] Andrew Purtell commented on HBASE-10191: Just for documentary purposes at this point, since the implementation is early and has a long way to go, but RedHat recently announced ongoing work on a new GC called Shenandoah, with the stated goals Reduce GC pause times on extremely large heaps by doing evacuation work concurrently with Java threads and making pause times independent of heap size.. - JEP: http://openjdk.java.net/jeps/189 - Project: http://icedtea.classpath.org/shenandoah/ - Source: http://icedtea.classpath.org/hg/shenandoah Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934565#comment-13934565 ] Liyin Tang commented on HBASE-10191: Just curious, has anyone experienced any imbalance memory allocation among the NUMA nodes when allocating large off heap arena ? Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925530#comment-13925530 ] Yu Li commented on HBASE-10191: --- Hi [~mcorgan] and [~stack], I find you ever had a discussion long ago in HBASE-3484 ([here| https://issues.apache.org/jira/browse/HBASE-3484?focusedCommentId=13410934page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13410934]), but it seems no further progress since then. And [~mcorban] I find you have more detailed design thought now according to your above comment, so I'm wondering whether you have done some real work to implement this design? Or any plan? Actually I think the design you proposed is kind of different from the JIRA topic here or in HBASE-3484, since it's more like an in-memory-flush to reduce memory fragmentation rather than move off heap. I'm wondering whether it would be better to open another JIRA to make the discussion more explicit, while leaving the off heap discussion here? I've been watching this thread or say this topic for some while and now we've decided to do similar improvement to our online hbase service here, so I'd really like to work together with community to complete the design and implementation of the in-memory-flush stuff. :-) I'm totally new face here in this discussion, so please kindly forgive me if I've stated anything naive. :-) Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925606#comment-13925606 ] Anoop Sam John commented on HBASE-10191: [~carp84] I am working on this stuff of CellBlocks. (Yes in memory flushes) Coding wise mostly it is done and will do perf tests also. Some time I had worked in HBASE-3484 but later dropped. Ya here along with Off heap , the discussion of CellBlocks also came in. This can greatly reduce the issue we face today with CSLM (When there are too many KVs in it). We are parallely working on the Off heap stuff also. My code is like in a combined form now. Let me seperate it out. Also see HBASE-10648 which will allow us to have different MemStore impls. Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925855#comment-13925855 ] Yu Li commented on HBASE-10191: --- Hi [~anoop.hbase], Thanks for the info, really good to know the progress, I almost started to do the impl by myself. :-) It's also great to see the patch of making MemStore impls pluggable almost ready. {quote} My code is like in a combined form now. Let me seperate it out. {quote} I guess the code changes about CellBlocks would base on HBASE-10648? I searched but found no seperate JIRA for this CellBlocks impl, would you create one after separating the code out? Really cannot wait to take a look at it. :-) Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925850#comment-13925850 ] Matt Corgan commented on HBASE-10191: - [~carp84] you're right, flushing the memstore to memory is a separate issue than off-heap storage, but it's important to mention here so off-heap storage can be designed to support it. My comments about splitting the memstore into stripes could also be a separate issue since it's just an improvement that saves you some in-memory compaction work on non-uniform data distributions. Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925861#comment-13925861 ] Yu Li commented on HBASE-10191: --- Hi [~mcorgan], Got it, thanks for the explanation :-) Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925889#comment-13925889 ] Anoop Sam John commented on HBASE-10191: bq.would you create one HBASE-10713. Will come up with patch soon. Welcome ur suggestions. Pls keep all such discussions under this new jira issue Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920599#comment-13920599 ] ramkrishna.s.vasudevan commented on HBASE-10191: bq.Would be sweet if the value at least was not on heap Yes, this could be a nice. Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918407#comment-13918407 ] stack commented on HBASE-10191: --- [~mcorgan] bq. It's basically creating small in-memory HFiles that can be compacted several times in memory without going to disk, and holding on to the WAL entries until they do go to disk. Pardon dumb questions, creating small in-memory HFiles... -- from a small CSLM that does the sort for us? Or, I remember talking to Martin Thompson once trying to ask how he'd go about the MemStore 'problem' and I'm sure he didn't follow what I was on about (I was doing a crappy job explaining I'm sure),, but other than his usual adage of try everything and measure, he suggested just trying a sort on the fly... Are you thinking the same Matt? So we'd keep around Cells and then once we had a batch or if after some nanos had elapsed, we'd do a merge sort w/ current set of in-memory edits and then put in place the new sorted 'in-memory-hfile' and up the mvcc read point so it was readable? Once they got to a certain size we'd do like we do now with snapshot and start up a new foreground set of edits to merge into? bq. ...and holding on to the WAL entries until they do go to disk What you thinking here? Would be good if the WAL system was not related to the MemStore system (though chatting w/ [~liyin] recently, he had an idea that would make the WAL sync more 'live' if WAL sync updated mvcc (mvcc and seqid being tied). bq. Anoop, Ram, and I were throwing around ideas of making in-memory HFiles out of memstore snapshots Would be sweet if the value at least was not on heap Sounds like nice experiment Andrew. Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918632#comment-13918632 ] Matt Corgan commented on HBASE-10191: - {quote}creating small in-memory HFiles... – from a small CSLM that does the sort for us?{quote}yes, that is all i meant. The CSLM would remain small because it gets flushed more often. I don't doubt there are better ways to do it than the CSLM (like the deferred sorting you mention), but even just shrinking the size of the CSLM would be an improvement without having to re-think the memstore's concurrency mechanisms. Let's say you have a 500MB memstore limit, and that encodes (not compresses) to 100MB. You could: * split it into 10 stripes, each with ~50MB limit, and flush each of the 10 stripes (to memory) individually ** you probably have a performance boost already because 10 50MB CSLMs is better than 1 500MB CSLM * for a given stripe, flush the CSLM each time it reaches 25MB, which will spit out 5MB encoded memory hfile to the off-heap storage * optionally compact a stripe's memory hfiles in the background to increase read performance * when a stripe has 25MB CSLM + 5 encoded snapshots, flush/compact the whole thing to disk * release the WAL entries for the stripe On the WAL entries, i was just pointing out that you can no longer release the WAL entries when you flush the CSLM. You have to hold on to the WAL entries until you flush the memory hfiles to disk. Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917352#comment-13917352 ] Andrew Purtell commented on HBASE-10191: bq. (Matt Corgan) I could see using an allocator based on huge on or off-heap slabs where smaller pages/blocks are referenced by reusable ByteRanges. The allocator could recycle memory by continuously picking the least utilized slab and copying (moving) its occupied ByteRanges to the slab at the head of the queue. This would provide constant compaction via fast sequential copying. We could make the investment of writing our own slab allocator. Experiments with Netty 4 ByteBufs are in part about seeing if we can re-use open source in production already rather than redo the work. On the other hand, it could be a crucial component so maybe it's necessary to have complete control. Perhaps we can move additional comments on this sub-topic over to HBASE-10573? Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917682#comment-13917682 ] Matt Corgan commented on HBASE-10191: - {quote}How then to have KeyValues/Cells w/o calling them out as individual objects? For MemStore, once we hit some upper bound – say 64k, 1M? – 'flush' it to an inmemory, sorted, cellblock? Reading, we'd consult the (small) CSLM memstore and some tiering of cellblocks?{quote}I think there's been talk of this before, and it makes sense to me. It's basically creating small in-memory HFiles that can be compacted several times in memory without going to disk, and holding on to the WAL entries until they do go to disk. We'd get huge space savings from reduction in objects, references, and repetition via block encoding. The problem is that if you have hundreds of 1MB in-memory HFiles, then it becomes too expensive to merge them all (via KVHeap) when scanning. A possible solution is to subdivide the memstore into stripes (probably smaller than the stripe compaction stripes) and periodically compact the in-memory stripes. It sounds complicated compared to the current memstore, but it's probably simpler than other parts of hbase because you don't have to deal with IOExceptions, retries, etc. Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917703#comment-13917703 ] Andrew Purtell commented on HBASE-10191: bq. The problem is that if you have hundreds of 1MB in-memory HFiles, then it becomes too expensive to merge them all (via KVHeap) when scanning. A possible solution is to subdivide the memstore into stripes (probably smaller than the stripe compaction stripes) and periodically compact the in-memory stripes Anoop, Ram, and I were throwing around ideas of making in-memory HFiles out of memstore snapshots, and then doing in-memory compaction over them. If we have off-heap backing for memstore we could potentially carry larger datasets leading to less frequent flushes and significantly less write amplification overall. Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13909235#comment-13909235 ] stack commented on HBASE-10191: --- (Good discussion going on here) How then to have KeyValues/Cells w/o calling them out as individual objects? Keep cellblocks of KeyValues/Cells w/ a CellScanner to read over 64k blocks of them? For MemStore, once we hit some upper bound -- say 64k, 1M? -- 'flush' it to an inmemory, sorted, cellblock? Reading, we'd consult the (small) CSLM memstore and some tiering of cellblocks? Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13909242#comment-13909242 ] Lars Hofhansl commented on HBASE-10191: --- HBASE-5311 and HBASE-9440 have related discussion. If we're smart we can build all these things such that they work on- and off heap. Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907299#comment-13907299 ] Vladimir Rodionov commented on HBASE-10191: --- {quote} We also want to consider addressing up 1 TB of usable memory without loading up cores with redundant work / multiple processes. {quote} 6TB of RAM. http://www.supermicro.nl/newsroom/pressreleases/2014/press140218_4U_4-Way.cfm {quote} Collection times are not a function of the heap size but rather of heap complexity, i.e. the number of objects to track {quote} Heap compaction is a function of a heap size (at least in CMS). Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907889#comment-13907889 ] Lars Hofhansl commented on HBASE-10191: --- bq. Heap compaction is a function of a heap size (at least in CMS). Not to start a long, tangential argument here... Last I looked CMS was non-compacting, and thus the only relevant metric is the number of objects to trace, not their size. A 100G heap with 1 objects is far easier to manage than a 100G heap with 100 million objects. Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907925#comment-13907925 ] Vladimir Rodionov commented on HBASE-10191: --- Right, CMS is not compacting but, nevertheless, compaction happens from time to time (Full GC) and it is a function of a heap size. Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907967#comment-13907967 ] Lars Hofhansl commented on HBASE-10191: --- (not if all objects are of roughly the same size then you will never need a full GC) In any case, nobody is arguing (at least I am not) that 1T or more (6T? Wow) should be managed off-heap with contemporary Hotspot JVMs. I'm looking forward to what Andrew and folks will produce here. Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908002#comment-13908002 ] Matt Corgan commented on HBASE-10191: - I hate to continue the tangent, but I'd add that even the occasional compaction that CMS triggers is dependent on how many objects need to be compacted. It's because random access memory isn't as random anymore because there are enormous speed boosts when copying long swaths of sequential memory. So compacting 100 1GB slabs should be far faster than compacting 1 billion 100B KeyValues that are scattered around the heap. I also wonder if there's a slab size big enough that hotspot won't bother moving it during a compaction (but i have no idea). Separately, one of the reasons Nick and I thought ByteRange should be an interface was that we could back it with varying implementations including arrays, HeapByteBuffers, DirectByteBuffers, netty ByteBufs, etc. A utility similar to IOUtils.copy could help optimizing the copies between the different implementations. Another advantage of using it as the primary interface is that its internal compareTo method uses hbase-friendly unsigned byte comparison, making it easy to put ByteRanges into traditional sorted collections like TreeSet/CSLM without passing an external comparator. I could see using an allocator based on huge on or off-heap slabs where smaller pages/blocks are referenced by reusable ByteRanges. The allocator could recycle memory by continuously picking the least utilized slab and copying (moving) its occupied ByteRanges to the slab at the head of the queue. This would provide constant compaction via fast sequential copying. Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906456#comment-13906456 ] Andrew Purtell commented on HBASE-10191: I'm looking at Netty 4's netty-buffer module (http://netty.io/4.0/api/io/netty/buffer/package-summary.html), which has some nice properties, including composite buffers, arena allocation, dynamic buffer resizing, and reference counting, never mind dev and testing by another community. I also like it because you can plug in your own allocators and specialize the abstract ByteBuf base type. More on this later. When I get closer to seeing what exactly needs to be done I will post a design doc. Current thinking follows. Below the term 'buffer' currently means Netty ByteBufs or derived classes backed by off-heap allocated direct buffers. *Write* When coming in from RPC, cells are laid out by codecs into cellbocks in buffers and the cellblocks/buffers are handed to the memstore. Netty's allocation arenas replace the MemstoreLAB. The memstore data structure evolves into an index over cellblocks. Per [~mcorgan]'s comment above, we should think about how the memstore index can be built with fewer object allocations than the number of cells in the memstore, yet be in the ballpark with efficiency of concurrent access. A tall order. CSLM wouldn't be the right choice as it allocates at least one list entry per key, but we could punt and use it initially and make a replacement datastructure as a follow on task. *Read* We feed down buffers to HDFS to fill with file block data. We pick which pool to get a buffer from for a read depending on family caching strategy. Pools could be backed by arenas that match up with LRU policy strata, with a common pool/arena for noncaching reads. (Or for noncaching reads, can we optionally use a new API for getting buffers up from HDFS, perhaps backed by the pinned shared RAM cache, since we know we will be referring to the contents only briefly?) It will be important to get reference counting right as we will be servicing scans while attempting to evict. Related, eviction of a block may not immediately return a buffer to a pool, if there is more than one block in a buffer. We maintain new metrics on numbers of buffers allocated, stats on arenas, stats on wastage and internal fragmentation of the buffers, etc, and use these to guide optimizations and refinements. Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe.
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906599#comment-13906599 ] Lars Hofhansl commented on HBASE-10191: --- This might not be very popular viewpoint these days, but anyway. My office neighbor used to work on a proprietary Java database, and he says they used 128GB or even 192GB Java heaps and larger all the time without any significant GC impact. (non moving) Collection times are not a function of the heap size but rather of heap complexity, i.e. the number of objects to track (HBase also produces a lot of garbage, but that is short lived and can be quickly collected by a moving collector for the young gen). With memstoreLAB and the block cache HBase already does a good job on this. Even as is currently, if we fill an entire 128GB of heap with 64k blocks from the blockcache that would only be about 2m objects. Now, if we want to forage into the 100ms latency area we need to rethink things, but then Java might just not be the right choice. Before we embark on an all-or-nothing adventure and move everything out of the Java heap, we should also investigate whether we can make the GC's life easier, yet. Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906612#comment-13906612 ] Andrew Purtell commented on HBASE-10191: I intend to prototype something so we don't have to argue supposition. Yes enabling sub 100 ms collections at 95th or 99th is an important consideration. We also want to consider addressing up 1 TB of usable memory without loading up cores with redundant work / multiple processes. Some GC overheads are a linear function of the heap size, at least for G1. Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906682#comment-13906682 ] Lars Hofhansl commented on HBASE-10191: --- Yeah, was talking about CMS and definitely less than 1TB. Please do not read my comment as criticism, this is very important work. No doubt you can drive max latency down significantly by going off heap, at the same time are probably a lot of further improvement we make to current HBase in the heap allocation area. Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851990#comment-13851990 ] Vladimir Rodionov commented on HBASE-10191: --- {quote} It's abundantly clear once using heaps larger than ~8 GB that collection pauses under safepoint blow out latency SLAs at the high percentiles. {quote} What HBase version are you using? No bucket cache yet? Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Umbrella issue for moving large arena storage off heap. Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852031#comment-13852031 ] Andrew Purtell commented on HBASE-10191: bq. What HBase version are you using? No bucket cache yet? Trunk, what is now 0.98. As you point out above, serialization/deserialization costs limit the bucket cache, which is why I propose the goal of direct operation on allocations backed by off-heap memory. This has to be approached in stages. The bucket cache encourages looking at this approach. Although you'll see reduced throughput, it will smooth out the latency tail and allow the blockcache to address RAM without increasing heap size, which also helps smooth out the latency tail with respect to collection pause distribution. However, using large heaps e.g. 128+ GB mixed generation collections exceeding the ZooKeeper heartbeat timeout are inevitable under mixed read+write load, nothing mitigates that sufficiently that I have found. Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Umbrella issue for moving large arena storage off heap. Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852086#comment-13852086 ] Vladimir Rodionov commented on HBASE-10191: --- {quote} The bucket cache encourages looking at this approach. Although you'll see reduced throughput, it will smooth out the latency tail and allow the blockcache to address RAM without increasing heap size, which also helps smooth out the latency tail with respect to collection pause distribution. However, using large heaps e.g. 128+ GB mixed generation collections exceeding the ZooKeeper heartbeat timeout are inevitable under mixed read+write load, nothing mitigates that sufficiently that I have found. {quote} It looks like you have done some bucket cache research and tests. Are there any numbers available? We are considering upgrading to 0.96 release and bucket cache is the major attraction for us. According to you, its not that usable or it does not give any performance advantage? I really doubt, that 80GB on heap block cache is viable alternative to off heap cache in mixed read/write load scenario even in Java7 with G1. One thing to note: having serialization barrier has one huge advantage over direct off heap access. You can compress blocks in off heap. For our application compression ratio is close to 4. Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Umbrella issue for moving large arena storage off heap. Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852093#comment-13852093 ] Andrew Purtell commented on HBASE-10191: It is on my to do list to produce a technical report, but my time is quite constrained and that item is not close to the top of the list. As always, you should evaluate HBase using your application and environment. You may be quite happy with 0.96, with or without the bucket cache. bq. having serialization barrier has one huge advantage over direct off heap access. You can compress blocks in off heap That's a great point. I would actually like to operate on an encoded block representation from disk to socket. This is a trick in memory databases have been using for years, and will let us push through the memory wall, but that is several steps down a long road. The scope of this JIRA is described in the 'Description' field above. Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Umbrella issue for moving large arena storage off heap. Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852328#comment-13852328 ] stack commented on HBASE-10191: --- [~apurtell] There are a couple of off-heap experiments ongoing. This JIRA covers memstore and blockcache allocations. Seems like we need a larger umbrella issue than this allows? If you agree I'll open one because would be useful be able to tie all effots. Good on you [~apurtell] Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Umbrella issue for moving large arena storage off heap. Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852333#comment-13852333 ] Andrew Purtell commented on HBASE-10191: If you want to reparent this somewhere that's fine with me [~stack]. We're going to start with memstore and blockcache (likely a unified pool) and go from there based on results. If there are other things going on would be good to put them all together so we can try to coordinate. Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852436#comment-13852436 ] Matt Corgan commented on HBASE-10191: - Something to keep in mind is that GC pauses can be influenced as much or more by the number of live objects as they can by the raw size of the heap. 32GB of block cache could be made of only 1mm 32KB blocks. This particular 32GB of memory may not stop the world for very long. It's all the small remaining objects that are keeping the garbage collector busy, and I bet the biggest culprit here is the individual KeyValues in the memstores. MemstoreLAB combines the backing arrays into big chunks to reduce heap fragmentation, but there is still one object per KeyValue, and each object needs to be considered by the collector. A big heap has big memstores, which have lots of KeyValues - possibly far more than the 1mm blocks in the block cache. A big advantage of flattening the memstores into blocks of key values is that you might be reducing ~500 KeyValues to a single block object. This 500x reduction in objects strikes me as a significant GC pause improvement that is independent from off-heap techniques. Moving blocks off-heap and operating on them directly will be very cool. DataBlockEncoders should be able to read off-heap blocks similarly to how they do now, namely, copying only the modified bytes from the previous cell into an array buffer. Vladimir makes a good point that it would be tough to match the scan performance of unencoded data, so that would need some thinking. Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852645#comment-13852645 ] stack commented on HBASE-10191: --- If we supplied DFSClient our own DBB, then maybe we could read from dfs and put into an offheap blockcache w/o going over the heap (see HDFS-2834 ByteBuffer-based read API for DFSInputStream) Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851145#comment-13851145 ] Nick Dimiduk commented on HBASE-10191: -- Apparently you're reading my mind :) Nicely articulated [~apurtell]. I'd like to see a body of evidence that points to specific components which make meaningful sense for moving off-heap. Memstore and BlockCache are commonly cited as the offending components, but I've not seen anyone present conclusive profiling results making this clear. Nor is there clear advice regarding at what point a heap becomes too large. I've started work to track down some read data here on both of these points before pressing forward with recommendations. See also [~nkeywal]'s recent profiling work reducing the GC burden imposed by the protobuf RPC implementation. This is an example where a major offender isn't on the above short-list. I am excited work toward and experiment with an entirely off-heap data flow, at least for the read path (HDFS - BlockCache - RPC send buffer)! Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Umbrella issue for moving large arena storage off heap. Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851148#comment-13851148 ] Andrew Purtell commented on HBASE-10191: bq. Memstore and BlockCache are commonly cited as the offending components, but I've not seen anyone present conclusive profiling results making this clear It's abundantly clear once using heaps larger than ~8 GB that collection pauses under safepoint blow out latency SLAs at the high percentiles. I've observed this directly under mixed read+write load. (Read-only loads work ok with G1 even with very large heaps, e.g. 192 GB.) Why would we need heaps larger than this? To take direct advantage of large server RAM. Memstore and blockcache are then the largest allocators of heap memory. If we move them off heap, they can soak up most of the available RAM, leaving remaining heap demand relatively small - this is the idea. Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Umbrella issue for moving large arena storage off heap. Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851154#comment-13851154 ] Vladimir Rodionov commented on HBASE-10191: --- This will require the whole data flow redesign in HBase. Currently, the minimum (and maximum) data exchange element in HBase's internal pipeline is KeyValue, which is heavy, on-heap (byte array backed) data structure. Moving data allocations to off-heap is a half a problem, another one is how to avoid copy-data-on-read and copy data on write (from/to off heap). Serialization is quite expensive. Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Umbrella issue for moving large arena storage off heap. Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.1.4#6159)