subject:"\[jira\] \[Commented\] \(HBASE\-10191\) Move large arena storage off heap"

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-22 Thread Andrew Purtell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944209#comment-13944209
]

Andrew Purtell commented on HBASE-10191:

Just for documentary purposes at this point, since the implementation is early
and has a long way to go, but RedHat recently announced ongoing work on a new
GC called Shenandoah, with the stated goals Reduce GC pause times on extremely
large heaps by doing evacuation work concurrently with Java threads and making
pause times independent of heap size..
- JEP: http://openjdk.java.net/jeps/189
- Project: http://icedtea.classpath.org/shenandoah/
- Source: http://icedtea.classpath.org/hg/shenandoah

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

Even with the improved G1 GC in Java 7, Java processes that want to address
large regions of memory while also providing low high-percentile latencies
continue to be challenged. Fundamentally, a Java server process that has high
data throughput and also tight latency SLAs will be stymied by the fact that
the JVM does not provide a fully concurrent collector. There is simply not
enough throughput to copy data during GC under safepoint (all application
threads suspended) within available time bounds. This is increasingly an
issue for HBase users operating under dual pressures: 1. tight response SLAs,
2. the increasing amount of RAM available in commodity server
configurations, because GC load is roughly proportional to heap size.
We can address this using parallel strategies. We should talk with the Java
platform developer community about the possibility of a fully concurrent
collector appearing in OpenJDK somehow. Set aside the question of if this is
too little too late, if one becomes available the benefit will be immediate
though subject to qualification for production, and transparent in terms of
code changes. However in the meantime we need an answer for Java versions
already in production. This requires we move the large arena allocations off
heap, those being the blockcache and memstore. On other JIRAs recently there
has been related discussion about combining the blockcache and memstore
(HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is
related work. We should build off heap allocation for memstore and
blockcache, perhaps a unified pool for both, and plumb through zero copy
direct access to these allocations (via direct buffers) through the read and
write I/O paths. This may require the construction of classes that provide
object views over data contained within direct buffers. This is something
else we could talk with the Java platform developer community about - it
could be possible to provide language level object views over off heap
memory, on heap objects could hold references to objects backed by off heap
memory but not vice versa, maybe facilitated by new intrinsics in Unsafe.
Again we need an answer for today also. We should investigate what existing
libraries may be available in this regard. Key will be avoiding
marshalling/unmarshalling costs. At most we should be copying primitives out
of the direct buffers to register or stack locations until finally copying
data to construct protobuf Messages. A related issue there is HBASE-9794,
which proposes scatter-gather access to KeyValues when constructing RPC
messages. We should see how far we can get with that and also zero copy
construction of protobuf Messages backed by direct buffer allocations. Some
amount of native code may be required.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-13 Thread Liyin Tang (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934565#comment-13934565
]

Liyin Tang commented on HBASE-10191:

Just curious, has anyone experienced any imbalance memory allocation among the
NUMA nodes when allocating large off heap arena ?

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-10 Thread Yu Li (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925530#comment-13925530
]

Yu Li commented on HBASE-10191:
---

Hi [~mcorgan] and [~stack],

I find you ever had a discussion long ago in HBASE-3484 ([here|
https://issues.apache.org/jira/browse/HBASE-3484?focusedCommentId=13410934page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13410934]),
but it seems no further progress since then. And [~mcorban] I find you have
more detailed design thought now according to your above comment, so I'm
wondering whether you have done some real work to implement this design? Or any
plan?

Actually I think the design you proposed is kind of different from the JIRA
topic here or in HBASE-3484, since it's more like an in-memory-flush to reduce
memory fragmentation rather than move off heap. I'm wondering whether it
would be better to open another JIRA to make the discussion more explicit,
while leaving the off heap discussion here?

I've been watching this thread or say this topic for some while and now we've
decided to do similar improvement to our online hbase service here, so I'd
really like to work together with community to complete the design and
implementation of the in-memory-flush stuff. :-)

I'm totally new face here in this discussion, so please kindly forgive me if
I've stated anything naive. :-)

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-10 Thread Anoop Sam John (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925606#comment-13925606
]

Anoop Sam John commented on HBASE-10191:

[~carp84] I am working on this stuff of CellBlocks. (Yes in memory flushes)
Coding wise mostly it is done and will do perf tests also. Some time I had
worked in HBASE-3484 but later dropped. Ya here along with Off heap , the
discussion of CellBlocks also came in. This can greatly reduce the issue we
face today with CSLM (When there are too many KVs in it). We are parallely
working on the Off heap stuff also. My code is like in a combined form now. Let
me seperate it out. Also see HBASE-10648 which will allow us to have different
MemStore impls.

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-10 Thread Yu Li (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925855#comment-13925855
]

Yu Li commented on HBASE-10191:
---

Hi [~anoop.hbase],

Thanks for the info, really good to know the progress, I almost started to do
the impl by myself. :-) It's also great to see the patch of making MemStore
impls pluggable almost ready.
{quote}
My code is like in a combined form now. Let me seperate it out.
{quote}
I guess the code changes about CellBlocks would base on HBASE-10648? I searched
but found no seperate JIRA for this CellBlocks impl, would you create one after
separating the code out? Really cannot wait to take a look at it. :-)

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-10 Thread Matt Corgan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925850#comment-13925850
]

Matt Corgan commented on HBASE-10191:
-

[~carp84] you're right, flushing the memstore to memory is a separate issue
than off-heap storage, but it's important to mention here so off-heap storage
can be designed to support it. My comments about splitting the memstore into
stripes could also be a separate issue since it's just an improvement that
saves you some in-memory compaction work on non-uniform data distributions.

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-10 Thread Yu Li (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925861#comment-13925861
]

Yu Li commented on HBASE-10191:
---

Hi [~mcorgan],

Got it, thanks for the explanation :-)

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-10 Thread Anoop Sam John (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925889#comment-13925889
]

Anoop Sam John commented on HBASE-10191:

bq.would you create one
HBASE-10713. Will come up with patch soon. Welcome ur suggestions. Pls keep
all such discussions under this new jira issue

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-04 Thread ramkrishna.s.vasudevan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920599#comment-13920599
]

ramkrishna.s.vasudevan commented on HBASE-10191:

bq.Would be sweet if the value at least was not on heap
Yes, this could be a nice.

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-03 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918407#comment-13918407
]

stack commented on HBASE-10191:
---

[~mcorgan]

bq. It's basically creating small in-memory HFiles that can be compacted
several times in memory without going to disk, and holding on to the WAL
entries until they do go to disk.

Pardon dumb questions, creating small in-memory HFiles... -- from a small
CSLM that does the sort for us? Or, I remember talking to Martin Thompson once
trying to ask how he'd go about the MemStore 'problem' and I'm sure he didn't
follow what I was on about (I was doing a crappy job explaining I'm sure),, but
other than his usual adage of try everything and measure, he suggested just
trying a sort on the fly... Are you thinking the same Matt? So we'd keep
around Cells and then once we had a batch or if after some nanos had elapsed,
we'd do a merge sort w/ current set of in-memory edits and then put in place
the new sorted 'in-memory-hfile' and up the mvcc read point so it was readable?
Once they got to a certain size we'd do like we do now with snapshot and start
up a new foreground set of edits to merge into?

bq. ...and holding on to the WAL entries until they do go to disk

What you thinking here? Would be good if the WAL system was not related to the
MemStore system (though chatting w/ [~liyin] recently, he had an idea that
would make the WAL sync more 'live' if WAL sync updated mvcc (mvcc and seqid
being tied).

bq. Anoop, Ram, and I were throwing around ideas of making in-memory HFiles out
of memstore snapshots

Would be sweet if the value at least was not on heap Sounds like nice
experiment Andrew.

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-03 Thread Matt Corgan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918632#comment-13918632
]

Matt Corgan commented on HBASE-10191:
-

{quote}creating small in-memory HFiles... – from a small CSLM that does the
sort for us?{quote}yes, that is all i meant. The CSLM would remain small
because it gets flushed more often. I don't doubt there are better ways to do
it than the CSLM (like the deferred sorting you mention), but even just
shrinking the size of the CSLM would be an improvement without having to
re-think the memstore's concurrency mechanisms.

Let's say you have a 500MB memstore limit, and that encodes (not compresses) to
100MB. You could:
* split it into 10 stripes, each with ~50MB limit, and flush each of the 10
stripes (to memory) individually
** you probably have a performance boost already because 10 50MB CSLMs is
better than 1 500MB CSLM
* for a given stripe, flush the CSLM each time it reaches 25MB, which will spit
out 5MB encoded memory hfile to the off-heap storage
* optionally compact a stripe's memory hfiles in the background to increase
read performance
* when a stripe has 25MB CSLM + 5 encoded snapshots, flush/compact the whole
thing to disk
* release the WAL entries for the stripe

On the WAL entries, i was just pointing out that you can no longer release the
WAL entries when you flush the CSLM. You have to hold on to the WAL entries
until you flush the memory hfiles to disk.

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-02 Thread Andrew Purtell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917352#comment-13917352
]

Andrew Purtell commented on HBASE-10191:

bq. (Matt Corgan) I could see using an allocator based on huge on or off-heap
slabs where smaller pages/blocks are referenced by reusable ByteRanges. The
allocator could recycle memory by continuously picking the least utilized slab
and copying (moving) its occupied ByteRanges to the slab at the head of the
queue. This would provide constant compaction via fast sequential copying.

We could make the investment of writing our own slab allocator. Experiments
with Netty 4 ByteBufs are in part about seeing if we can re-use open source in
production already rather than redo the work. On the other hand, it could be a
crucial component so maybe it's necessary to have complete control. Perhaps we
can move additional comments on this sub-topic over to HBASE-10573?

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-02 Thread Matt Corgan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917682#comment-13917682
]

Matt Corgan commented on HBASE-10191:
-

{quote}How then to have KeyValues/Cells w/o calling them out as individual
objects? For MemStore, once we hit some upper bound – say 64k, 1M? –
'flush' it to an inmemory, sorted, cellblock? Reading, we'd consult the (small)
CSLM memstore and some tiering of cellblocks?{quote}I think there's been talk
of this before, and it makes sense to me. It's basically creating small
in-memory HFiles that can be compacted several times in memory without going to
disk, and holding on to the WAL entries until they do go to disk. We'd get
huge space savings from reduction in objects, references, and repetition via
block encoding. The problem is that if you have hundreds of 1MB in-memory
HFiles, then it becomes too expensive to merge them all (via KVHeap) when
scanning. A possible solution is to subdivide the memstore into stripes
(probably smaller than the stripe compaction stripes) and periodically compact
the in-memory stripes. It sounds complicated compared to the current memstore,
but it's probably simpler than other parts of hbase because you don't have to
deal with IOExceptions, retries, etc.

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-02 Thread Andrew Purtell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917703#comment-13917703
]

Andrew Purtell commented on HBASE-10191:

bq. The problem is that if you have hundreds of 1MB in-memory HFiles, then it
becomes too expensive to merge them all (via KVHeap) when scanning. A possible
solution is to subdivide the memstore into stripes (probably smaller than the
stripe compaction stripes) and periodically compact the in-memory stripes

Anoop, Ram, and I were throwing around ideas of making in-memory HFiles out of
memstore snapshots, and then doing in-memory compaction over them. If we have
off-heap backing for memstore we could potentially carry larger datasets
leading to less frequent flushes and significantly less write amplification
overall.

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-02-21 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13909235#comment-13909235
]

stack commented on HBASE-10191:
---

(Good discussion going on here)

How then to have KeyValues/Cells w/o calling them out as individual objects?
Keep cellblocks of KeyValues/Cells w/ a CellScanner to read over 64k blocks of
them? For MemStore, once we hit some upper bound -- say 64k, 1M? -- 'flush'
it to an inmemory, sorted, cellblock? Reading, we'd consult the (small) CSLM
memstore and some tiering of cellblocks?

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-02-21 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13909242#comment-13909242
]

Lars Hofhansl commented on HBASE-10191:
---

HBASE-5311 and HBASE-9440 have related discussion. If we're smart we can build
all these things such that they work on- and off heap.

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-02-20 Thread Vladimir Rodionov (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907299#comment-13907299
]

Vladimir Rodionov commented on HBASE-10191:
---

{quote}
We also want to consider addressing up 1 TB of usable memory without loading up
cores with redundant work / multiple processes.
{quote}
6TB of RAM.
http://www.supermicro.nl/newsroom/pressreleases/2014/press140218_4U_4-Way.cfm

{quote}
Collection times are not a function of the heap size but rather of heap
complexity, i.e. the number of objects to track
{quote}

Heap compaction is a function of a heap size (at least in CMS).

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-02-20 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907889#comment-13907889
]

Lars Hofhansl commented on HBASE-10191:
---

bq. Heap compaction is a function of a heap size (at least in CMS).

Not to start a long, tangential argument here... Last I looked CMS was
non-compacting, and thus the only relevant metric is the number of objects to
trace, not their size. A 100G heap with 1 objects is far easier to manage
than a 100G heap with 100 million objects.

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-02-20 Thread Vladimir Rodionov (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907925#comment-13907925
]

Vladimir Rodionov commented on HBASE-10191:
---

Right, CMS is not compacting but, nevertheless, compaction happens from time to
time (Full GC) and it is a function of a heap size.

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-02-20 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907967#comment-13907967
]

Lars Hofhansl commented on HBASE-10191:
---

(not if all objects are of roughly the same size then you will never need a
full GC)

In any case, nobody is arguing (at least I am not) that 1T or more (6T? Wow)
should be managed off-heap with contemporary Hotspot JVMs. I'm looking forward
to what Andrew and folks will produce here.

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-02-20 Thread Matt Corgan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908002#comment-13908002
]

Matt Corgan commented on HBASE-10191:
-

I hate to continue the tangent, but I'd add that even the occasional compaction
that CMS triggers is dependent on how many objects need to be compacted. It's
because random access memory isn't as random anymore because there are
enormous speed boosts when copying long swaths of sequential memory. So
compacting 100 1GB slabs should be far faster than compacting 1 billion 100B
KeyValues that are scattered around the heap. I also wonder if there's a slab
size big enough that hotspot won't bother moving it during a compaction (but i
have no idea).

Separately, one of the reasons Nick and I thought ByteRange should be an
interface was that we could back it with varying implementations including
arrays, HeapByteBuffers, DirectByteBuffers, netty ByteBufs, etc. A utility
similar to IOUtils.copy could help optimizing the copies between the different
implementations. Another advantage of using it as the primary interface is
that its internal compareTo method uses hbase-friendly unsigned byte
comparison, making it easy to put ByteRanges into traditional sorted
collections like TreeSet/CSLM without passing an external comparator.

I could see using an allocator based on huge on or off-heap slabs where smaller
pages/blocks are referenced by reusable ByteRanges. The allocator could
recycle memory by continuously picking the least utilized slab and copying
(moving) its occupied ByteRanges to the slab at the head of the queue. This
would provide constant compaction via fast sequential copying.

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-02-19 Thread Andrew Purtell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906456#comment-13906456
]

Andrew Purtell commented on HBASE-10191:

I'm looking at Netty 4's netty-buffer module
(http://netty.io/4.0/api/io/netty/buffer/package-summary.html), which has some
nice properties, including composite buffers, arena allocation, dynamic buffer
resizing, and reference counting, never mind dev and testing by another
community. I also like it because you can plug in your own allocators and
specialize the abstract ByteBuf base type. More on this later.

When I get closer to seeing what exactly needs to be done I will post a design
doc. Current thinking follows. Below the term 'buffer' currently means Netty
ByteBufs or derived classes backed by off-heap allocated direct buffers.

*Write*

When coming in from RPC, cells are laid out by codecs into cellbocks in buffers
and the cellblocks/buffers are handed to the memstore. Netty's allocation
arenas replace the MemstoreLAB. The memstore data structure evolves into an
index over cellblocks.

Per [~mcorgan]'s comment above, we should think about how the memstore index
can be built with fewer object allocations than the number of cells in the
memstore, yet be in the ballpark with efficiency of concurrent access. A tall
order. CSLM wouldn't be the right choice as it allocates at least one list
entry per key, but we could punt and use it initially and make a replacement
datastructure as a follow on task.

*Read*

We feed down buffers to HDFS to fill with file block data. We pick which pool
to get a buffer from for a read depending on family caching strategy. Pools
could be backed by arenas that match up with LRU policy strata, with a common
pool/arena for noncaching reads. (Or for noncaching reads, can we optionally
use a new API for getting buffers up from HDFS, perhaps backed by the pinned
shared RAM cache, since we know we will be referring to the contents only
briefly?) It will be important to get reference counting right as we will be
servicing scans while attempting to evict. Related, eviction of a block may not
immediately return a buffer to a pool, if there is more than one block in a
buffer.

We maintain new metrics on numbers of buffers allocated, stats on arenas, stats
on wastage and internal fragmentation of the buffers, etc, and use these to
guide optimizations and refinements.

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-02-19 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906599#comment-13906599
]

Lars Hofhansl commented on HBASE-10191:
---

This might not be very popular viewpoint these days, but anyway. My office
neighbor used to work on a proprietary Java database, and he says they used
128GB or even 192GB Java heaps and larger all the time without any significant
GC impact.

(non moving) Collection times are not a function of the heap size but rather of
heap complexity, i.e. the number of objects to track (HBase also produces a lot
of garbage, but that is short lived and can be quickly collected by a moving
collector for the young gen).
With memstoreLAB and the block cache HBase already does a good job on this.
Even as is currently, if we fill an entire 128GB of heap with 64k blocks from
the blockcache that would only be about 2m objects.
Now, if we want to forage into the 100ms latency area we need to rethink
things, but then Java might just not be the right choice.

Before we embark on an all-or-nothing adventure and move everything out of the
Java heap, we should also investigate whether we can make the GC's life easier,
yet.

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-02-19 Thread Andrew Purtell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906612#comment-13906612
]

Andrew Purtell commented on HBASE-10191:

I intend to prototype something so we don't have to argue supposition.

Yes enabling sub 100 ms collections at 95th or 99th is an important
consideration. We also want to consider addressing up 1 TB of usable memory
without loading up cores with redundant work / multiple processes.

Some GC overheads are a linear function of the heap size, at least for G1.

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-02-19 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906682#comment-13906682
]

Lars Hofhansl commented on HBASE-10191:
---

Yeah, was talking about CMS and definitely less than 1TB.

Please do not read my comment as criticism, this is very important work.
No doubt you can drive max latency down significantly by going off heap, at the
same time are probably a lot of further improvement we make to current HBase in
the heap allocation area.

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2013-12-18 Thread Vladimir Rodionov (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851990#comment-13851990
]

Vladimir Rodionov commented on HBASE-10191:
---

{quote}
It's abundantly clear once using heaps larger than ~8 GB that collection pauses
under safepoint blow out latency SLAs at the high percentiles.
{quote}

What HBase version are you using? No bucket cache yet?

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

Umbrella issue for moving large arena storage off heap.
Even with the improved G1 GC in Java 7, Java processes that want to address
large regions of memory while also providing low high-percentile latencies
continue to be challenged. Fundamentally, a Java server process that has high
data throughput and also tight latency SLAs will be stymied by the fact that
the JVM does not provide a fully concurrent collector. There is simply not
enough throughput to copy data during GC under safepoint (all application
threads suspended) within available time bounds. This is increasingly an
issue for HBase users operating under dual pressures: 1. tight response SLAs,
2. the increasing amount of RAM available in commodity server
configurations, because GC load is roughly proportional to heap size.
We can address this using parallel strategies. We should talk with the Java
platform developer community about the possibility of a fully concurrent
collector appearing in OpenJDK somehow. Set aside the question of if this is
too little too late, if one becomes available the benefit will be immediate
though subject to qualification for production, and transparent in terms of
code changes. However in the meantime we need an answer for Java versions
already in production. This requires we move the large arena allocations off
heap, those being the blockcache and memstore. On other JIRAs recently there
has been related discussion about combining the blockcache and memstore
(HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is
related work. We should build off heap allocation for memstore and
blockcache, perhaps a unified pool for both, and plumb through zero copy
direct access to these allocations (via direct buffers) through the read and
write I/O paths. This may require the construction of classes that provide
object views over data contained within direct buffers. This is something
else we could talk with the Java platform developer community about - it
could be possible to provide language level object views over off heap
memory, on heap objects could hold references to objects backed by off heap
memory but not vice versa, maybe facilitated by new intrinsics in Unsafe.
Again we need an answer for today also. We should investigate what existing
libraries may be available in this regard. Key will be avoiding
marshalling/unmarshalling costs. At most we should be copying primitives out
of the direct buffers to register or stack locations until finally copying
data to construct protobuf Messages. A related issue there is HBASE-9794,
which proposes scatter-gather access to KeyValues when constructing RPC
messages. We should see how far we can get with that and also zero copy
construction of protobuf Messages backed by direct buffer allocations. Some
amount of native code may be required.

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2013-12-18 Thread Andrew Purtell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852031#comment-13852031
]

Andrew Purtell commented on HBASE-10191:

bq. What HBase version are you using? No bucket cache yet?

Trunk, what is now 0.98.

As you point out above, serialization/deserialization costs limit the bucket
cache, which is why I propose the goal of direct operation on allocations
backed by off-heap memory. This has to be approached in stages.

The bucket cache encourages looking at this approach. Although you'll see
reduced throughput, it will smooth out the latency tail and allow the
blockcache to address RAM without increasing heap size, which also helps smooth
out the latency tail with respect to collection pause distribution. However,
using large heaps e.g. 128+ GB mixed generation collections exceeding the
ZooKeeper heartbeat timeout are inevitable under mixed read+write load, nothing
mitigates that sufficiently that I have found.

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2013-12-18 Thread Vladimir Rodionov (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852086#comment-13852086
]

Vladimir Rodionov commented on HBASE-10191:
---

{quote}
The bucket cache encourages looking at this approach. Although you'll see
reduced throughput, it will smooth out the latency tail and allow the
blockcache to address RAM without increasing heap size, which also helps smooth
out the latency tail with respect to collection pause distribution. However,
using large heaps e.g. 128+ GB mixed generation collections exceeding the
ZooKeeper heartbeat timeout are inevitable under mixed read+write load, nothing
mitigates that sufficiently that I have found.
{quote}

It looks like you have done some bucket cache research and tests. Are there any
numbers available? We are considering upgrading to 0.96 release and bucket
cache is the major attraction for us. According to you, its not that usable or
it does not give any performance advantage? I really doubt, that 80GB on heap
block cache is viable alternative to off heap cache in mixed read/write load
scenario even in Java7 with G1.

One thing to note: having serialization barrier has one huge advantage over
direct off heap access. You can compress blocks in off heap. For our
application compression ratio is close to 4.

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2013-12-18 Thread Andrew Purtell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852093#comment-13852093
]

Andrew Purtell commented on HBASE-10191:

It is on my to do list to produce a technical report, but my time is quite
constrained and that item is not close to the top of the list. As always, you
should evaluate HBase using your application and environment. You may be quite
happy with 0.96, with or without the bucket cache.

bq. having serialization barrier has one huge advantage over direct off heap
access. You can compress blocks in off heap

That's a great point. I would actually like to operate on an encoded block
representation from disk to socket. This is a trick in memory databases have
been using for years, and will let us push through the memory wall, but that is
several steps down a long road. The scope of this JIRA is described in the
'Description' field above.

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2013-12-18 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852328#comment-13852328
]

stack commented on HBASE-10191:
---

[~apurtell] There are a couple of off-heap experiments ongoing. This JIRA
covers memstore and blockcache allocations. Seems like we need a larger
umbrella issue than this allows? If you agree I'll open one because would be
useful be able to tie all effots. Good on you [~apurtell]

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2013-12-18 Thread Andrew Purtell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852333#comment-13852333
]

Andrew Purtell commented on HBASE-10191:

If you want to reparent this somewhere that's fine with me [~stack]. We're
going to start with memstore and blockcache (likely a unified pool) and go from
there based on results. If there are other things going on would be good to put
them all together so we can try to coordinate.

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2013-12-18 Thread Matt Corgan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852436#comment-13852436
]

Matt Corgan commented on HBASE-10191:
-

Something to keep in mind is that GC pauses can be influenced as much or more
by the number of live objects as they can by the raw size of the heap. 32GB of
block cache could be made of only 1mm 32KB blocks. This particular 32GB of
memory may not stop the world for very long. It's all the small remaining
objects that are keeping the garbage collector busy, and I bet the biggest
culprit here is the individual KeyValues in the memstores.

MemstoreLAB combines the backing arrays into big chunks to reduce heap
fragmentation, but there is still one object per KeyValue, and each object
needs to be considered by the collector. A big heap has big memstores, which
have lots of KeyValues - possibly far more than the 1mm blocks in the block
cache. A big advantage of flattening the memstores into blocks of key values
is that you might be reducing ~500 KeyValues to a single block object. This
500x reduction in objects strikes me as a significant GC pause improvement that
is independent from off-heap techniques.

Moving blocks off-heap and operating on them directly will be very cool.
DataBlockEncoders should be able to read off-heap blocks similarly to how they
do now, namely, copying only the modified bytes from the previous cell into an
array buffer. Vladimir makes a good point that it would be tough to match the
scan performance of unencoded data, so that would need some thinking.

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2013-12-18 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852645#comment-13852645
]

stack commented on HBASE-10191:
---

If we supplied DFSClient our own DBB, then maybe we could read from dfs and put
into an offheap blockcache w/o going over the heap (see HDFS-2834
ByteBuffer-based read API for DFSInputStream)

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2013-12-17 Thread Nick Dimiduk (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851145#comment-13851145
]

Nick Dimiduk commented on HBASE-10191:
--

Apparently you're reading my mind :)

Nicely articulated [~apurtell]. I'd like to see a body of evidence that points
to specific components which make meaningful sense for moving off-heap.
Memstore and BlockCache are commonly cited as the offending components, but
I've not seen anyone present conclusive profiling results making this clear.
Nor is there clear advice regarding at what point a heap becomes too large.
I've started work to track down some read data here on both of these points
before pressing forward with recommendations.

See also [~nkeywal]'s recent profiling work reducing the GC burden imposed by
the protobuf RPC implementation. This is an example where a major offender
isn't on the above short-list. I am excited work toward and experiment with an
entirely off-heap data flow, at least for the read path (HDFS - BlockCache -
RPC send buffer)!

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2013-12-17 Thread Andrew Purtell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851148#comment-13851148
]

Andrew Purtell commented on HBASE-10191:

bq. Memstore and BlockCache are commonly cited as the offending components, but
I've not seen anyone present conclusive profiling results making this clear

It's abundantly clear once using heaps larger than ~8 GB that collection pauses
under safepoint blow out latency SLAs at the high percentiles. I've observed
this directly under mixed read+write load. (Read-only loads work ok with G1
even with very large heaps, e.g. 192 GB.) Why would we need heaps larger than
this? To take direct advantage of large server RAM. Memstore and blockcache are
then the largest allocators of heap memory. If we move them off heap, they can
soak up most of the available RAM, leaving remaining heap demand relatively
small - this is the idea.

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2013-12-17 Thread Vladimir Rodionov (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851154#comment-13851154
]

Vladimir Rodionov commented on HBASE-10191:
---

This will require the whole data flow redesign in HBase. Currently, the minimum
(and maximum) data exchange element in HBase's internal pipeline is KeyValue,
which is heavy, on-heap (byte array backed) data structure. Moving data
allocations to off-heap is a half a problem, another one is how to avoid
copy-data-on-read and copy data on write (from/to off heap). Serialization is
quite expensive.

Move large arena storage off heap
-

Key: HBASE-10191
URL: https://issues.apache.org/jira/browse/HBASE-10191
Project: HBase
Issue Type: Umbrella
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

[jira] [Commented] (HBASE-10191) Move large arena storage off heap

36 matches

Site Navigation

Mail list logo

Footer information