[jira] [Created] (HBASE-20542) Better heap utilization for IMC with MSLABs
Eshcar Hillel created HBASE-20542: - Summary: Better heap utilization for IMC with MSLABs Key: HBASE-20542 URL: https://issues.apache.org/jira/browse/HBASE-20542 Project: HBase Issue Type: Task Reporter: Eshcar Hillel Following HBASE-20188 we realized in-memory compaction combined with MSLABs may suffer from heap under-utilization due to internal fragmentation. This jira presents a solution to circumvent this problem. The main idea is to have each update operation check if it will cause overflow in the active segment *before* it is writing the new value (instead of checking the size after the write is completed), and if it is then the active segment is atomically swapped with a new empty segment, and is pushed (full-yet-not-overflowed) to the compaction pipeline. Later on the IMC deamon will run its compaction operation (flatten index/merge indices/data compaction) in the background. Some subtle concurrency issues should be handled with care. We next elaborate on them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [ANNOUNCE] Please welcome Francis Liu to the HBase PMC
Congrats Francis and good luck! On Wednesday, April 11, 2018, 11:04:17 PM GMT+3, Andrew Purtellwrote: On behalf of the Apache HBase PMC I am pleased to announce that Francis Liu has accepted our invitation to become a PMC member on the Apache HBase project. We appreciate Francis stepping up to take more responsibility in the HBase project. He has been an active contributor to HBase for many years and recently took over responsibilities as branch RM for branch-1.3. Please join me in welcoming Francis to the HBase PMC! -- Best regards, Andrew
[jira] [Created] (HBASE-20390) IMC Default Parameters for 2.0.0
Eshcar Hillel created HBASE-20390: - Summary: IMC Default Parameters for 2.0.0 Key: HBASE-20390 URL: https://issues.apache.org/jira/browse/HBASE-20390 Project: HBase Issue Type: Task Reporter: Eshcar Hillel Setting new default parameters for in-memory compaction based on performance tests done in HBASE-20188 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Questions about synchronization in compaction pipeline
I agree with Chia-Ping Tsai, read-only copy can be changed from LL to array list plus replacing getLast with get(size())However I am dubious about any of the suggested changes having any affect on the system performance.Would be very interesting to see the workload that can differentiate between these two implementations. On Sunday, March 11, 2018, 4:33:05 PM GMT+2, Chia-Ping Tsaiwrote: > LL copy is fairly slow and likely loses us any gains. Also, I'm a little > dubious on the use of LL given that we support a replaceAtIndex which will > be much faster in an array. > > Can we improve by using an ArrayDeque? Does ArrayDeque support to change element by arbitrary index? If not, it not easy to change the impl by one line since the replaceAtIndex() need to change the element in pipeline. BTW, could we change the impl of "readOnlyCopy" from LinkedList to ArrayList? Most ops to readOnlyCopy are iteration, and the getLast can be replaced by size() and get(index). On 2018/03/10 20:12:01, Mike Drob wrote: > Hi devs, > > I was reading through HBASE-17434 trying to understand why we have two > linked lists in compaction pipeline and I'm having trouble following the > conversation there, especially since it seems intertwined with HBASE-17379 > and jumps back and forth a few times. > > It looks like we are implementing our own copy-on-write list, and there is > a claim that addFirst is faster on a LinkedList than an array based list. I > am concerned about the LL copy in pushHead - even if addFirst is faster, a > LL copy is fairly slow and likely loses us any gains. Also, I'm a little > dubious on the use of LL given that we support a replaceAtIndex which will > be much faster in an array. > > Can we improve by using an ArrayDeque? > > Eschar, Anastasia, WDYT? > > Thanks, > Mike > > Some observations about performance - > https://stuartmarks.wordpress.com/2015/12/18/some-java-list-benchmarks/ >
Re: [DISCUSS] Performance degradation in master (compared to 2-alpha-1)
I'm currently using my cluster for other purposes.When it is available I can run the same experiment against alpha-3. On Tuesday, November 28, 2017, 7:17:08 PM GMT+2, Mike Drob <md...@apache.org> wrote: Eshcar - do you have time to try the other alpha releases and see where exactly we introduced the regressions? Also, I'm worried that the performance regression may be related to an important bug-fix, where before we may have had fast writes but also risked incorrect behavior somehow. Mike On Tue, Nov 28, 2017 at 2:48 AM, Eshcar Hillel <esh...@oath.com.invalid> wrote: > I agree, so will wait till we focus on performance. > Just one more update, I also ran the same experiment (write-only) with > banch-2 beta-1.Here is a summary of the throughput I see in each tag/branch: > --- | BASIC | NONE | > ---2-alpha-1| 110K | 80K | > 2-beta-1 | 81K | 62K | > master | 60K | 55K |--- > This means there are multiple sources for the regression. > > Thanks > > On Saturday, November 25, 2017, 7:44:01 AM GMT+2, 张铎(Duo Zhang) < > palomino...@gmail.com> wrote: > > I think first we need a release plan on when we will begin to focus on the > performance issue? > > I do not think it is a good time to focus on performance issue now as we > haven’t stabilized our build yet. The performance regression may come back > again after some bug fixes and maybe we use a wrong way to increase > performance and finally we find that it is just a bug... > > Of course I do not mean we can not do any performance related issues now, > for example, HBASE-19338 is a good catch and can be fixed right now. > > And also, for AsyncFSWAL and in memory compaction, we need to consider the > performance right now as they are born for performance, but let’s focus on > the comparison to other policies, not a previous release so we can find the > correct things to fix. > > Of course, if there is a big performance downgrading comparing to the > previous release and we find it then we should tell others, just like this > email. An earlier notification is always welcomed. > > Thanks. > > Stack <st...@duboce.net>于2017年11月25日 周六13:22写道: > > > On Thu, Nov 23, 2017 at 7:35 AM, Eshcar Hillel <esh...@oath.com.invalid> > > wrote: > > > > > Happy Thanksgiving all, > > > > > > > And to you Eshcar. > > > > > > > > > In recent benchmarks I ran in HBASE-18294 I discovered major > performance > > > degradation of master code w.r.t 2-alpha-1 code.I am running write-only > > > workload (similar to the one reported in HBASE-16417). I am using the > > same > > > hardware and same configuration settings (specifically, I testes both > > basic > > > memstore compaction with optimal parameters, and no memsore > > > compaction).While in 2-alpha-1 code I see throughput of ~110Kops for > > basic > > > compaction and ~80Kops for no compaction, in the master code I get only > > > 60Kops and 55Kops, respectively. *This is almost 50% reduction in > > > performance*. > > > (1) Did anyone else noticed such degradation?(2) Do we have any > > systematic > > > automatic/semi-automatic method to track the sources of this > performance > > > issue? > > > Thanks,Eshcar > > > > > > > > > On #1, no. I've not done perf compare. I wonder if later alpha versions > > include the regression (I'll have to check and see). > > > > On #2, again no. I intend to do a bit of perf tuning and compare before > > release. > > > > If you don't file an issue, I will do so later for myself as a task to > > compare at least to alpha-1. > > > > Thanks Eshcar, > > > > St.Ack > > >
Re: [DISCUSS] Performance degradation in master (compared to 2-alpha-1)
I agree, so will wait till we focus on performance. Just one more update, I also ran the same experiment (write-only) with banch-2 beta-1.Here is a summary of the throughput I see in each tag/branch: --- | BASIC | NONE | ---2-alpha-1| 110K | 80K | 2-beta-1 | 81K | 62K | master | 60K | 55K |--- This means there are multiple sources for the regression. Thanks On Saturday, November 25, 2017, 7:44:01 AM GMT+2, 张铎(Duo Zhang) <palomino...@gmail.com> wrote: I think first we need a release plan on when we will begin to focus on the performance issue? I do not think it is a good time to focus on performance issue now as we haven’t stabilized our build yet. The performance regression may come back again after some bug fixes and maybe we use a wrong way to increase performance and finally we find that it is just a bug... Of course I do not mean we can not do any performance related issues now, for example, HBASE-19338 is a good catch and can be fixed right now. And also, for AsyncFSWAL and in memory compaction, we need to consider the performance right now as they are born for performance, but let’s focus on the comparison to other policies, not a previous release so we can find the correct things to fix. Of course, if there is a big performance downgrading comparing to the previous release and we find it then we should tell others, just like this email. An earlier notification is always welcomed. Thanks. Stack <st...@duboce.net>于2017年11月25日 周六13:22写道: > On Thu, Nov 23, 2017 at 7:35 AM, Eshcar Hillel <esh...@oath.com.invalid> > wrote: > > > Happy Thanksgiving all, > > > > And to you Eshcar. > > > > > In recent benchmarks I ran in HBASE-18294 I discovered major performance > > degradation of master code w.r.t 2-alpha-1 code.I am running write-only > > workload (similar to the one reported in HBASE-16417). I am using the > same > > hardware and same configuration settings (specifically, I testes both > basic > > memstore compaction with optimal parameters, and no memsore > > compaction).While in 2-alpha-1 code I see throughput of ~110Kops for > basic > > compaction and ~80Kops for no compaction, in the master code I get only > > 60Kops and 55Kops, respectively. *This is almost 50% reduction in > > performance*. > > (1) Did anyone else noticed such degradation?(2) Do we have any > systematic > > automatic/semi-automatic method to track the sources of this performance > > issue? > > Thanks,Eshcar > > > > > On #1, no. I've not done perf compare. I wonder if later alpha versions > include the regression (I'll have to check and see). > > On #2, again no. I intend to do a bit of perf tuning and compare before > release. > > If you don't file an issue, I will do so later for myself as a task to > compare at least to alpha-1. > > Thanks Eshcar, > > St.Ack >
Re: [DISCUSS] Should flush decisions be made based on data size (key-value only) or based on heap size (including metadata overhead)?
We can change the AtomicLong to be AtomicReference which is updated atomically.MemStoreSize can be changed to hold on-heap memory, off-heap memory and data size, or just on-heap memory and off-heap memory if data size is not required anymore. On Monday, August 7, 2017, 10:23:12 AM GMT+3, Anoop John <anoop.hb...@gmail.com> wrote: Sorry for being later to reply. So u mean we should track both sizes even at Region level? This was considered at that time but did not do as that will add more overhead. We have to deal with 2 AtomicLongs in every Region. Right now we handle this double check at RS level only so that added just one more variable dealing. -Anoop- On Mon, Jul 10, 2017 at 7:34 PM, Eshcar Hillel <esh...@yahoo-inc.com.invalid> wrote: > Here is a suggestion:We can track both heap and off-heap sizes and have 2 > thresholds one for limiting heap size and one for limiting off-heap size.And > in all decision making junctions we check whether one of the thresholds is > exceeded and if it is we trigger a flush. We can choose which entity to flush > based on the cause.For example, if we decided to flush since the heap size > exceeds the heap threshold than we flush the region/store with greatest heap > size. and likewise for off-heap flush. > > I can prepare a patch. > > This is not rolling back HBASE-18294 simply refining it to have different > decision making for the on and off heap cases. > > On Monday, July 10, 2017, 8:25:12 AM GMT+3, Anoop John > <anoop.hb...@gmail.com> wrote: > > Stack and others.. > We wont do any OOM or FullGC issues. Because globally at RS level we > will track both the data size (of all the memstores) and the heap > size. The decision there accounts both. In fact in case of normal on > heap memstores, the accounting is like the old way of heap size based. > > At region level (and at Segments level) we track data size only. The > decisions are based on data size. > > So in the past region flush size of 128 MB means we will flush when > the heap size of that region crosses 128 MB. But now it is data size > alone. What I feel is that is more inclined to a normal user > thinking. He say flush size of 128 MB and then the thinking can be > 128 MB of data. > > The background of this change is the off heap memstores where we need > separate tracking of both data and heap overhead sizes. But at > region level this behave change was done thinking that is more user > oriented > > I agree with Yu that it is a surprising behave change. Ya if not tuned > accordingly one might see more blocked writes. Because the per region > flushes are more delayed now and so chances of reaching the global > memstore upper barrier chances are more. And then we will block > writes and force flushes. (But off heap memstores will do better job > here). But this would NOT cause any OOME or FullGC. > > I guess we should have reduced the 128 MB default flush size then? I > asked this Q in that jira and then we did not discuss further. > > I hope I explained the background and the change and the impacts. Thanks. > > -Anoop- > > On Thu, Jul 6, 2017 at 11:43 AM, 宾莉金(binlijin) <binli...@gmail.com> wrote: >> I like to use the former, heap occupancy, so we not need to worry about the >> OOM and FullGc,and change configuration to adapted to new policy. >> >> 2017-07-06 14:03 GMT+08:00 Stack <st...@duboce.net>: >> >>> On Wed, Jul 5, 2017 at 9:59 PM, ramkrishna vasudevan < >>> ramkrishna.s.vasude...@gmail.com> wrote: >>> >>> > >>> > >>Sounds like we should be doing the former, heap occupancy >>> > Stack, so do you mean we need to roll back this new change in trunk? The >>> > background is https://issues.apache.org/jira/browse/HBASE-16747. >>> > >>> > >>> I remember that issue. It seems good to me (as it did then) where we have >>> the global tracking in RS of all data and overhead so we shouldn't OOME and >>> we keep accounting of overhead and data distinct because now data can be >>> onheap or offheap. >>> >>> We shouldn't be doing blocking updates -- not when there is probably loads >>> of memory still available -- but that is a different (critical) issue. >>> Sounds like current configs can 'surprise' -- see Yu Li note -- given the >>> new accounting. >>> >>> Looks like I need to read HBASE-18294 >>> <https://issues.apache.org/jira/browse/HBASE-18294> to figure what the >>> pivot/problem w/ the new policy is. >>> >>> Thanks, >>> St.Ack >>> >>> >>> >>> >>> >>> > Reg
Re: [DISCUSS] Should flush decisions be made based on data size (key-value only) or based on heap size (including metadata overhead)?
6:11:54,724 INFO >> > > [B.defaultRpcServer.handler=182,queue=11,port=16020] >> > > regionserver.MemStoreFlusher: Blocking updates on >> > > hadoop0528.et2.tbsite.net,16020,1497336978160: >> > > global memstore heapsize 7.2 G is >= than blocking 7.2 G size >> > > 2017-07-03 16:11:54,754 INFO >> > > [B.defaultRpcServer.handler=186,queue=15,port=16020] >> > > regionserver.MemStoreFlusher: Blocking updates on >> > > hadoop0528.et2.tbsite.net,16020,1497336978160: >> > > global memstore heapsize 7.2 G is >= than blocking 7.2 G size >> > > 2017-07-03 16:11:57,571 INFO [MemStoreFlusher.0] >> > > regionserver.MemStoreFlusher: Flush of region >> > > mainv7_main_result_c,1496,1499062935573.02adfa7cbdc606dce5b79a516e1649 >> > 2a. >> > > due to global heap pressure. Total Memstore size=3.2 G, Region memstore >> > > size=331.4 M >> > > 2017-07-03 16:11:57,571 WARN >> > > [B.defaultRpcServer.handler=49,queue=11,port=16020] >> > > regionserver.MemStoreFlusher: Memstore is above high water mark and >> block >> > > 2892ms >> > > >> > > Best Regards, >> > > Yu >> > > >> > > On 6 July 2017 at 00:56, Stack <st...@duboce.net> wrote: >> > > >> > > > On Wed, Jul 5, 2017 at 6:30 AM, Eshcar Hillel >> > > <esh...@yahoo-inc.com.invalid >> > > > > >> > > > wrote: >> > > > >> > > > > Hi All, >> > > > > I opened a new Jira https://issues.apache.org/ >> > jira/browse/HBASE-18294 >> > > to >> > > > > discuss this question. >> > > > > Flush decisions are taken at the region level and also at the >> region >> > > > > server level - there is the question of when to trigger a flush and >> > > then >> > > > > which region/store to flush.Regions track both their data size >> > > (key-value >> > > > > size only) and their total heap occupancy (including index and >> > > additional >> > > > > metadata).One option (which was the past policy) is to trigger >> > flushes >> > > > and >> > > > > choose flush subjects based on regions heap size - this gives a >> > better >> > > > > estimation for sysadmin of how many regions can a RS carry.Another >> > > option >> > > > > (which is the current policy) is to look at the data size - this >> > gives >> > > a >> > > > > better estimation of the size of the files that are created by the >> > > flush. >> > > > > >> > > > >> > > > >> > > > Sounds like we should be doing the former, heap occupancy. An >> > > > OutOfMemoryException puts a nail in any benefit other accountings >> might >> > > > have. >> > > > >> > > > St.Ack >> > > > >> > > > >> > > > >> > > > > I see this is as critical to HBase performance and usability, >> namely >> > > > > meeting the user expectation from the system, hence I would like to >> > > hear >> > > > as >> > > > > many voices as possible.Please join the discussion in the Jira and >> > let >> > > us >> > > > > know what you think. >> > > > > Thanks,Eshcar >> > > > > >> > > > > >> > > > >> > > >> > >> > > > > -- > *Best Regards,* > lijin bin
[DISCUSS] Should flush decisions be made based on data size (key-value only) or based on heap size (including metadata overhead)?
Hi All, I opened a new Jira https://issues.apache.org/jira/browse/HBASE-18294 to discuss this question. Flush decisions are taken at the region level and also at the region server level - there is the question of when to trigger a flush and then which region/store to flush.Regions track both their data size (key-value size only) and their total heap occupancy (including index and additional metadata).One option (which was the past policy) is to trigger flushes and choose flush subjects based on regions heap size - this gives a better estimation for sysadmin of how many regions can a RS carry.Another option (which is the current policy) is to look at the data size - this gives a better estimation of the size of the files that are created by the flush. I see this is as critical to HBase performance and usability, namely meeting the user expectation from the system, hence I would like to hear as many voices as possible.Please join the discussion in the Jira and let us know what you think. Thanks,Eshcar
[jira] [Created] (HBASE-18294) Flush policy checks data size instead of heap size
Eshcar Hillel created HBASE-18294: - Summary: Flush policy checks data size instead of heap size Key: HBASE-18294 URL: https://issues.apache.org/jira/browse/HBASE-18294 Project: HBase Issue Type: Bug Reporter: Eshcar Hillel Assignee: Eshcar Hillel A flush policy decides whether to flush a store by comparing the size of the store to a threshold (that can be configured with hbase.hregion.percolumnfamilyflush.size.lower.bound). Currently the implementation compares the data size (key-value only) to the threshold where it should compare the heap size (which includes index size, and metadata). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-17575) Towards Making BASIC the Default In-Memory Compaction Policy
Eshcar Hillel created HBASE-17575: - Summary: Towards Making BASIC the Default In-Memory Compaction Policy Key: HBASE-17575 URL: https://issues.apache.org/jira/browse/HBASE-17575 Project: HBase Issue Type: Sub-task Reporter: Eshcar Hillel We remove the NONE configuration setting added to tests in HBASE-17294 and HBASE-17316. We run these tests with all 3 memory compaction policies. For each test (1) if all 3 pass -- we remove the configuration from the test. (2) if some fail we add tests of all 3 configurations, e.g., by parameterized tests. When needed we update expected results. One test failure identified a small bug which is also fixed in the patch. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17434) New Synchronization Scheme for Compaction Pipeline
Eshcar Hillel created HBASE-17434: - Summary: New Synchronization Scheme for Compaction Pipeline Key: HBASE-17434 URL: https://issues.apache.org/jira/browse/HBASE-17434 Project: HBase Issue Type: Bug Reporter: Eshcar Hillel A new copyOnWrite synchronization scheme is introduced for the compaction pipeline. The new scheme is better since it removes the lock from getSegments() which is invoked in every get and scan operation, and it reduces the number of LinkedList objects that are created at runtime, thus can reduce GC (not by much, but still...). In addition, it fixes the method getTailSize() in compaction pipeline. This method creates a MemstoreSize object which comprises the data size and the overhead size of the segment and needs to be atomic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17407) Correct update of maxFlushedSeqId in HRegion
Eshcar Hillel created HBASE-17407: - Summary: Correct update of maxFlushedSeqId in HRegion Key: HBASE-17407 URL: https://issues.apache.org/jira/browse/HBASE-17407 Project: HBase Issue Type: Bug Reporter: Eshcar Hillel The attribute maxFlushedSeqId in HRegion is used to track the max sequence id in the store files and is reported to HMaster. When flushing only part of the memstore content this value might be incorrect and may cause data loss. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17339) Scan-Memory-First Optimization
Eshcar Hillel created HBASE-17339: - Summary: Scan-Memory-First Optimization Key: HBASE-17339 URL: https://issues.apache.org/jira/browse/HBASE-17339 Project: HBase Issue Type: Improvement Reporter: Eshcar Hillel The current implementation of a get operation (to retrieve values for a specific key) scans through all relevant stores of the region; for each store both memory components (memstores segments) and disk components (hfiles) are scanned in parallel. We suggest to apply an optimization that speculatively scans memory-only components first and only if the result is incomplete scans both memory and disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17316) Addendum to HBASE-17294
Eshcar Hillel created HBASE-17316: - Summary: Addendum to HBASE-17294 Key: HBASE-17316 URL: https://issues.apache.org/jira/browse/HBASE-17316 Project: HBase Issue Type: Sub-task Reporter: Eshcar Hillel Assignee: Eshcar Hillel Updating 2 tests that failed during the commit of HBASE-17294 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17294) External Configuration for Memory Compaction
Eshcar Hillel created HBASE-17294: - Summary: External Configuration for Memory Compaction Key: HBASE-17294 URL: https://issues.apache.org/jira/browse/HBASE-17294 Project: HBase Issue Type: Sub-task Reporter: Eshcar Hillel We would like to have a single external knob to control memstore compaction. Possible memstore compaction policies are none, basic, and eager. This sub-task allows to set this property at the column family level at table creation time: {code} create ‘’, {NAME => ‘’, IN_MEMORY_COMPACTION => ‘<NONE|BASIC|EAGER>’} {code} or to set this at the global configuration level by setting the property in abase-site.xml, with BASIC being the default value: {code} hbase.hregion.compacting.memstore.type <NONE|BASIC|EAGER> {code} The values used in this property can change as memstore compaction policies evolve over time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15359) Simplifying Segment hierarchy
Eshcar Hillel created HBASE-15359: - Summary: Simplifying Segment hierarchy Key: HBASE-15359 URL: https://issues.apache.org/jira/browse/HBASE-15359 Project: HBase Issue Type: Sub-task Reporter: Eshcar Hillel Assignee: Eshcar Hillel Now that it is clear that no memstore segment will be implemented as an HFIle, and that all segments store their data in some representation of CellSet (skip-list or flat), the segment hierarchy can be much simplified. The attached patch includes only 3 classes in the hierarchy: Segment - comprises most of the state and implementation MutableSegment - extends API with add and rollback functionality ImmutableSegment - extends API with key-value scanner for snapshot SegmentScanner is the scanner for all types of segments. In addition, the option to rollback immutable segment in the memstore is disabled. This code would allow us to make progress independently in the compaction subtask (HBASE-14920) and the flat index representation subtask (HBASE-14921). It also means that the new immutable segment can reuse the existing SegmentScanner, instead of implementing a new scanner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15016) StoreServices facility in Region
Eshcar Hillel created HBASE-15016: - Summary: StoreServices facility in Region Key: HBASE-15016 URL: https://issues.apache.org/jira/browse/HBASE-15016 Project: HBase Issue Type: Sub-task Reporter: Eshcar Hillel The default implementation of a memstore ensures that between two flushes the memstore size increases monotonically. Supporting new memstores that store data in different formats (specifically, compressed), or that allows to eliminate data redundancies in memory (e.g., via compaction), means that the size of the data stored in memory can decrease even between two flushes. This requires memstores to have access to facilities that manipulate region counters and synchronization. This subtasks introduces a new region interface -- StoreServices, through which store components can access these facilities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14918) In-Memory MemStore Flush and Compaction
Eshcar Hillel created HBASE-14918: - Summary: In-Memory MemStore Flush and Compaction Key: HBASE-14918 URL: https://issues.apache.org/jira/browse/HBASE-14918 Project: HBase Issue Type: Umbrella Affects Versions: 2.0.0 Reporter: Eshcar Hillel A memstore serves as the in-memory component of a store unit, absorbing all updates to the store. From time to time these updates are flushed to a file on disk, where they are compacted (by eliminating redundancies) and compressed (i.e., written in a compressed format to reduce their storage size). We aim to speed up data access, and therefore suggest to apply in-memory memstore flush. That is to flush the active in-memory segment into an intermediate buffer where it can be accessed by the application. Data in the buffer is subject to compaction and can be stored in any format that allows it to take up smaller space in RAM. The less space the buffer consumes the longer it can reside in memory before data is flushed to disk, resulting in better performance. Specifically, the optimization is beneficial for workloads with medium-to-high key churn which incur many redundant cells, like persistent messaging. We suggest to structure the solution as 3 subtasks (respectively, patches). (1) Infrastructure - refactoring of the MemStore hierarchy, introducing segment (StoreSegment) as first-class citizen, and decoupling memstore scanner from the memstore implementation; (2) Implementation of a new memstore (CompactingMemstore) with non-optimized immutable segment representation, and (3) Memory optimization including compressed format representation and offheap allocations. This Jira continues the discussion in HBASE-13408. Design documents, evaluation results and previous patches can be found in HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14921) Memory optimizations
Eshcar Hillel created HBASE-14921: - Summary: Memory optimizations Key: HBASE-14921 URL: https://issues.apache.org/jira/browse/HBASE-14921 Project: HBase Issue Type: Sub-task Affects Versions: 2.0.0 Reporter: Eshcar Hillel Memory optimizations including compressed format representation and offheap allocations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14919) Infrastructure refactoring
Eshcar Hillel created HBASE-14919: - Summary: Infrastructure refactoring Key: HBASE-14919 URL: https://issues.apache.org/jira/browse/HBASE-14919 Project: HBase Issue Type: Sub-task Affects Versions: 2.0.0 Reporter: Eshcar Hillel Assignee: Eshcar Hillel Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as first-class citizen and decoupling memstore scanner from the memstore implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14920) Compacting Memstore
Eshcar Hillel created HBASE-14920: - Summary: Compacting Memstore Key: HBASE-14920 URL: https://issues.apache.org/jira/browse/HBASE-14920 Project: HBase Issue Type: Sub-task Reporter: Eshcar Hillel Assignee: Eshcar Hillel Implementation of a new compacting memstore with non-optimized immutable segment representation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13719) Asynchronous scanner -- cache size-in-bytes bug fix
Eshcar Hillel created HBASE-13719: - Summary: Asynchronous scanner -- cache size-in-bytes bug fix Key: HBASE-13719 URL: https://issues.apache.org/jira/browse/HBASE-13719 Project: HBase Issue Type: Bug Reporter: Eshcar Hillel Hbase Streaming Scan is a feature recently added to trunk. In this feature, an asynchronous scanner pre-loads data to the cache based on its size (both row count and size in bytes). In one of the locations where the scanner polls an item from the cache, the variable holding the estimated byte size of the cache is not updated. This affects the decision of when to load the next batch of data. A bug fix patch is attached - it comprises only local changes to the ClientAsyncPrefetchScanner.java file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13408) HBase In-Memory Memstore Compaction
Eshcar Hillel created HBASE-13408: - Summary: HBase In-Memory Memstore Compaction Key: HBASE-13408 URL: https://issues.apache.org/jira/browse/HBASE-13408 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent. In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. We suggest a new compacted memstore with the following principles: 1. The data is kept in memory for as long as possible 2. Memstore data is either compacted or in process of being compacted 3. Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore. We suggest applying this optimization only to in-memory column families. A design document is attached. This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13071) Hbase Streaming Scan Feature
Eshcar Hillel created HBASE-13071: - Summary: Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache---so the cache is empty---the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)