[jira] [Created] (HBASE-20480) Multiple one-time cell objects are allocated and de-allocated when working with CCM
Anastasia Braginsky created HBASE-20480: --- Summary: Multiple one-time cell objects are allocated and de-allocated when working with CCM Key: HBASE-20480 URL: https://issues.apache.org/jira/browse/HBASE-20480 Project: HBase Issue Type: Sub-task Reporter: Anastasia Braginsky We believe that the cause for some read performance degradation while working with CellChunkMap (CCM). Multiple one-time cell objects are allocated and de-allocated when performing multiple reads and working with CCM MemStore. We have a couple of ideas for solution. More details will follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Re: [ANNOUNCE] Please welcome Francis Liu to the HBase PMC
Congratulations!! On Thursday, April 12, 2018, 7:44:17 PM GMT+3, Jerry Hewrote: Congratulations, Francis! Jerry On Wed, Apr 11, 2018 at 10:03 PM, Andrew Purtell > wrote: > > On behalf of the Apache HBase PMC I am pleased to announce that Francis > > Liu has accepted our invitation to become a PMC member on the Apache > > HBase project. We appreciate Francis stepping up to take more > > responsibility in the HBase project. He has been an active contributor to > > HBase for many years and recently took over responsibilities as branch RM > > for branch-1.3. > > > > Please join me in welcoming Francis to the HBase PMC! > > > > -- > > Best regards, > > Andrew > > >
Re: Questions about synchronization in compaction pipeline
Hi Mike, Thanks for being interested in the CompactionPipeline implementation. It is pleasure to discuss it with you. Regarding that we are implementing our own copy-on-write (COW) list. May be it is close, but in classic COW, everybody is sharing the same read-only copy and when someone tries to write on this copy it gets its own/personal copy updated according to this write. This is not what happens in the pipeline. In pipeline we let everyone read the same read-only copy, because read accesses are more frequent. When rare update to the pipeline happens, it is synchronized on the pipeline itself (writable) and the the read-only copy is updated (quickly). So all this is done for a faster synchronization. Anyway I am not aware of some from-the-shelf Java list, giving me the same synchronization as I want. Please update me if I am wrong. Regarding "I am concerned about the LL copy in pushHead - even if addFirst is faster, a LL copy is fairly slow and likely loses us any gains". As you can see, recreation of the read-only-copy happens anytime the background pipeline changes (addFirst, swap, replaceAtIndex), which are rare operations happening on snapshot, compaction, flattening, respectively. The copy of the segment after all is the copy of the references without copying the entire data itself. We had previous type of synchronization before (without read-only-copy) and it was slower. So if you believe, read-only-copy creation is a key for some performance problem, please give provide any measurements. Regarding "Also, I'm a little dubious on the use of LL given that we support a replaceAtIndex which will be much faster in an array". Generally I agree that change the implementation of "readOnlyCopy" from LinkedList to ArrayList, might be beneficial here. Specially for the replaceAtIndex case. I don't see how ArrayDeque helps us. Thanks,Anastasia On Sunday, March 11, 2018, 8:06:05 AM GMT+2, 张铎(Duo Zhang)wrote: I believe the comments there are mainly about concurrency problem, not for linked list vs. array list, at least for me... 2018-03-11 4:12 GMT+08:00 Mike Drob : > Hi devs, > > I was reading through HBASE-17434 trying to understand why we have two > linked lists in compaction pipeline and I'm having trouble following the > conversation there, especially since it seems intertwined with HBASE-17379 > and jumps back and forth a few times. > > It looks like we are implementing our own copy-on-write list, and there is > a claim that addFirst is faster on a LinkedList than an array based list. I > am concerned about the LL copy in pushHead - even if addFirst is faster, a > LL copy is fairly slow and likely loses us any gains. Also, I'm a little > dubious on the use of LL given that we support a replaceAtIndex which will > be much faster in an array. > > Can we improve by using an ArrayDeque? > > Eschar, Anastasia, WDYT? > > Thanks, > Mike > > Some observations about performance - > https://stuartmarks.wordpress.com/2015/12/18/some-java-list-benchmarks/ >
[jira] [Created] (HBASE-19506) Support variable sized chunks from ChunkCreator
Anastasia Braginsky created HBASE-19506: --- Summary: Support variable sized chunks from ChunkCreator Key: HBASE-19506 URL: https://issues.apache.org/jira/browse/HBASE-19506 Project: HBase Issue Type: Sub-task Reporter: Anastasia Braginsky When CellChunkMap is created it allocates a special index chunk (or chunks) where array of cell-representations is stored. When the number of cell-representations is small, it is preferable to allocate a chunk smaller than a default value which is 2MB. On the other hand, those "non-standard size" chunks can not be used in pool. On-demand allocations in off-heap are costly. So this JIRA is about to investigate the trade of between memory usage and the final performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19282) CellChunkMap Benchmarking and User Interface
Anastasia Braginsky created HBASE-19282: --- Summary: CellChunkMap Benchmarking and User Interface Key: HBASE-19282 URL: https://issues.apache.org/jira/browse/HBASE-19282 Project: HBase Issue Type: Sub-task Reporter: Anastasia Braginsky We have made some experiments how working with CellChunkMap (CCM) influences the performance when running on-heap and off-heap. Based on those results it is suggested to tie the MSLAB usage (off-heap or on-heap) with CCM index usage. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19133) Transfer big cells or upserted/appended cells into MSLAB upon flattening to CellChunkMap
Anastasia Braginsky created HBASE-19133: --- Summary: Transfer big cells or upserted/appended cells into MSLAB upon flattening to CellChunkMap Key: HBASE-19133 URL: https://issues.apache.org/jira/browse/HBASE-19133 Project: HBase Issue Type: Sub-task Reporter: Anastasia Braginsky CellChunkMap Segment index requires all cell data to be written in the MSLAB Chunks. Eventhough MSLAB is enabled, cells bigger than chunk size or upserted/incremented/appended cells are still allocated on the JVM stack. If such cells are found in the process of flattening into CellChunkMap (in-memory-flush) they need to be copied into MSLAB. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: Welcome Chia-Ping Tsai to the HBase PMC
Congratulations! On Saturday, September 30, 2017, 4:52:21 AM GMT+3, Mike Drobwrote: Well deserved, Chia-Ping! On Fri, Sep 29, 2017 at 6:04 PM, Esteban Gutierrez wrote: > Congrats Chia-Ping! and Welcome! > > -- > Cloudera, Inc. > > > On Fri, Sep 29, 2017 at 3:52 PM, Guanghao Zhang > wrote: > > > Congratulations! > > > > 2017-09-30 6:38 GMT+08:00 Andrew Purtell : > > > > > Congratulations, Chia-Ping! Welcome to the PMC. > > > > > > On Fri, Sep 29, 2017 at 3:19 PM, Misty Stanley-Jones > > > > wrote: > > > > > > > The HBase PMC is delighted to announce that Chia-Ping Tsai has agreed > > to > > > > join > > > > the HBase PMC, and help to make the project run smoothly. Chia-Ping > > > became > > > > an > > > > HBase committer over 6 months ago, based on long-running participate > in > > > the > > > > HBase project, a consistent record of resolving HBase issues, and > > > > contributions > > > > to testing and performance. > > > > > > > > Thank you for stepping up to serve, Chia-Ping! > > > > > > > > As a reminder, if anyone would like to nominate another person as a > > > > committer or PMC member, even if you are not currently a committer or > > PMC > > > > member, you can always drop a note to priv...@hbase.apache.org to > let > > us > > > > know! > > > > > > > > Thanks, > > > > Misty (on behalf of the HBase PMC) > > > > > > > > > > > > > > > > -- > > > Best regards, > > > Andrew > > > > > > Words like orphans lost among the crosstalk, meaning torn from truth's > > > decrepit hands > > > - A23, Crosstalk > > > > > >
[jira] [Created] (HBASE-18748) Cache pre-warming upon replication
Anastasia Braginsky created HBASE-18748: --- Summary: Cache pre-warming upon replication Key: HBASE-18748 URL: https://issues.apache.org/jira/browse/HBASE-18748 Project: HBase Issue Type: New Feature Reporter: Anastasia Braginsky HBase's cluster replication is very important and widely used feature. Let's assume primary cluster is replicated to secondary (backup) cluster using the WAL of the primary cluster to propagate the changes. Let's also assume the secondary cluster is a target for failover when needed and should become primary when needed. We suggest improving the way the HBase cluster failover works today. Namely, upon failover, the backup RS's cache is cold. Warming it up to the right working set takes many minutes. The suggested solution is to selectively replay read requests at the backup - namely, those reads that caused cache-ins at the primary. We intend to use WAL replication as transport protocol (hopefully, as black box), and of course add custom replay callbacks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18375) The pool chunks from ChunkCreator are deallocated while in pool because there is no reference to them
Anastasia Braginsky created HBASE-18375: --- Summary: The pool chunks from ChunkCreator are deallocated while in pool because there is no reference to them Key: HBASE-18375 URL: https://issues.apache.org/jira/browse/HBASE-18375 Project: HBase Issue Type: Sub-task Reporter: Anastasia Braginsky Because MSLAB list of chunks was changed to list of chunk IDs, the chunks returned back to pool can be deallocated by JVM because there is no reference to them. The solution is to protect pool chunks from GC by the strong map of ChunkCreator introduced by HBASE-18010. Will prepare the patch today. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18251) Remove unnecessary traversing to the first and last keys in the CellSet
Anastasia Braginsky created HBASE-18251: --- Summary: Remove unnecessary traversing to the first and last keys in the CellSet Key: HBASE-18251 URL: https://issues.apache.org/jira/browse/HBASE-18251 Project: HBase Issue Type: Bug Reporter: Anastasia Braginsky The implementation of finding the first and last keys in the CellSet is as following: {code} public Cell first() { return this.delegatee.get(this.delegatee.firstKey()); } public Cell last() { return this.delegatee.get(this.delegatee.lastKey()); } {code} Recall we have Cell to Cell mapping, therefore the methods bringing the first/last key, which allready return Cell. Thus no need to waist time on the get() method for the same Cell. Fix: return just the first/lastKey(), should be at least twice more effective. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18232) Add variable size chunks to the MSLAB
Anastasia Braginsky created HBASE-18232: --- Summary: Add variable size chunks to the MSLAB Key: HBASE-18232 URL: https://issues.apache.org/jira/browse/HBASE-18232 Project: HBase Issue Type: Sub-task Reporter: Anastasia Braginsky Add possibility to create a variable size chunks of memory, so any cell (of any size) can reside on a chunk. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18056) Change CompactingMemStore in BASIC mode to merge multiple segments in pipeline
Anastasia Braginsky created HBASE-18056: --- Summary: Change CompactingMemStore in BASIC mode to merge multiple segments in pipeline Key: HBASE-18056 URL: https://issues.apache.org/jira/browse/HBASE-18056 Project: HBase Issue Type: Sub-task Reporter: Anastasia Braginsky Under HBASE-16417 it was decided that CompactingMemStore in BASIC mode should merge multiple ImmutableSegments in CompactionPipeline. Basic+Merge actually demonstrated reduction in GC, alongside improvement in other metrics. However, the limit on the number of segments in pipeline is still set to 30. Under this JIRA it should be changed to 1, as it was tested under HBASE-16417. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-18010) Connect CellChunkMap to be used for flattening in CompactingMemStore
Anastasia Braginsky created HBASE-18010: --- Summary: Connect CellChunkMap to be used for flattening in CompactingMemStore Key: HBASE-18010 URL: https://issues.apache.org/jira/browse/HBASE-18010 Project: HBase Issue Type: Sub-task Reporter: Anastasia Braginsky The CellChunkMap helps to create a new type of ImmutableSegment, where the index (CellSet's delegatee) is going to be CellChunkMap. No big cells or upserted cells are going to be supported here. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[ANNOUNCE] - MemStore default implementation change
Hi All Following the performance benefits demonstrated by the new in-memory compaction algorithm in HBase, please note the anticipated change to the MemStore default implementation. The default in-memory compaction level now become BASIC (the previous default translates to NONE). The post on Apache HBase blog sketches the algorithm, and fleshes out the configuration details. This change is tracked under HBASE-17343. Regards,Anastasia
[jira] [Resolved] (HBASE-17377) MemStoreChunkAllocator
[ https://issues.apache.org/jira/browse/HBASE-17377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anastasia Braginsky resolved HBASE-17377. - Resolution: Duplicate Duplicate of HBASE-16438 > MemStoreChunkAllocator > -- > > Key: HBASE-17377 > URL: https://issues.apache.org/jira/browse/HBASE-17377 > Project: HBase > Issue Type: Sub-task > Reporter: Anastasia Braginsky > Attachments: MemStoreChunkAllocator.pdf > > > Refactoring the separation between MemStoreChunkPool and (new) > MemStoreChunkAllocator. The latter allocates chunks either from heap or from > pool and assigns them the Chunk IDs. MemStoreChunkAllocator stores the > mapping between chunks and their IDs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17765) Reviving the merge possibility in the CompactingMemStore
Anastasia Braginsky created HBASE-17765: --- Summary: Reviving the merge possibility in the CompactingMemStore Key: HBASE-17765 URL: https://issues.apache.org/jira/browse/HBASE-17765 Project: HBase Issue Type: Sub-task Reporter: Anastasia Braginsky According to the new performance results presented in the HBASE-16417 we see that the read latency of the 90th percentile of the BASIC policy is too big due to the need to traverse through too many segments in the pipeline. In this JIRA we correct the bug in the merge sizing calculations and allow pipeline size threshold to be a configurable parameter. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17662) Disable in-memory flush when eplaying from WAL
Anastasia Braginsky created HBASE-17662: --- Summary: Disable in-memory flush when eplaying from WAL Key: HBASE-17662 URL: https://issues.apache.org/jira/browse/HBASE-17662 Project: HBase Issue Type: Sub-task Reporter: Anastasia Braginsky When replaying the edits from WAL, the region's updateLock is not taken, because a single threaded action is assumed. However, the thread-safeness of the in-memory flush of CompactingMemStore is based on taking the region's updateLock. The in-memory flush can be skipped in the replay time (anyway everything is flushed to disk just after the replay). Therefore it is acceptable to just skip the in-memory flush action while the updates come as part of replay from WAL. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17492) Fix the compacting memstore part in hbase shell ruby script
Anastasia Braginsky created HBASE-17492: --- Summary: Fix the compacting memstore part in hbase shell ruby script Key: HBASE-17492 URL: https://issues.apache.org/jira/browse/HBASE-17492 Project: HBase Issue Type: Sub-task Reporter: Anastasia Braginsky Make the MemoryCompaction enum, not an internal class of HColumnDescriptor, but an external class. This enum is later used in the ruby script and the ruby script doesn't accept this internal class and proceeds with an error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17377) MemStoreChunkAllocator
Anastasia Braginsky created HBASE-17377: --- Summary: MemStoreChunkAllocator Key: HBASE-17377 URL: https://issues.apache.org/jira/browse/HBASE-17377 Project: HBase Issue Type: Sub-task Reporter: Anastasia Braginsky Refactoring the separation between MemStoreChunkPool and (new) MemStoreChunkAllocator. The latter allocates chunks either from heap or from pool and assigns them the Chunk IDs. MemStoreChunkAllocator stores the mapping between chunks and their IDs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17373) Reverse the order of snapshot creation in the CompactingMemStore
Anastasia Braginsky created HBASE-17373: --- Summary: Reverse the order of snapshot creation in the CompactingMemStore Key: HBASE-17373 URL: https://issues.apache.org/jira/browse/HBASE-17373 Project: HBase Issue Type: Bug Reporter: Anastasia Braginsky In CompactingMemStore both in BASIC and EAGER cases when snapshot is created the segments are first removed from the pipeline then added to the snapshot. This is the opposite to what is done in the DefaultMemStore where the snapshot is firstly created with the active segment and only after the active segment is refreshed. This JIRA is about to reverse the order in CompactingMemStore and to make all MemStores to behave the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17081) Flush the entire CompactingMemStore content to disk
Anastasia Braginsky created HBASE-17081: --- Summary: Flush the entire CompactingMemStore content to disk Key: HBASE-17081 URL: https://issues.apache.org/jira/browse/HBASE-17081 Project: HBase Issue Type: Sub-task Reporter: Anastasia Braginsky Assignee: Anastasia Braginsky Part of CompactingMemStore's memory is held by an active segment, and another part is divided between immutable segments in the compacting pipeline. Upon flush-to-disk request we want to flush all of it to disk, in contrast to flushing only tail of the compacting pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-16608) Introducing the ability to merge ImmutableSegments without copy-compaction or SQM usage
[ https://issues.apache.org/jira/browse/HBASE-16608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anastasia Braginsky resolved HBASE-16608. - Resolution: Fixed > Introducing the ability to merge ImmutableSegments without copy-compaction or > SQM usage > --- > > Key: HBASE-16608 > URL: https://issues.apache.org/jira/browse/HBASE-16608 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 > Reporter: Anastasia Braginsky >Assignee: Anastasia Braginsky > Fix For: 2.0.0 > > Attachments: HBASE-16417-V02.patch, HBASE-16417-V04.patch, > HBASE-16417-V06.patch, HBASE-16417-V07.patch, HBASE-16417-V08.patch, > HBASE-16417-V10.patch, HBASE-16608-Final.patch, HBASE-16608-Final.patch, > HBASE-16608-V01.patch, HBASE-16608-V03.patch, HBASE-16608-V04.patch, > HBASE-16608-V08.patch, HBASE-16608-V09.patch, HBASE-16608-V09.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16608) Introducing the ability to merge ImmutableSegments without copy-compaction or SQM usage
Anastasia Braginsky created HBASE-16608: --- Summary: Introducing the ability to merge ImmutableSegments without copy-compaction or SQM usage Key: HBASE-16608 URL: https://issues.apache.org/jira/browse/HBASE-16608 Project: HBase Issue Type: Sub-task Reporter: Anastasia Braginsky -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16421) Introducing the CellChunkMap as a new additional indes in the MemStore
Anastasia Braginsky created HBASE-16421: --- Summary: Introducing the CellChunkMap as a new additional indes in the MemStore Key: HBASE-16421 URL: https://issues.apache.org/jira/browse/HBASE-16421 Project: HBase Issue Type: Umbrella Reporter: Anastasia Braginsky -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions
Anastasia Braginsky created HBASE-16417: --- Summary: In-Memory MemStore Policy for Flattening and Compactions Key: HBASE-16417 URL: https://issues.apache.org/jira/browse/HBASE-16417 Project: HBase Issue Type: Improvement Reporter: Anastasia Braginsky -- This message was sent by Atlassian JIRA (v6.3.4#6332)