[jira] [Created] (HBASE-20480) Multiple one-time cell objects are allocated and de-allocated when working with CCM

2018-04-24 Thread Anastasia Braginsky (JIRA)
Anastasia Braginsky created HBASE-20480:
---

 Summary: Multiple one-time cell objects are allocated and 
de-allocated when working with CCM
 Key: HBASE-20480
 URL: https://issues.apache.org/jira/browse/HBASE-20480
 Project: HBase
  Issue Type: Sub-task
Reporter: Anastasia Braginsky


We believe that the cause for some read performance degradation while working 
with CellChunkMap (CCM). Multiple one-time cell objects are allocated and 
de-allocated when performing multiple reads and working with CCM MemStore. We 
have a couple of ideas for solution. More details will follow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Re: [ANNOUNCE] Please welcome Francis Liu to the HBase PMC

2018-04-12 Thread Anastasia Braginsky
 Congratulations!! 
On Thursday, April 12, 2018, 7:44:17 PM GMT+3, Jerry He 
 wrote:  
 
 Congratulations, Francis!

Jerry

On Wed, Apr 11, 2018 at 10:03 PM, Andrew Purtell 
> wrote:
> > On behalf of the Apache HBase PMC I am pleased to announce that Francis
> > Liu has accepted our invitation to become a PMC member on the Apache
> > HBase project. We appreciate Francis stepping up to take more
> > responsibility in the HBase project. He has been an active contributor to
> > HBase for many years and recently took over responsibilities as branch RM
> > for branch-1.3.
> >
> > Please join me in welcoming Francis to the HBase PMC!
> >
> > --
> > Best regards,
> > Andrew
> >
>
  

Re: Questions about synchronization in compaction pipeline

2018-03-11 Thread Anastasia Braginsky
 Hi Mike,
Thanks for being interested in the CompactionPipeline implementation. It is 
pleasure to discuss it with you.
Regarding that we are implementing our own copy-on-write (COW) list. May be it 
is close, but in classic COW, everybody is sharing the same read-only copy and 
when someone tries to write on this copy it gets its own/personal copy updated 
according to this write. This is not what happens in the pipeline. In pipeline 
we let everyone read the same read-only copy, because read accesses are more 
frequent. When rare update to the pipeline happens, it is synchronized on the 
pipeline itself (writable) and the the read-only copy is updated (quickly). So 
all this is done for a faster synchronization. Anyway I am not aware of some 
from-the-shelf Java list, giving me the same synchronization as I want. Please 
update me if I am wrong.
Regarding "I am concerned about the LL copy in pushHead - even if addFirst is 
faster, a LL copy is fairly slow and likely loses us any gains". As you can 
see, recreation of the read-only-copy happens anytime the background pipeline 
changes (addFirst, swap, replaceAtIndex), which are rare operations happening 
on snapshot, compaction, flattening, respectively. The copy of the segment 
after all is the copy of the references without copying the entire data itself. 
We had previous type of synchronization before (without read-only-copy) and it 
was slower. So if you believe, read-only-copy creation is a key for some 
performance problem, please give provide any measurements.
Regarding "Also, I'm a little dubious on the use of LL given that we support a 
replaceAtIndex which will be much faster in an array". Generally I agree that 
change the implementation of "readOnlyCopy" from LinkedList to ArrayList, might 
be beneficial here. Specially for the replaceAtIndex case. I don't see how 
ArrayDeque helps us.
Thanks,Anastasia
On Sunday, March 11, 2018, 8:06:05 AM GMT+2, 张铎(Duo Zhang) 
 wrote:  
 
 I believe the comments there are mainly about concurrency problem, not for
linked list vs. array list, at least for me...

2018-03-11 4:12 GMT+08:00 Mike Drob :

> Hi devs,
>
> I was reading through HBASE-17434 trying to understand why we have two
> linked lists in compaction pipeline and I'm having trouble following the
> conversation there, especially since it seems intertwined with HBASE-17379
> and jumps back and forth a few times.
>
> It looks like we are implementing our own copy-on-write list, and there is
> a claim that addFirst is faster on a LinkedList than an array based list. I
> am concerned about the LL copy in pushHead - even if addFirst is faster, a
> LL copy is fairly slow and likely loses us any gains. Also, I'm a little
> dubious on the use of LL given that we support a replaceAtIndex which will
> be much faster in an array.
>
> Can we improve by using an ArrayDeque?
>
> Eschar, Anastasia, WDYT?
>
> Thanks,
> Mike
>
> Some observations about performance -
> https://stuartmarks.wordpress.com/2015/12/18/some-java-list-benchmarks/
>
  

[jira] [Created] (HBASE-19506) Support variable sized chunks from ChunkCreator

2017-12-13 Thread Anastasia Braginsky (JIRA)
Anastasia Braginsky created HBASE-19506:
---

 Summary: Support variable sized chunks from ChunkCreator
 Key: HBASE-19506
 URL: https://issues.apache.org/jira/browse/HBASE-19506
 Project: HBase
  Issue Type: Sub-task
Reporter: Anastasia Braginsky


When CellChunkMap is created it allocates a special index chunk (or chunks) 
where array of cell-representations is stored. When the number of 
cell-representations is small, it is preferable to allocate a chunk smaller 
than a default value which is 2MB.

On the other hand, those "non-standard size" chunks can not be used in pool. 
On-demand allocations in off-heap are costly. So this JIRA is about to 
investigate the trade of between memory usage and the final performance. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19282) CellChunkMap Benchmarking and User Interface

2017-11-16 Thread Anastasia Braginsky (JIRA)
Anastasia Braginsky created HBASE-19282:
---

 Summary: CellChunkMap Benchmarking and User Interface
 Key: HBASE-19282
 URL: https://issues.apache.org/jira/browse/HBASE-19282
 Project: HBase
  Issue Type: Sub-task
Reporter: Anastasia Braginsky


We have made some experiments how working with CellChunkMap (CCM) influences 
the performance when running on-heap and off-heap. Based on those results it is 
suggested to tie the MSLAB usage (off-heap or on-heap) with CCM index usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19133) Transfer big cells or upserted/appended cells into MSLAB upon flattening to CellChunkMap

2017-10-31 Thread Anastasia Braginsky (JIRA)
Anastasia Braginsky created HBASE-19133:
---

 Summary: Transfer big cells or upserted/appended cells into MSLAB 
upon flattening to CellChunkMap
 Key: HBASE-19133
 URL: https://issues.apache.org/jira/browse/HBASE-19133
 Project: HBase
  Issue Type: Sub-task
Reporter: Anastasia Braginsky


CellChunkMap Segment index requires all cell data to be written in the MSLAB 
Chunks. Eventhough MSLAB is enabled, cells bigger than chunk size or 
upserted/incremented/appended cells are still allocated on the JVM stack. If 
such cells are found in the process of flattening into CellChunkMap 
(in-memory-flush) they need to be copied into MSLAB.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Welcome Chia-Ping Tsai to the HBase PMC

2017-10-02 Thread Anastasia Braginsky
Congratulations! 

On Saturday, September 30, 2017, 4:52:21 AM GMT+3, Mike Drob 
 wrote:  
 
 Well deserved, Chia-Ping!

On Fri, Sep 29, 2017 at 6:04 PM, Esteban Gutierrez 
wrote:

> Congrats  Chia-Ping! and Welcome!
>
> --
> Cloudera, Inc.
>
>
> On Fri, Sep 29, 2017 at 3:52 PM, Guanghao Zhang 
> wrote:
>
> > Congratulations!
> >
> > 2017-09-30 6:38 GMT+08:00 Andrew Purtell :
> >
> > > Congratulations, Chia-Ping! Welcome to the PMC.
> > >
> > > On Fri, Sep 29, 2017 at 3:19 PM, Misty Stanley-Jones  >
> > > wrote:
> > >
> > > > The HBase PMC is delighted to announce that Chia-Ping Tsai has agreed
> > to
> > > > join
> > > > the HBase PMC, and help to make the project run smoothly. Chia-Ping
> > > became
> > > > an
> > > > HBase committer over 6 months ago, based on long-running participate
> in
> > > the
> > > > HBase project, a consistent record of resolving HBase issues, and
> > > > contributions
> > > > to testing and performance.
> > > >
> > > > Thank you for stepping up to serve, Chia-Ping!
> > > >
> > > > As a reminder, if anyone would like to nominate another person as a
> > > > committer or PMC member, even if you are not currently a committer or
> > PMC
> > > > member, you can always drop a note to priv...@hbase.apache.org to
> let
> > us
> > > > know!
> > > >
> > > > Thanks,
> > > > Misty (on behalf of the HBase PMC)
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrew
> > >
> > > Words like orphans lost among the crosstalk, meaning torn from truth's
> > > decrepit hands
> > >    - A23, Crosstalk
> > >
> >
>


[jira] [Created] (HBASE-18748) Cache pre-warming upon replication

2017-09-03 Thread Anastasia Braginsky (JIRA)
Anastasia Braginsky created HBASE-18748:
---

 Summary: Cache pre-warming upon replication
 Key: HBASE-18748
 URL: https://issues.apache.org/jira/browse/HBASE-18748
 Project: HBase
  Issue Type: New Feature
Reporter: Anastasia Braginsky


HBase's cluster replication is very important and widely used feature. Let's 
assume primary cluster is replicated to secondary (backup) cluster using the 
WAL of the primary cluster to propagate the changes. Let's also assume the 
secondary cluster is a target for failover when needed and should become 
primary when needed.

We suggest improving the way the HBase cluster failover works today. Namely, 
upon failover, the backup RS's cache is cold. Warming it up to the right 
working set takes many minutes. The suggested solution is to selectively replay 
read requests at the backup - namely, those reads that caused cache-ins at the 
primary. We intend to use WAL replication as transport protocol (hopefully, as 
black box), and of course add custom replay callbacks. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18375) The pool chunks from ChunkCreator are deallocated while in pool because there is no reference to them

2017-07-13 Thread Anastasia Braginsky (JIRA)
Anastasia Braginsky created HBASE-18375:
---

 Summary: The pool chunks from ChunkCreator are deallocated while 
in pool because there is no reference to them
 Key: HBASE-18375
 URL: https://issues.apache.org/jira/browse/HBASE-18375
 Project: HBase
  Issue Type: Sub-task
Reporter: Anastasia Braginsky


Because MSLAB list of chunks was changed to list of chunk IDs, the chunks 
returned back to pool can be deallocated by JVM because there is no reference 
to them. The solution is to protect pool chunks from GC by the strong map of 
ChunkCreator introduced by HBASE-18010. Will prepare the patch today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18251) Remove unnecessary traversing to the first and last keys in the CellSet

2017-06-21 Thread Anastasia Braginsky (JIRA)
Anastasia Braginsky created HBASE-18251:
---

 Summary: Remove unnecessary traversing to the first and last keys 
in the CellSet
 Key: HBASE-18251
 URL: https://issues.apache.org/jira/browse/HBASE-18251
 Project: HBase
  Issue Type: Bug
Reporter: Anastasia Braginsky


The implementation of finding the first and last keys in the CellSet is as 
following:

{code}
 public Cell first() {
return this.delegatee.get(this.delegatee.firstKey());
  }

  public Cell last() {
return this.delegatee.get(this.delegatee.lastKey());
  }
{code}

Recall we have Cell to Cell mapping, therefore the methods bringing the 
first/last key, which allready return Cell. Thus no need to waist time on the 
get() method for the same Cell.
Fix: return just the first/lastKey(), should be at least twice more effective.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18232) Add variable size chunks to the MSLAB

2017-06-18 Thread Anastasia Braginsky (JIRA)
Anastasia Braginsky created HBASE-18232:
---

 Summary: Add variable size chunks to the MSLAB
 Key: HBASE-18232
 URL: https://issues.apache.org/jira/browse/HBASE-18232
 Project: HBase
  Issue Type: Sub-task
Reporter: Anastasia Braginsky


Add possibility to create a variable size chunks of memory, so any cell (of any 
size) can reside on a chunk.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18056) Change CompactingMemStore in BASIC mode to merge multiple segments in pipeline

2017-05-16 Thread Anastasia Braginsky (JIRA)
Anastasia Braginsky created HBASE-18056:
---

 Summary: Change CompactingMemStore in BASIC mode to merge multiple 
segments in pipeline
 Key: HBASE-18056
 URL: https://issues.apache.org/jira/browse/HBASE-18056
 Project: HBase
  Issue Type: Sub-task
Reporter: Anastasia Braginsky


Under HBASE-16417 it was decided that CompactingMemStore in BASIC mode should 
merge multiple ImmutableSegments in CompactionPipeline. Basic+Merge actually 
demonstrated reduction in GC, alongside improvement in other metrics.

However, the limit on the number of segments in pipeline is still set to 30. 
Under this JIRA it should be changed to 1, as it was tested under HBASE-16417.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-18010) Connect CellChunkMap to be used for flattening in CompactingMemStore

2017-05-08 Thread Anastasia Braginsky (JIRA)
Anastasia Braginsky created HBASE-18010:
---

 Summary: Connect CellChunkMap to be used for flattening in 
CompactingMemStore
 Key: HBASE-18010
 URL: https://issues.apache.org/jira/browse/HBASE-18010
 Project: HBase
  Issue Type: Sub-task
Reporter: Anastasia Braginsky


The CellChunkMap helps to create a new type of ImmutableSegment, where the 
index (CellSet's delegatee) is going to be CellChunkMap. No big cells or 
upserted cells are going to be supported here.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[ANNOUNCE] - MemStore default implementation change

2017-05-04 Thread Anastasia Braginsky
Hi All


Following the performance benefits demonstrated by the new in-memory compaction 
algorithm in HBase, please note the anticipated change to the MemStore default 
implementation.
The default in-memory compaction level now become BASIC (the previous default 
translates to NONE). The post on Apache HBase blog sketches the algorithm, and 
fleshes out the configuration details. 
This change is tracked under HBASE-17343. 

Regards,Anastasia


[jira] [Resolved] (HBASE-17377) MemStoreChunkAllocator

2017-03-30 Thread Anastasia Braginsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anastasia Braginsky resolved HBASE-17377.
-
Resolution: Duplicate

Duplicate of HBASE-16438

> MemStoreChunkAllocator
> --
>
> Key: HBASE-17377
> URL: https://issues.apache.org/jira/browse/HBASE-17377
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Anastasia Braginsky
> Attachments: MemStoreChunkAllocator.pdf
>
>
> Refactoring the separation between MemStoreChunkPool and (new) 
> MemStoreChunkAllocator. The latter allocates chunks either from heap or from 
> pool and assigns them the Chunk IDs. MemStoreChunkAllocator stores the 
> mapping between chunks and their IDs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17765) Reviving the merge possibility in the CompactingMemStore

2017-03-09 Thread Anastasia Braginsky (JIRA)
Anastasia Braginsky created HBASE-17765:
---

 Summary: Reviving the merge possibility in the CompactingMemStore
 Key: HBASE-17765
 URL: https://issues.apache.org/jira/browse/HBASE-17765
 Project: HBase
  Issue Type: Sub-task
Reporter: Anastasia Braginsky


According to the new performance results presented in the HBASE-16417 we see 
that the read latency of the 90th percentile of the BASIC policy is too big due 
to the need to traverse through too many segments in the pipeline. In this JIRA 
we correct the bug in the merge sizing calculations and allow pipeline size 
threshold to be a configurable parameter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17662) Disable in-memory flush when eplaying from WAL

2017-02-19 Thread Anastasia Braginsky (JIRA)
Anastasia Braginsky created HBASE-17662:
---

 Summary: Disable in-memory flush when eplaying from WAL
 Key: HBASE-17662
 URL: https://issues.apache.org/jira/browse/HBASE-17662
 Project: HBase
  Issue Type: Sub-task
Reporter: Anastasia Braginsky


When replaying the edits from WAL, the region's updateLock is not taken, 
because a single threaded action is assumed. However, the thread-safeness of 
the in-memory flush of CompactingMemStore is based on taking the region's 
updateLock. 

The in-memory flush can be skipped in the replay time (anyway everything is 
flushed to disk just after the replay). Therefore it is acceptable to just skip 
the in-memory flush action while the updates come as part of replay from WAL.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17492) Fix the compacting memstore part in hbase shell ruby script

2017-01-19 Thread Anastasia Braginsky (JIRA)
Anastasia Braginsky created HBASE-17492:
---

 Summary: Fix the compacting memstore part in hbase shell ruby 
script 
 Key: HBASE-17492
 URL: https://issues.apache.org/jira/browse/HBASE-17492
 Project: HBase
  Issue Type: Sub-task
Reporter: Anastasia Braginsky


Make the MemoryCompaction enum, not an internal class of HColumnDescriptor, but 
an external class. This enum is later used in the ruby script and the ruby 
script doesn't accept this internal class and proceeds with an error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17377) MemStoreChunkAllocator

2016-12-26 Thread Anastasia Braginsky (JIRA)
Anastasia Braginsky created HBASE-17377:
---

 Summary: MemStoreChunkAllocator
 Key: HBASE-17377
 URL: https://issues.apache.org/jira/browse/HBASE-17377
 Project: HBase
  Issue Type: Sub-task
Reporter: Anastasia Braginsky


Refactoring the separation between MemStoreChunkPool and (new) 
MemStoreChunkAllocator. The latter allocates chunks either from heap or from 
pool and assigns them the Chunk IDs. MemStoreChunkAllocator stores the mapping 
between chunks and their IDs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17373) Reverse the order of snapshot creation in the CompactingMemStore

2016-12-26 Thread Anastasia Braginsky (JIRA)
Anastasia Braginsky created HBASE-17373:
---

 Summary: Reverse the order of snapshot creation in the 
CompactingMemStore
 Key: HBASE-17373
 URL: https://issues.apache.org/jira/browse/HBASE-17373
 Project: HBase
  Issue Type: Bug
Reporter: Anastasia Braginsky


In CompactingMemStore both in BASIC and EAGER cases when snapshot is created 
the segments are first removed from the pipeline then added to the snapshot. 
This is the opposite to what is done in the DefaultMemStore where the snapshot 
is firstly created with the active segment and only after the active segment is 
refreshed. This JIRA is about to reverse the order in CompactingMemStore and to 
make all MemStores to behave the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17081) Flush the entire CompactingMemStore content to disk

2016-11-12 Thread Anastasia Braginsky (JIRA)
Anastasia Braginsky created HBASE-17081:
---

 Summary: Flush the entire CompactingMemStore content to disk
 Key: HBASE-17081
 URL: https://issues.apache.org/jira/browse/HBASE-17081
 Project: HBase
  Issue Type: Sub-task
Reporter: Anastasia Braginsky
Assignee: Anastasia Braginsky


Part of CompactingMemStore's memory is held by an active segment, and another 
part is divided between immutable segments in the compacting pipeline. Upon 
flush-to-disk request we want to flush all of it to disk, in contrast to 
flushing only tail of the compacting pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-16608) Introducing the ability to merge ImmutableSegments without copy-compaction or SQM usage

2016-11-01 Thread Anastasia Braginsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anastasia Braginsky resolved HBASE-16608.
-
Resolution: Fixed

> Introducing the ability to merge ImmutableSegments without copy-compaction or 
> SQM usage
> ---
>
> Key: HBASE-16608
> URL: https://issues.apache.org/jira/browse/HBASE-16608
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>    Reporter: Anastasia Braginsky
>Assignee: Anastasia Braginsky
> Fix For: 2.0.0
>
> Attachments: HBASE-16417-V02.patch, HBASE-16417-V04.patch, 
> HBASE-16417-V06.patch, HBASE-16417-V07.patch, HBASE-16417-V08.patch, 
> HBASE-16417-V10.patch, HBASE-16608-Final.patch, HBASE-16608-Final.patch, 
> HBASE-16608-V01.patch, HBASE-16608-V03.patch, HBASE-16608-V04.patch, 
> HBASE-16608-V08.patch, HBASE-16608-V09.patch, HBASE-16608-V09.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16608) Introducing the ability to merge ImmutableSegments without copy-compaction or SQM usage

2016-09-10 Thread Anastasia Braginsky (JIRA)
Anastasia Braginsky created HBASE-16608:
---

 Summary: Introducing the ability to merge ImmutableSegments 
without copy-compaction or SQM usage
 Key: HBASE-16608
 URL: https://issues.apache.org/jira/browse/HBASE-16608
 Project: HBase
  Issue Type: Sub-task
Reporter: Anastasia Braginsky






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16421) Introducing the CellChunkMap as a new additional indes in the MemStore

2016-08-16 Thread Anastasia Braginsky (JIRA)
Anastasia Braginsky created HBASE-16421:
---

 Summary: Introducing the CellChunkMap as a new additional indes in 
the MemStore
 Key: HBASE-16421
 URL: https://issues.apache.org/jira/browse/HBASE-16421
 Project: HBase
  Issue Type: Umbrella
Reporter: Anastasia Braginsky






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions

2016-08-15 Thread Anastasia Braginsky (JIRA)
Anastasia Braginsky created HBASE-16417:
---

 Summary: In-Memory MemStore Policy for Flattening and Compactions
 Key: HBASE-16417
 URL: https://issues.apache.org/jira/browse/HBASE-16417
 Project: HBase
  Issue Type: Improvement
Reporter: Anastasia Braginsky






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)