Re: Review Request 51785: [HBASE-16608] Merge for the segments in the compaction pipeline and simplifying the user interface for in-memory flush

Anoop Sam John Mon, 17 Oct 2016 22:23:59 -0700


> On Oct. 16, 2016, 10:43 p.m., Anoop Sam John wrote:
> > hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HeapMemStoreLAB.java,
> >  line 179
> > <https://reviews.apache.org/r/51785/diff/7-8/?file=1511963#file1511963line179>
> >
> >     Able to get what u r trying to do here.  Seems very complex now.
> >     So can we take a diff path?
> >     We have to merge N segments into one here means we have to merge N 
> > MSLABs referred by these segments.  What we do as per this patch is make a 
> > new HeapMSLAB object and add all pooled chunk from other MSLABs int the Q 
> > in new one.  And other stuff of refer next etc.
> >     
> >     Can we make a new MSLAB impl? This will act as a collection of N 
> > HeapMSLABs. Now we have to impl 4 methods
> >     incScannerCount and decScannerCount based on an openScannerCount in the 
> > new class.
> >     close() method like in HeapMSLAB and has to call close() on each of the 
> > N MSLABs it wrap
> >     This is an ImmutableSegment so no one is going to call copyCellInto(). 
> > Make this impl in this new class throw IllegalStateException or so.  Will 
> > that look good?
> 
> Michael Stack wrote:
>     No copying? That sounds good.
> 
> Anoop Sam John wrote:
>     So this fix and so this comment is applicable to merge op alone..  There 
> is still an open point which Me/Ram are not agreeing around this.. That is 
> whether to do this in mem merge.  This is required to limit the #segments it 
> pipeline and to get bigger sized segment at tail of the pipeline. As we do 
> flush of the tail of the segment.  But our point is let us flush all segments 
> (I mean flush to disk)..  In case of in mem compaction (This action is by def 
> OFF and as per data usage one can enable) it is worth to flush only tail.  In 
> this normal use case let us flush all segments and so there is only flatten 
> of segments comes in and so no bug as this. There is absolutely no gain we 
> get because of delaying of flush of the other segments in the pipeline ( I 
> mean in merge kind of usages)..   So I suggest let us discuss on that before 
> going to work on this. Else it wl b waste of time.
> 
> Anastasia Braginsky wrote:
>     I am referencing in this answer the Anoop's suggestion to create a new 
> MSLAB implementation that act as a collection of N old HeapMSLABs. Please pay 
> attention that old scans (opened on N old HeapMSLABs) are not aware of 
> existing of the new MSLAB. In addition the close() of the new MSLAB doesn't 
> automatically means old MSLABs need to be closed, as there might be still 
> ongoing scans. Thus I do not think this idea works (at least not as it is 
> presented now).
> 
> Anastasia Braginsky wrote:
>     Anoop, I do not understand about what exactly you disagree. Are you 
> saying that merging of the segments is waste of time? Do you prefer to do a 
> merge upon the flush to disk? Aa we have seen on stress benchmarks this 
> behavior delays flushes and blocks writes which is undesirable. Please 
> explain yourself clearer or we can talk about if we have a meeting this 
> Wednesday.

Reply
First abt the COllection of N HeapMSLAB wont work.
You say problem with old scans which were working on old HeapMSLABs and they 
dont know abt the new MSLAB.  There wont be an issue.  We will really close an 
MSLAB (I mean return chunks to pool) when there is a close call happened on it 
AND the scanner count is zero.  So in the case u mentioned, as there are old 
scan, the scanner count on these MSLABs are >0. Remember we are not going to 
call close() on these.  We just have a new Collection based MSLAB impl which 
point to N old MSLABs.  So the old scanners still use the old MSLABs and once 
done they decr the counter.
Now on the new one there might be new scan reqs.  So we have scanner count in 
new COllection based MSLAB impl.  When we are at a point to close this new 
MSLAB, we will call close on it. It will call close() on all old MSLABs. (If 
its scanner count is zero or else the close on old MSLABs will be delayed till 
a decr op makes the count to 0)  Now again within the old MSLAB there will be 
normal close flow.. Iff its scanner count is 0, it will really get closed.  So 
in a rare case where the new MSLAB is made and it is at a point of getting 
closed and still a very old scanner still uses old MSLAB, still no issue.. The 
one old MSLAB will get really closed only when the long pending scan ends.

Second abt disagreement of merge
Ya. I feel that there should NOT be an in memory merge op at all.  We need to 
have the compaction op any way.  Only in case of compaction op mode, the on 
disk flush will flush only tail of the pipeline.  In other case (def case) 
there will be in memory flushes (flatten op) and we keep adding segments to 
pipeline and here when a flush to disk req comes, we will flush all segments 
not just tail of pipeline..  We try to do this merge only to make the #segments 
in pipeline less and to have a bigger sized tail for pipeline. No need to do 
merge at any point of time.. The flush will any way use a scanner (With SQM and 
all) and let us make that scanner to be on all segmnets.

Am I making it clear..  Because of 2nd I suggested let us not fix comments 
regarding merge.

- Anoop Sam

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51785/#review152809
-----------------------------------------------------------

On Oct. 14, 2016, 12:49 p.m., Anastasia Braginsky wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51785/
> -----------------------------------------------------------
> 
> (Updated Oct. 14, 2016, 12:49 p.m.)
> 
> 
> Review request for hbase.
> 
> 
> Repository: hbase-git
> 
> 
> Description
> -------
> 
> This is a step toward final compacting memstore that allowes two modes of 
> work: index-compaction and data-compaction. 
> 
> The index-compaction means that when the new segment is pushed into the 
> pipeline, it is flattened and probably merged with old segments in the 
> pipeline. The new merge "feature" induces no data-copy-compaction and no 
> speculative SQM scan. 
> The compacting memstore of the data-compaction type means the usage of the 
> data-copy-compaction.
> 
> 
> Diffs
> -----
> 
>   
> hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactingMemStore.java
>  177f222 
>   
> hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactionPipeline.java
>  6a13f43 
>   
> hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HeapMemStoreLAB.java
>  378601d 
>   
> hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ImmutableSegment.java
>  12b7916 
>   
> hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreCompactor.java
>  714ffe3 
>   
> hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreCompactorIterator.java
>  9798ec2 
>   
> hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreCompactorSegmentsIterator.java
>  PRE-CREATION 
>   
> hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreMergerSegmentsIterator.java
>  PRE-CREATION 
>   
> hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreSegmentsIterator.java
>  PRE-CREATION 
>   
> hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SegmentFactory.java
>  510ebbd 
>   
> hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/VersionedSegmentsList.java
>  2e8bead 
>   
> hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactingMemStore.java
>  211a6d8 
>   
> hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactingToCellArrayMapMemStore.java
>  f89a040 
>   
> hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStoreLAB.java
>  1ea5112 
>   
> hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestPerColumnFamilyFlush.java
>  6bfaa59 
>   
> hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestWalAndCompactingMemStoreFlush.java
>  74826b0 
> 
> Diff: https://reviews.apache.org/r/51785/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Anastasia Braginsky
> 
>

Re: Review Request 51785: [HBASE-16608] Merge for the segments in the compaction pipeline and simplifying the user interface for in-memory flush

Reply via email to