[jira] [Created] (HBASE-20542) Better heap utilization for IMC with MSLABs

2018-05-08 Thread Eshcar Hillel (JIRA)
Eshcar Hillel created HBASE-20542:
-

 Summary: Better heap utilization for IMC with MSLABs
 Key: HBASE-20542
 URL: https://issues.apache.org/jira/browse/HBASE-20542
 Project: HBase
  Issue Type: Task
Reporter: Eshcar Hillel


Following HBASE-20188 we realized in-memory compaction combined with MSLABs may 
suffer from heap under-utilization due to internal fragmentation. This jira 
presents a solution to circumvent this problem. The main idea is to have each 
update operation check if it will cause overflow in the active segment *before* 
it is writing the new value (instead of checking the size after the write is 
completed), and if it is then the active segment is atomically swapped with a 
new empty segment, and is pushed (full-yet-not-overflowed) to the compaction 
pipeline. Later on the IMC deamon will run its compaction operation (flatten 
index/merge indices/data compaction) in the background. Some subtle concurrency 
issues should be handled with care. We next elaborate on them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] Please welcome Francis Liu to the HBase PMC

2018-04-12 Thread Eshcar Hillel
Congrats Francis and good luck!
 

On Wednesday, April 11, 2018, 11:04:17 PM GMT+3, Andrew Purtell 
 wrote:  
 
 On behalf of the Apache HBase PMC I am pleased to announce that Francis
Liu has accepted our invitation to become a PMC member on the Apache
HBase project. We appreciate Francis stepping up to take more
responsibility in the HBase project. He has been an active contributor to
HBase for many years and recently took over responsibilities as branch RM
for branch-1.3.

Please join me in welcoming Francis to the HBase PMC!

-- 
Best regards,
Andrew
  

[jira] [Created] (HBASE-20390) IMC Default Parameters for 2.0.0

2018-04-11 Thread Eshcar Hillel (JIRA)
Eshcar Hillel created HBASE-20390:
-

 Summary: IMC Default Parameters for 2.0.0
 Key: HBASE-20390
 URL: https://issues.apache.org/jira/browse/HBASE-20390
 Project: HBase
  Issue Type: Task
Reporter: Eshcar Hillel


Setting new default parameters for in-memory compaction based on performance 
tests done in HBASE-20188 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Questions about synchronization in compaction pipeline

2018-03-12 Thread Eshcar Hillel
 I agree with Chia-Ping Tsai, read-only copy can be changed from LL to array 
list plus replacing getLast with get(size())However I am dubious about any of 
the suggested changes having any affect on the system performance.Would be very 
interesting to see the workload that can differentiate between these two 
implementations.
On Sunday, March 11, 2018, 4:33:05 PM GMT+2, Chia-Ping Tsai 
 wrote:  
 
 > LL copy is fairly slow and likely loses us any gains. Also, I'm a little
> dubious on the use of LL given that we support a replaceAtIndex which will
> be much faster in an array.
> 
> Can we improve by using an ArrayDeque?
Does ArrayDeque support to change element by arbitrary index? If not, it not 
easy to change the impl by one line since the replaceAtIndex() need to change 
the element in pipeline.

BTW, could we change the impl of "readOnlyCopy" from LinkedList to ArrayList? 
Most ops to  readOnlyCopy are iteration, and the getLast can be replaced by 
size() and get(index). 

On 2018/03/10 20:12:01, Mike Drob  wrote: 
> Hi devs,
> 
> I was reading through HBASE-17434 trying to understand why we have two
> linked lists in compaction pipeline and I'm having trouble following the
> conversation there, especially since it seems intertwined with HBASE-17379
> and jumps back and forth a few times.
> 
> It looks like we are implementing our own copy-on-write list, and there is
> a claim that addFirst is faster on a LinkedList than an array based list. I
> am concerned about the LL copy in pushHead - even if addFirst is faster, a
> LL copy is fairly slow and likely loses us any gains. Also, I'm a little
> dubious on the use of LL given that we support a replaceAtIndex which will
> be much faster in an array.
> 
> Can we improve by using an ArrayDeque?
> 
> Eschar, Anastasia, WDYT?
> 
> Thanks,
> Mike
> 
> Some observations about performance -
> https://stuartmarks.wordpress.com/2015/12/18/some-java-list-benchmarks/
> 
  

Re: [DISCUSS] Performance degradation in master (compared to 2-alpha-1)

2017-11-29 Thread Eshcar Hillel
I'm currently using my cluster for other purposes.When it is available I can 
run the same experiment against alpha-3.
 

On Tuesday, November 28, 2017, 7:17:08 PM GMT+2, Mike Drob 
<md...@apache.org> wrote:  
 
 Eshcar - do you have time to try the other alpha releases and see where
exactly we introduced the regressions?

Also, I'm worried that the performance regression may be related to an
important bug-fix, where before we may have had fast writes but also risked
incorrect behavior somehow.

Mike

On Tue, Nov 28, 2017 at 2:48 AM, Eshcar Hillel <esh...@oath.com.invalid>
wrote:

> I agree, so will wait till we focus on performance.
> Just one more update, I also ran the same experiment (write-only) with
> banch-2 beta-1.Here is a summary of the throughput I see in each tag/branch:
> ---              | BASIC | NONE  |
> ---2-alpha-1| 110K  | 80K    |
> 2-beta-1 |  81K    | 62K    |
> master    | 60K    | 55K    |---
> This means there are multiple sources for the regression.
>
> Thanks
>
>    On Saturday, November 25, 2017, 7:44:01 AM GMT+2, 张铎(Duo Zhang) <
> palomino...@gmail.com> wrote:
>
>  I think first we need a release plan on when we will begin to focus on the
> performance issue?
>
> I do not think it is a good time to focus on performance issue now as we
> haven’t stabilized our build yet. The performance regression may come back
> again after some bug fixes and maybe we use a wrong way to increase
> performance and finally we find that it is just a bug...
>
> Of course I do not mean we can not do any performance related issues now,
> for example, HBASE-19338 is a good catch and can be fixed right now.
>
> And also, for AsyncFSWAL and in memory compaction, we need to consider the
> performance right now as they are born for performance, but let’s focus on
> the comparison to other policies, not a previous release so we can find the
> correct things to fix.
>
> Of course, if there is a big performance downgrading comparing to the
> previous release and we find it then we should tell others, just like this
> email. An earlier notification is always welcomed.
>
> Thanks.
>
> Stack <st...@duboce.net>于2017年11月25日 周六13:22写道:
>
> > On Thu, Nov 23, 2017 at 7:35 AM, Eshcar Hillel <esh...@oath.com.invalid>
> > wrote:
> >
> > > Happy Thanksgiving all,
> > >
> >
> > And to you Eshcar.
> >
> >
> >
> > > In recent benchmarks I ran in HBASE-18294 I discovered major
> performance
> > > degradation of master code w.r.t 2-alpha-1 code.I am running write-only
> > > workload (similar to the one reported in HBASE-16417). I am using the
> > same
> > > hardware and same configuration settings (specifically, I testes both
> > basic
> > > memstore compaction with optimal parameters, and no memsore
> > > compaction).While in 2-alpha-1 code I see throughput of ~110Kops for
> > basic
> > > compaction and ~80Kops for no compaction, in the master code I get only
> > > 60Kops and 55Kops, respectively. *This is almost 50% reduction in
> > > performance*.
> > > (1) Did anyone else noticed such degradation?(2) Do we have any
> > systematic
> > > automatic/semi-automatic method to track the sources of this
> performance
> > > issue?
> > > Thanks,Eshcar
> > >
> >
> >
> > On #1, no. I've not done perf compare. I wonder if later alpha versions
> > include the regression (I'll have to check and see).
> >
> > On #2, again no. I intend to do a bit of perf tuning and compare before
> > release.
> >
> > If you don't file an issue, I will do so later for myself as a task to
> > compare at least to alpha-1.
> >
> > Thanks Eshcar,
> >
> > St.Ack
> >
>  

Re: [DISCUSS] Performance degradation in master (compared to 2-alpha-1)

2017-11-28 Thread Eshcar Hillel
I agree, so will wait till we focus on performance.
Just one more update, I also ran the same experiment (write-only) with banch-2 
beta-1.Here is a summary of the throughput I see in each tag/branch:
---  | BASIC | NONE  |
---2-alpha-1| 110K   | 80K |
2-beta-1 |  81K    | 62K |
master    | 60K | 55K |---
This means there are multiple sources for the regression.
 
Thanks

On Saturday, November 25, 2017, 7:44:01 AM GMT+2, 张铎(Duo Zhang) 
<palomino...@gmail.com> wrote:  
 
 I think first we need a release plan on when we will begin to focus on the
performance issue?

I do not think it is a good time to focus on performance issue now as we
haven’t stabilized our build yet. The performance regression may come back
again after some bug fixes and maybe we use a wrong way to increase
performance and finally we find that it is just a bug...

Of course I do not mean we can not do any performance related issues now,
for example, HBASE-19338 is a good catch and can be fixed right now.

And also, for AsyncFSWAL and in memory compaction, we need to consider the
performance right now as they are born for performance, but let’s focus on
the comparison to other policies, not a previous release so we can find the
correct things to fix.

Of course, if there is a big performance downgrading comparing to the
previous release and we find it then we should tell others, just like this
email. An earlier notification is always welcomed.

Thanks.

Stack <st...@duboce.net>于2017年11月25日 周六13:22写道:

> On Thu, Nov 23, 2017 at 7:35 AM, Eshcar Hillel <esh...@oath.com.invalid>
> wrote:
>
> > Happy Thanksgiving all,
> >
>
> And to you Eshcar.
>
>
>
> > In recent benchmarks I ran in HBASE-18294 I discovered major performance
> > degradation of master code w.r.t 2-alpha-1 code.I am running write-only
> > workload (similar to the one reported in HBASE-16417). I am using the
> same
> > hardware and same configuration settings (specifically, I testes both
> basic
> > memstore compaction with optimal parameters, and no memsore
> > compaction).While in 2-alpha-1 code I see throughput of ~110Kops for
> basic
> > compaction and ~80Kops for no compaction, in the master code I get only
> > 60Kops and 55Kops, respectively. *This is almost 50% reduction in
> > performance*.
> > (1) Did anyone else noticed such degradation?(2) Do we have any
> systematic
> > automatic/semi-automatic method to track the sources of this performance
> > issue?
> > Thanks,Eshcar
> >
>
>
> On #1, no. I've not done perf compare. I wonder if later alpha versions
> include the regression (I'll have to check and see).
>
> On #2, again no. I intend to do a bit of perf tuning and compare before
> release.
>
> If you don't file an issue, I will do so later for myself as a task to
> compare at least to alpha-1.
>
> Thanks Eshcar,
>
> St.Ack
>  

Re: [DISCUSS] Should flush decisions be made based on data size (key-value only) or based on heap size (including metadata overhead)?

2017-08-08 Thread Eshcar Hillel
We can change the AtomicLong to be AtomicReference which is 
updated atomically.MemStoreSize can be changed to hold on-heap memory, off-heap 
memory and data size, or just on-heap memory and off-heap memory if data size 
is not required anymore.



On Monday, August 7, 2017, 10:23:12 AM GMT+3, Anoop John 
<anoop.hb...@gmail.com> wrote:

Sorry for being later to reply.

So u mean we should track both sizes even at Region level?  This was
considered at that time but did not do as that will add more overhead.
We have to deal with 2 AtomicLongs in every Region.  Right now we
handle this double check at RS level only so that added just one more
variable dealing.

-Anoop-

On Mon, Jul 10, 2017 at 7:34 PM, Eshcar Hillel
<esh...@yahoo-inc.com.invalid> wrote:
> Here is a suggestion:We can track both heap and off-heap sizes and have 2 
> thresholds one for limiting heap size and one for limiting off-heap size.And 
> in all decision making junctions we check whether one of the thresholds is 
> exceeded and if it is we trigger a flush. We can choose which entity to flush 
> based on the cause.For example, if we decided to flush since the heap size 
> exceeds the heap threshold than we flush the region/store with greatest heap 
> size. and likewise for off-heap flush.
>
> I can prepare a patch.
>
> This is not rolling back HBASE-18294 simply refining it to have different 
> decision making for the on and off heap cases.
>
> On Monday, July 10, 2017, 8:25:12 AM GMT+3, Anoop John 
> <anoop.hb...@gmail.com> wrote:
>
> Stack and others..
> We wont do any OOM or FullGC issues.  Because globally at RS level we
> will track both the data size (of all the memstores) and the heap
> size.  The decision there accounts both. In fact in case of normal on
> heap memstores, the accounting is like the old way of heap size based.
>
> At region level (and at Segments level)  we track data size only.  The
> decisions are based on data size.
>
> So in the past region flush size of 128 MB means we will flush when
> the heap size of that region crosses 128 MB.  But now it is data size
> alone.  What I feel is that is more inclined to a normal user
> thinking.  He say flush size of 128 MB and then the thinking can be
> 128 MB of data.
>
> The background of this change is the off heap memstores where we need
> separate tracking of both data and heap overhead sizes.  But at
> region level this behave change was done thinking that is more user
> oriented
>
> I agree with Yu that it is a surprising behave change. Ya if not tuned
> accordingly one might see more blocked writes. Because the per region
> flushes are more delayed now and so chances of reaching the global
> memstore upper barrier chances are more.  And then we will block
> writes and force flushes.  (But off heap memstores will do better job
> here).  But this would NOT cause any OOME or FullGC.
>
> I guess we should have reduced the 128 MB default flush size then?  I
> asked this Q in that jira and then we did not discuss further.
>
> I hope I explained the background and the change and the impacts.  Thanks.
>
> -Anoop-
>
> On Thu, Jul 6, 2017 at 11:43 AM, 宾莉金(binlijin) <binli...@gmail.com> wrote:
>> I like to use the former, heap occupancy, so we not need to worry about the
>> OOM and FullGc,and change configuration to adapted to new policy.
>>
>> 2017-07-06 14:03 GMT+08:00 Stack <st...@duboce.net>:
>>
>>> On Wed, Jul 5, 2017 at 9:59 PM, ramkrishna vasudevan <
>>> ramkrishna.s.vasude...@gmail.com> wrote:
>>>
>>> >
>>> > >>Sounds like we should be doing the former, heap occupancy
>>> > Stack, so do you mean we need to roll back this new change in trunk? The
>>> > background is https://issues.apache.org/jira/browse/HBASE-16747.
>>> >
>>> >
>>> I remember that issue. It seems good to me (as it did then) where we have
>>> the global tracking in RS of all data and overhead so we shouldn't OOME and
>>> we keep accounting of overhead and data distinct because now data can be
>>> onheap or offheap.
>>>
>>> We shouldn't be doing blocking updates -- not when there is probably loads
>>> of memory still available -- but that is a different (critical) issue.
>>> Sounds like current configs can 'surprise' -- see Yu Li note -- given the
>>> new accounting.
>>>
>>> Looks like I need to read HBASE-18294
>>> <https://issues.apache.org/jira/browse/HBASE-18294> to figure what the
>>> pivot/problem w/ the new policy is.
>>>
>>> Thanks,
>>> St.Ack
>>>
>>>
>>>
>>>
>>>
>>> > Reg

Re: [DISCUSS] Should flush decisions be made based on data size (key-value only) or based on heap size (including metadata overhead)?

2017-07-10 Thread Eshcar Hillel
6:11:54,724 INFO
>> > >  [B.defaultRpcServer.handler=182,queue=11,port=16020]
>> > > regionserver.MemStoreFlusher: Blocking updates on
>> > > hadoop0528.et2.tbsite.net,16020,1497336978160:
>> > > global memstore heapsize 7.2 G is >= than blocking 7.2 G size
>> > > 2017-07-03 16:11:54,754 INFO
>> > >  [B.defaultRpcServer.handler=186,queue=15,port=16020]
>> > > regionserver.MemStoreFlusher: Blocking updates on
>> > > hadoop0528.et2.tbsite.net,16020,1497336978160:
>> > > global memstore heapsize 7.2 G is >= than blocking 7.2 G size
>> > > 2017-07-03 16:11:57,571 INFO  [MemStoreFlusher.0]
>> > > regionserver.MemStoreFlusher: Flush of region
>> > > mainv7_main_result_c,1496,1499062935573.02adfa7cbdc606dce5b79a516e1649
>> > 2a.
>> > > due to global heap pressure. Total Memstore size=3.2 G, Region memstore
>> > > size=331.4 M
>> > > 2017-07-03 16:11:57,571 WARN
>> > >  [B.defaultRpcServer.handler=49,queue=11,port=16020]
>> > > regionserver.MemStoreFlusher: Memstore is above high water mark and
>> block
>> > > 2892ms
>> > >
>> > > Best Regards,
>> > > Yu
>> > >
>> > > On 6 July 2017 at 00:56, Stack <st...@duboce.net> wrote:
>> > >
>> > > > On Wed, Jul 5, 2017 at 6:30 AM, Eshcar Hillel
>> > > <esh...@yahoo-inc.com.invalid
>> > > > >
>> > > > wrote:
>> > > >
>> > > > > Hi All,
>> > > > > I opened a new Jira https://issues.apache.org/
>> > jira/browse/HBASE-18294
>> > > to
>> > > > > discuss this question.
>> > > > > Flush decisions are taken at the region level and also at the
>> region
>> > > > > server level - there is the question of when to trigger a flush and
>> > > then
>> > > > > which region/store to flush.Regions track both their data size
>> > > (key-value
>> > > > > size only) and their total heap occupancy (including index and
>> > > additional
>> > > > > metadata).One option (which was the past policy) is to trigger
>> > flushes
>> > > > and
>> > > > > choose flush subjects based on regions heap size - this gives a
>> > better
>> > > > > estimation for sysadmin of how many regions can a RS carry.Another
>> > > option
>> > > > > (which is the current policy) is to look at the data size - this
>> > gives
>> > > a
>> > > > > better estimation of the size of the files that are created by the
>> > > flush.
>> > > > >
>> > > >
>> > > >
>> > > > Sounds like we should be doing the former, heap occupancy. An
>> > > > OutOfMemoryException puts a nail in any benefit other accountings
>> might
>> > > > have.
>> > > >
>> > > > St.Ack
>> > > >
>> > > >
>> > > >
>> > > > > I see this is as critical to HBase performance and usability,
>> namely
>> > > > > meeting the user expectation from the system, hence I would like to
>> > > hear
>> > > > as
>> > > > > many voices as possible.Please join the discussion in the Jira and
>> > let
>> > > us
>> > > > > know what you think.
>> > > > > Thanks,Eshcar
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>
>
> --
> *Best Regards,*
>  lijin bin

[DISCUSS] Should flush decisions be made based on data size (key-value only) or based on heap size (including metadata overhead)?

2017-07-05 Thread Eshcar Hillel
Hi All,
I opened a new Jira https://issues.apache.org/jira/browse/HBASE-18294 to 
discuss this question.
Flush decisions are taken at the region level and also at the region server 
level - there is the question of when to trigger a flush and then which 
region/store to flush.Regions track both their data size (key-value size only) 
and their total heap occupancy (including index and additional metadata).One 
option (which was the past policy) is to trigger flushes and choose flush 
subjects based on regions heap size - this gives a better estimation for 
sysadmin of how many regions can a RS carry.Another option (which is the 
current policy) is to look at the data size - this gives a better estimation of 
the size of the files that are created by the flush.  
I see this is as critical to HBase performance and usability, namely meeting 
the user expectation from the system, hence I would like to hear as many voices 
as possible.Please join the discussion in the Jira and let us know what you 
think.
Thanks,Eshcar



[jira] [Created] (HBASE-18294) Flush policy checks data size instead of heap size

2017-06-29 Thread Eshcar Hillel (JIRA)
Eshcar Hillel created HBASE-18294:
-

 Summary: Flush policy checks data size instead of heap size
 Key: HBASE-18294
 URL: https://issues.apache.org/jira/browse/HBASE-18294
 Project: HBase
  Issue Type: Bug
Reporter: Eshcar Hillel
Assignee: Eshcar Hillel


A flush policy decides whether to flush a store by comparing the size of the 
store to a threshold (that can be configured with 
hbase.hregion.percolumnfamilyflush.size.lower.bound).
Currently the implementation compares the data size (key-value only) to the 
threshold where it should compare the heap size (which includes index size, and 
metadata).




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-17575) Towards Making BASIC the Default In-Memory Compaction Policy

2017-02-01 Thread Eshcar Hillel (JIRA)
Eshcar Hillel created HBASE-17575:
-

 Summary: Towards Making BASIC the Default In-Memory Compaction 
Policy
 Key: HBASE-17575
 URL: https://issues.apache.org/jira/browse/HBASE-17575
 Project: HBase
  Issue Type: Sub-task
Reporter: Eshcar Hillel


We remove the NONE configuration setting added to tests in HBASE-17294 and 
HBASE-17316.
We run these tests with all 3 memory compaction policies.
For each test 
(1) if all 3 pass -- we remove the configuration from the test.
(2) if some fail we add tests of all 3 configurations, e.g., by parameterized 
tests. When needed we update expected results.

One test failure identified a small bug which is also fixed in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17434) New Synchronization Scheme for Compaction Pipeline

2017-01-06 Thread Eshcar Hillel (JIRA)
Eshcar Hillel created HBASE-17434:
-

 Summary: New Synchronization Scheme for Compaction Pipeline
 Key: HBASE-17434
 URL: https://issues.apache.org/jira/browse/HBASE-17434
 Project: HBase
  Issue Type: Bug
Reporter: Eshcar Hillel


A new copyOnWrite synchronization scheme is introduced for the compaction 
pipeline.
The new scheme is better since it removes the lock from getSegments() which is 
invoked in every get and scan operation, and it reduces the number of 
LinkedList objects that are created at runtime, thus can reduce GC (not by 
much, but still...).

In addition, it fixes the method getTailSize() in compaction pipeline. This 
method creates a MemstoreSize object which comprises the data size and the 
overhead size of the segment and needs to be atomic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17407) Correct update of maxFlushedSeqId in HRegion

2017-01-03 Thread Eshcar Hillel (JIRA)
Eshcar Hillel created HBASE-17407:
-

 Summary: Correct update of maxFlushedSeqId in HRegion
 Key: HBASE-17407
 URL: https://issues.apache.org/jira/browse/HBASE-17407
 Project: HBase
  Issue Type: Bug
Reporter: Eshcar Hillel


The attribute maxFlushedSeqId in HRegion is used to track the max sequence id 
in the store files and is reported to HMaster. When flushing only part of the 
memstore content this value might be incorrect and may cause data loss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17339) Scan-Memory-First Optimization

2016-12-19 Thread Eshcar Hillel (JIRA)
Eshcar Hillel created HBASE-17339:
-

 Summary: Scan-Memory-First Optimization
 Key: HBASE-17339
 URL: https://issues.apache.org/jira/browse/HBASE-17339
 Project: HBase
  Issue Type: Improvement
Reporter: Eshcar Hillel


The current implementation of a get operation (to retrieve values for a 
specific key) scans through all relevant stores of the region; for each store 
both memory components (memstores segments) and disk components (hfiles) are 
scanned in parallel.
We suggest to apply an optimization that speculatively scans memory-only 
components first and only if the result is incomplete scans both memory and 
disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17316) Addendum to HBASE-17294

2016-12-14 Thread Eshcar Hillel (JIRA)
Eshcar Hillel created HBASE-17316:
-

 Summary: Addendum to HBASE-17294
 Key: HBASE-17316
 URL: https://issues.apache.org/jira/browse/HBASE-17316
 Project: HBase
  Issue Type: Sub-task
Reporter: Eshcar Hillel
Assignee: Eshcar Hillel


Updating 2 tests that failed during the commit of HBASE-17294



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17294) External Configuration for Memory Compaction

2016-12-12 Thread Eshcar Hillel (JIRA)
Eshcar Hillel created HBASE-17294:
-

 Summary: External Configuration for Memory Compaction 
 Key: HBASE-17294
 URL: https://issues.apache.org/jira/browse/HBASE-17294
 Project: HBase
  Issue Type: Sub-task
Reporter: Eshcar Hillel


We would like to have a single external knob to control memstore compaction.
Possible memstore compaction policies are none, basic, and eager.
This sub-task allows to set this property at the column family level at table 
creation time:
{code}
create ‘’,
   {NAME => ‘’, 
IN_MEMORY_COMPACTION => ‘<NONE|BASIC|EAGER>’}
{code}
or to set this at the global configuration level by setting the property in 
abase-site.xml, with BASIC being the default value:
{code}

hbase.hregion.compacting.memstore.type
<NONE|BASIC|EAGER>

{code}
The values used in this property can change as memstore compaction policies 
evolve over time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-15359) Simplifying Segment hierarchy

2016-02-29 Thread Eshcar Hillel (JIRA)
Eshcar Hillel created HBASE-15359:
-

 Summary: Simplifying Segment hierarchy
 Key: HBASE-15359
 URL: https://issues.apache.org/jira/browse/HBASE-15359
 Project: HBase
  Issue Type: Sub-task
Reporter: Eshcar Hillel
Assignee: Eshcar Hillel


Now that it is clear that no memstore segment will be implemented as an HFIle, 
and that all segments store their data in some representation of CellSet 
(skip-list or flat), the segment hierarchy can be much simplified.

The attached patch includes only 3 classes in the hierarchy:
Segment - comprises most of the state and implementation
MutableSegment - extends API with add and rollback functionality
ImmutableSegment - extends API with key-value scanner for snapshot

SegmentScanner is the scanner for all types of segments. 

In addition, the option to rollback immutable segment in the memstore is 
disabled.

This code would allow us to make progress independently in the compaction 
subtask (HBASE-14920) and the flat index representation subtask (HBASE-14921). 
It also means that the new immutable segment can reuse the existing 
SegmentScanner, instead of implementing a new scanner.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-15016) StoreServices facility in Region

2015-12-19 Thread Eshcar Hillel (JIRA)
Eshcar Hillel created HBASE-15016:
-

 Summary: StoreServices facility in Region
 Key: HBASE-15016
 URL: https://issues.apache.org/jira/browse/HBASE-15016
 Project: HBase
  Issue Type: Sub-task
Reporter: Eshcar Hillel


The default implementation of a memstore ensures that between two flushes the 
memstore size increases monotonically. Supporting new memstores that store data 
in different formats (specifically, compressed), or that allows to eliminate 
data redundancies in memory (e.g., via compaction), means that the size of the 
data stored in memory can decrease even between two flushes. This requires 
memstores to have access to facilities that manipulate region counters and 
synchronization.
This subtasks introduces a new region interface -- StoreServices, through which 
store components can access these facilities.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14918) In-Memory MemStore Flush and Compaction

2015-12-03 Thread Eshcar Hillel (JIRA)
Eshcar Hillel created HBASE-14918:
-

 Summary: In-Memory MemStore Flush and Compaction
 Key: HBASE-14918
 URL: https://issues.apache.org/jira/browse/HBASE-14918
 Project: HBase
  Issue Type: Umbrella
Affects Versions: 2.0.0
Reporter: Eshcar Hillel


A memstore serves as the in-memory component of a store unit, absorbing all 
updates to the store. From time to time these updates are flushed to a file on 
disk, where they are compacted (by eliminating redundancies) and compressed 
(i.e., written in a compressed format to reduce their storage size).

We aim to speed up data access, and therefore suggest to apply in-memory 
memstore flush. That is to flush the active in-memory segment into an 
intermediate buffer where it can be accessed by the application. Data in the 
buffer is subject to compaction and can be stored in any format that allows it 
to take up smaller space in RAM. The less space the buffer consumes the longer 
it can reside in memory before data is flushed to disk, resulting in better 
performance.
Specifically, the optimization is beneficial for workloads with medium-to-high 
key churn which incur many redundant cells, like persistent messaging. 

We suggest to structure the solution as 3 subtasks (respectively, patches). 
(1) Infrastructure - refactoring of the MemStore hierarchy, introducing segment 
(StoreSegment) as first-class citizen, and decoupling memstore scanner from the 
memstore implementation;
(2) Implementation of a new memstore (CompactingMemstore) with non-optimized 
immutable segment representation, and 
(3) Memory optimization including compressed format representation and offheap 
allocations.

This Jira continues the discussion in HBASE-13408.
Design documents, evaluation results and previous patches can be found in 
HBASE-13408. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14921) Memory optimizations

2015-12-03 Thread Eshcar Hillel (JIRA)
Eshcar Hillel created HBASE-14921:
-

 Summary: Memory optimizations
 Key: HBASE-14921
 URL: https://issues.apache.org/jira/browse/HBASE-14921
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.0
Reporter: Eshcar Hillel


Memory optimizations including compressed format representation and offheap 
allocations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14919) Infrastructure refactoring

2015-12-03 Thread Eshcar Hillel (JIRA)
Eshcar Hillel created HBASE-14919:
-

 Summary: Infrastructure refactoring
 Key: HBASE-14919
 URL: https://issues.apache.org/jira/browse/HBASE-14919
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.0
Reporter: Eshcar Hillel
Assignee: Eshcar Hillel


Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
first-class citizen and decoupling memstore scanner from the memstore 
implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14920) Compacting Memstore

2015-12-03 Thread Eshcar Hillel (JIRA)
Eshcar Hillel created HBASE-14920:
-

 Summary: Compacting Memstore
 Key: HBASE-14920
 URL: https://issues.apache.org/jira/browse/HBASE-14920
 Project: HBase
  Issue Type: Sub-task
Reporter: Eshcar Hillel
Assignee: Eshcar Hillel


Implementation of a new compacting memstore with non-optimized immutable 
segment representation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13719) Asynchronous scanner -- cache size-in-bytes bug fix

2015-05-20 Thread Eshcar Hillel (JIRA)
Eshcar Hillel created HBASE-13719:
-

 Summary: Asynchronous scanner -- cache size-in-bytes bug fix
 Key: HBASE-13719
 URL: https://issues.apache.org/jira/browse/HBASE-13719
 Project: HBase
  Issue Type: Bug
Reporter: Eshcar Hillel


Hbase Streaming Scan is a feature recently added to trunk.
In this feature, an asynchronous scanner pre-loads data to the cache based on 
its size (both row count and size in bytes). In one of the locations where the 
scanner polls an item from the cache, the variable holding the estimated byte 
size of the cache is not updated. This affects the decision of when to load the 
next batch of data.

A bug fix patch is attached - it comprises only local changes to the 
ClientAsyncPrefetchScanner.java file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-04-05 Thread Eshcar Hillel (JIRA)
Eshcar Hillel created HBASE-13408:
-

 Summary: HBase In-Memory Memstore Compaction
 Key: HBASE-13408
 URL: https://issues.apache.org/jira/browse/HBASE-13408
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel


A store unit holds a column family in a region, where the memstore is its 
in-memory component. The memstore absorbs all updates to the store; from time 
to time these updates are flushed to a file on disk, where they are compacted. 
Unlike disk components, the memstore is not compacted until it is written to 
the filesystem and optionally to block-cache. This may result in 
underutilization of the memory due to duplicate entries per row, for example, 
when hot data is continuously updated. 
Generally, the faster the data is accumulated in memory, more flushes are 
triggered, the data sinks to disk more frequently, slowing down retrieval of 
data, even if very recent.

In high-churn workloads, compacting the memstore can help maintain the data in 
memory, and thereby speed up data retrieval. 
We suggest a new compacted memstore with the following principles:
1.  The data is kept in memory for as long as possible
2.  Memstore data is either compacted or in process of being compacted 
3.  Allow a panic mode, which may interrupt an in-progress compaction and 
force a flush of part of the memstore.

We suggest applying this optimization only to in-memory column families.

A design document is attached.
This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13071) Hbase Streaming Scan Feature

2015-02-19 Thread Eshcar Hillel (JIRA)
Eshcar Hillel created HBASE-13071:
-

 Summary: Hbase Streaming Scan Feature
 Key: HBASE-13071
 URL: https://issues.apache.org/jira/browse/HBASE-13071
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel


A scan operation iterates over all rows of a table or a subrange of the table. 
The synchronous nature in which the data is served at the client side hinders 
the speed the application traverses the data: it increases the overall 
processing time, and may cause a great variance in the times the application 
waits for the next piece of data.

The scanner next() method at the client side invokes an RPC to the regionserver 
and then stores the results in a cache. The application can specify how many 
rows will be transmitted per RPC; by default this is set to 100 rows. 
The cache can be considered as a producer-consumer queue, where the hbase 
client pushes the data to the queue and the application consumes it. Currently 
this queue is synchronous, i.e., blocking. More specifically, when the 
application consumed all the data from the cache---so the cache is empty---the 
hbase client retrieves additional data from the server and re-fills the cache 
with new data. During this time the application is blocked.

Under the assumption that the application processing time can be balanced by 
the time it takes to retrieve the data, an asynchronous approach can reduce the 
time the application is waiting for data.

We attach a design document.
We also have a patch that is based on a private branch, and some evaluation 
results of this code.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)