[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2018-04-05 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427663#comment-16427663
 ] 

Edward Bortnikov commented on HBASE-16851:
--

[~stack] whatever you find the right procedure. 


Sent from Yahoo Mail for iPhone


On Thursday, April 5, 2018, 8:05 PM, stack (JIRA)  wrote:


    [ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427259#comment-16427259
 ] 

stack commented on HBASE-16851:
---

Not sure what this was resolved. I see you posted a bit of doc for the refguide 
[~ebortnik] and it didn't get any love. It looks good. There is also 
HBASE-20259 "Doc configs for in-memory-compaction and add detail to 
in-memory-compaction logging" which went in but this should have gone in before 
it. Should we reopen this to get your doc in on top of HBASE-20259 sir?





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
>Assignee: Edward Bortnikov
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, 
> Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory 
> Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20188) [TESTING] Performance

2018-04-05 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426569#comment-16426569
 ] 

Edward Bortnikov commented on HBASE-20188:
--

Michael, thanks for all the diligence, apparently you are a step ahead of us 
with 8G. 

Could you please post the complete results for 8G so that we could see the 
difference between the reads and the writes? The workloada result is weird - 
the writes are skewed, and IMC should really shine. Apparently, the read and 
the write paths have very different (and independent) issues. With workloadc, 
there is no reason IMC would work faster (multiple segments to look up), but 
let's understand workloada first. 

Thanks again. 

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> flamegraph-1072.1.svg, flamegraph-1072.2.svg, hbase-env.sh, hbase-site.xml, 
> lock.127.workloadc.20180402T200918Z.svg, 
> lock.2.memsize2.c.20180403T160257Z.svg, run_ycsb.sh, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20188) [TESTING] Performance

2018-04-04 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425861#comment-16425861
 ] 

Edward Bortnikov commented on HBASE-20188:
--

[~stack] just making sure we're on the same page .. the "2 all defaults" column 
(col I) does not include FastPath (included in Col F), is this intentional?

One other thing that puzzles me is the discrepancy between your and [~eshcar]'s 
results for workloadA - her results show +27% upside for IMC, curious what's 
going on here? 

Last question - do you intend to start looking at off-heap configurations? We 
are working on them now, too. 

Thanks 

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> flamegraph-1072.1.svg, flamegraph-1072.2.svg, hbase-env.sh, hbase-site.xml, 
> lock.127.workloadc.20180402T200918Z.svg, 
> lock.2.memsize2.c.20180403T160257Z.svg, run_ycsb.sh, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20188) [TESTING] Performance

2018-04-04 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425804#comment-16425804
 ] 

Edward Bortnikov commented on HBASE-20188:
--

[~eshcar] could you please post your YCSB 100%W benchmark code? 

Thanks

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> flamegraph-1072.1.svg, flamegraph-1072.2.svg, hbase-site.xml, 
> lock.127.workloadc.20180402T200918Z.svg, 
> lock.2.memsize2.c.20180403T160257Z.svg, run_ycsb.sh, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-20234) Expose in-memory compaction metrics

2018-04-04 Thread Edward Bortnikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Bortnikov reassigned HBASE-20234:


Assignee: Anastasia Braginsky

> Expose in-memory compaction metrics
> ---
>
> Key: HBASE-20234
> URL: https://issues.apache.org/jira/browse/HBASE-20234
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Assignee: Anastasia Braginsky
>Priority: Major
>
> Hard to glean insight from how well in-memory compaction is doing currently. 
> It dumps stats into the logs but better if they were available to a 
> dashboard. This issue is about exposing a couple of helpful counts. There are 
> already by-region metrics. We can add a few for in-memory compaction (Help me 
> out [~anastas]... what counts would be best to expose).
> Flush related metrics include
> {code}
> Namespace_default_table_tsdb-tree_region_cfbf23e7330a1a2bbde031f9583d3415_metric_flushesQueuedCount:
>  {
> description: "Number flushes requested/queued for this region",
> value: 0
> {
> description: "The number of cells flushed to disk",
> value: 0
> },
> {
> description: "The total amount of data flushed to disk, in bytes",
> value: 0
> },
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-20259) Doc configs for in-memory-compaction and add detail to in-memory-compaction logging

2018-04-02 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421993#comment-16421993
 ] 

Edward Bortnikov edited comment on HBASE-20259 at 4/2/18 7:29 AM:
--

[~stack] let's take the decision in a broader context of the overall 
performance evaluation methodology. Tons of effort have been invested in 
exploring CompactingMemstore, including the off-heap path. The demonstrated 
performance benefits were decisive. Let's avoid haste just now. 

We're now on testing the system in the settings that have not been expected 
before (CMS+MSLAB on-heap combination), still need some time to figure out 
what's going on. Stay tuned sir ..


was (Author: ebortnik):
[~stack] let's take the decision in a broader context of the overall 
performance evaluation methodology. Tons of effort have been invested in 
exploring CompactingMemstore, and the demonstrated performance benefits were 
decisive. Let's avoid haste just now. 

We're now on testing the system in the settings that have not been expected 
before (CMS+MSLAB on-heap combination), still need some time to figure out 
what's going on. Stay tuned sir ..

> Doc configs for in-memory-compaction and add detail to in-memory-compaction 
> logging
> ---
>
> Key: HBASE-20259
> URL: https://issues.apache.org/jira/browse/HBASE-20259
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-20259.master.001.patch, 
> HBASE-20259.master.002.patch, HBASE-20259.master.003.patch
>
>
> I set {{hbase.systemtables.compacting.memstore.type}} to NONE but it seems 
> like in-memory is still on. My table looks like this:
> {code}
> Table ycsb is ENABLED
> ycsb
> COLUMN FAMILIES DESCRIPTION
> {NAME => 'family', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', 
> NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', 
> CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 
> 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER =
> > 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', 
> > CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', 
> > COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}
> {code}
> Looks like table doesn't have it on either (IN_MEMORY_COMPACTION doesn't show 
> in the above).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20259) Doc configs for in-memory-compaction and add detail to in-memory-compaction logging

2018-04-02 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421993#comment-16421993
 ] 

Edward Bortnikov commented on HBASE-20259:
--

[~stack] let's take the decision in a broader context of the overall 
performance evaluation methodology. Tons of effort have been invested in 
exploring CompactingMemstore, and the demonstrated performance benefits were 
decisive. Let's avoid haste just now. 

We're now on testing the system in the settings that have not been expected 
before (CMS+MSLAB on-heap combination), still need some time to figure out 
what's going on. Stay tuned sir ..

> Doc configs for in-memory-compaction and add detail to in-memory-compaction 
> logging
> ---
>
> Key: HBASE-20259
> URL: https://issues.apache.org/jira/browse/HBASE-20259
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-20259.master.001.patch, 
> HBASE-20259.master.002.patch, HBASE-20259.master.003.patch
>
>
> I set {{hbase.systemtables.compacting.memstore.type}} to NONE but it seems 
> like in-memory is still on. My table looks like this:
> {code}
> Table ycsb is ENABLED
> ycsb
> COLUMN FAMILIES DESCRIPTION
> {NAME => 'family', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', 
> NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', 
> CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 
> 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER =
> > 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', 
> > CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', 
> > COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}
> {code}
> Looks like table doesn't have it on either (IN_MEMORY_COMPACTION doesn't show 
> in the above).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18294) Reduce global heap pressure: flush based on heap occupancy

2018-02-06 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353979#comment-16353979
 ] 

Edward Bortnikov commented on HBASE-18294:
--

[~eshcar], [~anoop.hbase], [~ram_krish], maybe we can take the following 
approach to your dispute on RB. 

The code makes the following config variable definitions: 
{code:java}
public static final String HREGION_MEMSTORE_FLUSH_SIZE =

  "hbase.hregion.memstore.flush.size";
 
public static final String HREGION_MEMSTORE_OFFHEAP_FLUSH_SIZE =
  "hbase.hregion.memstore.offheap.flush.size";{code}
The former is the legacy flush size threshold, whereas the latter is new. 
However, the further treatment is different - HREGION_MEMORY_FLUSH_SIZE is 
actually treated as *on-heap* threshold. This is confusing I guess - especially 
for admins. 

Having said that, we do need separate accounting for on-heap and off-heap 
memory, as [~eshcar] explained above. Let me suggest a change that is more 
digestible for users imo. Let HREGION_MEMORY_FLUSH_SIZE retain its legacy 
meaning (and the 128M default) - namely, the overall max memory the system is 
willing to allocate for a store. Furthermore, let's define a new variable, 
HREGION_MEMSTORE_OFFHEAP_SIZE_RATIO, to define a fraction of the former that 
can be allocated offheap (0, by default). 

How about that? 

 

 

> Reduce global heap pressure: flush based on heap occupancy
> --
>
> Key: HBASE-18294
> URL: https://issues.apache.org/jira/browse/HBASE-18294
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-18294.01.patch, HBASE-18294.01.patch, 
> HBASE-18294.01.patch, HBASE-18294.01.patch, HBASE-18294.02.patch, 
> HBASE-18294.03.patch, HBASE-18294.04.patch, HBASE-18294.05.patch, 
> HBASE-18294.06.patch, HBASE-18294.07.patch, HBASE-18294.07.patch, 
> HBASE-18294.08.patch, HBASE-18294.09.patch, HBASE-18294.10.patch, 
> HBASE-18294.11.patch, HBASE-18294.11.patch, HBASE-18294.12.patch, 
> HBASE-18294.13.patch, HBASE-18294.15.patch, HBASE-18294.16.patch, 
> HBASE-18294.master.01.patch
>
>
> A region is flushed if its memory component exceed a threshold (default size 
> is 128MB).
> A flush policy decides whether to flush a store by comparing the size of the 
> store to another threshold (that can be configured with 
> hbase.hregion.percolumnfamilyflush.size.lower.bound).
> Currently the implementation (in both cases) compares the data size 
> (key-value only) to the threshold where it should compare the heap size 
> (which includes index size, and metadata).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18294) Reduce global heap pressure: flush based on heap occupancy

2018-02-04 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351714#comment-16351714
 ] 

Edward Bortnikov commented on HBASE-18294:
--

I second [~eshcar]. Off-heap and on-heap memory are different resources, with 
potentially very different allocations within the same machine. The code 
already addresses them separately all the way long. The user does need this 
(optional) design knob.  

> Reduce global heap pressure: flush based on heap occupancy
> --
>
> Key: HBASE-18294
> URL: https://issues.apache.org/jira/browse/HBASE-18294
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-18294.01.patch, HBASE-18294.01.patch, 
> HBASE-18294.01.patch, HBASE-18294.01.patch, HBASE-18294.02.patch, 
> HBASE-18294.03.patch, HBASE-18294.04.patch, HBASE-18294.05.patch, 
> HBASE-18294.06.patch, HBASE-18294.07.patch, HBASE-18294.07.patch, 
> HBASE-18294.08.patch, HBASE-18294.09.patch, HBASE-18294.10.patch, 
> HBASE-18294.11.patch, HBASE-18294.11.patch, HBASE-18294.12.patch, 
> HBASE-18294.13.patch, HBASE-18294.15.patch, HBASE-18294.16.patch, 
> HBASE-18294.master.01.patch
>
>
> A region is flushed if its memory component exceed a threshold (default size 
> is 128MB).
> A flush policy decides whether to flush a store by comparing the size of the 
> store to another threshold (that can be configured with 
> hbase.hregion.percolumnfamilyflush.size.lower.bound).
> Currently the implementation (in both cases) compares the data size 
> (key-value only) to the threshold where it should compare the heap size 
> (which includes index size, and metadata).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18294) Reduce global heap pressure: flush based on heap occupancy

2017-12-28 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305644#comment-16305644
 ] 

Edward Bortnikov commented on HBASE-18294:
--

Hallelujah! Thanks, all, for the fruitful discussion. 


Sent from Yahoo Mail for iPhone


On Thursday, December 28, 2017, 8:24 PM, Eshcar Hillel (JIRA)  
wrote:


    [ 
https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305641#comment-16305641
 ] 

Eshcar Hillel commented on HBASE-18294:
---

OK let me prepare the patch.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)





> Reduce global heap pressure: flush based on heap occupancy
> --
>
> Key: HBASE-18294
> URL: https://issues.apache.org/jira/browse/HBASE-18294
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-18294.01.patch, HBASE-18294.02.patch, 
> HBASE-18294.03.patch, HBASE-18294.04.patch, HBASE-18294.05.patch, 
> HBASE-18294.06.patch, HBASE-18294.07.patch, HBASE-18294.07.patch, 
> HBASE-18294.08.patch, HBASE-18294.09.patch, HBASE-18294.10.patch, 
> HBASE-18294.11.patch, HBASE-18294.11.patch, HBASE-18294.12.patch, 
> HBASE-18294.13.patch
>
>
> A region is flushed if its memory component exceed a threshold (default size 
> is 128MB).
> A flush policy decides whether to flush a store by comparing the size of the 
> store to another threshold (that can be configured with 
> hbase.hregion.percolumnfamilyflush.size.lower.bound).
> Currently the implementation (in both cases) compares the data size 
> (key-value only) to the threshold where it should compare the heap size 
> (which includes index size, and metadata).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18294) Reduce global heap pressure: flush based on heap occupancy

2017-12-28 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305514#comment-16305514
 ] 

Edward Bortnikov commented on HBASE-18294:
--

Agree with [~eshcar]. This design introduces the abstraction that nicely 
separates between allocation accounting and flush triggering. The two should be 
separate - this way things become simple again. 

> Reduce global heap pressure: flush based on heap occupancy
> --
>
> Key: HBASE-18294
> URL: https://issues.apache.org/jira/browse/HBASE-18294
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-18294.01.patch, HBASE-18294.02.patch, 
> HBASE-18294.03.patch, HBASE-18294.04.patch, HBASE-18294.05.patch, 
> HBASE-18294.06.patch, HBASE-18294.07.patch, HBASE-18294.07.patch, 
> HBASE-18294.08.patch, HBASE-18294.09.patch, HBASE-18294.10.patch, 
> HBASE-18294.11.patch, HBASE-18294.11.patch, HBASE-18294.12.patch, 
> HBASE-18294.13.patch
>
>
> A region is flushed if its memory component exceed a threshold (default size 
> is 128MB).
> A flush policy decides whether to flush a store by comparing the size of the 
> store to another threshold (that can be configured with 
> hbase.hregion.percolumnfamilyflush.size.lower.bound).
> Currently the implementation (in both cases) compares the data size 
> (key-value only) to the threshold where it should compare the heap size 
> (which includes index size, and metadata).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HBASE-19282) CellChunkMap Benchmarking and User Interface

2017-12-27 Thread Edward Bortnikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Bortnikov reassigned HBASE-19282:


Assignee: Anastasia Braginsky

> CellChunkMap Benchmarking and User Interface
> 
>
> Key: HBASE-19282
> URL: https://issues.apache.org/jira/browse/HBASE-19282
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Anastasia Braginsky
> Fix For: 2.0.0-beta-2
>
> Attachments: CCM Benchmarking.pdf, HBASE-19282-V03.patch, 
> HBASE-19282-V05.patch, HBASE-19282-V06.patch, HBASE-19282-V06.patch, 
> HBASE-19282.patch
>
>
> We have made some experiments how working with CellChunkMap (CCM) influences 
> the performance when running on-heap and off-heap. Based on those results it 
> is suggested to tie the MSLAB usage (off-heap or on-heap) with CCM index 
> usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19282) CellChunkMap Benchmarking and User Interface

2017-12-27 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16304804#comment-16304804
 ] 

Edward Bortnikov commented on HBASE-19282:
--

[~anastas], could you please issue a release note, to document that MSLAB == 
CCM. 

> CellChunkMap Benchmarking and User Interface
> 
>
> Key: HBASE-19282
> URL: https://issues.apache.org/jira/browse/HBASE-19282
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
> Fix For: 2.0.0-beta-2
>
> Attachments: CCM Benchmarking.pdf, HBASE-19282-V03.patch, 
> HBASE-19282-V05.patch, HBASE-19282-V06.patch, HBASE-19282-V06.patch, 
> HBASE-19282.patch
>
>
> We have made some experiments how working with CellChunkMap (CCM) influences 
> the performance when running on-heap and off-heap. Based on those results it 
> is suggested to tie the MSLAB usage (off-heap or on-heap) with CCM index 
> usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19282) CellChunkMap Benchmarking and User Interface

2017-12-27 Thread Edward Bortnikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Bortnikov updated HBASE-19282:
-
Fix Version/s: 2.0.0-beta-2

> CellChunkMap Benchmarking and User Interface
> 
>
> Key: HBASE-19282
> URL: https://issues.apache.org/jira/browse/HBASE-19282
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
> Fix For: 2.0.0-beta-2
>
> Attachments: CCM Benchmarking.pdf, HBASE-19282-V03.patch, 
> HBASE-19282-V05.patch, HBASE-19282-V06.patch, HBASE-19282-V06.patch, 
> HBASE-19282.patch
>
>
> We have made some experiments how working with CellChunkMap (CCM) influences 
> the performance when running on-heap and off-heap. Based on those results it 
> is suggested to tie the MSLAB usage (off-heap or on-heap) with CCM index 
> usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19282) CellChunkMap Benchmarking and User Interface

2017-12-27 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16304801#comment-16304801
 ] 

Edward Bortnikov commented on HBASE-19282:
--

[~anastas] - could you please split benchmarking into a separate jira. 

> CellChunkMap Benchmarking and User Interface
> 
>
> Key: HBASE-19282
> URL: https://issues.apache.org/jira/browse/HBASE-19282
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
> Attachments: CCM Benchmarking.pdf, HBASE-19282-V03.patch, 
> HBASE-19282-V05.patch, HBASE-19282-V06.patch, HBASE-19282-V06.patch, 
> HBASE-19282.patch
>
>
> We have made some experiments how working with CellChunkMap (CCM) influences 
> the performance when running on-heap and off-heap. Based on those results it 
> is suggested to tie the MSLAB usage (off-heap or on-heap) with CCM index 
> usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19133) Transfer big cells or upserted/appended cells into MSLAB upon flattening to CellChunkMap

2017-12-27 Thread Edward Bortnikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Bortnikov updated HBASE-19133:
-
Fix Version/s: 2.0.0-beta-2

> Transfer big cells or upserted/appended cells into MSLAB upon flattening to 
> CellChunkMap
> 
>
> Key: HBASE-19133
> URL: https://issues.apache.org/jira/browse/HBASE-19133
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Gali Sheffi
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19133-V01.patch, HBASE-19133-V02.patch, 
> HBASE-19133-V03.patch, HBASE-19133.01.patch, HBASE-19133.02.patch, 
> HBASE-19133.03.patch, HBASE-19133.04.patch, HBASE-19133.05.patch, 
> HBASE-19133.06.patch, HBASE-19133.07.patch, HBASE-19133.08.patch, 
> HBASE-19133.09.patch, HBASE-19133.10.patch, HBASE-19133.11.patch
>
>
> CellChunkMap Segment index requires all cell data to be written in the MSLAB 
> Chunks. Eventhough MSLAB is enabled, cells bigger than chunk size or 
> upserted/incremented/appended cells are still allocated on the JVM stack. If 
> such cells are found in the process of flattening into CellChunkMap 
> (in-memory-flush) they need to be copied into MSLAB.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18294) Reduce global heap pressure: flush based on heap occupancy

2017-12-27 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16304800#comment-16304800
 ] 

Edward Bortnikov commented on HBASE-18294:
--

My 2 cents - after reviewing the code to the best of my ability. 

IMO the confusion stems from the unfortunate name - heapSize - the code 
historically uses for the overall allocated memory. Such that it is not clear 
whether we mean the Java heap or the OS heap. Might be good to replace globally 
- maybe in a different jira. 

Regarding the per-store flush trigger ... Still not sure what is the reasoning 
behind non-uniform handling of on-(Java) heap and off-(Java) heap allocations. 
Could someone please re-iterate why just monitor the overall allocated memory 
(data + overhead), no matter where, and flush when the threshold is crossed? 
Obviously there are all kinds of concerns, but the only experiment on the table 
is the one by [~eshcar], which demonstrates that regions flush too early, at 
least with on-heap data, due to conservative accounting. 

We intend to benchmark the off-heap write path thoroughly, in particular to 
evaluate the benefits of the CCM index (HBase-16421). If something unexpected 
comes up there, we'll all re-convene and re-discuss. Until then, may I suggest 
to keep the things simple and run the accounting context-free, along the lines 
with [~eshcar]'s patch. 

> Reduce global heap pressure: flush based on heap occupancy
> --
>
> Key: HBASE-18294
> URL: https://issues.apache.org/jira/browse/HBASE-18294
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-18294.01.patch, HBASE-18294.02.patch, 
> HBASE-18294.03.patch, HBASE-18294.04.patch, HBASE-18294.05.patch, 
> HBASE-18294.06.patch, HBASE-18294.07.patch, HBASE-18294.07.patch, 
> HBASE-18294.08.patch, HBASE-18294.09.patch, HBASE-18294.10.patch, 
> HBASE-18294.11.patch, HBASE-18294.11.patch, HBASE-18294.12.patch, 
> HBASE-18294.13.patch
>
>
> A region is flushed if its memory component exceed a threshold (default size 
> is 128MB).
> A flush policy decides whether to flush a store by comparing the size of the 
> store to another threshold (that can be configured with 
> hbase.hregion.percolumnfamilyflush.size.lower.bound).
> Currently the implementation (in both cases) compares the data size 
> (key-value only) to the threshold where it should compare the heap size 
> (which includes index size, and metadata).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18294) Reduce global heap pressure: flush based on heap occupancy

2017-12-05 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279774#comment-16279774
 ] 

Edward Bortnikov commented on HBASE-18294:
--

Chiming in ...

This question seems to be irrelevant to whether MSLAB use is a per-table or 
global flag. Agreed that we should avoid adding new configurations whenever 
possible. 

Let's try to remain factual in the decisions we make. The goal is to get the 
best possible performance from a machine with given RAM resources, on-heap or 
not. [~eshcar], could you please publish some numbers that validate the 
solution's value? [~anoop.hbase], mind sharing any data that proves the 
opposite? 

Thanks!


> Reduce global heap pressure: flush based on heap occupancy
> --
>
> Key: HBASE-18294
> URL: https://issues.apache.org/jira/browse/HBASE-18294
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-18294.01.patch, HBASE-18294.02.patch, 
> HBASE-18294.03.patch, HBASE-18294.04.patch, HBASE-18294.05.patch, 
> HBASE-18294.06.patch
>
>
> A region is flushed if its memory component exceed a threshold (default size 
> is 128MB).
> A flush policy decides whether to flush a store by comparing the size of the 
> store to another threshold (that can be configured with 
> hbase.hregion.percolumnfamilyflush.size.lower.bound).
> Currently the implementation (in both cases) compares the data size 
> (key-value only) to the threshold where it should compare the heap size 
> (which includes index size, and metadata).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations

2017-06-04 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036219#comment-16036219
 ] 

Edward Bortnikov commented on HBASE-17339:
--

Thanks [~eshcar]. Maybe it makes sense to describe the experiment we used to 
figure out the current implementation, to provide the community with the full 
picture (smile). 

We looked at a workload with temporal (rather than spatial) locality, namely 
writes closely followed by reads. This pattern is quite frequent in pub-sub 
scenarios. Instead of seeing a performance benefit in reading from MemStore 
first, we saw nearly 100% cache hit rate, and could not explain it for a while. 
The lazy evaluation procedure described by [~eshcar] sheds the light. 

Obviously, explicitly prioritizing reading from MemStore first rather than 
simply deferring the data fetch from disk could help avoid some access to Bloom 
filters, just to figure out whether the key has earlier versions on disk. Those 
accesses could be avoided. The main practical impact is when the BF itself is 
not in memory, and accessing it triggers I/O. Is that a realistic scenario? We 
assume that normally, BF's are permanently cached for all HFile's managed by 
the RS. 

Dear community - please speak up. Thanks. 

> Scan-Memory-First Optimization for Get Operations
> -
>
> Key: HBASE-17339
> URL: https://issues.apache.org/jira/browse/HBASE-17339
> Project: HBase
>  Issue Type: Improvement
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, 
> HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch, 
> HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg
>
>
> The current implementation of a get operation (to retrieve values for a 
> specific key) scans through all relevant stores of the region; for each store 
> both memory components (memstores segments) and disk components (hfiles) are 
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only 
> components first and only if the result is incomplete scans both memory and 
> disk.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HBASE-18056) Change CompactingMemStore in BASIC mode to merge multiple segments in pipeline

2017-05-29 Thread Edward Bortnikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Bortnikov reassigned HBASE-18056:


Assignee: Anastasia Braginsky

> Change CompactingMemStore in BASIC mode to merge multiple segments in pipeline
> --
>
> Key: HBASE-18056
> URL: https://issues.apache.org/jira/browse/HBASE-18056
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Anastasia Braginsky
> Attachments: HBASE-18056-V01.patch
>
>
> Under HBASE-16417 it was decided that CompactingMemStore in BASIC mode should 
> merge multiple ImmutableSegments in CompactionPipeline. Basic+Merge actually 
> demonstrated reduction in GC, alongside improvement in other metrics.
> However, the limit on the number of segments in pipeline is still set to 30. 
> Under this JIRA it should be changed to 1, as it was tested under HBASE-16417.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18056) Change CompactingMemStore in BASIC mode to merge multiple segments in pipeline

2017-05-21 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16018956#comment-16018956
 ] 

Edward Bortnikov commented on HBASE-18056:
--

Friends, 

I'm having a hard time understanding why this commit became a big deal :) We 
made a mistake. This parameter change should have been committed together with 
BASIC compaction becoming the default configuration. BASIC does not make sense 
without it. We presented a vey extensive perf evaluation exactly with this 
parameter value. It demonstrated improvement in all the operational metrics, GC 
included. The parameter should be not accessible to users; it is not documented 
in the reference manual; its sole purpose is developer flexibility. 

It is perfectly okay to re-open the discussion (and also revert the setting) 
once there is solid proof that something is broken. But we didn't see any such 
proof yet. Delaying without reason jeopardizes the feature, especially in 
anticipation of release.

Just saying it again - we made a technical mistake, and we are fixing it now. 
There is no new data.  

What is it that I get wrong? Thanks. 

> Change CompactingMemStore in BASIC mode to merge multiple segments in pipeline
> --
>
> Key: HBASE-18056
> URL: https://issues.apache.org/jira/browse/HBASE-18056
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
> Attachments: HBASE-18056-V01.patch
>
>
> Under HBASE-16417 it was decided that CompactingMemStore in BASIC mode should 
> merge multiple ImmutableSegments in CompactionPipeline. Basic+Merge actually 
> demonstrated reduction in GC, alongside improvement in other metrics.
> However, the limit on the number of segments in pipeline is still set to 30. 
> Under this JIRA it should be changed to 1, as it was tested under HBASE-16417.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2017-05-15 Thread Edward Bortnikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Bortnikov resolved HBASE-16851.
--
   Resolution: Fixed
Fix Version/s: 2.0.0
 Release Note: 
Two blog posts on Apache HBase blog: user manual and programmer manual. 
Ref. guide draft published: 
https://docs.google.com/document/d/1Xi1jh_30NKnjE3wSR-XF5JQixtyT6H_CdFTaVi78LKw/edit
 Tags: documentation

> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
>Assignee: Edward Bortnikov
> Fix For: 2.0.0
>
> Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, 
> Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory 
> Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17343) Make Compacting Memstore default in 2.0 with BASIC as the default type

2017-05-15 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16010622#comment-16010622
 ] 

Edward Bortnikov commented on HBASE-17343:
--

Ref. guide (published on HBASE-16851): 
https://docs.google.com/document/d/1Xi1jh_30NKnjE3wSR-XF5JQixtyT6H_CdFTaVi78LKw/edit.
 

> Make Compacting Memstore default in 2.0 with BASIC as the default type
> --
>
> Key: HBASE-17343
> URL: https://issues.apache.org/jira/browse/HBASE-17343
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: Anastasia Braginsky
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-17343-V01.patch, HBASE-17343-V02.patch, 
> HBASE-17343-V04.patch, HBASE-17343-V05.patch, HBASE-17343-V06.patch, 
> HBASE-17343-V07.patch, HBASE-17343-V08.patch, HBASE-17343-V09.patch, 
> ut.v1.patch
>
>
> FYI [~anastas], [~eshcar] and [~ebortnik].



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17343) Make Compacting Memstore default in 2.0 with BASIC as the default type

2017-05-09 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16002698#comment-16002698
 ] 

Edward Bortnikov commented on HBASE-17343:
--

Obviously, the failures are related to instability of the master branch rather 
than to CompactingMemstore. 
Strong +1 for commit :)

> Make Compacting Memstore default in 2.0 with BASIC as the default type
> --
>
> Key: HBASE-17343
> URL: https://issues.apache.org/jira/browse/HBASE-17343
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: Anastasia Braginsky
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-17343-V01.patch, HBASE-17343-V02.patch, 
> HBASE-17343-V04.patch, HBASE-17343-V05.patch, HBASE-17343-V06.patch, 
> HBASE-17343-V07.patch, HBASE-17343-V08.patch
>
>
> FYI [~anastas], [~eshcar] and [~ebortnik].



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17343) Make Compacting Memstore default in 2.0 with BASIC as the default type

2017-04-26 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984432#comment-15984432
 ] 

Edward Bortnikov commented on HBASE-17343:
--

[~anoop.hbase], thanks for the update, great to receive more evidence from a 
different tool that the method is working.
Let's flip the default ASAP. Thanks again. 

> Make Compacting Memstore default in 2.0 with BASIC as the default type
> --
>
> Key: HBASE-17343
> URL: https://issues.apache.org/jira/browse/HBASE-17343
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-17343-V01.patch, HBASE-17343-V02.patch, 
> HBASE-17343-V04.patch, HBASE-17343-V05.patch
>
>
> FYI [~anastas], [~eshcar] and [~ebortnik].



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17343) Make Compacting Memstore default in 2.0 with BASIC as the default type

2017-04-25 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982979#comment-15982979
 ] 

Edward Bortnikov commented on HBASE-17343:
--

+1 on my side. 

> Make Compacting Memstore default in 2.0 with BASIC as the default type
> --
>
> Key: HBASE-17343
> URL: https://issues.apache.org/jira/browse/HBASE-17343
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-17343-V01.patch, HBASE-17343-V02.patch, 
> HBASE-17343-V04.patch
>
>
> FYI [~anastas], [~eshcar] and [~ebortnik].



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2017-04-20 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976828#comment-15976828
 ] 

Edward Bortnikov commented on HBASE-16851:
--

Reference guide draft available in 
https://docs.google.com/document/d/1Xi1jh_30NKnjE3wSR-XF5JQixtyT6H_CdFTaVi78LKw/edit.
 
Please review. Thanks. 

> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
>Assignee: Edward Bortnikov
> Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, 
> Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory 
> Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2017-04-10 Thread Edward Bortnikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Bortnikov updated HBASE-16851:
-

 blockquote, div.yahoo_quoted { margin-left: 0 !important; border-left:1px 
#715FFA solid !important; padding-left:1ex !important; background-color:white 
!important; } Absolutely. Will work on it early next week. Let's not close so 
far.

Thanks. 

Sent from Yahoo Mail for iPhone


On Monday, April 10, 2017, 8:02 AM, stack (JIRA)  wrote:


    [ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962428#comment-15962428
 ] 

stack commented on HBASE-16851:
---

[~ebortnik] goodstuff. Posted.

We might as well use this issue to figure what to put in the refguide? Want to 
cut a piece from blogs or just do pointers from refguide to blog? Thanks 
[~ebortnik]





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)





> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
>Assignee: Edward Bortnikov
> Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, 
> Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory 
> Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2017-04-09 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962391#comment-15962391
 ] 

Edward Bortnikov commented on HBASE-16851:
--

Thanks a lot, [~stack]!

> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
>Assignee: Edward Bortnikov
> Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, 
> Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory 
> Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2017-04-09 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962099#comment-15962099
 ] 

Edward Bortnikov commented on HBASE-16851:
--

Thanks much [~stack] for all the great feedback inline. Applied most of your 
changes - better quality now. 
 
Guess we're good for publishing. 
High level/User manual: 
https://docs.google.com/document/d/1K_8plLz0K3pmV20dsgSWwRPn1qUNMRbLmi8aJkhB7z0
Dev manual: 
https://docs.google.com/document/d/1z1R-MdAxRvTC2NazxUmN3FOCFIknkxL2TFqVUhYBVbE

Thanks. 

> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
>Assignee: Edward Bortnikov
> Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, 
> Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory 
> Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16438) Create a cell type so that chunk id is embedded in it

2017-04-06 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959577#comment-15959577
 ] 

Edward Bortnikov commented on HBASE-16438:
--

[~anastas] - is this a +1? 

> Create a cell type so that chunk id is embedded in it
> -
>
> Key: HBASE-16438
> URL: https://issues.apache.org/jira/browse/HBASE-16438
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Attachments: 
> HBASE-16438_10_ChunkCreatorwrappingChunkPool_withchunkRef.patch, 
> HBASE-16438_11_ChunkCreatorwrappingChunkPool_withchunkRef.patch, 
> HBASE-16438_1.patch, HBASE-16438_3_ChunkCreatorwrappingChunkPool.patch, 
> HBASE-16438_4_ChunkCreatorwrappingChunkPool.patch, 
> HBASE-16438_8_ChunkCreatorwrappingChunkPool_withchunkRef.patch, 
> HBASE-16438_9_ChunkCreatorwrappingChunkPool_withchunkRef.patch, 
> HBASE-16438.patch, MemstoreChunkCell_memstoreChunkCreator_oldversion.patch, 
> MemstoreChunkCell_trunk.patch
>
>
> For CellChunkMap we may need a cell such that the chunk out of which it was 
> created, the id of the chunk be embedded in it so that when doing flattening 
> we can use the chunk id as a meta data. More details will follow once the 
> initial tasks are completed. 
> Why we need to embed the chunkid in the Cell is described by [~anastas] in 
> this remark over in parent issue 
> https://issues.apache.org/jira/browse/HBASE-14921?focusedCommentId=15244119=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15244119



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17343) Make Compacting Memstore default in 2.0 with BASIC as the default type

2017-03-30 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949277#comment-15949277
 ] 

Edward Bortnikov commented on HBASE-17343:
--

Are we good to commit? :) 

> Make Compacting Memstore default in 2.0 with BASIC as the default type
> --
>
> Key: HBASE-17343
> URL: https://issues.apache.org/jira/browse/HBASE-17343
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-17343-V01.patch, HBASE-17343-V02.patch
>
>
> FYI [~anastas], [~eshcar] and [~ebortnik].



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2017-03-30 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949098#comment-15949098
 ] 

Edward Bortnikov commented on HBASE-16851:
--

Updated the developer documentation in 
https://docs.google.com/document/d/1z1R-MdAxRvTC2NazxUmN3FOCFIknkxL2TFqVUhYBVbE.
 The modified parts are highlighted in yellow. Feel free to comment. 

> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
>Assignee: Edward Bortnikov
> Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, 
> Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory 
> Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations

2017-03-27 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943134#comment-15943134
 ] 

Edward Bortnikov commented on HBASE-17339:
--

Can't see how TinyLFU can do a better job with stationary distributions (in 
which item popularity does not change over time). I'd imagine it being good 
under bursty workloads. 

> Scan-Memory-First Optimization for Get Operations
> -
>
> Key: HBASE-17339
> URL: https://issues.apache.org/jira/browse/HBASE-17339
> Project: HBase
>  Issue Type: Improvement
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, 
> HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch, 
> HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg
>
>
> The current implementation of a get operation (to retrieve values for a 
> specific key) scans through all relevant stores of the region; for each store 
> both memory components (memstores segments) and disk components (hfiles) are 
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only 
> components first and only if the result is incomplete scans both memory and 
> disk.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2017-03-26 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942272#comment-15942272
 ] 

Edward Bortnikov commented on HBASE-16851:
--

Updated the user documentation, following the change in the definition of BASIC 
(see HBASE-16417).
Added a short summary of performance results (or, why the user should care).

New shared doc: 
https://docs.google.com/document/d/1K_8plLz0K3pmV20dsgSWwRPn1qUNMRbLmi8aJkhB7z0.
 
Please comment. We are interested to publish on Apache blog as soon as the 
default change is committed. 

> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
>Assignee: Edward Bortnikov
> Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, 
> Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory 
> Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17765) Reviving the merge possibility in the CompactingMemStore

2017-03-21 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935203#comment-15935203
 ] 

Edward Bortnikov commented on HBASE-17765:
--

Are we good to commit this patch? 

> Reviving the merge possibility in the CompactingMemStore
> 
>
> Key: HBASE-17765
> URL: https://issues.apache.org/jira/browse/HBASE-17765
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Anastasia Braginsky
> Fix For: 2.0.0
>
> Attachments: HBASE-17765-V01.patch, HBASE-17765-V02.patch
>
>
> According to the new performance results presented in the HBASE-16417 we see 
> that the read latency of the 90th percentile of the BASIC policy is too big 
> due to the need to traverse through too many segments in the pipeline. In 
> this JIRA we correct the bug in the merge sizing calculations and allow 
> pipeline size threshold to be a configurable parameter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions

2017-03-20 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933509#comment-15933509
 ] 

Edward Bortnikov commented on HBASE-16417:
--

Okay agreed - BASIC will include merge. We'll update the docs, too. 

Regarding parallelism - a promising direction, but we should be careful here. 
More threads might come at someone else's expense (write throughput maybe), so 
need more scrutiny. If all we do is run a bunch of binary searches in parallel 
- might not be worth the synchronization. Worth checking. 

> In-Memory MemStore Policy for Flattening and Compactions
> 
>
> Key: HBASE-16417
> URL: https://issues.apache.org/jira/browse/HBASE-16417
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-16417-benchmarkresults-20161101.pdf, 
> HBASE-16417-benchmarkresults-20161110.pdf, 
> HBASE-16417-benchmarkresults-20161123.pdf, 
> HBASE-16417-benchmarkresults-20161205.pdf, 
> HBASE-16417-benchmarkresults-20170309.pdf, 
> HBASE-16417-benchmarkresults-20170317.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions

2017-03-19 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931616#comment-15931616
 ] 

Edward Bortnikov commented on HBASE-16417:
--

[~eshcar], thanks for the thorough report, great stuff.

Question to all - do these results suggest that we change the default to 
BASIC+MERGE? Seems that this method does not have any material overhead, even 
under the uniform workload. If the answer is "yes", we could take one of two 
ways: (1) say that BASIC+MERGE is a new BASIC (my favorite :)), or (2) 
introduce a new compaction level (MODERATE?). Let's converge fast - then we can 
update the documentation and finalize the code. 

This work notwithstanding, it is still appealing to come up with an automatic 
policy to tune handsfree (which was the original intent behind this JIRA). With 
the 2.0 release on our heels, we might not be able to make it until then. But 
let's have all the building blocks in place, at least (smile).  

> In-Memory MemStore Policy for Flattening and Compactions
> 
>
> Key: HBASE-16417
> URL: https://issues.apache.org/jira/browse/HBASE-16417
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-16417-benchmarkresults-20161101.pdf, 
> HBASE-16417-benchmarkresults-20161110.pdf, 
> HBASE-16417-benchmarkresults-20161123.pdf, 
> HBASE-16417-benchmarkresults-20161205.pdf, 
> HBASE-16417-benchmarkresults-20170309.pdf, 
> HBASE-16417-benchmarkresults-20170317.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17765) Reviving the merge possibility in the CompactingMemStore

2017-03-13 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15907030#comment-15907030
 ] 

Edward Bortnikov commented on HBASE-17765:
--

Merge means that only the index data is restructured. We create a larger 
segment with one index - but no data is copied. Also, we avoid using the SQM 
scan (more expensive), so duplicate data versions are not eliminated. Bottom 
line - (1) the overhead and the space savings are both between BASIC and EAGER, 
and (2) the tail read latency problem is solved. 

We'll be publishing the perf results shortly. Following that, let's 
collectively decide whether MERGE should be a level between BASIC and EAGER, or 
maybe just become the new BASIC, for simplicity. 

Thanks. 

> Reviving the merge possibility in the CompactingMemStore
> 
>
> Key: HBASE-17765
> URL: https://issues.apache.org/jira/browse/HBASE-17765
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Anastasia Braginsky
> Fix For: 2.0.0
>
> Attachments: HBASE-17765-V01.patch
>
>
> According to the new performance results presented in the HBASE-16417 we see 
> that the read latency of the 90th percentile of the BASIC policy is too big 
> due to the need to traverse through too many segments in the pipeline. In 
> this JIRA we correct the bug in the merge sizing calculations and allow 
> pipeline size threshold to be a configurable parameter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions

2017-03-10 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906123#comment-15906123
 ] 

Edward Bortnikov commented on HBASE-16417:
--

.. So [~eshcar] answered nearly all of it here .. A couple of small remarks. 

The expected number of 2 segments in the pipeline follows from the fact that 
disk flush normally happens when there are 4. Assuming we are growing from 0, 
the expectation is 2. 

The varying WAL size with Async WAL introduces much noise indeed. However, 
please note that the overall volume of WAL writes differs between Sync and 
Async without one line of Accordion involved, why does this happen with the 
same workload? (Note that with Sync, the WAL volume is the same no matter what 
type of in-memory compaction is used). Looking forward to some help here :)

> In-Memory MemStore Policy for Flattening and Compactions
> 
>
> Key: HBASE-16417
> URL: https://issues.apache.org/jira/browse/HBASE-16417
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-16417-benchmarkresults-20161101.pdf, 
> HBASE-16417-benchmarkresults-20161110.pdf, 
> HBASE-16417-benchmarkresults-20161123.pdf, 
> HBASE-16417-benchmarkresults-20161205.pdf, 
> HBASE-16417-benchmarkresults-20170309.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions

2017-03-09 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15903831#comment-15903831
 ] 

Edward Bortnikov commented on HBASE-16417:
--

bq. On the 90th percentile degradation when BASIC, how many segments we talking 
 2 or 3 or more than this?

Taking the liberty of answering for [~eshcar]. The current default active 
segment size cap for in-memory flush is 1/4 the memstore size cap for disk 
flush. Which means that the expected number of segments in the pipeline is 
4/2=2. However, since disk flush is non-immediate, new segments can sometime 
pile up, especially under a very high write rate as exercised in our test. We 
don't have easily trackable metrics installed (maybe should have) but probably 
we're speaking about many more segments here. The number can't exceed 30 - at 
that point, a forceful merge happens. We guess that looking up the key in every 
single segment (to initialize the scan) is what leads to the high tail latency. 

We're taking a closer look at merge (index compaction only, no data copy), 
hopefully we'll show there's no material damage about it .. even EAGER does not 
look too bad .. A matter of a few more days of experimentation. Thanks. 

> In-Memory MemStore Policy for Flattening and Compactions
> 
>
> Key: HBASE-16417
> URL: https://issues.apache.org/jira/browse/HBASE-16417
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-16417-benchmarkresults-20161101.pdf, 
> HBASE-16417-benchmarkresults-20161110.pdf, 
> HBASE-16417-benchmarkresults-20161123.pdf, 
> HBASE-16417-benchmarkresults-20161205.pdf, 
> HBASE-16417-benchmarkresults-20170309.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore

2017-03-08 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901984#comment-15901984
 ] 

Edward Bortnikov commented on HBASE-16421:
--

That's great - let's follow that path (via HBASE-16438). We are round the 
corner to assist :) 

> Introducing the CellChunkMap as a new additional index variant in the MemStore
> --
>
> Key: HBASE-16421
> URL: https://issues.apache.org/jira/browse/HBASE-16421
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Anastasia Braginsky
> Attachments: CellChunkMapRevived.pdf, ChunkCell_creation.png, 
> IntroductiontoNewFlatandCompactMemStore.pdf
>
>
> Follow up for HBASE-14921. This is going to be the umbrella JIRA to include 
> all the parts of integration of the CellChunkMap to the MemStore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore

2017-03-08 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901163#comment-15901163
 ] 

Edward Bortnikov edited comment on HBASE-16421 at 3/8/17 12:26 PM:
---

Friends, 

What are you saying about [~anastas]'s suggestion in HBASE-16438? Looks like 
the patch is getting in better shape. How about you guys further improving and 
committing it, so that [~anastas] can pick up on solid ground? We'll keep 
helping with the reviews. Our assessment is that this patch covers 1/3 to 1/2 
of the original work plan. 

Thanks. 


was (Author: ebortnik):
Friends, 

What are you saying about [~anastas]'s suggestion in HBASE-16438? Looks like 
the patch is getting in better shape. How about further improving and 
committing it, so that [~anastas] can pick up on solid ground? Our assessment 
is that this patch covers 1/3 to 1/2 of the original work plan. 

Thanks. 

> Introducing the CellChunkMap as a new additional index variant in the MemStore
> --
>
> Key: HBASE-16421
> URL: https://issues.apache.org/jira/browse/HBASE-16421
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Anastasia Braginsky
> Attachments: CellChunkMapRevived.pdf, ChunkCell_creation.png, 
> IntroductiontoNewFlatandCompactMemStore.pdf
>
>
> Follow up for HBASE-14921. This is going to be the umbrella JIRA to include 
> all the parts of integration of the CellChunkMap to the MemStore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore

2017-03-08 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901163#comment-15901163
 ] 

Edward Bortnikov commented on HBASE-16421:
--

Friends, 

What are you saying about [~anastas]'s suggestion in HBASE-16438? Looks like 
the patch is getting in better shape. How about further improving and 
committing it, so that [~anastas] can pick up on solid ground? Our assessment 
is that this patch covers 1/3 to 1/2 of the original work plan. 

Thanks. 

> Introducing the CellChunkMap as a new additional index variant in the MemStore
> --
>
> Key: HBASE-16421
> URL: https://issues.apache.org/jira/browse/HBASE-16421
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Anastasia Braginsky
> Attachments: CellChunkMapRevived.pdf, ChunkCell_creation.png, 
> IntroductiontoNewFlatandCompactMemStore.pdf
>
>
> Follow up for HBASE-14921. This is going to be the umbrella JIRA to include 
> all the parts of integration of the CellChunkMap to the MemStore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore

2017-03-07 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15899084#comment-15899084
 ] 

Edward Bortnikov commented on HBASE-16421:
--

Okay we are making progress ... What I read from [~ram_krish]'s proposal is 
that you guys are actually comfortable with being the driving force behind the 
CellChunkMap project (in fact, after implementing a critical mass of code), all 
the way to production code. We are comfortable with this approach too. In that 
case, we can just switch to the assistant/reviewer role - and provide all the 
help we can in that capacity.  That question is actually not related to whether 
this feature is part of 2.0 or not. 

Let's reach consensus ... Appreciate your fast response. 

> Introducing the CellChunkMap as a new additional index variant in the MemStore
> --
>
> Key: HBASE-16421
> URL: https://issues.apache.org/jira/browse/HBASE-16421
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Anastasia Braginsky
> Attachments: CellChunkMapRevived.pdf, ChunkCell_creation.png, 
> IntroductiontoNewFlatandCompactMemStore.pdf
>
>
> Follow up for HBASE-14921. This is going to be the umbrella JIRA to include 
> all the parts of integration of the CellChunkMap to the MemStore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore

2017-03-06 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897806#comment-15897806
 ] 

Edward Bortnikov commented on HBASE-16421:
--

Friends, 

We are currently still vague on the issue of "who-does-what". The subtask 
JIRA's seem to indicate that [~ram_krish] and [~anoop.hbase] made substantial 
progress lately based on [~anastas]'s experimental code - including some steps 
in the work plan published last December. What is the status of these patches? 
Are they candidates for commit? 

Risking preaching to the choir but trying to be efficient. Let's align our 
efforts. Looking forward to your comments. Thanks. 

> Introducing the CellChunkMap as a new additional index variant in the MemStore
> --
>
> Key: HBASE-16421
> URL: https://issues.apache.org/jira/browse/HBASE-16421
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Anastasia Braginsky
> Attachments: CellChunkMapRevived.pdf, ChunkCell_creation.png, 
> IntroductiontoNewFlatandCompactMemStore.pdf
>
>
> Follow up for HBASE-14921. This is going to be the umbrella JIRA to include 
> all the parts of integration of the CellChunkMap to the MemStore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore

2017-03-02 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892370#comment-15892370
 ] 

Edward Bortnikov commented on HBASE-16421:
--

Also, we have to define the KPI's for this feature. What first-class-citizen 
metrics should it manifest in? Write/read throughput/latency? Please speak up :)

> Introducing the CellChunkMap as a new additional index variant in the MemStore
> --
>
> Key: HBASE-16421
> URL: https://issues.apache.org/jira/browse/HBASE-16421
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Anastasia Braginsky
> Attachments: CellChunkMapRevived.pdf, ChunkCell_creation.png, 
> IntroductiontoNewFlatandCompactMemStore.pdf
>
>
> Follow up for HBASE-14921. This is going to be the umbrella JIRA to include 
> all the parts of integration of the CellChunkMap to the MemStore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2017-03-02 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892304#comment-15892304
 ] 

Edward Bortnikov commented on HBASE-16851:
--

Programmer manual version 1.0 complete: 
https://docs.google.com/document/d/1z1R-MdAxRvTC2NazxUmN3FOCFIknkxL2TFqVUhYBVbE.
 
Many thanks to [~anastas] for the UML diagrams. Please take a look. 

The document summarizing the performance benchmark results is WIP - we'll 
publish in a week or so.  

> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
>Assignee: Edward Bortnikov
> Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, 
> Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory 
> Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore

2017-02-28 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889597#comment-15889597
 ] 

Edward Bortnikov commented on HBASE-16421:
--

For in-memory compaction per se it is not but for write-path off-heaping might 
be. Up to [~ram_krish] and [~anoop.hbase] to define. 

> Introducing the CellChunkMap as a new additional index variant in the MemStore
> --
>
> Key: HBASE-16421
> URL: https://issues.apache.org/jira/browse/HBASE-16421
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Anastasia Braginsky
> Attachments: CellChunkMapRevived.pdf, ChunkCell_creation.png, 
> IntroductiontoNewFlatandCompactMemStore.pdf
>
>
> Follow up for HBASE-14921. This is going to be the umbrella JIRA to include 
> all the parts of integration of the CellChunkMap to the MemStore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore

2017-02-28 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888120#comment-15888120
 ] 

Edward Bortnikov commented on HBASE-16421:
--

More precisely, the question is about the deadline. With the 2.0 release in 
Apr/May, it's going to be tight .. What would be the process if we don't make 
it until then? Would there be follow-up releases? [~saint@gmail.com], could 
you please chime in? 

> Introducing the CellChunkMap as a new additional index variant in the MemStore
> --
>
> Key: HBASE-16421
> URL: https://issues.apache.org/jira/browse/HBASE-16421
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Anastasia Braginsky
> Attachments: CellChunkMapRevived.pdf, ChunkCell_creation.png, 
> IntroductiontoNewFlatandCompactMemStore.pdf
>
>
> Follow up for HBASE-14921. This is going to be the umbrella JIRA to include 
> all the parts of integration of the CellChunkMap to the MemStore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore

2017-02-27 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886426#comment-15886426
 ] 

Edward Bortnikov commented on HBASE-16421:
--

Hi all, reviving this thread, it's been silent for a while ... 

As we are completing the In-Memory Compaction stuff for the 2.0 release, we'd 
like to re-iterate the mutual commitment to this project. We'll probably need 
help with parts of implementation, to be on time before the release cutoff. 
[~ram_krish], [~anoop.hbase] - are you on board? 

Thanks. 

> Introducing the CellChunkMap as a new additional index variant in the MemStore
> --
>
> Key: HBASE-16421
> URL: https://issues.apache.org/jira/browse/HBASE-16421
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Anastasia Braginsky
> Attachments: CellChunkMapRevived.pdf, ChunkCell_creation.png, 
> IntroductiontoNewFlatandCompactMemStore.pdf
>
>
> Follow up for HBASE-14921. This is going to be the umbrella JIRA to include 
> all the parts of integration of the CellChunkMap to the MemStore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17662) Disable in-memory flush when replaying from WAL

2017-02-27 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886409#comment-15886409
 ] 

Edward Bortnikov commented on HBASE-17662:
--

[~anastas], [~anoop.hbase] - do we have a resolution here? 

> Disable in-memory flush when replaying from WAL
> ---
>
> Key: HBASE-17662
> URL: https://issues.apache.org/jira/browse/HBASE-17662
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Anastasia Braginsky
> Attachments: HBASE-17662-V02.patch, HBASE-17662-V03.patch, 
> HBASE-17662-V04.patch, HBASE-17662-V05.patch, HBASE-17662-V06.patch
>
>
> When replaying the edits from WAL, the region's updateLock is not taken, 
> because a single threaded action is assumed. However, the thread-safeness of 
> the in-memory flush of CompactingMemStore is based on taking the region's 
> updateLock. 
> The in-memory flush can be skipped in the replay time (anyway everything is 
> flushed to disk just after the replay). Therefore it is acceptable to just 
> skip the in-memory flush action while the updates come as part of replay from 
> WAL.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17662) Disable in-memory flush when replaying from WAL

2017-02-26 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884893#comment-15884893
 ] 

Edward Bortnikov commented on HBASE-17662:
--

Folks, 

Apologies for pushing again, but please help us turning some fire under this 
Jira and the others remaining in this project ... This one is the last exposed 
bug that prevents us from turning BASIC compaction into default. Seems like 
this is a small patch, can we commit it? 

Thanks. 

> Disable in-memory flush when replaying from WAL
> ---
>
> Key: HBASE-17662
> URL: https://issues.apache.org/jira/browse/HBASE-17662
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Anastasia Braginsky
> Attachments: HBASE-17662-V02.patch, HBASE-17662-V03.patch, 
> HBASE-17662-V04.patch, HBASE-17662-V05.patch, HBASE-17662-V06.patch
>
>
> When replaying the edits from WAL, the region's updateLock is not taken, 
> because a single threaded action is assumed. However, the thread-safeness of 
> the in-memory flush of CompactingMemStore is based on taking the region's 
> updateLock. 
> The in-memory flush can be skipped in the replay time (anyway everything is 
> flushed to disk just after the replay). Therefore it is acceptable to just 
> skip the in-memory flush action while the updates come as part of replay from 
> WAL.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16850) Run large scale correctness tests for HBASE-14918 (in-memory flushes/compactions)

2017-02-16 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870672#comment-15870672
 ] 

Edward Bortnikov commented on HBASE-16850:
--

Large-scale benchmark results are reported in HBASE-16417. Makes sense to 
redirect/retire this Jira? Thanks. 

> Run large scale correctness tests for HBASE-14918 (in-memory 
> flushes/compactions)
> -
>
> Key: HBASE-16850
> URL: https://issues.apache.org/jira/browse/HBASE-16850
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Blocker
>
> As discussed here - 
> https://issues.apache.org/jira/browse/HBASE-16608?focusedCommentId=15577213=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15577213
> [~stack] [~anastas] [~ram_krish] [~anoop.hbase]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2017-02-15 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868436#comment-15868436
 ] 

Edward Bortnikov commented on HBASE-16851:
--

Programmer manual, version 0.9. 
https://docs.google.com/document/d/1z1R-MdAxRvTC2NazxUmN3FOCFIknkxL2TFqVUhYBVbE
WIP with UML diagrams. 

> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
>Assignee: Edward Bortnikov
> Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, 
> Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory 
> Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17407) Correct update of maxFlushedSeqId in HRegion

2017-01-19 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830575#comment-15830575
 ] 

Edward Bortnikov commented on HBASE-17407:
--

Mind rebasing and resubmitting the patch?

> Correct update of maxFlushedSeqId in HRegion
> 
>
> Key: HBASE-17407
> URL: https://issues.apache.org/jira/browse/HBASE-17407
> Project: HBase
>  Issue Type: Bug
>Reporter: Eshcar Hillel
> Attachments: HBASE-17407-V01.patch, HBASE-17407-V01.patch, 
> HBASE-17407-V02.patch
>
>
> The attribute maxFlushedSeqId in HRegion is used to track the max sequence id 
> in the store files and is reported to HMaster. When flushing only part of the 
> memstore content this value might be incorrect and may cause data loss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2017-01-18 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828032#comment-15828032
 ] 

Edward Bortnikov edited comment on HBASE-16851 at 1/18/17 1:17 PM:
---

Programmer manual (developer view) - initial write-up: 
https://docs.google.com/document/d/1z1R-MdAxRvTC2NazxUmN3FOCFIknkxL2TFqVUhYBVbE.
 Thanks [~anastas] for the class diagram. 


was (Author: ebortnik):
Programmer manual (developer view) - initial write-up: 
https://docs.google.com/document/d/1z1R-MdAxRvTC2NazxUmN3FOCFIknkxL2TFqVUhYBVbE.
 

> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
>Assignee: Edward Bortnikov
> Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, 
> Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory 
> Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2017-01-18 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828032#comment-15828032
 ] 

Edward Bortnikov commented on HBASE-16851:
--

Programmer manual (developer view) - initial write-up: 
https://docs.google.com/document/d/1z1R-MdAxRvTC2NazxUmN3FOCFIknkxL2TFqVUhYBVbE.
 

> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
>Assignee: Edward Bortnikov
> Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, 
> Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory 
> Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17081) Flush the entire CompactingMemStore content to disk

2017-01-16 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824307#comment-15824307
 ] 

Edward Bortnikov commented on HBASE-17081:
--

_v13 was out of sync with trunk (QA ran 1 day after submission). Rebase solved 
the problem. 

> Flush the entire CompactingMemStore content to disk
> ---
>
> Key: HBASE-17081
> URL: https://issues.apache.org/jira/browse/HBASE-17081
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Anastasia Braginsky
> Fix For: 2.0.0
>
> Attachments: HBASE-15787_8.patch, HBASE-17081-V01.patch, 
> HBASE-17081-V02.patch, HBASE-17081-V03.patch, HBASE-17081-V04.patch, 
> HBASE-17081-V05.patch, HBASE-17081-V06.patch, HBASE-17081-V06.patch, 
> HBASE-17081-V07.patch, HBASE-17081-V10.patch, HBASE-17081-V13.patch, 
> HBASE-17081-V14.patch, HBaseMeetupDecember2016-V02.pptx, 
> Pipelinememstore_fortrunk_3.patch
>
>
> Part of CompactingMemStore's memory is held by an active segment, and another 
> part is divided between immutable segments in the compacting pipeline. Upon 
> flush-to-disk request we want to flush all of it to disk, in contrast to 
> flushing only tail of the compacting pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17434) New Synchronization Scheme for Compaction Pipeline

2017-01-09 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812611#comment-15812611
 ] 

Edward Bortnikov commented on HBASE-17434:
--

Suggest to commit. This patch has been well discussed and verified. Would be 
much more convenient to fix and re-submit HBASE-17081 with a solid 
synchronization scheme in place. We are trying to solve a bunch of issues that 
piled up in the CompactingMemstore implementation. This one is a roadblock. 

Once again, thanks to all who contributed to improving the solution's quality. 

> New Synchronization Scheme for Compaction Pipeline
> --
>
> Key: HBASE-17434
> URL: https://issues.apache.org/jira/browse/HBASE-17434
> Project: HBase
>  Issue Type: Bug
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-17434-V01.patch, HBASE-17434-V02.patch, 
> HBASE-17434-V03.patch, HBASE-17434.master.001.patch
>
>
> A new copyOnWrite synchronization scheme is introduced for the compaction 
> pipeline.
> The new scheme is better since it removes the lock from getSegments() which 
> is invoked in every get and scan operation, and it reduces the number of 
> LinkedList objects that are created at runtime, thus can reduce GC (not by 
> much, but still...).
> In addition, it fixes the method getTailSize() in compaction pipeline. This 
> method creates a MemstoreSize object which comprises the data size and the 
> overhead size of the segment and needs to be atomic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17379) Lack of synchronization in CompactionPipeline#getScanners()

2017-01-05 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15800756#comment-15800756
 ] 

Edward Bortnikov commented on HBASE-17379:
--

[~eshcar]. Elegant code in RB. Please add high-level comments about the new 
synchronization scheme. 

> Lack of synchronization in CompactionPipeline#getScanners()
> ---
>
> Key: HBASE-17379
> URL: https://issues.apache.org/jira/browse/HBASE-17379
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 17379.v1.txt, 17379.v14.txt, 17379.v2.txt, 17379.v3.txt, 
> 17379.v4.txt, 17379.v5.txt, 17379.v6.txt, 17379.v8.txt
>
>
> From 
> https://builds.apache.org/job/PreCommit-HBASE-Build/5053/testReport/org.apache.hadoop.hbase.regionserver/TestHRegionWithInMemoryFlush/testWritesWhileGetting/
>  :
> {code}
> java.io.IOException: java.util.ConcurrentModificationException
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleException(HRegion.java:5886)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.initializeScanners(HRegion.java:5856)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.(HRegion.java:5819)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2786)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2766)
>   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:7036)
>   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:7015)
>   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6994)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting(TestHRegion.java:4141)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException: null
>   at 
> java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:966)
>   at java.util.LinkedList$ListItr.next(LinkedList.java:888)
>   at 
> org.apache.hadoop.hbase.regionserver.CompactionPipeline.getScanners(CompactionPipeline.java:220)
>   at 
> org.apache.hadoop.hbase.regionserver.CompactingMemStore.getScanners(CompactingMemStore.java:298)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.getScanners(HStore.java:1154)
>   at org.apache.hadoop.hbase.regionserver.Store.getScanners(Store.java:97)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.getScannersNoCompaction(StoreScanner.java:353)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:210)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.createScanner(HStore.java:1892)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:1880)
>   at 
> 

[jira] [Commented] (HBASE-17407) Correct update of maxFlushedSeqId in HRegion

2017-01-04 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15798198#comment-15798198
 ] 

Edward Bortnikov commented on HBASE-17407:
--

[~Apache9] - what is the invariant you are looking for, and how does it affect 
the correctness? "Strange intermediate states" is exactly what transactions are 
about - they are perfectly fine. Can you substantiate about the data loss case? 

Independent on that, you have a point with that maxFlushedSeqId is managed in 
too many places, maybe this is a call to action. 

> Correct update of maxFlushedSeqId in HRegion
> 
>
> Key: HBASE-17407
> URL: https://issues.apache.org/jira/browse/HBASE-17407
> Project: HBase
>  Issue Type: Bug
>Reporter: Eshcar Hillel
>
> The attribute maxFlushedSeqId in HRegion is used to track the max sequence id 
> in the store files and is reported to HMaster. When flushing only part of the 
> memstore content this value might be incorrect and may cause data loss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17373) Reverse the order of snapshot creation in the CompactingMemStore

2017-01-03 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15797288#comment-15797288
 ] 

Edward Bortnikov commented on HBASE-17373:
--

... And once again, this delicate case should be described as part of the 
synchronization scheme in the programmer's manual ... Our job, too. 

> Reverse the order of snapshot creation in the CompactingMemStore
> 
>
> Key: HBASE-17373
> URL: https://issues.apache.org/jira/browse/HBASE-17373
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-17373-V01.patch, HBASE-17373-V02.patch, 
> HBASE-17373-V03.patch, HBASE-17373-V04.patch, HBASE-17373-V04.patch, 
> HBASE-17373-V05.patch
>
>
> In CompactingMemStore both in BASIC and EAGER cases when snapshot is created 
> the segments are first removed from the pipeline then added to the snapshot. 
> This is the opposite to what is done in the DefaultMemStore where the 
> snapshot is firstly created with the active segment and only after the active 
> segment is refreshed. This JIRA is about to reverse the order in 
> CompactingMemStore and to make all MemStores to behave the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17373) Reverse the order of snapshot creation in the CompactingMemStore

2017-01-03 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15797283#comment-15797283
 ] 

Edward Bortnikov commented on HBASE-17373:
--

Mind not reverting please? Seems that this issue is related to HBASE-17081 
(currently reverted), and should be addressed there. We have a dependency on 
this commit with HBASE-17379, which is also quite pressing. Thanks. 

> Reverse the order of snapshot creation in the CompactingMemStore
> 
>
> Key: HBASE-17373
> URL: https://issues.apache.org/jira/browse/HBASE-17373
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-17373-V01.patch, HBASE-17373-V02.patch, 
> HBASE-17373-V03.patch, HBASE-17373-V04.patch, HBASE-17373-V04.patch, 
> HBASE-17373-V05.patch
>
>
> In CompactingMemStore both in BASIC and EAGER cases when snapshot is created 
> the segments are first removed from the pipeline then added to the snapshot. 
> This is the opposite to what is done in the DefaultMemStore where the 
> snapshot is firstly created with the active segment and only after the active 
> segment is refreshed. This JIRA is about to reverse the order in 
> CompactingMemStore and to make all MemStores to behave the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17373) Reverse the order of snapshot creation in the CompactingMemStore

2017-01-02 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15793279#comment-15793279
 ] 

Edward Bortnikov commented on HBASE-17373:
--

Mind committing folks (smile)? 

> Reverse the order of snapshot creation in the CompactingMemStore
> 
>
> Key: HBASE-17373
> URL: https://issues.apache.org/jira/browse/HBASE-17373
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Eshcar Hillel
> Attachments: HBASE-17373-V01.patch, HBASE-17373-V02.patch, 
> HBASE-17373-V03.patch, HBASE-17373-V04.patch, HBASE-17373-V04.patch
>
>
> In CompactingMemStore both in BASIC and EAGER cases when snapshot is created 
> the segments are first removed from the pipeline then added to the snapshot. 
> This is the opposite to what is done in the DefaultMemStore where the 
> snapshot is firstly created with the active segment and only after the active 
> segment is refreshed. This JIRA is about to reverse the order in 
> CompactingMemStore and to make all MemStores to behave the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17379) Lack of synchronization in CompactionPipeline#getScanners()

2016-12-30 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787776#comment-15787776
 ] 

Edward Bortnikov commented on HBASE-17379:
--

Thanks, all, for the comments, suggestions, and patches. I second [~stack] in 
his suggestion to let [~eshcar] and [~anastas] finish the job. We'll publish 
the precise synchronization scheme with the patch, and will also make it part 
of the programmer's manual/blog post (WIP in HBASE-16851). We also have a very 
comprehensive benchmark in HBASE-16417 - once the patch is ready we'll run it 
to make sure it does not hamper performance. 

> Lack of synchronization in CompactionPipeline#getScanners()
> ---
>
> Key: HBASE-17379
> URL: https://issues.apache.org/jira/browse/HBASE-17379
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 17379.v1.txt, 17379.v2.txt, 17379.v3.txt, 17379.v4.txt, 
> 17379.v5.txt, 17379.v6.txt, 17379.v8.txt
>
>
> From 
> https://builds.apache.org/job/PreCommit-HBASE-Build/5053/testReport/org.apache.hadoop.hbase.regionserver/TestHRegionWithInMemoryFlush/testWritesWhileGetting/
>  :
> {code}
> java.io.IOException: java.util.ConcurrentModificationException
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleException(HRegion.java:5886)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.initializeScanners(HRegion.java:5856)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.(HRegion.java:5819)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2786)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2766)
>   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:7036)
>   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:7015)
>   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6994)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting(TestHRegion.java:4141)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException: null
>   at 
> java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:966)
>   at java.util.LinkedList$ListItr.next(LinkedList.java:888)
>   at 
> org.apache.hadoop.hbase.regionserver.CompactionPipeline.getScanners(CompactionPipeline.java:220)
>   at 
> org.apache.hadoop.hbase.regionserver.CompactingMemStore.getScanners(CompactingMemStore.java:298)
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.getScanners(HStore.java:1154)
>   at org.apache.hadoop.hbase.regionserver.Store.getScanners(Store.java:97)
>   at 
> 

[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2016-12-28 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15783747#comment-15783747
 ] 

Edward Bortnikov commented on HBASE-16851:
--

Thanks Michael. Unlocked the doc for commenting. 

> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
>Assignee: Edward Bortnikov
> Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, 
> Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory 
> Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2016-12-28 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15782955#comment-15782955
 ] 

Edward Bortnikov commented on HBASE-16851:
--

Updated shared doc: 
https://docs.google.com/document/d/1lsDv8mmw3Daz9Rw9zySEI7zXOlLYYy2dyhQoB6gNMcI

> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
>Assignee: Edward Bortnikov
> Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, 
> Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory 
> Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore

2016-12-28 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15782856#comment-15782856
 ] 

Edward Bortnikov commented on HBASE-16421:
--

Question unrelated to the unfolding technical discussion :) 

Our estimate for the whole feature done comme il faut is about 2 months. Are 
you guys targeting it for 2.0 or beyond? 

Apologies for possibly asking twice - do not remember the answer to this 
question.  

> Introducing the CellChunkMap as a new additional index variant in the MemStore
> --
>
> Key: HBASE-16421
> URL: https://issues.apache.org/jira/browse/HBASE-16421
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Anastasia Braginsky
> Attachments: CellChunkMapRevived.pdf, ChunkCell_creation.png, 
> IntroductiontoNewFlatandCompactMemStore.pdf
>
>
> Follow up for HBASE-14921. This is going to be the umbrella JIRA to include 
> all the parts of integration of the CellChunkMap to the MemStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operation

2016-12-27 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15782330#comment-15782330
 ] 

Edward Bortnikov commented on HBASE-17339:
--

[~davelatham], [~yangzhe1991] - thanks for pointing out the historical context. 
Indeed, the idea will not work in peer clusters with concurrent updates. 
However, it seems that there are enough interesting use cases that deserve 
treatment. 

This optimization is complementary to in-memory flush & compaction (see 
HBASE-14918). The latter brings its own value, but in conjunction the two 
produce very impressive reduction in read latency. [~eshcar], maybe you could 
attach some perf results? Thanks.  

> Scan-Memory-First Optimization for Get Operation
> 
>
> Key: HBASE-17339
> URL: https://issues.apache.org/jira/browse/HBASE-17339
> Project: HBase
>  Issue Type: Improvement
>Reporter: Eshcar Hillel
> Attachments: HBASE-17339-V01.patch
>
>
> The current implementation of a get operation (to retrieve values for a 
> specific key) scans through all relevant stores of the region; for each store 
> both memory components (memstores segments) and disk components (hfiles) are 
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only 
> components first and only if the result is incomplete scans both memory and 
> disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operation

2016-12-27 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15780318#comment-15780318
 ] 

Edward Bortnikov commented on HBASE-17339:
--

[~yangzhe1991], that is plausible - however need to check how the timestamp 
monotonicity checking can be done efficiently. We thought of going even 
further, and passing the flag upon every single Get instead of using a CF-level 
configuration. 

> Scan-Memory-First Optimization for Get Operation
> 
>
> Key: HBASE-17339
> URL: https://issues.apache.org/jira/browse/HBASE-17339
> Project: HBase
>  Issue Type: Improvement
>Reporter: Eshcar Hillel
> Attachments: HBASE-17339-V01.patch
>
>
> The current implementation of a get operation (to retrieve values for a 
> specific key) scans through all relevant stores of the region; for each store 
> both memory components (memstores segments) and disk components (hfiles) are 
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only 
> components first and only if the result is incomplete scans both memory and 
> disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operation

2016-12-27 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15780064#comment-15780064
 ] 

Edward Bortnikov commented on HBASE-17339:
--

Use cases in which this optimization might (and will) be useful:
- Pub-Sub on top of HBase storage.
- Shared counters on top of HBase storage.
- E-commerce - editing and checking out a purchase cart. 

In these cases, the churn is high but the working set is small - can fit in 
memory. 

> Scan-Memory-First Optimization for Get Operation
> 
>
> Key: HBASE-17339
> URL: https://issues.apache.org/jira/browse/HBASE-17339
> Project: HBase
>  Issue Type: Improvement
>Reporter: Eshcar Hillel
> Attachments: HBASE-17339-V01.patch
>
>
> The current implementation of a get operation (to retrieve values for a 
> specific key) scans through all relevant stores of the region; for each store 
> both memory components (memstores segments) and disk components (hfiles) are 
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only 
> components first and only if the result is incomplete scans both memory and 
> disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2016-12-27 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15779874#comment-15779874
 ] 

Edward Bortnikov commented on HBASE-16851:
--

WIP change: 
- Separating the description and benchmarking of CellChunkMap into a standalone 
post. (Experimental project, currently at a different maturity level than 
CellArrayMap, see HBASE-16421). 

> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
>Assignee: Edward Bortnikov
> Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, 
> Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory 
> Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17294) External Configuration for Memory Compaction

2016-12-19 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762453#comment-15762453
 ] 

Edward Bortnikov commented on HBASE-17294:
--

[~devaraj], please see the performance benchmarks results reported in 
HBASE-16417. We started writing a separate blog post with clearer and more 
concise description. Thanks. 

> External Configuration for Memory Compaction 
> -
>
> Key: HBASE-17294
> URL: https://issues.apache.org/jira/browse/HBASE-17294
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-17294-V01.patch, HBASE-17294-V02.patch, 
> HBASE-17294-V03.patch
>
>
> We would like to have a single external knob to control memstore compaction.
> Possible memstore compaction policies are none, basic, and eager.
> This sub-task allows to set this property at the column family level at table 
> creation time:
> {code}
> create ‘’,
>{NAME => ‘’, 
> IN_MEMORY_COMPACTION => ‘’}
> {code}
> or to set this at the global configuration level by setting the property in 
> hbase-site.xml, with BASIC being the default value:
> {code}
> 
>   hbase.hregion.compacting.memstore.type
>   
> 
> {code}
> The values used in this property can change as memstore compaction policies 
> evolve over time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore

2016-12-19 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762435#comment-15762435
 ] 

Edward Bortnikov commented on HBASE-16421:
--

Thanks [~anoop.hbase]. Still, I kind of do not get the difference between 
creating Cells when scanning the BucketCache BB's and the Segment ChunkMap's :) 
In both cases, new temp objects are created. Did you evidence GC pressure in 
the former case? Thanks. 

> Introducing the CellChunkMap as a new additional index variant in the MemStore
> --
>
> Key: HBASE-16421
> URL: https://issues.apache.org/jira/browse/HBASE-16421
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Anastasia Braginsky
> Attachments: CellChunkMapRevived.pdf, 
> IntroductiontoNewFlatandCompactMemStore.pdf
>
>
> Follow up for HBASE-14921. This is going to be the umbrella JIRA to include 
> all the parts of integration of the CellChunkMap to the MemStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17081) Flush the entire CompactingMemStore content to disk

2016-12-19 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762151#comment-15762151
 ] 

Edward Bortnikov commented on HBASE-17081:
--

All - 

Thanks for chiming in. Apologies for possible misunderstanding - distributed 
dev processes charge their toll :) . We are starting to suffer from the fact 
that the Compacting Memstore project split into many small jira's, and it's 
hard to track the full picture. 

No problem at all with reverting specific patches if potential destabilization 
suspected. My concern was scrapping or delaying the whole project without a 
good reason, hence the suggestion to improve the discussion process and manage 
it in a well-defined space. 

Might be that the instability follows from the reverse order in which this Jira 
and HBASE-17294 were checked in. The latter was supposed to be the concluding 
chord, finalizing the configuration syntax and setting the new default. 
Although we cannot reproduce the failures in problematic tests locally, how 
about the following plan: 
1. Revert both HBASE-17081 and HBASE-17294, and see if the regression is 
stable. 
2. Rebase and checkin HBASE-17081. 
3. Rebase and checkin HBASE-17294. 
4. Move the external documentation to HBASE-14918 (top-level JIRA), to improve 
the visibility of the new definitions. 

Thanks, again, for all the assistance identifying the problems so far. 

> Flush the entire CompactingMemStore content to disk
> ---
>
> Key: HBASE-17081
> URL: https://issues.apache.org/jira/browse/HBASE-17081
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Anastasia Braginsky
> Attachments: HBASE-15787_8.patch, HBASE-17081-V01.patch, 
> HBASE-17081-V02.patch, HBASE-17081-V03.patch, HBASE-17081-V04.patch, 
> HBASE-17081-V05.patch, HBASE-17081-V06.patch, HBASE-17081-V06.patch, 
> HBASE-17081-V07.patch, HBaseMeetupDecember2016-V02.pptx, 
> Pipelinememstore_fortrunk_3.patch
>
>
> Part of CompactingMemStore's memory is held by an active segment, and another 
> part is divided between immutable segments in the compacting pipeline. Upon 
> flush-to-disk request we want to flush all of it to disk, in contrast to 
> flushing only tail of the compacting pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17081) Flush the entire CompactingMemStore content to disk

2016-12-19 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15760912#comment-15760912
 ] 

Edward Bortnikov commented on HBASE-17081:
--

[~anoop.hbase], indeed neither the current jira nor HBASE-17294 were intended 
to discuss the configuration. It has been discussed extensively in HBASE-16851. 
The current jira is about the flush of the full pipeline to disk, which is a 
basic mechanism, and IMHO there is no reason to revert it. 

If you are suggesting to re-open the decision to set the default for in-memory 
compaction, please substantiate your concerns, and how you intend to resolve 
them. We conducted a very thorough and transparent benchmarking process, and 
published the results. BASIC compaction showed no side effects, only 
advantages. EAGER compaction can indeed pose tradeoffs alongside larger gains, 
that's why it is not default. In any case, appreciate if we could run that 
discussion at HBASE-16851. It's very hard to track discussions when the jira is 
changing all the time. Definitely, we are -1 for reverting the change in 
HBASE-17294 without discussing the implications. 

The intent behind introducing the default is that otherwise nobody would use 
the option, as [~stack] rightfully noted. That's why we invested in testing, 
benchmarking, and simplicity of configuration so much. We are prepared to 
handle the issues that arise with this change in behavior. We value your 
perspective a lot, however let's build the discussion around what gaps exist on 
the ground, and how they can be mediate them without killing the feature. 
Thanks [~anoop.hbase].  

> Flush the entire CompactingMemStore content to disk
> ---
>
> Key: HBASE-17081
> URL: https://issues.apache.org/jira/browse/HBASE-17081
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Anastasia Braginsky
> Attachments: HBASE-15787_8.patch, HBASE-17081-V01.patch, 
> HBASE-17081-V02.patch, HBASE-17081-V03.patch, HBASE-17081-V04.patch, 
> HBASE-17081-V05.patch, HBASE-17081-V06.patch, HBASE-17081-V06.patch, 
> HBASE-17081-V07.patch, HBaseMeetupDecember2016-V02.pptx, 
> Pipelinememstore_fortrunk_3.patch
>
>
> Part of CompactingMemStore's memory is held by an active segment, and another 
> part is divided between immutable segments in the compacting pipeline. Upon 
> flush-to-disk request we want to flush all of it to disk, in contrast to 
> flushing only tail of the compacting pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17081) Flush the entire CompactingMemStore content to disk

2016-12-18 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15760388#comment-15760388
 ] 

Edward Bortnikov commented on HBASE-17081:
--

The commit should be re-applied. The problem has been exposed by the new 
configuration HBASE-17294 as [~ram_krish] indicated.  Maybe a new Jira should 
be filed. 

> Flush the entire CompactingMemStore content to disk
> ---
>
> Key: HBASE-17081
> URL: https://issues.apache.org/jira/browse/HBASE-17081
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Anastasia Braginsky
> Attachments: HBASE-15787_8.patch, HBASE-17081-V01.patch, 
> HBASE-17081-V02.patch, HBASE-17081-V03.patch, HBASE-17081-V04.patch, 
> HBASE-17081-V05.patch, HBASE-17081-V06.patch, HBASE-17081-V06.patch, 
> HBASE-17081-V07.patch, HBaseMeetupDecember2016-V02.pptx, 
> Pipelinememstore_fortrunk_3.patch
>
>
> Part of CompactingMemStore's memory is held by an active segment, and another 
> part is divided between immutable segments in the compacting pipeline. Upon 
> flush-to-disk request we want to flush all of it to disk, in contrast to 
> flushing only tail of the compacting pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17081) Flush the entire CompactingMemStore content to disk

2016-12-18 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15760383#comment-15760383
 ] 

Edward Bortnikov commented on HBASE-17081:
--

Thanks [~ram_krish] for discovering. 

Compacting Memstore (basic configuration) became default as of HBASE-17294, the 
documentation indicates that. The family can be configured for a different type 
of in-memory compaction (NONE/EAGER). So I guess the issue is with the other 
test that the nee configuration exposed. 

> Flush the entire CompactingMemStore content to disk
> ---
>
> Key: HBASE-17081
> URL: https://issues.apache.org/jira/browse/HBASE-17081
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Anastasia Braginsky
> Attachments: HBASE-15787_8.patch, HBASE-17081-V01.patch, 
> HBASE-17081-V02.patch, HBASE-17081-V03.patch, HBASE-17081-V04.patch, 
> HBASE-17081-V05.patch, HBASE-17081-V06.patch, HBASE-17081-V06.patch, 
> HBASE-17081-V07.patch, HBaseMeetupDecember2016-V02.pptx, 
> Pipelinememstore_fortrunk_3.patch
>
>
> Part of CompactingMemStore's memory is held by an active segment, and another 
> part is divided between immutable segments in the compacting pipeline. Upon 
> flush-to-disk request we want to flush all of it to disk, in contrast to 
> flushing only tail of the compacting pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore

2016-12-18 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15760375#comment-15760375
 ] 

Edward Bortnikov commented on HBASE-16421:
--

Thanks [~anastas], [~anoop.hbase] and [~stack]. Two questions: 

1. What is the time frame of this feature? Looks like the full-fledged 
implementation + evaluation will take a few months. Can it be candidate for 2.0 
then? 
2. I guess many of the potential performance concerns are similar to the 
read-path off-heaping. There too, Cell objects are created from off-heap block 
indexes. Theoretically, there too, it would be desirable to copy the result 
directly to the response protocol buffer. Do you think that the read-path 
performance would be a good predictor to what we'll see here? What would be the 
minimum PoC?   

> Introducing the CellChunkMap as a new additional index variant in the MemStore
> --
>
> Key: HBASE-16421
> URL: https://issues.apache.org/jira/browse/HBASE-16421
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Anastasia Braginsky
> Attachments: CellChunkMapRevived.pdf, 
> IntroductiontoNewFlatandCompactMemStore.pdf
>
>
> Follow up for HBASE-14921. This is going to be the umbrella JIRA to include 
> all the parts of integration of the CellChunkMap to the MemStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17081) Flush the entire CompactingMemStore content to disk

2016-12-15 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752078#comment-15752078
 ] 

Edward Bortnikov commented on HBASE-17081:
--

[~anastas] Please take a look at the test result, seems to be related: 

Flaked tests: 
org.apache.hadoop.hbase.regionserver.TestHRegionWithInMemoryFlush.testWritesWhileScanning(org.apache.hadoop.hbase.regionserver.TestHRegionWithInMemoryFlush)
  Run 1: TestHRegionWithInMemoryFlush>TestHRegion.testWritesWhileScanning:3979 
expected null, but was:


> Flush the entire CompactingMemStore content to disk
> ---
>
> Key: HBASE-17081
> URL: https://issues.apache.org/jira/browse/HBASE-17081
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Anastasia Braginsky
> Attachments: HBASE-15787_8.patch, HBASE-17081-V01.patch, 
> HBASE-17081-V02.patch, HBASE-17081-V03.patch, HBASE-17081-V04.patch, 
> HBASE-17081-V05.patch, HBASE-17081-V06.patch, HBASE-17081-V06.patch, 
> HBASE-17081-V07.patch, HBaseMeetupDecember2016-V02.pptx, 
> Pipelinememstore_fortrunk_3.patch
>
>
> Part of CompactingMemStore's memory is held by an active segment, and another 
> part is divided between immutable segments in the compacting pipeline. Upon 
> flush-to-disk request we want to flush all of it to disk, in contrast to 
> flushing only tail of the compacting pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17081) Flush the entire CompactingMemStore content to disk

2016-12-14 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747604#comment-15747604
 ] 

Edward Bortnikov commented on HBASE-17081:
--

Re/ [~stack]'s question about what's next: 
1. HBASE-17294 configuration (Eshcar) - committed, thanks [~stack]. 
2. HBASE-16851 documentation (me) - need to complete 3 blog posts: (1) user 
manual - complete, give or take, (2) performance eval, and (3) programmer's 
manual. Where should we post all those? Apache blog? 
3. HBASE-16417 (Eshcar) automated policy for figuring out whether the BASIC or 
the EAGER algorithm is to be used. Small refactoring the internal API for 
future policies. 

Independent on in-memory compaction per se: 
1. HBASE-16421 CellChunkMap implementation (Anastasia) - starting now, need to 
coordinate with [~anoop.hbase] and [~ram_krish]. 
2. JIRA TBD Memstore-First Get (Eshcar) - Big value demonstrated by benchmarks 
in HBASE-16851, we should try to implement & push before 2.0 closes. 

Sounds like a plan (smile) ?

> Flush the entire CompactingMemStore content to disk
> ---
>
> Key: HBASE-17081
> URL: https://issues.apache.org/jira/browse/HBASE-17081
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Anastasia Braginsky
> Attachments: HBASE-15787_8.patch, HBASE-17081-V01.patch, 
> HBASE-17081-V02.patch, HBASE-17081-V03.patch, HBASE-17081-V04.patch, 
> HBASE-17081-V05.patch, HBASE-17081-V06.patch, HBASE-17081-V06.patch, 
> HBaseMeetupDecember2016-V02.pptx, Pipelinememstore_fortrunk_3.patch
>
>
> Part of CompactingMemStore's memory is held by an active segment, and another 
> part is divided between immutable segments in the compacting pipeline. Upon 
> flush-to-disk request we want to flush all of it to disk, in contrast to 
> flushing only tail of the compacting pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17081) Flush the entire CompactingMemStore content to disk

2016-11-27 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15700230#comment-15700230
 ] 

Edward Bortnikov commented on HBASE-17081:
--

Folks, would you please consider for commit. 

> Flush the entire CompactingMemStore content to disk
> ---
>
> Key: HBASE-17081
> URL: https://issues.apache.org/jira/browse/HBASE-17081
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Anastasia Braginsky
> Attachments: HBASE-17081-V01.patch, HBASE-17081-V02.patch, 
> HBASE-17081-V03.patch, Pipelinememstore_fortrunk_3.patch
>
>
> Part of CompactingMemStore's memory is held by an active segment, and another 
> part is divided between immutable segments in the compacting pipeline. Upon 
> flush-to-disk request we want to flush all of it to disk, in contrast to 
> flushing only tail of the compacting pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17081) Flush the entire CompactingMemStore content to disk

2016-11-27 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15699652#comment-15699652
 ] 

Edward Bortnikov commented on HBASE-17081:
--

Please note that this feature is part of both BASIC and EAGER compaction 
policies, as described in 
https://docs.google.com/document/d/16XOiOuG9e0l6D_mD-oM5JHcSVmKC8rpIVJGH239gWsQ 
(see HBASE-16851). 
As such, Index and Data merge are both parts of the EAGER policy; only index 
flattening happens in BASIC. 
The whole pipeline is flushed to disk, in both policies. 

> Flush the entire CompactingMemStore content to disk
> ---
>
> Key: HBASE-17081
> URL: https://issues.apache.org/jira/browse/HBASE-17081
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Anastasia Braginsky
> Attachments: HBASE-17081-V01.patch, HBASE-17081-V02.patch, 
> Pipelinememstore_fortrunk_3.patch
>
>
> Part of CompactingMemStore's memory is held by an active segment, and another 
> part is divided between immutable segments in the compacting pipeline. Upon 
> flush-to-disk request we want to flush all of it to disk, in contrast to 
> flushing only tail of the compacting pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2016-11-27 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15699646#comment-15699646
 ] 

Edward Bortnikov commented on HBASE-16851:
--

Dear committers - please assign this issue to me. 

> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
> Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, 
> Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory 
> Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2016-11-23 Thread Edward Bortnikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Bortnikov updated HBASE-16851:
-
Attachment: Accordion HBase In-Memory Compaction - Nov 23.pdf

> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
> Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, 
> Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory 
> Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2016-11-23 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690414#comment-15690414
 ] 

Edward Bortnikov edited comment on HBASE-16851 at 11/23/16 3:30 PM:


Updated version of the blog post in 
https://docs.google.com/document/d/16XOiOuG9e0l6D_mD-oM5JHcSVmKC8rpIVJGH239gWsQ.
 

Hopefully, the final configuration syntax and technical description. 


was (Author: ebortnik):
Updated version of the blog post in 
https://docs.google.com/document/d/16XOiOuG9e0l6D_mD-oM5JHcSVmKC8rpIVJGH239gWsQ.
 

> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
> Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, 
> Accordion_ HBase In-Memory Compaction - Oct 27.pdf, 
> HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2016-11-23 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690414#comment-15690414
 ] 

Edward Bortnikov commented on HBASE-16851:
--

Updated version of the blog post in 
https://docs.google.com/document/d/16XOiOuG9e0l6D_mD-oM5JHcSVmKC8rpIVJGH239gWsQ.
 

> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
> Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, 
> Accordion_ HBase In-Memory Compaction - Oct 27.pdf, 
> HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions

2016-11-13 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15661998#comment-15661998
 ] 

Edward Bortnikov commented on HBASE-16417:
--

[~anoop.hbase], we certainly appreciate the input, feel free to fire the first 
thoughts going fwd (smile). Yes, we thought about the multi-cf case. We are 
speaking of single-row get only. The idea was trying to fetch from the set of 
the memstore scanners first. If the data can be retrieved, no need to go look 
in HFiles - isn't it? Am I missing something here? 

> In-Memory MemStore Policy for Flattening and Compactions
> 
>
> Key: HBASE-16417
> URL: https://issues.apache.org/jira/browse/HBASE-16417
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-16417-benchmarkresults-20161101.pdf, 
> HBASE-16417-benchmarkresults-20161110.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17081) Flush the entire CompactingMemStore content to disk

2016-11-12 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15660383#comment-15660383
 ] 

Edward Bortnikov commented on HBASE-17081:
--

Just clarifying the context. This code is a building block for the default 
compaction policy that has been suggested before. 

> Flush the entire CompactingMemStore content to disk
> ---
>
> Key: HBASE-17081
> URL: https://issues.apache.org/jira/browse/HBASE-17081
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Anastasia Braginsky
>
> Part of CompactingMemStore's memory is held by an active segment, and another 
> part is divided between immutable segments in the compacting pipeline. Upon 
> flush-to-disk request we want to flush all of it to disk, in contrast to 
> flushing only tail of the compacting pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions

2016-11-12 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15659859#comment-15659859
 ] 

Edward Bortnikov commented on HBASE-16417:
--

Just emphasizing the #4 point raised by [~eshcar], it looks pretty important. 
Does anyone see a problem with the "try-to-read-from-the-memstore-first" 
approach for scans? It seems to be pretty important for in-memory compaction. 
Please speak up (smile). 

> In-Memory MemStore Policy for Flattening and Compactions
> 
>
> Key: HBASE-16417
> URL: https://issues.apache.org/jira/browse/HBASE-16417
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-16417-benchmarkresults-20161101.pdf, 
> HBASE-16417-benchmarkresults-20161110.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16608) Introducing the ability to merge ImmutableSegments without copy-compaction or SQM usage

2016-11-01 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15626761#comment-15626761
 ] 

Edward Bortnikov commented on HBASE-16608:
--

Disclaimer. Configuration is subject to changes, pending conclusions in 
HBASE-16417. In particular, compaction policy as global configuration (rather 
than per-CF attribute) is temporary.  

> Introducing the ability to merge ImmutableSegments without copy-compaction or 
> SQM usage
> ---
>
> Key: HBASE-16608
> URL: https://issues.apache.org/jira/browse/HBASE-16608
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Anastasia Braginsky
>Assignee: Anastasia Braginsky
> Fix For: 2.0.0
>
> Attachments: HBASE-16417-V02.patch, HBASE-16417-V04.patch, 
> HBASE-16417-V06.patch, HBASE-16417-V07.patch, HBASE-16417-V08.patch, 
> HBASE-16417-V10.patch, HBASE-16608-Final.patch, HBASE-16608-Final.patch, 
> HBASE-16608-V01.patch, HBASE-16608-V03.patch, HBASE-16608-V04.patch, 
> HBASE-16608-V08.patch, HBASE-16608-V09.patch, HBASE-16608-V09.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2016-11-01 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625712#comment-15625712
 ] 

Edward Bortnikov commented on HBASE-16851:
--

A more detailed version published, diagrams updated, perf results pending. 

> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
> Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, 
> Accordion_ HBase In-Memory Compaction - Oct 27.pdf, 
> HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2016-11-01 Thread Edward Bortnikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Bortnikov updated HBASE-16851:
-
Attachment: Accordion HBase In-Memory Compaction - Nov 1 .pdf

> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
> Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, 
> Accordion_ HBase In-Memory Compaction - Oct 27.pdf, 
> HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-14918) In-Memory MemStore Flush and Compaction

2016-10-28 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615304#comment-15615304
 ] 

Edward Bortnikov edited comment on HBASE-14918 at 10/28/16 12:54 PM:
-

Let's focus the discussion on HBASE-16417, that is the right context. 


was (Author: ebortnik):
Let's focus the discussion on HBASE-14617, that is the right context. 

> In-Memory MemStore Flush and Compaction
> ---
>
> Key: HBASE-14918
> URL: https://issues.apache.org/jira/browse/HBASE-14918
> Project: HBase
>  Issue Type: Umbrella
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: CellBlocksSegmentDesign.pdf, 
> HBASE-16417-benchmarkresults.pdf, MSLABMove.patch
>
>
> A memstore serves as the in-memory component of a store unit, absorbing all 
> updates to the store. From time to time these updates are flushed to a file 
> on disk, where they are compacted (by eliminating redundancies) and 
> compressed (i.e., written in a compressed format to reduce their storage 
> size).
> We aim to speed up data access, and therefore suggest to apply in-memory 
> memstore flush. That is to flush the active in-memory segment into an 
> intermediate buffer where it can be accessed by the application. Data in the 
> buffer is subject to compaction and can be stored in any format that allows 
> it to take up smaller space in RAM. The less space the buffer consumes the 
> longer it can reside in memory before data is flushed to disk, resulting in 
> better performance.
> Specifically, the optimization is beneficial for workloads with 
> medium-to-high key churn which incur many redundant cells, like persistent 
> messaging. 
> We suggest to structure the solution as 4 subtasks (respectively, patches). 
> (1) Infrastructure - refactoring of the MemStore hierarchy, introducing 
> segment (StoreSegment) as first-class citizen, and decoupling memstore 
> scanner from the memstore implementation;
> (2) Adding StoreServices facility at the region level to allow memstores 
> update region counters and access region level synchronization mechanism;
> (3) Implementation of a new memstore (CompactingMemstore) with non-optimized 
> immutable segment representation, and 
> (4) Memory optimization including compressed format representation and off 
> heap allocations.
> This Jira continues the discussion in HBASE-13408.
> Design documents, evaluation results and previous patches can be found in 
> HBASE-13408. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction

2016-10-28 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615304#comment-15615304
 ] 

Edward Bortnikov commented on HBASE-14918:
--

Let's focus the discussion on HBASE-14617, that is the right context. 

> In-Memory MemStore Flush and Compaction
> ---
>
> Key: HBASE-14918
> URL: https://issues.apache.org/jira/browse/HBASE-14918
> Project: HBase
>  Issue Type: Umbrella
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: CellBlocksSegmentDesign.pdf, 
> HBASE-16417-benchmarkresults.pdf, MSLABMove.patch
>
>
> A memstore serves as the in-memory component of a store unit, absorbing all 
> updates to the store. From time to time these updates are flushed to a file 
> on disk, where they are compacted (by eliminating redundancies) and 
> compressed (i.e., written in a compressed format to reduce their storage 
> size).
> We aim to speed up data access, and therefore suggest to apply in-memory 
> memstore flush. That is to flush the active in-memory segment into an 
> intermediate buffer where it can be accessed by the application. Data in the 
> buffer is subject to compaction and can be stored in any format that allows 
> it to take up smaller space in RAM. The less space the buffer consumes the 
> longer it can reside in memory before data is flushed to disk, resulting in 
> better performance.
> Specifically, the optimization is beneficial for workloads with 
> medium-to-high key churn which incur many redundant cells, like persistent 
> messaging. 
> We suggest to structure the solution as 4 subtasks (respectively, patches). 
> (1) Infrastructure - refactoring of the MemStore hierarchy, introducing 
> segment (StoreSegment) as first-class citizen, and decoupling memstore 
> scanner from the memstore implementation;
> (2) Adding StoreServices facility at the region level to allow memstores 
> update region counters and access region level synchronization mechanism;
> (3) Implementation of a new memstore (CompactingMemstore) with non-optimized 
> immutable segment representation, and 
> (4) Memory optimization including compressed format representation and off 
> heap allocations.
> This Jira continues the discussion in HBASE-13408.
> Design documents, evaluation results and previous patches can be found in 
> HBASE-13408. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions

2016-10-28 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615299#comment-15615299
 ] 

Edward Bortnikov commented on HBASE-16417:
--

Just to give a sense of what we've been thinking as possible auto-tuning policy 
(smile). It's a "war driving" approach that is actually similar to 
opportunistic scans we had once but is a bit smarter.  Suppose we do full 
(data) compaction once in a while; a by-product is the compaction factor - how 
much space we saved. If the latter is small - schedule the next compaction 
further away, using some exponential backoff scheme. For workloads with very 
few duplicates - compactions will never happen, de-facto. For skewed workloads, 
compactions will consistently prove valuable, and will run at a constant pace. 

Note that the above is unrelated to whether we flush just one or all segments 
in the pipeline once the disk flush time comes. Personal opinion - no problem 
with flushing everything if this shows value. Let's wait for more benchmark 
results, they're just around the corner. One more personal opinion - we should 
strive to a generic policy, as much independent as possible on whether we use 
MSLAB's or not, run on-heap or off-heap, etc.; let's see if we can get there. 

Actually it's the most fun stage now - we have all the building blocks, the 
goal is connecting them right :) Stay tuned, we'll keep sharing the results and 
the ideas.  

> In-Memory MemStore Policy for Flattening and Compactions
> 
>
> Key: HBASE-16417
> URL: https://issues.apache.org/jira/browse/HBASE-16417
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions

2016-10-27 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612232#comment-15612232
 ] 

Edward Bortnikov commented on HBASE-16417:
--

[~anoop.hbase], [~eshcar] - the end state you both want to reach is the same, 
just the ways of going there are different. Before going into any detail - the 
holy grail is a self-tuning policy that does the right thing in EVERY use case 
of interest. We'd like to achieve it without any additional configuration or 
private solutions for specific cases. Reason is - what ends up as non-default 
option will never be used. 

Anoop - you want two things actually: (1) no compaction of any kind happen when 
there is no redundancy and (2) flushing everything to disk when the memstore 
overflows. (Note that these two can be decoupled.) Hopefully you don't mind if 
there's a policy that magically figures out that we're in that use case, at 
practically zero cost, and does exactly (1) and (2). We just disagree on 
marking the opposite case (many duplicates) as special and going down a 
different code path there - because if we leave it to the admin as non-default 
we know what'll happen. 

So we are after that magic policy. The quest won't take long but it has to be 
data-driven. At the moment, we've just reproduced one microbenchmark (uniform 
writes, no reads), but there are many other cases that should be looked at. We 
have the env to run them, and we'll be producing those results over the next 
couple of weeks. We'll be very much transparent in the process, publishing the 
results frequently. Once we have the data let's decide collectively. If nothing 
universal we'll work we can always back off to configs but I'd consider that 
undesirable. 

> In-Memory MemStore Policy for Flattening and Compactions
> 
>
> Key: HBASE-16417
> URL: https://issues.apache.org/jira/browse/HBASE-16417
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Anastasia Braginsky
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions

2016-10-27 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611916#comment-15611916
 ] 

Edward Bortnikov commented on HBASE-16417:
--

 blockquote, div.yahoo_quoted { margin-left: 0 !important; border-left:1px 
#715FFA solid !important; padding-left:1ex !important; background-color:white 
!important; }  I created this Jira, I think I can attach. Please share this 
file with me. 


Sent from Yahoo Mail for iPhone


On Thursday, October 27, 2016, 4:32 PM, Eshcar Hillel (JIRA)  
wrote:


    [ 
https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611902#comment-15611902
 ] 

Eshcar Hillel commented on HBASE-16417:
---

The report of the first round of experiment is ready however I cannot attach it 
here.
Can anyone assign this subtask to me so I can attach files in it?
Meanwhile, I will attach it in the umbrella Jira.

The summary of the report is as follows --
Main difference in configuration vs previous benchmarks:
1. Since we run on a 48GB ram machine we allocate only 16GB to HBase (and not 
32GB).
2. Saturation point was found when running 10 threads (and not 50); see more 
details in the report.
3. We write 50GB (and not 150GB) just to have the experiments shorter since we 
run many different settings.

First round of experiments compares different options (no-, index-, 
data-compaction) under write-only workload with uniform keys distribution using 
PE. We see that up until the 95th percentile all options are comparable. At the 
99th percentile data compaction starts to lag behind -- indeed in a uniform 
workload there is not much point in doing data compaction. The overhead might 
stem from running SQM to determine which versions to retain. One way to close 
this gap is to not run data compaction when there is no gain in it. A good 
policy should be able to identify this with no extra cost.
At the 99.999th percentile index compaction also exhibits significant overhead. 
This might be due to memory reclamation of temporary indices.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

 



> In-Memory MemStore Policy for Flattening and Compactions
> 
>
> Key: HBASE-16417
> URL: https://issues.apache.org/jira/browse/HBASE-16417
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anastasia Braginsky
>Assignee: Anastasia Braginsky
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature

2016-10-27 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611802#comment-15611802
 ] 

Edward Bortnikov commented on HBASE-16851:
--

Attached a version of user-facing doc, with configuration specified. No 
illustrations yet - working on it. Compaction triggering policy described in 
general terms (pending HBASE-16417). 

The impl details are rather high-level - to let the user roughly understand 
what this feature is about. Let's wait until the policy shapes out, to figure 
out whether they actually belong to developer documentation. 

> User-facing documentation for the In-Memory Compaction feature
> --
>
> Key: HBASE-16851
> URL: https://issues.apache.org/jira/browse/HBASE-16851
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Edward Bortnikov
> Attachments: Accordion_ HBase In-Memory Compaction - Oct 27.pdf, 
> HBaseAcceleratedHbaseConf-final.pptx
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >