[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature
[ https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427663#comment-16427663 ] Edward Bortnikov commented on HBASE-16851: -- [~stack] whatever you find the right procedure. Sent from Yahoo Mail for iPhone On Thursday, April 5, 2018, 8:05 PM, stack (JIRA)wrote: [ https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427259#comment-16427259 ] stack commented on HBASE-16851: --- Not sure what this was resolved. I see you posted a bit of doc for the refguide [~ebortnik] and it didn't get any love. It looks good. There is also HBASE-20259 "Doc configs for in-memory-compaction and add detail to in-memory-compaction logging" which went in but this should have gone in before it. Should we reopen this to get your doc in on top of HBASE-20259 sir? -- This message was sent by Atlassian JIRA (v7.6.3#76005) > User-facing documentation for the In-Memory Compaction feature > -- > > Key: HBASE-16851 > URL: https://issues.apache.org/jira/browse/HBASE-16851 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Edward Bortnikov >Assignee: Edward Bortnikov >Priority: Major > Fix For: 2.0.0 > > Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, > Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory > Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20188) [TESTING] Performance
[ https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426569#comment-16426569 ] Edward Bortnikov commented on HBASE-20188: -- Michael, thanks for all the diligence, apparently you are a step ahead of us with 8G. Could you please post the complete results for 8G so that we could see the difference between the reads and the writes? The workloada result is weird - the writes are skewed, and IMC should really shine. Apparently, the read and the write paths have very different (and independent) issues. With workloadc, there is no reason IMC would work faster (multiple segments to look up), but let's understand workloada first. Thanks again. > [TESTING] Performance > - > > Key: HBASE-20188 > URL: https://issues.apache.org/jira/browse/HBASE-20188 > Project: HBase > Issue Type: Umbrella > Components: Performance >Reporter: stack >Assignee: stack >Priority: Blocker > Fix For: 2.0.0 > > Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 > performance evaluation - Basic vs None_ system settings.pdf, > ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, > ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, > ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, > ITBLL2.5B_1.2.7vs2.0.0_ops.png, > ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, > YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, > YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, > flamegraph-1072.1.svg, flamegraph-1072.2.svg, hbase-env.sh, hbase-site.xml, > lock.127.workloadc.20180402T200918Z.svg, > lock.2.memsize2.c.20180403T160257Z.svg, run_ycsb.sh, tree.txt > > > How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor > that it is much slower, that the problem is the asyncwal writing. Does > in-memory compaction slow us down or speed us up? What happens when you > enable offheaping? > Keep notes here in this umbrella issue. Need to be able to say something > about perf when 2.0.0 ships. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20188) [TESTING] Performance
[ https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425861#comment-16425861 ] Edward Bortnikov commented on HBASE-20188: -- [~stack] just making sure we're on the same page .. the "2 all defaults" column (col I) does not include FastPath (included in Col F), is this intentional? One other thing that puzzles me is the discrepancy between your and [~eshcar]'s results for workloadA - her results show +27% upside for IMC, curious what's going on here? Last question - do you intend to start looking at off-heap configurations? We are working on them now, too. Thanks > [TESTING] Performance > - > > Key: HBASE-20188 > URL: https://issues.apache.org/jira/browse/HBASE-20188 > Project: HBase > Issue Type: Umbrella > Components: Performance >Reporter: stack >Assignee: stack >Priority: Blocker > Fix For: 2.0.0 > > Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 > performance evaluation - Basic vs None_ system settings.pdf, > ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, > ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, > ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, > ITBLL2.5B_1.2.7vs2.0.0_ops.png, > ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, > YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, > YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, > flamegraph-1072.1.svg, flamegraph-1072.2.svg, hbase-env.sh, hbase-site.xml, > lock.127.workloadc.20180402T200918Z.svg, > lock.2.memsize2.c.20180403T160257Z.svg, run_ycsb.sh, tree.txt > > > How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor > that it is much slower, that the problem is the asyncwal writing. Does > in-memory compaction slow us down or speed us up? What happens when you > enable offheaping? > Keep notes here in this umbrella issue. Need to be able to say something > about perf when 2.0.0 ships. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20188) [TESTING] Performance
[ https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425804#comment-16425804 ] Edward Bortnikov commented on HBASE-20188: -- [~eshcar] could you please post your YCSB 100%W benchmark code? Thanks > [TESTING] Performance > - > > Key: HBASE-20188 > URL: https://issues.apache.org/jira/browse/HBASE-20188 > Project: HBase > Issue Type: Umbrella > Components: Performance >Reporter: stack >Assignee: stack >Priority: Blocker > Fix For: 2.0.0 > > Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 > performance evaluation - Basic vs None_ system settings.pdf, > ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, > ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, > ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, > ITBLL2.5B_1.2.7vs2.0.0_ops.png, > ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, > YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, > YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, > flamegraph-1072.1.svg, flamegraph-1072.2.svg, hbase-site.xml, > lock.127.workloadc.20180402T200918Z.svg, > lock.2.memsize2.c.20180403T160257Z.svg, run_ycsb.sh, tree.txt > > > How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor > that it is much slower, that the problem is the asyncwal writing. Does > in-memory compaction slow us down or speed us up? What happens when you > enable offheaping? > Keep notes here in this umbrella issue. Need to be able to say something > about perf when 2.0.0 ships. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HBASE-20234) Expose in-memory compaction metrics
[ https://issues.apache.org/jira/browse/HBASE-20234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Bortnikov reassigned HBASE-20234: Assignee: Anastasia Braginsky > Expose in-memory compaction metrics > --- > > Key: HBASE-20234 > URL: https://issues.apache.org/jira/browse/HBASE-20234 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: Anastasia Braginsky >Priority: Major > > Hard to glean insight from how well in-memory compaction is doing currently. > It dumps stats into the logs but better if they were available to a > dashboard. This issue is about exposing a couple of helpful counts. There are > already by-region metrics. We can add a few for in-memory compaction (Help me > out [~anastas]... what counts would be best to expose). > Flush related metrics include > {code} > Namespace_default_table_tsdb-tree_region_cfbf23e7330a1a2bbde031f9583d3415_metric_flushesQueuedCount: > { > description: "Number flushes requested/queued for this region", > value: 0 > { > description: "The number of cells flushed to disk", > value: 0 > }, > { > description: "The total amount of data flushed to disk, in bytes", > value: 0 > }, > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-20259) Doc configs for in-memory-compaction and add detail to in-memory-compaction logging
[ https://issues.apache.org/jira/browse/HBASE-20259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421993#comment-16421993 ] Edward Bortnikov edited comment on HBASE-20259 at 4/2/18 7:29 AM: -- [~stack] let's take the decision in a broader context of the overall performance evaluation methodology. Tons of effort have been invested in exploring CompactingMemstore, including the off-heap path. The demonstrated performance benefits were decisive. Let's avoid haste just now. We're now on testing the system in the settings that have not been expected before (CMS+MSLAB on-heap combination), still need some time to figure out what's going on. Stay tuned sir .. was (Author: ebortnik): [~stack] let's take the decision in a broader context of the overall performance evaluation methodology. Tons of effort have been invested in exploring CompactingMemstore, and the demonstrated performance benefits were decisive. Let's avoid haste just now. We're now on testing the system in the settings that have not been expected before (CMS+MSLAB on-heap combination), still need some time to figure out what's going on. Stay tuned sir .. > Doc configs for in-memory-compaction and add detail to in-memory-compaction > logging > --- > > Key: HBASE-20259 > URL: https://issues.apache.org/jira/browse/HBASE-20259 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: stack >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-20259.master.001.patch, > HBASE-20259.master.002.patch, HBASE-20259.master.003.patch > > > I set {{hbase.systemtables.compacting.memstore.type}} to NONE but it seems > like in-memory is still on. My table looks like this: > {code} > Table ycsb is ENABLED > ycsb > COLUMN FAMILIES DESCRIPTION > {NAME => 'family', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', > NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', > CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => > 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER = > > 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', > > CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', > > COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'} > {code} > Looks like table doesn't have it on either (IN_MEMORY_COMPACTION doesn't show > in the above). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20259) Doc configs for in-memory-compaction and add detail to in-memory-compaction logging
[ https://issues.apache.org/jira/browse/HBASE-20259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421993#comment-16421993 ] Edward Bortnikov commented on HBASE-20259: -- [~stack] let's take the decision in a broader context of the overall performance evaluation methodology. Tons of effort have been invested in exploring CompactingMemstore, and the demonstrated performance benefits were decisive. Let's avoid haste just now. We're now on testing the system in the settings that have not been expected before (CMS+MSLAB on-heap combination), still need some time to figure out what's going on. Stay tuned sir .. > Doc configs for in-memory-compaction and add detail to in-memory-compaction > logging > --- > > Key: HBASE-20259 > URL: https://issues.apache.org/jira/browse/HBASE-20259 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: stack >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-20259.master.001.patch, > HBASE-20259.master.002.patch, HBASE-20259.master.003.patch > > > I set {{hbase.systemtables.compacting.memstore.type}} to NONE but it seems > like in-memory is still on. My table looks like this: > {code} > Table ycsb is ENABLED > ycsb > COLUMN FAMILIES DESCRIPTION > {NAME => 'family', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', > NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', > CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => > 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER = > > 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', > > CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', > > COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'} > {code} > Looks like table doesn't have it on either (IN_MEMORY_COMPACTION doesn't show > in the above). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-18294) Reduce global heap pressure: flush based on heap occupancy
[ https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353979#comment-16353979 ] Edward Bortnikov commented on HBASE-18294: -- [~eshcar], [~anoop.hbase], [~ram_krish], maybe we can take the following approach to your dispute on RB. The code makes the following config variable definitions: {code:java} public static final String HREGION_MEMSTORE_FLUSH_SIZE = "hbase.hregion.memstore.flush.size"; public static final String HREGION_MEMSTORE_OFFHEAP_FLUSH_SIZE = "hbase.hregion.memstore.offheap.flush.size";{code} The former is the legacy flush size threshold, whereas the latter is new. However, the further treatment is different - HREGION_MEMORY_FLUSH_SIZE is actually treated as *on-heap* threshold. This is confusing I guess - especially for admins. Having said that, we do need separate accounting for on-heap and off-heap memory, as [~eshcar] explained above. Let me suggest a change that is more digestible for users imo. Let HREGION_MEMORY_FLUSH_SIZE retain its legacy meaning (and the 128M default) - namely, the overall max memory the system is willing to allocate for a store. Furthermore, let's define a new variable, HREGION_MEMSTORE_OFFHEAP_SIZE_RATIO, to define a fraction of the former that can be allocated offheap (0, by default). How about that? > Reduce global heap pressure: flush based on heap occupancy > -- > > Key: HBASE-18294 > URL: https://issues.apache.org/jira/browse/HBASE-18294 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-18294.01.patch, HBASE-18294.01.patch, > HBASE-18294.01.patch, HBASE-18294.01.patch, HBASE-18294.02.patch, > HBASE-18294.03.patch, HBASE-18294.04.patch, HBASE-18294.05.patch, > HBASE-18294.06.patch, HBASE-18294.07.patch, HBASE-18294.07.patch, > HBASE-18294.08.patch, HBASE-18294.09.patch, HBASE-18294.10.patch, > HBASE-18294.11.patch, HBASE-18294.11.patch, HBASE-18294.12.patch, > HBASE-18294.13.patch, HBASE-18294.15.patch, HBASE-18294.16.patch, > HBASE-18294.master.01.patch > > > A region is flushed if its memory component exceed a threshold (default size > is 128MB). > A flush policy decides whether to flush a store by comparing the size of the > store to another threshold (that can be configured with > hbase.hregion.percolumnfamilyflush.size.lower.bound). > Currently the implementation (in both cases) compares the data size > (key-value only) to the threshold where it should compare the heap size > (which includes index size, and metadata). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-18294) Reduce global heap pressure: flush based on heap occupancy
[ https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351714#comment-16351714 ] Edward Bortnikov commented on HBASE-18294: -- I second [~eshcar]. Off-heap and on-heap memory are different resources, with potentially very different allocations within the same machine. The code already addresses them separately all the way long. The user does need this (optional) design knob. > Reduce global heap pressure: flush based on heap occupancy > -- > > Key: HBASE-18294 > URL: https://issues.apache.org/jira/browse/HBASE-18294 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-18294.01.patch, HBASE-18294.01.patch, > HBASE-18294.01.patch, HBASE-18294.01.patch, HBASE-18294.02.patch, > HBASE-18294.03.patch, HBASE-18294.04.patch, HBASE-18294.05.patch, > HBASE-18294.06.patch, HBASE-18294.07.patch, HBASE-18294.07.patch, > HBASE-18294.08.patch, HBASE-18294.09.patch, HBASE-18294.10.patch, > HBASE-18294.11.patch, HBASE-18294.11.patch, HBASE-18294.12.patch, > HBASE-18294.13.patch, HBASE-18294.15.patch, HBASE-18294.16.patch, > HBASE-18294.master.01.patch > > > A region is flushed if its memory component exceed a threshold (default size > is 128MB). > A flush policy decides whether to flush a store by comparing the size of the > store to another threshold (that can be configured with > hbase.hregion.percolumnfamilyflush.size.lower.bound). > Currently the implementation (in both cases) compares the data size > (key-value only) to the threshold where it should compare the heap size > (which includes index size, and metadata). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-18294) Reduce global heap pressure: flush based on heap occupancy
[ https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305644#comment-16305644 ] Edward Bortnikov commented on HBASE-18294: -- Hallelujah! Thanks, all, for the fruitful discussion. Sent from Yahoo Mail for iPhone On Thursday, December 28, 2017, 8:24 PM, Eshcar Hillel (JIRA)wrote: [ https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305641#comment-16305641 ] Eshcar Hillel commented on HBASE-18294: --- OK let me prepare the patch. -- This message was sent by Atlassian JIRA (v6.4.14#64029) > Reduce global heap pressure: flush based on heap occupancy > -- > > Key: HBASE-18294 > URL: https://issues.apache.org/jira/browse/HBASE-18294 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-18294.01.patch, HBASE-18294.02.patch, > HBASE-18294.03.patch, HBASE-18294.04.patch, HBASE-18294.05.patch, > HBASE-18294.06.patch, HBASE-18294.07.patch, HBASE-18294.07.patch, > HBASE-18294.08.patch, HBASE-18294.09.patch, HBASE-18294.10.patch, > HBASE-18294.11.patch, HBASE-18294.11.patch, HBASE-18294.12.patch, > HBASE-18294.13.patch > > > A region is flushed if its memory component exceed a threshold (default size > is 128MB). > A flush policy decides whether to flush a store by comparing the size of the > store to another threshold (that can be configured with > hbase.hregion.percolumnfamilyflush.size.lower.bound). > Currently the implementation (in both cases) compares the data size > (key-value only) to the threshold where it should compare the heap size > (which includes index size, and metadata). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18294) Reduce global heap pressure: flush based on heap occupancy
[ https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305514#comment-16305514 ] Edward Bortnikov commented on HBASE-18294: -- Agree with [~eshcar]. This design introduces the abstraction that nicely separates between allocation accounting and flush triggering. The two should be separate - this way things become simple again. > Reduce global heap pressure: flush based on heap occupancy > -- > > Key: HBASE-18294 > URL: https://issues.apache.org/jira/browse/HBASE-18294 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-18294.01.patch, HBASE-18294.02.patch, > HBASE-18294.03.patch, HBASE-18294.04.patch, HBASE-18294.05.patch, > HBASE-18294.06.patch, HBASE-18294.07.patch, HBASE-18294.07.patch, > HBASE-18294.08.patch, HBASE-18294.09.patch, HBASE-18294.10.patch, > HBASE-18294.11.patch, HBASE-18294.11.patch, HBASE-18294.12.patch, > HBASE-18294.13.patch > > > A region is flushed if its memory component exceed a threshold (default size > is 128MB). > A flush policy decides whether to flush a store by comparing the size of the > store to another threshold (that can be configured with > hbase.hregion.percolumnfamilyflush.size.lower.bound). > Currently the implementation (in both cases) compares the data size > (key-value only) to the threshold where it should compare the heap size > (which includes index size, and metadata). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HBASE-19282) CellChunkMap Benchmarking and User Interface
[ https://issues.apache.org/jira/browse/HBASE-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Bortnikov reassigned HBASE-19282: Assignee: Anastasia Braginsky > CellChunkMap Benchmarking and User Interface > > > Key: HBASE-19282 > URL: https://issues.apache.org/jira/browse/HBASE-19282 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Anastasia Braginsky > Fix For: 2.0.0-beta-2 > > Attachments: CCM Benchmarking.pdf, HBASE-19282-V03.patch, > HBASE-19282-V05.patch, HBASE-19282-V06.patch, HBASE-19282-V06.patch, > HBASE-19282.patch > > > We have made some experiments how working with CellChunkMap (CCM) influences > the performance when running on-heap and off-heap. Based on those results it > is suggested to tie the MSLAB usage (off-heap or on-heap) with CCM index > usage. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19282) CellChunkMap Benchmarking and User Interface
[ https://issues.apache.org/jira/browse/HBASE-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16304804#comment-16304804 ] Edward Bortnikov commented on HBASE-19282: -- [~anastas], could you please issue a release note, to document that MSLAB == CCM. > CellChunkMap Benchmarking and User Interface > > > Key: HBASE-19282 > URL: https://issues.apache.org/jira/browse/HBASE-19282 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky > Fix For: 2.0.0-beta-2 > > Attachments: CCM Benchmarking.pdf, HBASE-19282-V03.patch, > HBASE-19282-V05.patch, HBASE-19282-V06.patch, HBASE-19282-V06.patch, > HBASE-19282.patch > > > We have made some experiments how working with CellChunkMap (CCM) influences > the performance when running on-heap and off-heap. Based on those results it > is suggested to tie the MSLAB usage (off-heap or on-heap) with CCM index > usage. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-19282) CellChunkMap Benchmarking and User Interface
[ https://issues.apache.org/jira/browse/HBASE-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Bortnikov updated HBASE-19282: - Fix Version/s: 2.0.0-beta-2 > CellChunkMap Benchmarking and User Interface > > > Key: HBASE-19282 > URL: https://issues.apache.org/jira/browse/HBASE-19282 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky > Fix For: 2.0.0-beta-2 > > Attachments: CCM Benchmarking.pdf, HBASE-19282-V03.patch, > HBASE-19282-V05.patch, HBASE-19282-V06.patch, HBASE-19282-V06.patch, > HBASE-19282.patch > > > We have made some experiments how working with CellChunkMap (CCM) influences > the performance when running on-heap and off-heap. Based on those results it > is suggested to tie the MSLAB usage (off-heap or on-heap) with CCM index > usage. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19282) CellChunkMap Benchmarking and User Interface
[ https://issues.apache.org/jira/browse/HBASE-19282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16304801#comment-16304801 ] Edward Bortnikov commented on HBASE-19282: -- [~anastas] - could you please split benchmarking into a separate jira. > CellChunkMap Benchmarking and User Interface > > > Key: HBASE-19282 > URL: https://issues.apache.org/jira/browse/HBASE-19282 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky > Attachments: CCM Benchmarking.pdf, HBASE-19282-V03.patch, > HBASE-19282-V05.patch, HBASE-19282-V06.patch, HBASE-19282-V06.patch, > HBASE-19282.patch > > > We have made some experiments how working with CellChunkMap (CCM) influences > the performance when running on-heap and off-heap. Based on those results it > is suggested to tie the MSLAB usage (off-heap or on-heap) with CCM index > usage. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-19133) Transfer big cells or upserted/appended cells into MSLAB upon flattening to CellChunkMap
[ https://issues.apache.org/jira/browse/HBASE-19133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Bortnikov updated HBASE-19133: - Fix Version/s: 2.0.0-beta-2 > Transfer big cells or upserted/appended cells into MSLAB upon flattening to > CellChunkMap > > > Key: HBASE-19133 > URL: https://issues.apache.org/jira/browse/HBASE-19133 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Gali Sheffi > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19133-V01.patch, HBASE-19133-V02.patch, > HBASE-19133-V03.patch, HBASE-19133.01.patch, HBASE-19133.02.patch, > HBASE-19133.03.patch, HBASE-19133.04.patch, HBASE-19133.05.patch, > HBASE-19133.06.patch, HBASE-19133.07.patch, HBASE-19133.08.patch, > HBASE-19133.09.patch, HBASE-19133.10.patch, HBASE-19133.11.patch > > > CellChunkMap Segment index requires all cell data to be written in the MSLAB > Chunks. Eventhough MSLAB is enabled, cells bigger than chunk size or > upserted/incremented/appended cells are still allocated on the JVM stack. If > such cells are found in the process of flattening into CellChunkMap > (in-memory-flush) they need to be copied into MSLAB. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18294) Reduce global heap pressure: flush based on heap occupancy
[ https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16304800#comment-16304800 ] Edward Bortnikov commented on HBASE-18294: -- My 2 cents - after reviewing the code to the best of my ability. IMO the confusion stems from the unfortunate name - heapSize - the code historically uses for the overall allocated memory. Such that it is not clear whether we mean the Java heap or the OS heap. Might be good to replace globally - maybe in a different jira. Regarding the per-store flush trigger ... Still not sure what is the reasoning behind non-uniform handling of on-(Java) heap and off-(Java) heap allocations. Could someone please re-iterate why just monitor the overall allocated memory (data + overhead), no matter where, and flush when the threshold is crossed? Obviously there are all kinds of concerns, but the only experiment on the table is the one by [~eshcar], which demonstrates that regions flush too early, at least with on-heap data, due to conservative accounting. We intend to benchmark the off-heap write path thoroughly, in particular to evaluate the benefits of the CCM index (HBase-16421). If something unexpected comes up there, we'll all re-convene and re-discuss. Until then, may I suggest to keep the things simple and run the accounting context-free, along the lines with [~eshcar]'s patch. > Reduce global heap pressure: flush based on heap occupancy > -- > > Key: HBASE-18294 > URL: https://issues.apache.org/jira/browse/HBASE-18294 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-18294.01.patch, HBASE-18294.02.patch, > HBASE-18294.03.patch, HBASE-18294.04.patch, HBASE-18294.05.patch, > HBASE-18294.06.patch, HBASE-18294.07.patch, HBASE-18294.07.patch, > HBASE-18294.08.patch, HBASE-18294.09.patch, HBASE-18294.10.patch, > HBASE-18294.11.patch, HBASE-18294.11.patch, HBASE-18294.12.patch, > HBASE-18294.13.patch > > > A region is flushed if its memory component exceed a threshold (default size > is 128MB). > A flush policy decides whether to flush a store by comparing the size of the > store to another threshold (that can be configured with > hbase.hregion.percolumnfamilyflush.size.lower.bound). > Currently the implementation (in both cases) compares the data size > (key-value only) to the threshold where it should compare the heap size > (which includes index size, and metadata). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18294) Reduce global heap pressure: flush based on heap occupancy
[ https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279774#comment-16279774 ] Edward Bortnikov commented on HBASE-18294: -- Chiming in ... This question seems to be irrelevant to whether MSLAB use is a per-table or global flag. Agreed that we should avoid adding new configurations whenever possible. Let's try to remain factual in the decisions we make. The goal is to get the best possible performance from a machine with given RAM resources, on-heap or not. [~eshcar], could you please publish some numbers that validate the solution's value? [~anoop.hbase], mind sharing any data that proves the opposite? Thanks! > Reduce global heap pressure: flush based on heap occupancy > -- > > Key: HBASE-18294 > URL: https://issues.apache.org/jira/browse/HBASE-18294 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: HBASE-18294.01.patch, HBASE-18294.02.patch, > HBASE-18294.03.patch, HBASE-18294.04.patch, HBASE-18294.05.patch, > HBASE-18294.06.patch > > > A region is flushed if its memory component exceed a threshold (default size > is 128MB). > A flush policy decides whether to flush a store by comparing the size of the > store to another threshold (that can be configured with > hbase.hregion.percolumnfamilyflush.size.lower.bound). > Currently the implementation (in both cases) compares the data size > (key-value only) to the threshold where it should compare the heap size > (which includes index size, and metadata). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations
[ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036219#comment-16036219 ] Edward Bortnikov commented on HBASE-17339: -- Thanks [~eshcar]. Maybe it makes sense to describe the experiment we used to figure out the current implementation, to provide the community with the full picture (smile). We looked at a workload with temporal (rather than spatial) locality, namely writes closely followed by reads. This pattern is quite frequent in pub-sub scenarios. Instead of seeing a performance benefit in reading from MemStore first, we saw nearly 100% cache hit rate, and could not explain it for a while. The lazy evaluation procedure described by [~eshcar] sheds the light. Obviously, explicitly prioritizing reading from MemStore first rather than simply deferring the data fetch from disk could help avoid some access to Bloom filters, just to figure out whether the key has earlier versions on disk. Those accesses could be avoided. The main practical impact is when the BF itself is not in memory, and accessing it triggers I/O. Is that a realistic scenario? We assume that normally, BF's are permanently cached for all HFile's managed by the RS. Dear community - please speak up. Thanks. > Scan-Memory-First Optimization for Get Operations > - > > Key: HBASE-17339 > URL: https://issues.apache.org/jira/browse/HBASE-17339 > Project: HBase > Issue Type: Improvement >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, > HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch, > HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg > > > The current implementation of a get operation (to retrieve values for a > specific key) scans through all relevant stores of the region; for each store > both memory components (memstores segments) and disk components (hfiles) are > scanned in parallel. > We suggest to apply an optimization that speculatively scans memory-only > components first and only if the result is incomplete scans both memory and > disk. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HBASE-18056) Change CompactingMemStore in BASIC mode to merge multiple segments in pipeline
[ https://issues.apache.org/jira/browse/HBASE-18056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Bortnikov reassigned HBASE-18056: Assignee: Anastasia Braginsky > Change CompactingMemStore in BASIC mode to merge multiple segments in pipeline > -- > > Key: HBASE-18056 > URL: https://issues.apache.org/jira/browse/HBASE-18056 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Anastasia Braginsky > Attachments: HBASE-18056-V01.patch > > > Under HBASE-16417 it was decided that CompactingMemStore in BASIC mode should > merge multiple ImmutableSegments in CompactionPipeline. Basic+Merge actually > demonstrated reduction in GC, alongside improvement in other metrics. > However, the limit on the number of segments in pipeline is still set to 30. > Under this JIRA it should be changed to 1, as it was tested under HBASE-16417. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-18056) Change CompactingMemStore in BASIC mode to merge multiple segments in pipeline
[ https://issues.apache.org/jira/browse/HBASE-18056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16018956#comment-16018956 ] Edward Bortnikov commented on HBASE-18056: -- Friends, I'm having a hard time understanding why this commit became a big deal :) We made a mistake. This parameter change should have been committed together with BASIC compaction becoming the default configuration. BASIC does not make sense without it. We presented a vey extensive perf evaluation exactly with this parameter value. It demonstrated improvement in all the operational metrics, GC included. The parameter should be not accessible to users; it is not documented in the reference manual; its sole purpose is developer flexibility. It is perfectly okay to re-open the discussion (and also revert the setting) once there is solid proof that something is broken. But we didn't see any such proof yet. Delaying without reason jeopardizes the feature, especially in anticipation of release. Just saying it again - we made a technical mistake, and we are fixing it now. There is no new data. What is it that I get wrong? Thanks. > Change CompactingMemStore in BASIC mode to merge multiple segments in pipeline > -- > > Key: HBASE-18056 > URL: https://issues.apache.org/jira/browse/HBASE-18056 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky > Attachments: HBASE-18056-V01.patch > > > Under HBASE-16417 it was decided that CompactingMemStore in BASIC mode should > merge multiple ImmutableSegments in CompactionPipeline. Basic+Merge actually > demonstrated reduction in GC, alongside improvement in other metrics. > However, the limit on the number of segments in pipeline is still set to 30. > Under this JIRA it should be changed to 1, as it was tested under HBASE-16417. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature
[ https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Bortnikov resolved HBASE-16851. -- Resolution: Fixed Fix Version/s: 2.0.0 Release Note: Two blog posts on Apache HBase blog: user manual and programmer manual. Ref. guide draft published: https://docs.google.com/document/d/1Xi1jh_30NKnjE3wSR-XF5JQixtyT6H_CdFTaVi78LKw/edit Tags: documentation > User-facing documentation for the In-Memory Compaction feature > -- > > Key: HBASE-16851 > URL: https://issues.apache.org/jira/browse/HBASE-16851 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Edward Bortnikov >Assignee: Edward Bortnikov > Fix For: 2.0.0 > > Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, > Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory > Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17343) Make Compacting Memstore default in 2.0 with BASIC as the default type
[ https://issues.apache.org/jira/browse/HBASE-17343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16010622#comment-16010622 ] Edward Bortnikov commented on HBASE-17343: -- Ref. guide (published on HBASE-16851): https://docs.google.com/document/d/1Xi1jh_30NKnjE3wSR-XF5JQixtyT6H_CdFTaVi78LKw/edit. > Make Compacting Memstore default in 2.0 with BASIC as the default type > -- > > Key: HBASE-17343 > URL: https://issues.apache.org/jira/browse/HBASE-17343 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: Anastasia Braginsky >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-17343-V01.patch, HBASE-17343-V02.patch, > HBASE-17343-V04.patch, HBASE-17343-V05.patch, HBASE-17343-V06.patch, > HBASE-17343-V07.patch, HBASE-17343-V08.patch, HBASE-17343-V09.patch, > ut.v1.patch > > > FYI [~anastas], [~eshcar] and [~ebortnik]. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17343) Make Compacting Memstore default in 2.0 with BASIC as the default type
[ https://issues.apache.org/jira/browse/HBASE-17343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16002698#comment-16002698 ] Edward Bortnikov commented on HBASE-17343: -- Obviously, the failures are related to instability of the master branch rather than to CompactingMemstore. Strong +1 for commit :) > Make Compacting Memstore default in 2.0 with BASIC as the default type > -- > > Key: HBASE-17343 > URL: https://issues.apache.org/jira/browse/HBASE-17343 > Project: HBase > Issue Type: New Feature > Components: regionserver >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: Anastasia Braginsky >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-17343-V01.patch, HBASE-17343-V02.patch, > HBASE-17343-V04.patch, HBASE-17343-V05.patch, HBASE-17343-V06.patch, > HBASE-17343-V07.patch, HBASE-17343-V08.patch > > > FYI [~anastas], [~eshcar] and [~ebortnik]. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17343) Make Compacting Memstore default in 2.0 with BASIC as the default type
[ https://issues.apache.org/jira/browse/HBASE-17343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984432#comment-15984432 ] Edward Bortnikov commented on HBASE-17343: -- [~anoop.hbase], thanks for the update, great to receive more evidence from a different tool that the method is working. Let's flip the default ASAP. Thanks again. > Make Compacting Memstore default in 2.0 with BASIC as the default type > -- > > Key: HBASE-17343 > URL: https://issues.apache.org/jira/browse/HBASE-17343 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-17343-V01.patch, HBASE-17343-V02.patch, > HBASE-17343-V04.patch, HBASE-17343-V05.patch > > > FYI [~anastas], [~eshcar] and [~ebortnik]. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17343) Make Compacting Memstore default in 2.0 with BASIC as the default type
[ https://issues.apache.org/jira/browse/HBASE-17343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982979#comment-15982979 ] Edward Bortnikov commented on HBASE-17343: -- +1 on my side. > Make Compacting Memstore default in 2.0 with BASIC as the default type > -- > > Key: HBASE-17343 > URL: https://issues.apache.org/jira/browse/HBASE-17343 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-17343-V01.patch, HBASE-17343-V02.patch, > HBASE-17343-V04.patch > > > FYI [~anastas], [~eshcar] and [~ebortnik]. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature
[ https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976828#comment-15976828 ] Edward Bortnikov commented on HBASE-16851: -- Reference guide draft available in https://docs.google.com/document/d/1Xi1jh_30NKnjE3wSR-XF5JQixtyT6H_CdFTaVi78LKw/edit. Please review. Thanks. > User-facing documentation for the In-Memory Compaction feature > -- > > Key: HBASE-16851 > URL: https://issues.apache.org/jira/browse/HBASE-16851 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Edward Bortnikov >Assignee: Edward Bortnikov > Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, > Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory > Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature
[ https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Bortnikov updated HBASE-16851: - blockquote, div.yahoo_quoted { margin-left: 0 !important; border-left:1px #715FFA solid !important; padding-left:1ex !important; background-color:white !important; } Absolutely. Will work on it early next week. Let's not close so far. Thanks. Sent from Yahoo Mail for iPhone On Monday, April 10, 2017, 8:02 AM, stack (JIRA)wrote: [ https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962428#comment-15962428 ] stack commented on HBASE-16851: --- [~ebortnik] goodstuff. Posted. We might as well use this issue to figure what to put in the refguide? Want to cut a piece from blogs or just do pointers from refguide to blog? Thanks [~ebortnik] -- This message was sent by Atlassian JIRA (v6.3.15#6346) > User-facing documentation for the In-Memory Compaction feature > -- > > Key: HBASE-16851 > URL: https://issues.apache.org/jira/browse/HBASE-16851 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Edward Bortnikov >Assignee: Edward Bortnikov > Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, > Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory > Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature
[ https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962391#comment-15962391 ] Edward Bortnikov commented on HBASE-16851: -- Thanks a lot, [~stack]! > User-facing documentation for the In-Memory Compaction feature > -- > > Key: HBASE-16851 > URL: https://issues.apache.org/jira/browse/HBASE-16851 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Edward Bortnikov >Assignee: Edward Bortnikov > Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, > Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory > Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature
[ https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962099#comment-15962099 ] Edward Bortnikov commented on HBASE-16851: -- Thanks much [~stack] for all the great feedback inline. Applied most of your changes - better quality now. Guess we're good for publishing. High level/User manual: https://docs.google.com/document/d/1K_8plLz0K3pmV20dsgSWwRPn1qUNMRbLmi8aJkhB7z0 Dev manual: https://docs.google.com/document/d/1z1R-MdAxRvTC2NazxUmN3FOCFIknkxL2TFqVUhYBVbE Thanks. > User-facing documentation for the In-Memory Compaction feature > -- > > Key: HBASE-16851 > URL: https://issues.apache.org/jira/browse/HBASE-16851 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Edward Bortnikov >Assignee: Edward Bortnikov > Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, > Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory > Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16438) Create a cell type so that chunk id is embedded in it
[ https://issues.apache.org/jira/browse/HBASE-16438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959577#comment-15959577 ] Edward Bortnikov commented on HBASE-16438: -- [~anastas] - is this a +1? > Create a cell type so that chunk id is embedded in it > - > > Key: HBASE-16438 > URL: https://issues.apache.org/jira/browse/HBASE-16438 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Attachments: > HBASE-16438_10_ChunkCreatorwrappingChunkPool_withchunkRef.patch, > HBASE-16438_11_ChunkCreatorwrappingChunkPool_withchunkRef.patch, > HBASE-16438_1.patch, HBASE-16438_3_ChunkCreatorwrappingChunkPool.patch, > HBASE-16438_4_ChunkCreatorwrappingChunkPool.patch, > HBASE-16438_8_ChunkCreatorwrappingChunkPool_withchunkRef.patch, > HBASE-16438_9_ChunkCreatorwrappingChunkPool_withchunkRef.patch, > HBASE-16438.patch, MemstoreChunkCell_memstoreChunkCreator_oldversion.patch, > MemstoreChunkCell_trunk.patch > > > For CellChunkMap we may need a cell such that the chunk out of which it was > created, the id of the chunk be embedded in it so that when doing flattening > we can use the chunk id as a meta data. More details will follow once the > initial tasks are completed. > Why we need to embed the chunkid in the Cell is described by [~anastas] in > this remark over in parent issue > https://issues.apache.org/jira/browse/HBASE-14921?focusedCommentId=15244119=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15244119 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17343) Make Compacting Memstore default in 2.0 with BASIC as the default type
[ https://issues.apache.org/jira/browse/HBASE-17343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949277#comment-15949277 ] Edward Bortnikov commented on HBASE-17343: -- Are we good to commit? :) > Make Compacting Memstore default in 2.0 with BASIC as the default type > -- > > Key: HBASE-17343 > URL: https://issues.apache.org/jira/browse/HBASE-17343 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-17343-V01.patch, HBASE-17343-V02.patch > > > FYI [~anastas], [~eshcar] and [~ebortnik]. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature
[ https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949098#comment-15949098 ] Edward Bortnikov commented on HBASE-16851: -- Updated the developer documentation in https://docs.google.com/document/d/1z1R-MdAxRvTC2NazxUmN3FOCFIknkxL2TFqVUhYBVbE. The modified parts are highlighted in yellow. Feel free to comment. > User-facing documentation for the In-Memory Compaction feature > -- > > Key: HBASE-16851 > URL: https://issues.apache.org/jira/browse/HBASE-16851 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Edward Bortnikov >Assignee: Edward Bortnikov > Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, > Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory > Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations
[ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943134#comment-15943134 ] Edward Bortnikov commented on HBASE-17339: -- Can't see how TinyLFU can do a better job with stationary distributions (in which item popularity does not change over time). I'd imagine it being good under bursty workloads. > Scan-Memory-First Optimization for Get Operations > - > > Key: HBASE-17339 > URL: https://issues.apache.org/jira/browse/HBASE-17339 > Project: HBase > Issue Type: Improvement >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, > HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch, > HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg > > > The current implementation of a get operation (to retrieve values for a > specific key) scans through all relevant stores of the region; for each store > both memory components (memstores segments) and disk components (hfiles) are > scanned in parallel. > We suggest to apply an optimization that speculatively scans memory-only > components first and only if the result is incomplete scans both memory and > disk. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature
[ https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942272#comment-15942272 ] Edward Bortnikov commented on HBASE-16851: -- Updated the user documentation, following the change in the definition of BASIC (see HBASE-16417). Added a short summary of performance results (or, why the user should care). New shared doc: https://docs.google.com/document/d/1K_8plLz0K3pmV20dsgSWwRPn1qUNMRbLmi8aJkhB7z0. Please comment. We are interested to publish on Apache blog as soon as the default change is committed. > User-facing documentation for the In-Memory Compaction feature > -- > > Key: HBASE-16851 > URL: https://issues.apache.org/jira/browse/HBASE-16851 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Edward Bortnikov >Assignee: Edward Bortnikov > Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, > Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory > Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17765) Reviving the merge possibility in the CompactingMemStore
[ https://issues.apache.org/jira/browse/HBASE-17765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935203#comment-15935203 ] Edward Bortnikov commented on HBASE-17765: -- Are we good to commit this patch? > Reviving the merge possibility in the CompactingMemStore > > > Key: HBASE-17765 > URL: https://issues.apache.org/jira/browse/HBASE-17765 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Anastasia Braginsky > Fix For: 2.0.0 > > Attachments: HBASE-17765-V01.patch, HBASE-17765-V02.patch > > > According to the new performance results presented in the HBASE-16417 we see > that the read latency of the 90th percentile of the BASIC policy is too big > due to the need to traverse through too many segments in the pipeline. In > this JIRA we correct the bug in the merge sizing calculations and allow > pipeline size threshold to be a configurable parameter. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions
[ https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933509#comment-15933509 ] Edward Bortnikov commented on HBASE-16417: -- Okay agreed - BASIC will include merge. We'll update the docs, too. Regarding parallelism - a promising direction, but we should be careful here. More threads might come at someone else's expense (write throughput maybe), so need more scrutiny. If all we do is run a bunch of binary searches in parallel - might not be worth the synchronization. Worth checking. > In-Memory MemStore Policy for Flattening and Compactions > > > Key: HBASE-16417 > URL: https://issues.apache.org/jira/browse/HBASE-16417 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-16417-benchmarkresults-20161101.pdf, > HBASE-16417-benchmarkresults-20161110.pdf, > HBASE-16417-benchmarkresults-20161123.pdf, > HBASE-16417-benchmarkresults-20161205.pdf, > HBASE-16417-benchmarkresults-20170309.pdf, > HBASE-16417-benchmarkresults-20170317.pdf > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions
[ https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931616#comment-15931616 ] Edward Bortnikov commented on HBASE-16417: -- [~eshcar], thanks for the thorough report, great stuff. Question to all - do these results suggest that we change the default to BASIC+MERGE? Seems that this method does not have any material overhead, even under the uniform workload. If the answer is "yes", we could take one of two ways: (1) say that BASIC+MERGE is a new BASIC (my favorite :)), or (2) introduce a new compaction level (MODERATE?). Let's converge fast - then we can update the documentation and finalize the code. This work notwithstanding, it is still appealing to come up with an automatic policy to tune handsfree (which was the original intent behind this JIRA). With the 2.0 release on our heels, we might not be able to make it until then. But let's have all the building blocks in place, at least (smile). > In-Memory MemStore Policy for Flattening and Compactions > > > Key: HBASE-16417 > URL: https://issues.apache.org/jira/browse/HBASE-16417 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-16417-benchmarkresults-20161101.pdf, > HBASE-16417-benchmarkresults-20161110.pdf, > HBASE-16417-benchmarkresults-20161123.pdf, > HBASE-16417-benchmarkresults-20161205.pdf, > HBASE-16417-benchmarkresults-20170309.pdf, > HBASE-16417-benchmarkresults-20170317.pdf > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17765) Reviving the merge possibility in the CompactingMemStore
[ https://issues.apache.org/jira/browse/HBASE-17765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15907030#comment-15907030 ] Edward Bortnikov commented on HBASE-17765: -- Merge means that only the index data is restructured. We create a larger segment with one index - but no data is copied. Also, we avoid using the SQM scan (more expensive), so duplicate data versions are not eliminated. Bottom line - (1) the overhead and the space savings are both between BASIC and EAGER, and (2) the tail read latency problem is solved. We'll be publishing the perf results shortly. Following that, let's collectively decide whether MERGE should be a level between BASIC and EAGER, or maybe just become the new BASIC, for simplicity. Thanks. > Reviving the merge possibility in the CompactingMemStore > > > Key: HBASE-17765 > URL: https://issues.apache.org/jira/browse/HBASE-17765 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Anastasia Braginsky > Fix For: 2.0.0 > > Attachments: HBASE-17765-V01.patch > > > According to the new performance results presented in the HBASE-16417 we see > that the read latency of the 90th percentile of the BASIC policy is too big > due to the need to traverse through too many segments in the pipeline. In > this JIRA we correct the bug in the merge sizing calculations and allow > pipeline size threshold to be a configurable parameter. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions
[ https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906123#comment-15906123 ] Edward Bortnikov commented on HBASE-16417: -- .. So [~eshcar] answered nearly all of it here .. A couple of small remarks. The expected number of 2 segments in the pipeline follows from the fact that disk flush normally happens when there are 4. Assuming we are growing from 0, the expectation is 2. The varying WAL size with Async WAL introduces much noise indeed. However, please note that the overall volume of WAL writes differs between Sync and Async without one line of Accordion involved, why does this happen with the same workload? (Note that with Sync, the WAL volume is the same no matter what type of in-memory compaction is used). Looking forward to some help here :) > In-Memory MemStore Policy for Flattening and Compactions > > > Key: HBASE-16417 > URL: https://issues.apache.org/jira/browse/HBASE-16417 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-16417-benchmarkresults-20161101.pdf, > HBASE-16417-benchmarkresults-20161110.pdf, > HBASE-16417-benchmarkresults-20161123.pdf, > HBASE-16417-benchmarkresults-20161205.pdf, > HBASE-16417-benchmarkresults-20170309.pdf > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions
[ https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15903831#comment-15903831 ] Edward Bortnikov commented on HBASE-16417: -- bq. On the 90th percentile degradation when BASIC, how many segments we talking 2 or 3 or more than this? Taking the liberty of answering for [~eshcar]. The current default active segment size cap for in-memory flush is 1/4 the memstore size cap for disk flush. Which means that the expected number of segments in the pipeline is 4/2=2. However, since disk flush is non-immediate, new segments can sometime pile up, especially under a very high write rate as exercised in our test. We don't have easily trackable metrics installed (maybe should have) but probably we're speaking about many more segments here. The number can't exceed 30 - at that point, a forceful merge happens. We guess that looking up the key in every single segment (to initialize the scan) is what leads to the high tail latency. We're taking a closer look at merge (index compaction only, no data copy), hopefully we'll show there's no material damage about it .. even EAGER does not look too bad .. A matter of a few more days of experimentation. Thanks. > In-Memory MemStore Policy for Flattening and Compactions > > > Key: HBASE-16417 > URL: https://issues.apache.org/jira/browse/HBASE-16417 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-16417-benchmarkresults-20161101.pdf, > HBASE-16417-benchmarkresults-20161110.pdf, > HBASE-16417-benchmarkresults-20161123.pdf, > HBASE-16417-benchmarkresults-20161205.pdf, > HBASE-16417-benchmarkresults-20170309.pdf > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore
[ https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901984#comment-15901984 ] Edward Bortnikov commented on HBASE-16421: -- That's great - let's follow that path (via HBASE-16438). We are round the corner to assist :) > Introducing the CellChunkMap as a new additional index variant in the MemStore > -- > > Key: HBASE-16421 > URL: https://issues.apache.org/jira/browse/HBASE-16421 > Project: HBase > Issue Type: Umbrella >Reporter: Anastasia Braginsky > Attachments: CellChunkMapRevived.pdf, ChunkCell_creation.png, > IntroductiontoNewFlatandCompactMemStore.pdf > > > Follow up for HBASE-14921. This is going to be the umbrella JIRA to include > all the parts of integration of the CellChunkMap to the MemStore. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore
[ https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901163#comment-15901163 ] Edward Bortnikov edited comment on HBASE-16421 at 3/8/17 12:26 PM: --- Friends, What are you saying about [~anastas]'s suggestion in HBASE-16438? Looks like the patch is getting in better shape. How about you guys further improving and committing it, so that [~anastas] can pick up on solid ground? We'll keep helping with the reviews. Our assessment is that this patch covers 1/3 to 1/2 of the original work plan. Thanks. was (Author: ebortnik): Friends, What are you saying about [~anastas]'s suggestion in HBASE-16438? Looks like the patch is getting in better shape. How about further improving and committing it, so that [~anastas] can pick up on solid ground? Our assessment is that this patch covers 1/3 to 1/2 of the original work plan. Thanks. > Introducing the CellChunkMap as a new additional index variant in the MemStore > -- > > Key: HBASE-16421 > URL: https://issues.apache.org/jira/browse/HBASE-16421 > Project: HBase > Issue Type: Umbrella >Reporter: Anastasia Braginsky > Attachments: CellChunkMapRevived.pdf, ChunkCell_creation.png, > IntroductiontoNewFlatandCompactMemStore.pdf > > > Follow up for HBASE-14921. This is going to be the umbrella JIRA to include > all the parts of integration of the CellChunkMap to the MemStore. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore
[ https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901163#comment-15901163 ] Edward Bortnikov commented on HBASE-16421: -- Friends, What are you saying about [~anastas]'s suggestion in HBASE-16438? Looks like the patch is getting in better shape. How about further improving and committing it, so that [~anastas] can pick up on solid ground? Our assessment is that this patch covers 1/3 to 1/2 of the original work plan. Thanks. > Introducing the CellChunkMap as a new additional index variant in the MemStore > -- > > Key: HBASE-16421 > URL: https://issues.apache.org/jira/browse/HBASE-16421 > Project: HBase > Issue Type: Umbrella >Reporter: Anastasia Braginsky > Attachments: CellChunkMapRevived.pdf, ChunkCell_creation.png, > IntroductiontoNewFlatandCompactMemStore.pdf > > > Follow up for HBASE-14921. This is going to be the umbrella JIRA to include > all the parts of integration of the CellChunkMap to the MemStore. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore
[ https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15899084#comment-15899084 ] Edward Bortnikov commented on HBASE-16421: -- Okay we are making progress ... What I read from [~ram_krish]'s proposal is that you guys are actually comfortable with being the driving force behind the CellChunkMap project (in fact, after implementing a critical mass of code), all the way to production code. We are comfortable with this approach too. In that case, we can just switch to the assistant/reviewer role - and provide all the help we can in that capacity. That question is actually not related to whether this feature is part of 2.0 or not. Let's reach consensus ... Appreciate your fast response. > Introducing the CellChunkMap as a new additional index variant in the MemStore > -- > > Key: HBASE-16421 > URL: https://issues.apache.org/jira/browse/HBASE-16421 > Project: HBase > Issue Type: Umbrella >Reporter: Anastasia Braginsky > Attachments: CellChunkMapRevived.pdf, ChunkCell_creation.png, > IntroductiontoNewFlatandCompactMemStore.pdf > > > Follow up for HBASE-14921. This is going to be the umbrella JIRA to include > all the parts of integration of the CellChunkMap to the MemStore. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore
[ https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897806#comment-15897806 ] Edward Bortnikov commented on HBASE-16421: -- Friends, We are currently still vague on the issue of "who-does-what". The subtask JIRA's seem to indicate that [~ram_krish] and [~anoop.hbase] made substantial progress lately based on [~anastas]'s experimental code - including some steps in the work plan published last December. What is the status of these patches? Are they candidates for commit? Risking preaching to the choir but trying to be efficient. Let's align our efforts. Looking forward to your comments. Thanks. > Introducing the CellChunkMap as a new additional index variant in the MemStore > -- > > Key: HBASE-16421 > URL: https://issues.apache.org/jira/browse/HBASE-16421 > Project: HBase > Issue Type: Umbrella >Reporter: Anastasia Braginsky > Attachments: CellChunkMapRevived.pdf, ChunkCell_creation.png, > IntroductiontoNewFlatandCompactMemStore.pdf > > > Follow up for HBASE-14921. This is going to be the umbrella JIRA to include > all the parts of integration of the CellChunkMap to the MemStore. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore
[ https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892370#comment-15892370 ] Edward Bortnikov commented on HBASE-16421: -- Also, we have to define the KPI's for this feature. What first-class-citizen metrics should it manifest in? Write/read throughput/latency? Please speak up :) > Introducing the CellChunkMap as a new additional index variant in the MemStore > -- > > Key: HBASE-16421 > URL: https://issues.apache.org/jira/browse/HBASE-16421 > Project: HBase > Issue Type: Umbrella >Reporter: Anastasia Braginsky > Attachments: CellChunkMapRevived.pdf, ChunkCell_creation.png, > IntroductiontoNewFlatandCompactMemStore.pdf > > > Follow up for HBASE-14921. This is going to be the umbrella JIRA to include > all the parts of integration of the CellChunkMap to the MemStore. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature
[ https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892304#comment-15892304 ] Edward Bortnikov commented on HBASE-16851: -- Programmer manual version 1.0 complete: https://docs.google.com/document/d/1z1R-MdAxRvTC2NazxUmN3FOCFIknkxL2TFqVUhYBVbE. Many thanks to [~anastas] for the UML diagrams. Please take a look. The document summarizing the performance benchmark results is WIP - we'll publish in a week or so. > User-facing documentation for the In-Memory Compaction feature > -- > > Key: HBASE-16851 > URL: https://issues.apache.org/jira/browse/HBASE-16851 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Edward Bortnikov >Assignee: Edward Bortnikov > Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, > Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory > Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore
[ https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889597#comment-15889597 ] Edward Bortnikov commented on HBASE-16421: -- For in-memory compaction per se it is not but for write-path off-heaping might be. Up to [~ram_krish] and [~anoop.hbase] to define. > Introducing the CellChunkMap as a new additional index variant in the MemStore > -- > > Key: HBASE-16421 > URL: https://issues.apache.org/jira/browse/HBASE-16421 > Project: HBase > Issue Type: Umbrella >Reporter: Anastasia Braginsky > Attachments: CellChunkMapRevived.pdf, ChunkCell_creation.png, > IntroductiontoNewFlatandCompactMemStore.pdf > > > Follow up for HBASE-14921. This is going to be the umbrella JIRA to include > all the parts of integration of the CellChunkMap to the MemStore. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore
[ https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888120#comment-15888120 ] Edward Bortnikov commented on HBASE-16421: -- More precisely, the question is about the deadline. With the 2.0 release in Apr/May, it's going to be tight .. What would be the process if we don't make it until then? Would there be follow-up releases? [~saint@gmail.com], could you please chime in? > Introducing the CellChunkMap as a new additional index variant in the MemStore > -- > > Key: HBASE-16421 > URL: https://issues.apache.org/jira/browse/HBASE-16421 > Project: HBase > Issue Type: Umbrella >Reporter: Anastasia Braginsky > Attachments: CellChunkMapRevived.pdf, ChunkCell_creation.png, > IntroductiontoNewFlatandCompactMemStore.pdf > > > Follow up for HBASE-14921. This is going to be the umbrella JIRA to include > all the parts of integration of the CellChunkMap to the MemStore. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore
[ https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886426#comment-15886426 ] Edward Bortnikov commented on HBASE-16421: -- Hi all, reviving this thread, it's been silent for a while ... As we are completing the In-Memory Compaction stuff for the 2.0 release, we'd like to re-iterate the mutual commitment to this project. We'll probably need help with parts of implementation, to be on time before the release cutoff. [~ram_krish], [~anoop.hbase] - are you on board? Thanks. > Introducing the CellChunkMap as a new additional index variant in the MemStore > -- > > Key: HBASE-16421 > URL: https://issues.apache.org/jira/browse/HBASE-16421 > Project: HBase > Issue Type: Umbrella >Reporter: Anastasia Braginsky > Attachments: CellChunkMapRevived.pdf, ChunkCell_creation.png, > IntroductiontoNewFlatandCompactMemStore.pdf > > > Follow up for HBASE-14921. This is going to be the umbrella JIRA to include > all the parts of integration of the CellChunkMap to the MemStore. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17662) Disable in-memory flush when replaying from WAL
[ https://issues.apache.org/jira/browse/HBASE-17662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886409#comment-15886409 ] Edward Bortnikov commented on HBASE-17662: -- [~anastas], [~anoop.hbase] - do we have a resolution here? > Disable in-memory flush when replaying from WAL > --- > > Key: HBASE-17662 > URL: https://issues.apache.org/jira/browse/HBASE-17662 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Anastasia Braginsky > Attachments: HBASE-17662-V02.patch, HBASE-17662-V03.patch, > HBASE-17662-V04.patch, HBASE-17662-V05.patch, HBASE-17662-V06.patch > > > When replaying the edits from WAL, the region's updateLock is not taken, > because a single threaded action is assumed. However, the thread-safeness of > the in-memory flush of CompactingMemStore is based on taking the region's > updateLock. > The in-memory flush can be skipped in the replay time (anyway everything is > flushed to disk just after the replay). Therefore it is acceptable to just > skip the in-memory flush action while the updates come as part of replay from > WAL. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17662) Disable in-memory flush when replaying from WAL
[ https://issues.apache.org/jira/browse/HBASE-17662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884893#comment-15884893 ] Edward Bortnikov commented on HBASE-17662: -- Folks, Apologies for pushing again, but please help us turning some fire under this Jira and the others remaining in this project ... This one is the last exposed bug that prevents us from turning BASIC compaction into default. Seems like this is a small patch, can we commit it? Thanks. > Disable in-memory flush when replaying from WAL > --- > > Key: HBASE-17662 > URL: https://issues.apache.org/jira/browse/HBASE-17662 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Anastasia Braginsky > Attachments: HBASE-17662-V02.patch, HBASE-17662-V03.patch, > HBASE-17662-V04.patch, HBASE-17662-V05.patch, HBASE-17662-V06.patch > > > When replaying the edits from WAL, the region's updateLock is not taken, > because a single threaded action is assumed. However, the thread-safeness of > the in-memory flush of CompactingMemStore is based on taking the region's > updateLock. > The in-memory flush can be skipped in the replay time (anyway everything is > flushed to disk just after the replay). Therefore it is acceptable to just > skip the in-memory flush action while the updates come as part of replay from > WAL. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16850) Run large scale correctness tests for HBASE-14918 (in-memory flushes/compactions)
[ https://issues.apache.org/jira/browse/HBASE-16850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870672#comment-15870672 ] Edward Bortnikov commented on HBASE-16850: -- Large-scale benchmark results are reported in HBASE-16417. Makes sense to redirect/retire this Jira? Thanks. > Run large scale correctness tests for HBASE-14918 (in-memory > flushes/compactions) > - > > Key: HBASE-16850 > URL: https://issues.apache.org/jira/browse/HBASE-16850 > Project: HBase > Issue Type: Sub-task >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Blocker > > As discussed here - > https://issues.apache.org/jira/browse/HBASE-16608?focusedCommentId=15577213=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15577213 > [~stack] [~anastas] [~ram_krish] [~anoop.hbase] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature
[ https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868436#comment-15868436 ] Edward Bortnikov commented on HBASE-16851: -- Programmer manual, version 0.9. https://docs.google.com/document/d/1z1R-MdAxRvTC2NazxUmN3FOCFIknkxL2TFqVUhYBVbE WIP with UML diagrams. > User-facing documentation for the In-Memory Compaction feature > -- > > Key: HBASE-16851 > URL: https://issues.apache.org/jira/browse/HBASE-16851 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Edward Bortnikov >Assignee: Edward Bortnikov > Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, > Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory > Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17407) Correct update of maxFlushedSeqId in HRegion
[ https://issues.apache.org/jira/browse/HBASE-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830575#comment-15830575 ] Edward Bortnikov commented on HBASE-17407: -- Mind rebasing and resubmitting the patch? > Correct update of maxFlushedSeqId in HRegion > > > Key: HBASE-17407 > URL: https://issues.apache.org/jira/browse/HBASE-17407 > Project: HBase > Issue Type: Bug >Reporter: Eshcar Hillel > Attachments: HBASE-17407-V01.patch, HBASE-17407-V01.patch, > HBASE-17407-V02.patch > > > The attribute maxFlushedSeqId in HRegion is used to track the max sequence id > in the store files and is reported to HMaster. When flushing only part of the > memstore content this value might be incorrect and may cause data loss. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature
[ https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828032#comment-15828032 ] Edward Bortnikov edited comment on HBASE-16851 at 1/18/17 1:17 PM: --- Programmer manual (developer view) - initial write-up: https://docs.google.com/document/d/1z1R-MdAxRvTC2NazxUmN3FOCFIknkxL2TFqVUhYBVbE. Thanks [~anastas] for the class diagram. was (Author: ebortnik): Programmer manual (developer view) - initial write-up: https://docs.google.com/document/d/1z1R-MdAxRvTC2NazxUmN3FOCFIknkxL2TFqVUhYBVbE. > User-facing documentation for the In-Memory Compaction feature > -- > > Key: HBASE-16851 > URL: https://issues.apache.org/jira/browse/HBASE-16851 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Edward Bortnikov >Assignee: Edward Bortnikov > Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, > Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory > Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature
[ https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828032#comment-15828032 ] Edward Bortnikov commented on HBASE-16851: -- Programmer manual (developer view) - initial write-up: https://docs.google.com/document/d/1z1R-MdAxRvTC2NazxUmN3FOCFIknkxL2TFqVUhYBVbE. > User-facing documentation for the In-Memory Compaction feature > -- > > Key: HBASE-16851 > URL: https://issues.apache.org/jira/browse/HBASE-16851 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Edward Bortnikov >Assignee: Edward Bortnikov > Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, > Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory > Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17081) Flush the entire CompactingMemStore content to disk
[ https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824307#comment-15824307 ] Edward Bortnikov commented on HBASE-17081: -- _v13 was out of sync with trunk (QA ran 1 day after submission). Rebase solved the problem. > Flush the entire CompactingMemStore content to disk > --- > > Key: HBASE-17081 > URL: https://issues.apache.org/jira/browse/HBASE-17081 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Anastasia Braginsky > Fix For: 2.0.0 > > Attachments: HBASE-15787_8.patch, HBASE-17081-V01.patch, > HBASE-17081-V02.patch, HBASE-17081-V03.patch, HBASE-17081-V04.patch, > HBASE-17081-V05.patch, HBASE-17081-V06.patch, HBASE-17081-V06.patch, > HBASE-17081-V07.patch, HBASE-17081-V10.patch, HBASE-17081-V13.patch, > HBASE-17081-V14.patch, HBaseMeetupDecember2016-V02.pptx, > Pipelinememstore_fortrunk_3.patch > > > Part of CompactingMemStore's memory is held by an active segment, and another > part is divided between immutable segments in the compacting pipeline. Upon > flush-to-disk request we want to flush all of it to disk, in contrast to > flushing only tail of the compacting pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17434) New Synchronization Scheme for Compaction Pipeline
[ https://issues.apache.org/jira/browse/HBASE-17434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812611#comment-15812611 ] Edward Bortnikov commented on HBASE-17434: -- Suggest to commit. This patch has been well discussed and verified. Would be much more convenient to fix and re-submit HBASE-17081 with a solid synchronization scheme in place. We are trying to solve a bunch of issues that piled up in the CompactingMemstore implementation. This one is a roadblock. Once again, thanks to all who contributed to improving the solution's quality. > New Synchronization Scheme for Compaction Pipeline > -- > > Key: HBASE-17434 > URL: https://issues.apache.org/jira/browse/HBASE-17434 > Project: HBase > Issue Type: Bug >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: HBASE-17434-V01.patch, HBASE-17434-V02.patch, > HBASE-17434-V03.patch, HBASE-17434.master.001.patch > > > A new copyOnWrite synchronization scheme is introduced for the compaction > pipeline. > The new scheme is better since it removes the lock from getSegments() which > is invoked in every get and scan operation, and it reduces the number of > LinkedList objects that are created at runtime, thus can reduce GC (not by > much, but still...). > In addition, it fixes the method getTailSize() in compaction pipeline. This > method creates a MemstoreSize object which comprises the data size and the > overhead size of the segment and needs to be atomic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17379) Lack of synchronization in CompactionPipeline#getScanners()
[ https://issues.apache.org/jira/browse/HBASE-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15800756#comment-15800756 ] Edward Bortnikov commented on HBASE-17379: -- [~eshcar]. Elegant code in RB. Please add high-level comments about the new synchronization scheme. > Lack of synchronization in CompactionPipeline#getScanners() > --- > > Key: HBASE-17379 > URL: https://issues.apache.org/jira/browse/HBASE-17379 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 17379.v1.txt, 17379.v14.txt, 17379.v2.txt, 17379.v3.txt, > 17379.v4.txt, 17379.v5.txt, 17379.v6.txt, 17379.v8.txt > > > From > https://builds.apache.org/job/PreCommit-HBASE-Build/5053/testReport/org.apache.hadoop.hbase.regionserver/TestHRegionWithInMemoryFlush/testWritesWhileGetting/ > : > {code} > java.io.IOException: java.util.ConcurrentModificationException > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleException(HRegion.java:5886) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.initializeScanners(HRegion.java:5856) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.(HRegion.java:5819) > at > org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2786) > at > org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2766) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:7036) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:7015) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6994) > at > org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting(TestHRegion.java:4141) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.ConcurrentModificationException: null > at > java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:966) > at java.util.LinkedList$ListItr.next(LinkedList.java:888) > at > org.apache.hadoop.hbase.regionserver.CompactionPipeline.getScanners(CompactionPipeline.java:220) > at > org.apache.hadoop.hbase.regionserver.CompactingMemStore.getScanners(CompactingMemStore.java:298) > at > org.apache.hadoop.hbase.regionserver.HStore.getScanners(HStore.java:1154) > at org.apache.hadoop.hbase.regionserver.Store.getScanners(Store.java:97) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.getScannersNoCompaction(StoreScanner.java:353) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:210) > at > org.apache.hadoop.hbase.regionserver.HStore.createScanner(HStore.java:1892) > at > org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:1880) > at >
[jira] [Commented] (HBASE-17407) Correct update of maxFlushedSeqId in HRegion
[ https://issues.apache.org/jira/browse/HBASE-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15798198#comment-15798198 ] Edward Bortnikov commented on HBASE-17407: -- [~Apache9] - what is the invariant you are looking for, and how does it affect the correctness? "Strange intermediate states" is exactly what transactions are about - they are perfectly fine. Can you substantiate about the data loss case? Independent on that, you have a point with that maxFlushedSeqId is managed in too many places, maybe this is a call to action. > Correct update of maxFlushedSeqId in HRegion > > > Key: HBASE-17407 > URL: https://issues.apache.org/jira/browse/HBASE-17407 > Project: HBase > Issue Type: Bug >Reporter: Eshcar Hillel > > The attribute maxFlushedSeqId in HRegion is used to track the max sequence id > in the store files and is reported to HMaster. When flushing only part of the > memstore content this value might be incorrect and may cause data loss. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17373) Reverse the order of snapshot creation in the CompactingMemStore
[ https://issues.apache.org/jira/browse/HBASE-17373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15797288#comment-15797288 ] Edward Bortnikov commented on HBASE-17373: -- ... And once again, this delicate case should be described as part of the synchronization scheme in the programmer's manual ... Our job, too. > Reverse the order of snapshot creation in the CompactingMemStore > > > Key: HBASE-17373 > URL: https://issues.apache.org/jira/browse/HBASE-17373 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-17373-V01.patch, HBASE-17373-V02.patch, > HBASE-17373-V03.patch, HBASE-17373-V04.patch, HBASE-17373-V04.patch, > HBASE-17373-V05.patch > > > In CompactingMemStore both in BASIC and EAGER cases when snapshot is created > the segments are first removed from the pipeline then added to the snapshot. > This is the opposite to what is done in the DefaultMemStore where the > snapshot is firstly created with the active segment and only after the active > segment is refreshed. This JIRA is about to reverse the order in > CompactingMemStore and to make all MemStores to behave the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17373) Reverse the order of snapshot creation in the CompactingMemStore
[ https://issues.apache.org/jira/browse/HBASE-17373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15797283#comment-15797283 ] Edward Bortnikov commented on HBASE-17373: -- Mind not reverting please? Seems that this issue is related to HBASE-17081 (currently reverted), and should be addressed there. We have a dependency on this commit with HBASE-17379, which is also quite pressing. Thanks. > Reverse the order of snapshot creation in the CompactingMemStore > > > Key: HBASE-17373 > URL: https://issues.apache.org/jira/browse/HBASE-17373 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-17373-V01.patch, HBASE-17373-V02.patch, > HBASE-17373-V03.patch, HBASE-17373-V04.patch, HBASE-17373-V04.patch, > HBASE-17373-V05.patch > > > In CompactingMemStore both in BASIC and EAGER cases when snapshot is created > the segments are first removed from the pipeline then added to the snapshot. > This is the opposite to what is done in the DefaultMemStore where the > snapshot is firstly created with the active segment and only after the active > segment is refreshed. This JIRA is about to reverse the order in > CompactingMemStore and to make all MemStores to behave the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17373) Reverse the order of snapshot creation in the CompactingMemStore
[ https://issues.apache.org/jira/browse/HBASE-17373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15793279#comment-15793279 ] Edward Bortnikov commented on HBASE-17373: -- Mind committing folks (smile)? > Reverse the order of snapshot creation in the CompactingMemStore > > > Key: HBASE-17373 > URL: https://issues.apache.org/jira/browse/HBASE-17373 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Eshcar Hillel > Attachments: HBASE-17373-V01.patch, HBASE-17373-V02.patch, > HBASE-17373-V03.patch, HBASE-17373-V04.patch, HBASE-17373-V04.patch > > > In CompactingMemStore both in BASIC and EAGER cases when snapshot is created > the segments are first removed from the pipeline then added to the snapshot. > This is the opposite to what is done in the DefaultMemStore where the > snapshot is firstly created with the active segment and only after the active > segment is refreshed. This JIRA is about to reverse the order in > CompactingMemStore and to make all MemStores to behave the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17379) Lack of synchronization in CompactionPipeline#getScanners()
[ https://issues.apache.org/jira/browse/HBASE-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787776#comment-15787776 ] Edward Bortnikov commented on HBASE-17379: -- Thanks, all, for the comments, suggestions, and patches. I second [~stack] in his suggestion to let [~eshcar] and [~anastas] finish the job. We'll publish the precise synchronization scheme with the patch, and will also make it part of the programmer's manual/blog post (WIP in HBASE-16851). We also have a very comprehensive benchmark in HBASE-16417 - once the patch is ready we'll run it to make sure it does not hamper performance. > Lack of synchronization in CompactionPipeline#getScanners() > --- > > Key: HBASE-17379 > URL: https://issues.apache.org/jira/browse/HBASE-17379 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 17379.v1.txt, 17379.v2.txt, 17379.v3.txt, 17379.v4.txt, > 17379.v5.txt, 17379.v6.txt, 17379.v8.txt > > > From > https://builds.apache.org/job/PreCommit-HBASE-Build/5053/testReport/org.apache.hadoop.hbase.regionserver/TestHRegionWithInMemoryFlush/testWritesWhileGetting/ > : > {code} > java.io.IOException: java.util.ConcurrentModificationException > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleException(HRegion.java:5886) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.initializeScanners(HRegion.java:5856) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.(HRegion.java:5819) > at > org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2786) > at > org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2766) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:7036) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:7015) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6994) > at > org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting(TestHRegion.java:4141) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.ConcurrentModificationException: null > at > java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:966) > at java.util.LinkedList$ListItr.next(LinkedList.java:888) > at > org.apache.hadoop.hbase.regionserver.CompactionPipeline.getScanners(CompactionPipeline.java:220) > at > org.apache.hadoop.hbase.regionserver.CompactingMemStore.getScanners(CompactingMemStore.java:298) > at > org.apache.hadoop.hbase.regionserver.HStore.getScanners(HStore.java:1154) > at org.apache.hadoop.hbase.regionserver.Store.getScanners(Store.java:97) > at >
[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature
[ https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15783747#comment-15783747 ] Edward Bortnikov commented on HBASE-16851: -- Thanks Michael. Unlocked the doc for commenting. > User-facing documentation for the In-Memory Compaction feature > -- > > Key: HBASE-16851 > URL: https://issues.apache.org/jira/browse/HBASE-16851 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Edward Bortnikov >Assignee: Edward Bortnikov > Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, > Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory > Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature
[ https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15782955#comment-15782955 ] Edward Bortnikov commented on HBASE-16851: -- Updated shared doc: https://docs.google.com/document/d/1lsDv8mmw3Daz9Rw9zySEI7zXOlLYYy2dyhQoB6gNMcI > User-facing documentation for the In-Memory Compaction feature > -- > > Key: HBASE-16851 > URL: https://issues.apache.org/jira/browse/HBASE-16851 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Edward Bortnikov >Assignee: Edward Bortnikov > Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, > Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory > Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore
[ https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15782856#comment-15782856 ] Edward Bortnikov commented on HBASE-16421: -- Question unrelated to the unfolding technical discussion :) Our estimate for the whole feature done comme il faut is about 2 months. Are you guys targeting it for 2.0 or beyond? Apologies for possibly asking twice - do not remember the answer to this question. > Introducing the CellChunkMap as a new additional index variant in the MemStore > -- > > Key: HBASE-16421 > URL: https://issues.apache.org/jira/browse/HBASE-16421 > Project: HBase > Issue Type: Umbrella >Reporter: Anastasia Braginsky > Attachments: CellChunkMapRevived.pdf, ChunkCell_creation.png, > IntroductiontoNewFlatandCompactMemStore.pdf > > > Follow up for HBASE-14921. This is going to be the umbrella JIRA to include > all the parts of integration of the CellChunkMap to the MemStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operation
[ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15782330#comment-15782330 ] Edward Bortnikov commented on HBASE-17339: -- [~davelatham], [~yangzhe1991] - thanks for pointing out the historical context. Indeed, the idea will not work in peer clusters with concurrent updates. However, it seems that there are enough interesting use cases that deserve treatment. This optimization is complementary to in-memory flush & compaction (see HBASE-14918). The latter brings its own value, but in conjunction the two produce very impressive reduction in read latency. [~eshcar], maybe you could attach some perf results? Thanks. > Scan-Memory-First Optimization for Get Operation > > > Key: HBASE-17339 > URL: https://issues.apache.org/jira/browse/HBASE-17339 > Project: HBase > Issue Type: Improvement >Reporter: Eshcar Hillel > Attachments: HBASE-17339-V01.patch > > > The current implementation of a get operation (to retrieve values for a > specific key) scans through all relevant stores of the region; for each store > both memory components (memstores segments) and disk components (hfiles) are > scanned in parallel. > We suggest to apply an optimization that speculatively scans memory-only > components first and only if the result is incomplete scans both memory and > disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operation
[ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15780318#comment-15780318 ] Edward Bortnikov commented on HBASE-17339: -- [~yangzhe1991], that is plausible - however need to check how the timestamp monotonicity checking can be done efficiently. We thought of going even further, and passing the flag upon every single Get instead of using a CF-level configuration. > Scan-Memory-First Optimization for Get Operation > > > Key: HBASE-17339 > URL: https://issues.apache.org/jira/browse/HBASE-17339 > Project: HBase > Issue Type: Improvement >Reporter: Eshcar Hillel > Attachments: HBASE-17339-V01.patch > > > The current implementation of a get operation (to retrieve values for a > specific key) scans through all relevant stores of the region; for each store > both memory components (memstores segments) and disk components (hfiles) are > scanned in parallel. > We suggest to apply an optimization that speculatively scans memory-only > components first and only if the result is incomplete scans both memory and > disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operation
[ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15780064#comment-15780064 ] Edward Bortnikov commented on HBASE-17339: -- Use cases in which this optimization might (and will) be useful: - Pub-Sub on top of HBase storage. - Shared counters on top of HBase storage. - E-commerce - editing and checking out a purchase cart. In these cases, the churn is high but the working set is small - can fit in memory. > Scan-Memory-First Optimization for Get Operation > > > Key: HBASE-17339 > URL: https://issues.apache.org/jira/browse/HBASE-17339 > Project: HBase > Issue Type: Improvement >Reporter: Eshcar Hillel > Attachments: HBASE-17339-V01.patch > > > The current implementation of a get operation (to retrieve values for a > specific key) scans through all relevant stores of the region; for each store > both memory components (memstores segments) and disk components (hfiles) are > scanned in parallel. > We suggest to apply an optimization that speculatively scans memory-only > components first and only if the result is incomplete scans both memory and > disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature
[ https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15779874#comment-15779874 ] Edward Bortnikov commented on HBASE-16851: -- WIP change: - Separating the description and benchmarking of CellChunkMap into a standalone post. (Experimental project, currently at a different maturity level than CellArrayMap, see HBASE-16421). > User-facing documentation for the In-Memory Compaction feature > -- > > Key: HBASE-16851 > URL: https://issues.apache.org/jira/browse/HBASE-16851 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Edward Bortnikov >Assignee: Edward Bortnikov > Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, > Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory > Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17294) External Configuration for Memory Compaction
[ https://issues.apache.org/jira/browse/HBASE-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762453#comment-15762453 ] Edward Bortnikov commented on HBASE-17294: -- [~devaraj], please see the performance benchmarks results reported in HBASE-16417. We started writing a separate blog post with clearer and more concise description. Thanks. > External Configuration for Memory Compaction > - > > Key: HBASE-17294 > URL: https://issues.apache.org/jira/browse/HBASE-17294 > Project: HBase > Issue Type: Sub-task >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-17294-V01.patch, HBASE-17294-V02.patch, > HBASE-17294-V03.patch > > > We would like to have a single external knob to control memstore compaction. > Possible memstore compaction policies are none, basic, and eager. > This sub-task allows to set this property at the column family level at table > creation time: > {code} > create ‘’, >{NAME => ‘’, > IN_MEMORY_COMPACTION => ‘’} > {code} > or to set this at the global configuration level by setting the property in > hbase-site.xml, with BASIC being the default value: > {code} > > hbase.hregion.compacting.memstore.type > > > {code} > The values used in this property can change as memstore compaction policies > evolve over time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore
[ https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762435#comment-15762435 ] Edward Bortnikov commented on HBASE-16421: -- Thanks [~anoop.hbase]. Still, I kind of do not get the difference between creating Cells when scanning the BucketCache BB's and the Segment ChunkMap's :) In both cases, new temp objects are created. Did you evidence GC pressure in the former case? Thanks. > Introducing the CellChunkMap as a new additional index variant in the MemStore > -- > > Key: HBASE-16421 > URL: https://issues.apache.org/jira/browse/HBASE-16421 > Project: HBase > Issue Type: Umbrella >Reporter: Anastasia Braginsky > Attachments: CellChunkMapRevived.pdf, > IntroductiontoNewFlatandCompactMemStore.pdf > > > Follow up for HBASE-14921. This is going to be the umbrella JIRA to include > all the parts of integration of the CellChunkMap to the MemStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17081) Flush the entire CompactingMemStore content to disk
[ https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762151#comment-15762151 ] Edward Bortnikov commented on HBASE-17081: -- All - Thanks for chiming in. Apologies for possible misunderstanding - distributed dev processes charge their toll :) . We are starting to suffer from the fact that the Compacting Memstore project split into many small jira's, and it's hard to track the full picture. No problem at all with reverting specific patches if potential destabilization suspected. My concern was scrapping or delaying the whole project without a good reason, hence the suggestion to improve the discussion process and manage it in a well-defined space. Might be that the instability follows from the reverse order in which this Jira and HBASE-17294 were checked in. The latter was supposed to be the concluding chord, finalizing the configuration syntax and setting the new default. Although we cannot reproduce the failures in problematic tests locally, how about the following plan: 1. Revert both HBASE-17081 and HBASE-17294, and see if the regression is stable. 2. Rebase and checkin HBASE-17081. 3. Rebase and checkin HBASE-17294. 4. Move the external documentation to HBASE-14918 (top-level JIRA), to improve the visibility of the new definitions. Thanks, again, for all the assistance identifying the problems so far. > Flush the entire CompactingMemStore content to disk > --- > > Key: HBASE-17081 > URL: https://issues.apache.org/jira/browse/HBASE-17081 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Anastasia Braginsky > Attachments: HBASE-15787_8.patch, HBASE-17081-V01.patch, > HBASE-17081-V02.patch, HBASE-17081-V03.patch, HBASE-17081-V04.patch, > HBASE-17081-V05.patch, HBASE-17081-V06.patch, HBASE-17081-V06.patch, > HBASE-17081-V07.patch, HBaseMeetupDecember2016-V02.pptx, > Pipelinememstore_fortrunk_3.patch > > > Part of CompactingMemStore's memory is held by an active segment, and another > part is divided between immutable segments in the compacting pipeline. Upon > flush-to-disk request we want to flush all of it to disk, in contrast to > flushing only tail of the compacting pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17081) Flush the entire CompactingMemStore content to disk
[ https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15760912#comment-15760912 ] Edward Bortnikov commented on HBASE-17081: -- [~anoop.hbase], indeed neither the current jira nor HBASE-17294 were intended to discuss the configuration. It has been discussed extensively in HBASE-16851. The current jira is about the flush of the full pipeline to disk, which is a basic mechanism, and IMHO there is no reason to revert it. If you are suggesting to re-open the decision to set the default for in-memory compaction, please substantiate your concerns, and how you intend to resolve them. We conducted a very thorough and transparent benchmarking process, and published the results. BASIC compaction showed no side effects, only advantages. EAGER compaction can indeed pose tradeoffs alongside larger gains, that's why it is not default. In any case, appreciate if we could run that discussion at HBASE-16851. It's very hard to track discussions when the jira is changing all the time. Definitely, we are -1 for reverting the change in HBASE-17294 without discussing the implications. The intent behind introducing the default is that otherwise nobody would use the option, as [~stack] rightfully noted. That's why we invested in testing, benchmarking, and simplicity of configuration so much. We are prepared to handle the issues that arise with this change in behavior. We value your perspective a lot, however let's build the discussion around what gaps exist on the ground, and how they can be mediate them without killing the feature. Thanks [~anoop.hbase]. > Flush the entire CompactingMemStore content to disk > --- > > Key: HBASE-17081 > URL: https://issues.apache.org/jira/browse/HBASE-17081 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Anastasia Braginsky > Attachments: HBASE-15787_8.patch, HBASE-17081-V01.patch, > HBASE-17081-V02.patch, HBASE-17081-V03.patch, HBASE-17081-V04.patch, > HBASE-17081-V05.patch, HBASE-17081-V06.patch, HBASE-17081-V06.patch, > HBASE-17081-V07.patch, HBaseMeetupDecember2016-V02.pptx, > Pipelinememstore_fortrunk_3.patch > > > Part of CompactingMemStore's memory is held by an active segment, and another > part is divided between immutable segments in the compacting pipeline. Upon > flush-to-disk request we want to flush all of it to disk, in contrast to > flushing only tail of the compacting pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17081) Flush the entire CompactingMemStore content to disk
[ https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15760388#comment-15760388 ] Edward Bortnikov commented on HBASE-17081: -- The commit should be re-applied. The problem has been exposed by the new configuration HBASE-17294 as [~ram_krish] indicated. Maybe a new Jira should be filed. > Flush the entire CompactingMemStore content to disk > --- > > Key: HBASE-17081 > URL: https://issues.apache.org/jira/browse/HBASE-17081 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Anastasia Braginsky > Attachments: HBASE-15787_8.patch, HBASE-17081-V01.patch, > HBASE-17081-V02.patch, HBASE-17081-V03.patch, HBASE-17081-V04.patch, > HBASE-17081-V05.patch, HBASE-17081-V06.patch, HBASE-17081-V06.patch, > HBASE-17081-V07.patch, HBaseMeetupDecember2016-V02.pptx, > Pipelinememstore_fortrunk_3.patch > > > Part of CompactingMemStore's memory is held by an active segment, and another > part is divided between immutable segments in the compacting pipeline. Upon > flush-to-disk request we want to flush all of it to disk, in contrast to > flushing only tail of the compacting pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17081) Flush the entire CompactingMemStore content to disk
[ https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15760383#comment-15760383 ] Edward Bortnikov commented on HBASE-17081: -- Thanks [~ram_krish] for discovering. Compacting Memstore (basic configuration) became default as of HBASE-17294, the documentation indicates that. The family can be configured for a different type of in-memory compaction (NONE/EAGER). So I guess the issue is with the other test that the nee configuration exposed. > Flush the entire CompactingMemStore content to disk > --- > > Key: HBASE-17081 > URL: https://issues.apache.org/jira/browse/HBASE-17081 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Anastasia Braginsky > Attachments: HBASE-15787_8.patch, HBASE-17081-V01.patch, > HBASE-17081-V02.patch, HBASE-17081-V03.patch, HBASE-17081-V04.patch, > HBASE-17081-V05.patch, HBASE-17081-V06.patch, HBASE-17081-V06.patch, > HBASE-17081-V07.patch, HBaseMeetupDecember2016-V02.pptx, > Pipelinememstore_fortrunk_3.patch > > > Part of CompactingMemStore's memory is held by an active segment, and another > part is divided between immutable segments in the compacting pipeline. Upon > flush-to-disk request we want to flush all of it to disk, in contrast to > flushing only tail of the compacting pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore
[ https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15760375#comment-15760375 ] Edward Bortnikov commented on HBASE-16421: -- Thanks [~anastas], [~anoop.hbase] and [~stack]. Two questions: 1. What is the time frame of this feature? Looks like the full-fledged implementation + evaluation will take a few months. Can it be candidate for 2.0 then? 2. I guess many of the potential performance concerns are similar to the read-path off-heaping. There too, Cell objects are created from off-heap block indexes. Theoretically, there too, it would be desirable to copy the result directly to the response protocol buffer. Do you think that the read-path performance would be a good predictor to what we'll see here? What would be the minimum PoC? > Introducing the CellChunkMap as a new additional index variant in the MemStore > -- > > Key: HBASE-16421 > URL: https://issues.apache.org/jira/browse/HBASE-16421 > Project: HBase > Issue Type: Umbrella >Reporter: Anastasia Braginsky > Attachments: CellChunkMapRevived.pdf, > IntroductiontoNewFlatandCompactMemStore.pdf > > > Follow up for HBASE-14921. This is going to be the umbrella JIRA to include > all the parts of integration of the CellChunkMap to the MemStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17081) Flush the entire CompactingMemStore content to disk
[ https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752078#comment-15752078 ] Edward Bortnikov commented on HBASE-17081: -- [~anastas] Please take a look at the test result, seems to be related: Flaked tests: org.apache.hadoop.hbase.regionserver.TestHRegionWithInMemoryFlush.testWritesWhileScanning(org.apache.hadoop.hbase.regionserver.TestHRegionWithInMemoryFlush) Run 1: TestHRegionWithInMemoryFlush>TestHRegion.testWritesWhileScanning:3979 expected null, but was: > Flush the entire CompactingMemStore content to disk > --- > > Key: HBASE-17081 > URL: https://issues.apache.org/jira/browse/HBASE-17081 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Anastasia Braginsky > Attachments: HBASE-15787_8.patch, HBASE-17081-V01.patch, > HBASE-17081-V02.patch, HBASE-17081-V03.patch, HBASE-17081-V04.patch, > HBASE-17081-V05.patch, HBASE-17081-V06.patch, HBASE-17081-V06.patch, > HBASE-17081-V07.patch, HBaseMeetupDecember2016-V02.pptx, > Pipelinememstore_fortrunk_3.patch > > > Part of CompactingMemStore's memory is held by an active segment, and another > part is divided between immutable segments in the compacting pipeline. Upon > flush-to-disk request we want to flush all of it to disk, in contrast to > flushing only tail of the compacting pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17081) Flush the entire CompactingMemStore content to disk
[ https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747604#comment-15747604 ] Edward Bortnikov commented on HBASE-17081: -- Re/ [~stack]'s question about what's next: 1. HBASE-17294 configuration (Eshcar) - committed, thanks [~stack]. 2. HBASE-16851 documentation (me) - need to complete 3 blog posts: (1) user manual - complete, give or take, (2) performance eval, and (3) programmer's manual. Where should we post all those? Apache blog? 3. HBASE-16417 (Eshcar) automated policy for figuring out whether the BASIC or the EAGER algorithm is to be used. Small refactoring the internal API for future policies. Independent on in-memory compaction per se: 1. HBASE-16421 CellChunkMap implementation (Anastasia) - starting now, need to coordinate with [~anoop.hbase] and [~ram_krish]. 2. JIRA TBD Memstore-First Get (Eshcar) - Big value demonstrated by benchmarks in HBASE-16851, we should try to implement & push before 2.0 closes. Sounds like a plan (smile) ? > Flush the entire CompactingMemStore content to disk > --- > > Key: HBASE-17081 > URL: https://issues.apache.org/jira/browse/HBASE-17081 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Anastasia Braginsky > Attachments: HBASE-15787_8.patch, HBASE-17081-V01.patch, > HBASE-17081-V02.patch, HBASE-17081-V03.patch, HBASE-17081-V04.patch, > HBASE-17081-V05.patch, HBASE-17081-V06.patch, HBASE-17081-V06.patch, > HBaseMeetupDecember2016-V02.pptx, Pipelinememstore_fortrunk_3.patch > > > Part of CompactingMemStore's memory is held by an active segment, and another > part is divided between immutable segments in the compacting pipeline. Upon > flush-to-disk request we want to flush all of it to disk, in contrast to > flushing only tail of the compacting pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17081) Flush the entire CompactingMemStore content to disk
[ https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15700230#comment-15700230 ] Edward Bortnikov commented on HBASE-17081: -- Folks, would you please consider for commit. > Flush the entire CompactingMemStore content to disk > --- > > Key: HBASE-17081 > URL: https://issues.apache.org/jira/browse/HBASE-17081 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Anastasia Braginsky > Attachments: HBASE-17081-V01.patch, HBASE-17081-V02.patch, > HBASE-17081-V03.patch, Pipelinememstore_fortrunk_3.patch > > > Part of CompactingMemStore's memory is held by an active segment, and another > part is divided between immutable segments in the compacting pipeline. Upon > flush-to-disk request we want to flush all of it to disk, in contrast to > flushing only tail of the compacting pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17081) Flush the entire CompactingMemStore content to disk
[ https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15699652#comment-15699652 ] Edward Bortnikov commented on HBASE-17081: -- Please note that this feature is part of both BASIC and EAGER compaction policies, as described in https://docs.google.com/document/d/16XOiOuG9e0l6D_mD-oM5JHcSVmKC8rpIVJGH239gWsQ (see HBASE-16851). As such, Index and Data merge are both parts of the EAGER policy; only index flattening happens in BASIC. The whole pipeline is flushed to disk, in both policies. > Flush the entire CompactingMemStore content to disk > --- > > Key: HBASE-17081 > URL: https://issues.apache.org/jira/browse/HBASE-17081 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Anastasia Braginsky > Attachments: HBASE-17081-V01.patch, HBASE-17081-V02.patch, > Pipelinememstore_fortrunk_3.patch > > > Part of CompactingMemStore's memory is held by an active segment, and another > part is divided between immutable segments in the compacting pipeline. Upon > flush-to-disk request we want to flush all of it to disk, in contrast to > flushing only tail of the compacting pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature
[ https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15699646#comment-15699646 ] Edward Bortnikov commented on HBASE-16851: -- Dear committers - please assign this issue to me. > User-facing documentation for the In-Memory Compaction feature > -- > > Key: HBASE-16851 > URL: https://issues.apache.org/jira/browse/HBASE-16851 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Edward Bortnikov > Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, > Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory > Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature
[ https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Bortnikov updated HBASE-16851: - Attachment: Accordion HBase In-Memory Compaction - Nov 23.pdf > User-facing documentation for the In-Memory Compaction feature > -- > > Key: HBASE-16851 > URL: https://issues.apache.org/jira/browse/HBASE-16851 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Edward Bortnikov > Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, > Accordion HBase In-Memory Compaction - Nov 23.pdf, Accordion_ HBase In-Memory > Compaction - Oct 27.pdf, HBaseAcceleratedHbaseConf-final.pptx > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature
[ https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690414#comment-15690414 ] Edward Bortnikov edited comment on HBASE-16851 at 11/23/16 3:30 PM: Updated version of the blog post in https://docs.google.com/document/d/16XOiOuG9e0l6D_mD-oM5JHcSVmKC8rpIVJGH239gWsQ. Hopefully, the final configuration syntax and technical description. was (Author: ebortnik): Updated version of the blog post in https://docs.google.com/document/d/16XOiOuG9e0l6D_mD-oM5JHcSVmKC8rpIVJGH239gWsQ. > User-facing documentation for the In-Memory Compaction feature > -- > > Key: HBASE-16851 > URL: https://issues.apache.org/jira/browse/HBASE-16851 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Edward Bortnikov > Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, > Accordion_ HBase In-Memory Compaction - Oct 27.pdf, > HBaseAcceleratedHbaseConf-final.pptx > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature
[ https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690414#comment-15690414 ] Edward Bortnikov commented on HBASE-16851: -- Updated version of the blog post in https://docs.google.com/document/d/16XOiOuG9e0l6D_mD-oM5JHcSVmKC8rpIVJGH239gWsQ. > User-facing documentation for the In-Memory Compaction feature > -- > > Key: HBASE-16851 > URL: https://issues.apache.org/jira/browse/HBASE-16851 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Edward Bortnikov > Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, > Accordion_ HBase In-Memory Compaction - Oct 27.pdf, > HBaseAcceleratedHbaseConf-final.pptx > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions
[ https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15661998#comment-15661998 ] Edward Bortnikov commented on HBASE-16417: -- [~anoop.hbase], we certainly appreciate the input, feel free to fire the first thoughts going fwd (smile). Yes, we thought about the multi-cf case. We are speaking of single-row get only. The idea was trying to fetch from the set of the memstore scanners first. If the data can be retrieved, no need to go look in HFiles - isn't it? Am I missing something here? > In-Memory MemStore Policy for Flattening and Compactions > > > Key: HBASE-16417 > URL: https://issues.apache.org/jira/browse/HBASE-16417 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-16417-benchmarkresults-20161101.pdf, > HBASE-16417-benchmarkresults-20161110.pdf > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17081) Flush the entire CompactingMemStore content to disk
[ https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15660383#comment-15660383 ] Edward Bortnikov commented on HBASE-17081: -- Just clarifying the context. This code is a building block for the default compaction policy that has been suggested before. > Flush the entire CompactingMemStore content to disk > --- > > Key: HBASE-17081 > URL: https://issues.apache.org/jira/browse/HBASE-17081 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Anastasia Braginsky > > Part of CompactingMemStore's memory is held by an active segment, and another > part is divided between immutable segments in the compacting pipeline. Upon > flush-to-disk request we want to flush all of it to disk, in contrast to > flushing only tail of the compacting pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions
[ https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15659859#comment-15659859 ] Edward Bortnikov commented on HBASE-16417: -- Just emphasizing the #4 point raised by [~eshcar], it looks pretty important. Does anyone see a problem with the "try-to-read-from-the-memstore-first" approach for scans? It seems to be pretty important for in-memory compaction. Please speak up (smile). > In-Memory MemStore Policy for Flattening and Compactions > > > Key: HBASE-16417 > URL: https://issues.apache.org/jira/browse/HBASE-16417 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-16417-benchmarkresults-20161101.pdf, > HBASE-16417-benchmarkresults-20161110.pdf > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16608) Introducing the ability to merge ImmutableSegments without copy-compaction or SQM usage
[ https://issues.apache.org/jira/browse/HBASE-16608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15626761#comment-15626761 ] Edward Bortnikov commented on HBASE-16608: -- Disclaimer. Configuration is subject to changes, pending conclusions in HBASE-16417. In particular, compaction policy as global configuration (rather than per-CF attribute) is temporary. > Introducing the ability to merge ImmutableSegments without copy-compaction or > SQM usage > --- > > Key: HBASE-16608 > URL: https://issues.apache.org/jira/browse/HBASE-16608 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Anastasia Braginsky >Assignee: Anastasia Braginsky > Fix For: 2.0.0 > > Attachments: HBASE-16417-V02.patch, HBASE-16417-V04.patch, > HBASE-16417-V06.patch, HBASE-16417-V07.patch, HBASE-16417-V08.patch, > HBASE-16417-V10.patch, HBASE-16608-Final.patch, HBASE-16608-Final.patch, > HBASE-16608-V01.patch, HBASE-16608-V03.patch, HBASE-16608-V04.patch, > HBASE-16608-V08.patch, HBASE-16608-V09.patch, HBASE-16608-V09.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature
[ https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625712#comment-15625712 ] Edward Bortnikov commented on HBASE-16851: -- A more detailed version published, diagrams updated, perf results pending. > User-facing documentation for the In-Memory Compaction feature > -- > > Key: HBASE-16851 > URL: https://issues.apache.org/jira/browse/HBASE-16851 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Edward Bortnikov > Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, > Accordion_ HBase In-Memory Compaction - Oct 27.pdf, > HBaseAcceleratedHbaseConf-final.pptx > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature
[ https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Bortnikov updated HBASE-16851: - Attachment: Accordion HBase In-Memory Compaction - Nov 1 .pdf > User-facing documentation for the In-Memory Compaction feature > -- > > Key: HBASE-16851 > URL: https://issues.apache.org/jira/browse/HBASE-16851 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Edward Bortnikov > Attachments: Accordion HBase In-Memory Compaction - Nov 1 .pdf, > Accordion_ HBase In-Memory Compaction - Oct 27.pdf, > HBaseAcceleratedHbaseConf-final.pptx > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615304#comment-15615304 ] Edward Bortnikov edited comment on HBASE-14918 at 10/28/16 12:54 PM: - Let's focus the discussion on HBASE-16417, that is the right context. was (Author: ebortnik): Let's focus the discussion on HBASE-14617, that is the right context. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: CellBlocksSegmentDesign.pdf, > HBASE-16417-benchmarkresults.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615304#comment-15615304 ] Edward Bortnikov commented on HBASE-14918: -- Let's focus the discussion on HBASE-14617, that is the right context. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: CellBlocksSegmentDesign.pdf, > HBASE-16417-benchmarkresults.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions
[ https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615299#comment-15615299 ] Edward Bortnikov commented on HBASE-16417: -- Just to give a sense of what we've been thinking as possible auto-tuning policy (smile). It's a "war driving" approach that is actually similar to opportunistic scans we had once but is a bit smarter. Suppose we do full (data) compaction once in a while; a by-product is the compaction factor - how much space we saved. If the latter is small - schedule the next compaction further away, using some exponential backoff scheme. For workloads with very few duplicates - compactions will never happen, de-facto. For skewed workloads, compactions will consistently prove valuable, and will run at a constant pace. Note that the above is unrelated to whether we flush just one or all segments in the pipeline once the disk flush time comes. Personal opinion - no problem with flushing everything if this shows value. Let's wait for more benchmark results, they're just around the corner. One more personal opinion - we should strive to a generic policy, as much independent as possible on whether we use MSLAB's or not, run on-heap or off-heap, etc.; let's see if we can get there. Actually it's the most fun stage now - we have all the building blocks, the goal is connecting them right :) Stay tuned, we'll keep sharing the results and the ideas. > In-Memory MemStore Policy for Flattening and Compactions > > > Key: HBASE-16417 > URL: https://issues.apache.org/jira/browse/HBASE-16417 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions
[ https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612232#comment-15612232 ] Edward Bortnikov commented on HBASE-16417: -- [~anoop.hbase], [~eshcar] - the end state you both want to reach is the same, just the ways of going there are different. Before going into any detail - the holy grail is a self-tuning policy that does the right thing in EVERY use case of interest. We'd like to achieve it without any additional configuration or private solutions for specific cases. Reason is - what ends up as non-default option will never be used. Anoop - you want two things actually: (1) no compaction of any kind happen when there is no redundancy and (2) flushing everything to disk when the memstore overflows. (Note that these two can be decoupled.) Hopefully you don't mind if there's a policy that magically figures out that we're in that use case, at practically zero cost, and does exactly (1) and (2). We just disagree on marking the opposite case (many duplicates) as special and going down a different code path there - because if we leave it to the admin as non-default we know what'll happen. So we are after that magic policy. The quest won't take long but it has to be data-driven. At the moment, we've just reproduced one microbenchmark (uniform writes, no reads), but there are many other cases that should be looked at. We have the env to run them, and we'll be producing those results over the next couple of weeks. We'll be very much transparent in the process, publishing the results frequently. Once we have the data let's decide collectively. If nothing universal we'll work we can always back off to configs but I'd consider that undesirable. > In-Memory MemStore Policy for Flattening and Compactions > > > Key: HBASE-16417 > URL: https://issues.apache.org/jira/browse/HBASE-16417 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Anastasia Braginsky > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions
[ https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611916#comment-15611916 ] Edward Bortnikov commented on HBASE-16417: -- blockquote, div.yahoo_quoted { margin-left: 0 !important; border-left:1px #715FFA solid !important; padding-left:1ex !important; background-color:white !important; } I created this Jira, I think I can attach. Please share this file with me. Sent from Yahoo Mail for iPhone On Thursday, October 27, 2016, 4:32 PM, Eshcar Hillel (JIRA)wrote: [ https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611902#comment-15611902 ] Eshcar Hillel commented on HBASE-16417: --- The report of the first round of experiment is ready however I cannot attach it here. Can anyone assign this subtask to me so I can attach files in it? Meanwhile, I will attach it in the umbrella Jira. The summary of the report is as follows -- Main difference in configuration vs previous benchmarks: 1. Since we run on a 48GB ram machine we allocate only 16GB to HBase (and not 32GB). 2. Saturation point was found when running 10 threads (and not 50); see more details in the report. 3. We write 50GB (and not 150GB) just to have the experiments shorter since we run many different settings. First round of experiments compares different options (no-, index-, data-compaction) under write-only workload with uniform keys distribution using PE. We see that up until the 95th percentile all options are comparable. At the 99th percentile data compaction starts to lag behind -- indeed in a uniform workload there is not much point in doing data compaction. The overhead might stem from running SQM to determine which versions to retain. One way to close this gap is to not run data compaction when there is no gain in it. A good policy should be able to identify this with no extra cost. At the 99.999th percentile index compaction also exhibits significant overhead. This might be due to memory reclamation of temporary indices. -- This message was sent by Atlassian JIRA (v6.3.4#6332) > In-Memory MemStore Policy for Flattening and Compactions > > > Key: HBASE-16417 > URL: https://issues.apache.org/jira/browse/HBASE-16417 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Anastasia Braginsky > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16851) User-facing documentation for the In-Memory Compaction feature
[ https://issues.apache.org/jira/browse/HBASE-16851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611802#comment-15611802 ] Edward Bortnikov commented on HBASE-16851: -- Attached a version of user-facing doc, with configuration specified. No illustrations yet - working on it. Compaction triggering policy described in general terms (pending HBASE-16417). The impl details are rather high-level - to let the user roughly understand what this feature is about. Let's wait until the policy shapes out, to figure out whether they actually belong to developer documentation. > User-facing documentation for the In-Memory Compaction feature > -- > > Key: HBASE-16851 > URL: https://issues.apache.org/jira/browse/HBASE-16851 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Edward Bortnikov > Attachments: Accordion_ HBase In-Memory Compaction - Oct 27.pdf, > HBaseAcceleratedHbaseConf-final.pptx > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)