[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615304#comment-15615304 ] Edward Bortnikov commented on HBASE-14918: -- Let's focus the discussion on HBASE-14617, that is the right context. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: CellBlocksSegmentDesign.pdf, > HBASE-16417-benchmarkresults.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614715#comment-15614715 ] Anoop Sam John commented on HBASE-14918: bq."For data compaction we do not use MSLABs to avoid the inherent space and computation overhead of copying data during compaction." No fully getting. During writes the cells data might have been copied to MSLAB. So when u do say 2 segments data compaction, u will not do any data copy? Or u mean u will do copy of the surviving cells but not to a chunk got from MSLAB. (?) WHy so? ANy way u r releasing the old segments MSLAB chunks. So temp there will be duplicate of the data (while copying) so u dont want to overuse the MSLAB pool chunks? > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: CellBlocksSegmentDesign.pdf, > HBASE-16417-benchmarkresults.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614309#comment-15614309 ] ramkrishna.s.vasudevan commented on HBASE-14918: Thanks for the results. Looks great. bq.We run index compaction with varying number of segments in the pipeline before merging the index: greater than 1 (ic1), greater than 2 (ic2), greater than 3 (ic3). So some where you have ensured that every segment while moving into the pipeline you do flattening and then merge them when the count is 3. Can you just try what happens when you don't merge it? > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: CellBlocksSegmentDesign.pdf, > HBASE-16417-benchmarkresults.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614268#comment-15614268 ] stack commented on HBASE-14918: --- 12 virtual or physical cores? bq. "For data compaction we do not use MSLABs to avoid the inherent space and computation overhead of copying data during compaction." [~eshcar] We avoid copying for the data case? Is that unreal? Thanks for running the compare. Interesting that you can saturate with 10 threads only. I should look into that. What do you conclude [~eshcar]? Or this is just exploratory work? Thanks. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: CellBlocksSegmentDesign.pdf, > HBASE-16417-benchmarkresults.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612079#comment-15612079 ] Anoop Sam John commented on HBASE-14918: So no tests with NO index merge at all? When the # merges less, we will end up flushing smaller sized files which might impact compaction I think. So we are working on a change which allows to flush whole segments to be flushed together to disk. With that we can better test the with diff policy. ie. #segments needed for a merge. What we want to reach is that Compacting memstore is not degraded from Default memstore. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: CellBlocksSegmentDesign.pdf, > HBASE-16417-benchmarkresults.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612053#comment-15612053 ] Eshcar Hillel commented on HBASE-14918: --- It is default memstore. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: CellBlocksSegmentDesign.pdf, > HBASE-16417-benchmarkresults.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611957#comment-15611957 ] Anoop Sam John commented on HBASE-14918: Good work. In last fig, 'default' means Default memstore? Or is this case with compacting memstore with *NO* index merge at all? > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: CellBlocksSegmentDesign.pdf, > HBASE-16417-benchmarkresults.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451584#comment-15451584 ] ramkrishna.s.vasudevan commented on HBASE-14918: I think it was already raised as another umbrella JIRa. Not sure if that can be moved under this. It was raised by [~anastas] only. I just added subtasks under that so that we could attach smaller patches. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451505#comment-15451505 ] Edward Bortnikov commented on HBASE-14918: -- HBASE-16421 > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451506#comment-15451506 ] Edward Bortnikov commented on HBASE-14918: -- HBASE-16421 > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451498#comment-15451498 ] Anoop Sam John commented on HBASE-14918: Which jira# tracks the CellChunkMap and related stuff (sub tasks) - [~ram_krish]? Pls move all related jiras under this umbrella. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429728#comment-15429728 ] Edward Bortnikov commented on HBASE-14918: -- We've just attached a proposed simplified spec for in-memory flush configuration on HBASE-16417, please take a look and speak up (smile). > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424488#comment-15424488 ] Edward Bortnikov commented on HBASE-14918: -- Let's agree on who-does-what in this Jira. We are open to suggestions. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424487#comment-15424487 ] Edward Bortnikov commented on HBASE-14918: -- Let's agree on who-does-what in this Jira. We are open to suggestions. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203266#comment-15203266 ] Eshcar Hillel commented on HBASE-14918: --- New patch is attached to task HBASE-14920 - new compacting memstore implementation. The patch is not small ;) please review. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170410#comment-15170410 ] ramkrishna.s.vasudevan commented on HBASE-14918: Yes we could move that to PRefixtree module once we complete the write path offheaping work so that we are sure that we got rid of all the ByteRange ref in the core areas. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169406#comment-15169406 ] stack commented on HBASE-14918: --- Move ByteRange into the prefix-tree module? If prefix-tree is enabled, offheaping will not work? Add warnings? > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168669#comment-15168669 ] ramkrishna.s.vasudevan commented on HBASE-14918: Since BR is used heavily in the prefix-tree area, that is one reason why still Prefix-Tree read path does not work completely with offheap. We have to rewrite the logic in prefix-tree replacing BR with BBs. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167514#comment-15167514 ] Anoop Sam John commented on HBASE-14918: Ya as BR can only handle on heap, we may have to move away from it. It is heavily used in prefix-tree area. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167497#comment-15167497 ] stack commented on HBASE-14918: --- Thanks [~ram_krish] See my comment above to [~anoop.hbase] on campaign to purge/deprecate other types. What you think? > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167496#comment-15167496 ] stack commented on HBASE-14918: --- bq. Our ByteBuff adds one unwanted wrap. We dont want multiple BB backing for return for each of the allocate call to MSLAB [~anoop.hbase] Ok. ByteBuff type is ONLY for case where we need to span BBs as in spanning BucketCache buckets? Or you see other uses for it Anoop? What about the fate f ByteRange et al. Seems like we want to move away from ByteRange since it only knows of onheap. If so, lets start a campaign to purge or at least post an edict that ByteRange and subclasses are deprecated. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167347#comment-15167347 ] Anastasia Braginsky commented on HBASE-14918: - [~stack], [~anoop.hbase], [~ram_krish] and everybody, I have just replied on HBASE-14921 because the discussion is about task#4. Please take a look there. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166882#comment-15166882 ] ramkrishna.s.vasudevan commented on HBASE-14918: bq.Yeah, lets align what you are doing here with the offheaping of the write path work +1 here. Was waiting for Anoop to reply over here. The MSLAB should not be working with ByteRange or its forms rather it should be with a datastructure that can work with offheap also. So ByteBuffer is the ideal choice here. bq.Should base type be ByteBuff so can do onheap/offheap? Bytebuffs are wrappers on Bytebbuffers so unless we need something like multiple buffers we need not go with ByteBuffs. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166843#comment-15166843 ] Anoop Sam John commented on HBASE-14918: So the underlying data structure is array of Cells or array of PBRs? HBASE-10713, I considered the in memory flushed CellBlock to have a plain byte[] representation as we can see in HFile data blocks. An array's overhead seems not that much compared to as CSLM. So array is ok. Ya it helps with a binary search and things looks much simpler. Regarding PBR return type from MSLAB, this will create issue with off heap MSLAB. So we try to change this to BB. Java ByteBuffer type is enough not our ByteBuff. Our ByteBuff adds one unwanted wrap. We dont want multiple BB backing for return for each of the allocate call to MSLAB, FYI [~saint@gmail.com] > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163424#comment-15163424 ] stack commented on HBASE-14918: --- bq. the CellBlocks suggests to use ArrayList of PositionedByteRange as the underlying data structure. You thinking this still the case [~anoop.hbase] given where offheaping of read path is going? Should base type be ByteBuff so can do onheap/offheap? bq, Thus we suggest CellBlocksSegment, which fits into new Segments structure of MemStore and inherits from ImmutableSegment. High-level, sounds good. bq. Underneath, CellBlocksSegment has the same idea of CellBlock. One question; what happens when a CellBlockSegment runs into a HFileBlock? How will the marshalling from CBS to HFB run? bq. Just striving to use an array of arrays, instead of list of arrays, in order to enjoy the binary search and less memory overhead. A noble goal. So, an array of CellBlocks? You'd allocate CellBlocks with MSLAB? bq. As far as for now MSLAB doesn't support off-heap allocation, the PositionedByteRange can be replaced by ByteRange/Chunk currently returned by MSLAB. Also little more tuning is required. Ok. Sorry for the plethora of types. We seem to be settling on a few now we know more. There also means of allocation. MSLAB, BucketCache allocator. We can move BBBP no problem. Yeah, lets align what you are doing here with the offheaping of the write path work @anastasia. bq. Sorry for this long monolog Keep going. It is good stuff. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158983#comment-15158983 ] Anastasia Braginsky commented on HBASE-14918: - Thank you for your immediate attention [~stack]! Of course, we looked on CellBlock from HBASE-10713 The code there is very well written with comments and thus possible to understand from just reading the patch. Kudos [~anoop.hbase] :) ! (At least I hope that I understand it :) and [~anoop.hbase] please correct me if I am wrong.) Alongside with some restructuring and refactoring (partially issued by HBASE-14919), the CellBlocks suggests to use ArrayList of PositionedByteRange as the underlying data structure. PositionedByteRange and SimplePositionedByteRange are allocated simply from JVM heap. The code treats many details and also provides a very important CellBlockScanner to scan the new data structure. In light of the recent MemStore refactoring, the CellBlock patch clearly can not be used as is. However, the most important and deep parts of the code are very valuable and definitely can be reused. Thus we suggest CellBlocksSegment, which fits into new Segments structure of MemStore and inherits from ImmutableSegment. Underneath, CellBlocksSegment has the same idea of CellBlock. Just striving to use an array of arrays, instead of list of arrays, in order to enjoy the binary search and less memory overhead. Taking in consideration the earlier [~anoop.hbase]'s comments about MSLAB (and a simple common sense) we suggest to use MSLAB for allocating any sequence of bytes. Please note that MSLAB is very suitable also because it issues the reference counting for chunk scans and thus the deallocation of the chunks per segment. As far as for now MSLAB doesn't support off-heap allocation, the PositionedByteRange can be replaced by ByteRange/Chunk currently returned by MSLAB. Also little more tuning is required. As completely orthogonal, but related issue we also see a possibility of enhancing the MSLAB and adding it an ability to allocate its Chunks on- and off-heap. It is probably issue for sub-task number 5 of HBASE-14918 :) Obviously, this requires some redesign of MemStoreLAB, HeapMemStoreLab.Chunk, and some other classes around the memory allocation. In particular, the implementation of HeapMemStoreLab.Chunk with "byte[] field" and the usage of ByteRange, can be replaced with (for example) ByteBuffer. (ByteBufferArray from hbase-common/org.apache.hadoop.hbase.util also looks very interesting :)) I agree that it is better to pre-allocate the off-heap Chunks, for that we can probably enhance the MemStoreChunkPool. I took a look on the BoundedByteBufferPool, which I found only in hbase-client code. It also looks very suitable, however in different component. Sorry for this long monolog :) [~anoop.hbase], [~stack], everybody, what do you think? I am thrilled to hear your insightful comments! :)) Thanks! > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and >
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157510#comment-15157510 ] stack commented on HBASE-14918: --- Sounds great Anastasia. bq. The HFile instance (with all its great possibilities) seams like something too complex for this simple task of storing the data flat. Agree. You looked at CellBlocks? Allocations offheap take time. You looked at bytebufferpool? Could allocate a bunch up front and then do reuse? > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156756#comment-15156756 ] Anastasia Braginsky commented on HBASE-14918: - Hi, We are now starting to progress with the CellBlocksSegment implementation for the CompactedMemStore. As explained in the attached design, CellBlocksSegment is a flat layout for the immutable segments (those which were flushed-in-memory). The suggestion is to implement CellBlocksSegment as long ordered array and to use binary search for the navigation inside the array. The array is the data structure that suits us best, because (1) the data is immutable (no insertions/deletions), (2) the data is already ordered before being written to array, (3) minimal memory overhead for any pointers, (4) most easily serializable. The HFile instance (with all its great possibilities) seams like something too complex for this simple task of storing the data flat. Clearly, a long array need to be partitioned into sub-array with bounded size. So down to earth, we will have an array of arrays. After once again looking on MSLAB and the memory management around, it looks like those arrays can be the chunks from MSLAB. So in an elegant way all memory allocations remains through MSLAB. Even more than that, it appears that MSLAB may be arranged to allocate chunks off-heap (with some little adjustment, of course). This can be used later if needed. Another discussion is required to understand the off-heap possibilities in MemStore. [~anoop.hbase] and everybody, what do you think? Thanks, Anastasia > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121121#comment-15121121 ] Anoop Sam John commented on HBASE-14918: As per the present trunk code, we can move it out.. I have done also.. But am not sure whether we need it inside the new Memstore impl, (with internal flush to pipeline and flush as CellBlock).. So I did not raise a Jira. Why I say to move out is this work of copying the Cell data into a MSLAB area is not a Memstore impl detail. Whatever be the Memstore impl (current or new) we need this. Also I have done a patch for avoiding garbage what we create in write path (See HBASE015180) when MSLAB is on. That is why I thought to make it an upper layer work than at the Memstore impl. I need to see how my patch can satisfy the need of new memstore impls > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > Attachments: CellBlocksSegmentDesign.pdf > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121094#comment-15121094 ] Eshcar Hillel commented on HBASE-14918: --- Thanks [~anoop.hbase]. I don't see how you can move MSLAB to the HStore level. In the first patch MSLAB is used in the segment to allocate the byte range (in maybeCloneWithAllocator()), and it also does bookkeeping of scanners which access the MSLAB (with inc/decScannersCount()) so it can manage the deallocation of buffers when no scanners can access them. This is also the case in master but there the methods are in the scope of DefaultMemStore and the MemStoreScanner. How would you suggest to move it to HStore? Why do you think it is better there and not inside the segment? > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > Attachments: CellBlocksSegmentDesign.pdf > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121385#comment-15121385 ] Eshcar Hillel commented on HBASE-14918: --- I reviewed the mslab-move patch. Software-engineering-wise I am not at all convinced that the right place for mslab is in HStore level. The compacting memstore is an example in which cells are allocated at the memstore level and not the store level. But more important is what you say about off-heap memory. I have no experience with off-heaping. Can you please elaborate why the suggested design cannot be off-heap, and what is needed to allow it be off-heap? In addition, you refer to the write-path, but actually the write-path goes through mutable-segment that stores the data in a CSLM format. Only reads and scans access the cell block. It is good we have this discussion at this point since it relates to the design of task #4, and can also affect task #3. However, [~stack], is there anything that prevents committing the patch of task #1. Is it not committed due to the MSLAB issue? IMO, the mslab is orthogonal to task #1. If it is decided that it needs to move, then it is possible to do so even after the patch. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121402#comment-15121402 ] Anoop Sam John commented on HBASE-14918: Ya I said it already, with the new Memstore impls it might be really possible for the move. My only point was that the copy to MSLAB is not a memstore impl thing at all..Or else it has to be a duty of all of the memstore impls. And I strongly think that for the new memstore impl, this movement may be problematic. (I mean the flush to cellblock one). Once the flush to cellblock area happens, we dont need this allocator and can have a new one. (It is like normal flush).. And this in memory flush happens within memstore impl.. So we might not be able to handle these if allocator is moved out.. That patch I attached just for ref. Commit of task-1 has nothing to do with this movement. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121231#comment-15121231 ] Anoop Sam John commented on HBASE-14918: bq.Each block is a PositionedByteRange (essentially encapsulating byte array), and the list is manifested as an array of PositionedByteRange That means we can not keep these Cells (in CellBlock) in an off heap memory area? We are trying to make the write flow also to support off heap {quote} Cell maybeCloneWithAllocator(Cell cell) If the segment has a memory allocator the cell is being cloned to this space, and returned; otherwise the given cell is returned {quote} I think doing this in these lower layers of memstore impl is not good.. That is one more reason why the thinking on moving the MSLAB copy. Can we do the copy stuff in HStore and only pass the allocator ref to Memstore for doing the inc/dec scanner things etc? Again I did not do any deep study on that. You know better. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121203#comment-15121203 ] Eshcar Hillel commented on HBASE-14918: --- ok. But even in the HBASE-15180 patch DefaultMemStore still have the attributes {code} volatile MemStoreLAB allocator; volatile MemStoreLAB snapshotAllocator; {code} and MemStoreScanner still have the attributes {code} volatile MemStoreLAB allocatorAtCreation; volatile MemStoreLAB snapshotAllocatorAtCreation; {code} So either I'm missing something or we talk on two different things. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > Attachments: CellBlocksSegmentDesign.pdf > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121216#comment-15121216 ] Anoop Sam John commented on HBASE-14918: No in those patch I have not done this move. I was saying that, working on that made me think more strongly towards that. Get me? I have that moving patch.. Can just attach here for your ref may be. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > Attachments: CellBlocksSegmentDesign.pdf > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15119208#comment-15119208 ] Anoop Sam John commented on HBASE-14918: [~eshcar] In one of the subtask (infrastructure refactor I guess) , I have added a comment that may be we should move the MSLAB copy stuff out of Memstore impl. Now in the patch it is moved to the Abstract base impl of Memstore and Allocator stuff passed through the Segment also. (I think I read it that way.. not remembering)..actually this MSLAB stuff should be moved to HStore level. (This is not a Memstore impl detail).. Not your patch issue. It was this way from begin and when Memstore is made interface impl way, I missed that too..wdyt? I can see how we can me it. Let me have a go at the attached pdf > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > Attachments: CellBlocksSegmentDesign.pdf > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112950#comment-15112950 ] stack commented on HBASE-14918: --- [~anoop.hbase] You see above sir? > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113555#comment-15113555 ] Anoop Sam John commented on HBASE-14918: I dont know why when the name is being referred, am not getting any mail notify from Jira! I used to get that.. Let me get to this. Also doing a pass over the other 2 patches. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110274#comment-15110274 ] Eshcar Hillel commented on HBASE-14918: --- I submitted a patch in task 1 two days ago but didn't receive any QA report since. Any problems with the QA system? > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110790#comment-15110790 ] Eshcar Hillel commented on HBASE-14918: --- I went through the code in HBASE-10713, and it seems we can come up with a design for task #4 of a compacted memstore which stores the data in a flat format (in the issue they are called CellBlocks) instead of in java skip-list. [~anoop.hbase] would you be interested to collaborate on this? If you are, we can schedule an off-list chat to discuss the details of the design. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072127#comment-15072127 ] Eshcar Hillel commented on HBASE-14918: --- Both patches got +1 overall in QA. Happy Holidays to those who celebrate - waiting to make progress when you return. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070913#comment-15070913 ] Eshcar Hillel commented on HBASE-14918: --- Patches are available for task 1 and task 2. > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 0.98.18 > > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 4 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Adding StoreServices facility at the region level to allow memstores > update region counters and access region level synchronization mechanism; > (3) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (4) Memory optimization including compressed format representation and off > heap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037817#comment-15037817 ] Eshcar Hillel commented on HBASE-14918: --- Submitted patch for first sub-task > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 3 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (3) Memory optimization including compressed format representation and > offheap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)