[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction

2016-10-21 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-13408:
---
Resolution: Duplicate
  Assignee: (was: Eshcar Hillel)
Status: Resolved  (was: Patch Available)

Dup of HBASE-14918.  

> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, 
> HBASE-13408-trunk-v08.patch, HBASE-13408-trunk-v09.patch, 
> HBASE-13408-trunk-v10.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver04.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionMasterEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-11-24 Thread Anastasia Braginsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anastasia Braginsky updated HBASE-13408:

Attachment: HBaseIn-MemoryMemstoreCompactionDesignDocument-ver04.pdf

> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, 
> HBASE-13408-trunk-v08.patch, HBASE-13408-trunk-v09.patch, 
> HBASE-13408-trunk-v10.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver04.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionMasterEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-11-10 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-13408:
--
Attachment: HBASE-13408-trunk-v10.patch

> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, 
> HBASE-13408-trunk-v08.patch, HBASE-13408-trunk-v09.patch, 
> HBASE-13408-trunk-v10.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionMasterEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-11-10 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-13408:
--
Attachment: HBASE-13408-trunk-v09.patch

We attach a new patch which includes the changes required by the recent 
discussion.
Specifically, we removed (undo) some of the changes to the HRegion and 
FlushPolicy classes. We moved the code for triggering in memory flush into the 
compacting memstore implementation.
We excluded two changes:
(1) we did not remove the StoreSegmentScanner tier from the KeyValueScanner 
hierarchy as this would result in empty implementation (of the two methods we 
define here) in the other 5 concrete classes implementing the KeyValueScanner 
interface, which seems unnecessary.
(2) we did not remove the snapshot - this needs to be discussed in a different 
Jira; there are pros and cons, and it shouldn’t be decided without thorough 
discussion.


> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, 
> HBASE-13408-trunk-v08.patch, HBASE-13408-trunk-v09.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionMasterEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-11-03 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-13408:
--
Attachment: HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf

Hi All,
We compiled a new design document (attached) capturing all changes (we noticed 
there are many changes since the original design suggestion).
In this new design document the behavior of the compacted memstore is confined 
mainly to the scope of the memstore, however some minimal changes are done at 
the scope of the region level, in order to give compacted memstore some slack 
to manage the in-memory flushes and in-memory compaction.
Next we plan to prepare the patch; main changes with respect to current patch 
would be to remove most of the code changes at the region level, and allow 
compacted memstore have access to the region lock to apply in-memory flushes.

> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, 
> HBASE-13408-trunk-v08.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionMasterEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-10-27 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-13408:
--
Attachment: HBASE-13408-trunk-v08.patch

> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, 
> HBASE-13408-trunk-v08.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionMasterEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-10-26 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-13408:
--
Attachment: InMemoryMemstoreCompactionMasterEvaluationResults.pdf
HBASE-13408-trunk-v07.patch

Attaching a new patch after rebase and code review changes.
One of the changes in the code is aligning the initialization of the memstore 
with the memstore class name configuration setting. To create a compacted 
memstore one needs to configure the hbase with

hbase.regionserver.memstore.class=org.apache.hadoop.hbase.regionserver.CompactedMemStore


In addition, we reproduced the results of the benchmarks for the master code 
(new and original) measured in different settings and workloads. Report is 
attached.

> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionMasterEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-09-25 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-13408:
--
Attachment: HBASE-13408-trunk-v06.patch

Rebased - again!!

> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBASE-13408-trunk-v06.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-09-24 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-13408:
--
Attachment: HBASE-13408-trunk-v04.patch
StoreSegmentandStoreSegmentScannerClassHierarchies.pdf

Attaching a new patch for supporting different formats of store segments.
Also attaching a low level design document to explain the class hierarchy for 
supporting existing format and adding other formats in the future.

> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-09-24 Thread Edward Bortnikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Bortnikov updated HBASE-13408:
-
Assignee: Eshcar Hillel

> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-09-24 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-13408:
--
Attachment: HBASE-13408-trunk-v05.patch

> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-09-02 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-13408:
---
Fix Version/s: 2.0.0

> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-09-02 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-13408:
--
Attachment: HBASE-13408-trunk-v03.patch

New patch after rebase and code review changes.

> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-08-20 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-13408:
--
Attachment: HBASE-13408-trunk-v02.patch
InMemoryMemstoreCompactionScansEvaluationResults.pdf

We attach a new patch which covers wal truncation.
We also attach evaluation results for scans. The trend is very similar to the 
improvement we see for read operation.

Following the approach suggested in HBASE-10713, we now divide flushed stores 
into two groups: one doing the traditional flush to disk, and the other group 
does in-memory flush into an inactive (read-only) memstore segment, which is 
subject to compaction. By default, an in-memory column family has compacted 
memstore which does in-memory flush, while all other column families have a 
default memstore which flush to disk. However, in some use cases, e.g. upon 
region split/merge/close, even in-memory columns flush their content to disk.

Therefore, flush policy selects *two* sets of stores: one to flush to disk, and 
one to do in-memory flush. The first set invokes snapshot(), and the second set 
invokes flushInMemory() during the prepare phase.
The main changes to support wal truncation are threefold:
(1) upon in-memory compaction the wal is updated with a sequence number which 
is a lower approximation of the lowest-unflushed-sequence-id
(2) When the number of log files exceed a certain threshold the store is forced 
to flush to disk even if it is an in-memory column.
(3) upon flush to disk lowest-unflushed-sequence-id is cleared (like it used to 
be). Stores with in-memory segments, update this with a lower approximation of 
the lowest sequence id still in memory. Other stores update this sequence id 
with the first insert after the flush (like it used to be)

While (1) should help in prolonging the time an item can stay in memory, (2) 
and (3) are there to ensure the wal size is maintainable and cannot explode.


 HBase In-Memory Memstore Compaction
 ---

 Key: HBASE-13408
 URL: https://issues.apache.org/jira/browse/HBASE-13408
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel
 Attachments: HBASE-13408-trunk-v01.patch, 
 HBASE-13408-trunk-v02.patch, 
 HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
 HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
 InMemoryMemstoreCompactionEvaluationResults.pdf, 
 InMemoryMemstoreCompactionScansEvaluationResults.pdf


 A store unit holds a column family in a region, where the memstore is its 
 in-memory component. The memstore absorbs all updates to the store; from time 
 to time these updates are flushed to a file on disk, where they are 
 compacted. Unlike disk components, the memstore is not compacted until it is 
 written to the filesystem and optionally to block-cache. This may result in 
 underutilization of the memory due to duplicate entries per row, for example, 
 when hot data is continuously updated. 
 Generally, the faster the data is accumulated in memory, more flushes are 
 triggered, the data sinks to disk more frequently, slowing down retrieval of 
 data, even if very recent.
 In high-churn workloads, compacting the memstore can help maintain the data 
 in memory, and thereby speed up data retrieval. 
 We suggest a new compacted memstore with the following principles:
 1.The data is kept in memory for as long as possible
 2.Memstore data is either compacted or in process of being compacted 
 3.Allow a panic mode, which may interrupt an in-progress compaction and 
 force a flush of part of the memstore.
 We suggest applying this optimization only to in-memory column families.
 A design document is attached.
 This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-08-11 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-13408:
--
Status: Patch Available  (was: Open)

 HBase In-Memory Memstore Compaction
 ---

 Key: HBASE-13408
 URL: https://issues.apache.org/jira/browse/HBASE-13408
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel
 Attachments: HBASE-13408-trunk-v01.patch, 
 HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
 HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
 InMemoryMemstoreCompactionEvaluationResults.pdf


 A store unit holds a column family in a region, where the memstore is its 
 in-memory component. The memstore absorbs all updates to the store; from time 
 to time these updates are flushed to a file on disk, where they are 
 compacted. Unlike disk components, the memstore is not compacted until it is 
 written to the filesystem and optionally to block-cache. This may result in 
 underutilization of the memory due to duplicate entries per row, for example, 
 when hot data is continuously updated. 
 Generally, the faster the data is accumulated in memory, more flushes are 
 triggered, the data sinks to disk more frequently, slowing down retrieval of 
 data, even if very recent.
 In high-churn workloads, compacting the memstore can help maintain the data 
 in memory, and thereby speed up data retrieval. 
 We suggest a new compacted memstore with the following principles:
 1.The data is kept in memory for as long as possible
 2.Memstore data is either compacted or in process of being compacted 
 3.Allow a panic mode, which may interrupt an in-progress compaction and 
 force a flush of part of the memstore.
 We suggest applying this optimization only to in-memory column families.
 A design document is attached.
 This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-08-11 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-13408:
--
Attachment: HBASE-13408-trunk-v01.patch

 HBase In-Memory Memstore Compaction
 ---

 Key: HBASE-13408
 URL: https://issues.apache.org/jira/browse/HBASE-13408
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel
 Attachments: HBASE-13408-trunk-v01.patch, 
 HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
 HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
 InMemoryMemstoreCompactionEvaluationResults.pdf


 A store unit holds a column family in a region, where the memstore is its 
 in-memory component. The memstore absorbs all updates to the store; from time 
 to time these updates are flushed to a file on disk, where they are 
 compacted. Unlike disk components, the memstore is not compacted until it is 
 written to the filesystem and optionally to block-cache. This may result in 
 underutilization of the memory due to duplicate entries per row, for example, 
 when hot data is continuously updated. 
 Generally, the faster the data is accumulated in memory, more flushes are 
 triggered, the data sinks to disk more frequently, slowing down retrieval of 
 data, even if very recent.
 In high-churn workloads, compacting the memstore can help maintain the data 
 in memory, and thereby speed up data retrieval. 
 We suggest a new compacted memstore with the following principles:
 1.The data is kept in memory for as long as possible
 2.Memstore data is either compacted or in process of being compacted 
 3.Allow a panic mode, which may interrupt an in-progress compaction and 
 force a flush of part of the memstore.
 We suggest applying this optimization only to in-memory column families.
 A design document is attached.
 This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-07-14 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-13408:
--
Attachment: InMemoryMemstoreCompactionEvaluationResults.pdf
HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf

 HBase In-Memory Memstore Compaction
 ---

 Key: HBASE-13408
 URL: https://issues.apache.org/jira/browse/HBASE-13408
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel
 Attachments: 
 HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
 HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
 InMemoryMemstoreCompactionEvaluationResults.pdf


 A store unit holds a column family in a region, where the memstore is its 
 in-memory component. The memstore absorbs all updates to the store; from time 
 to time these updates are flushed to a file on disk, where they are 
 compacted. Unlike disk components, the memstore is not compacted until it is 
 written to the filesystem and optionally to block-cache. This may result in 
 underutilization of the memory due to duplicate entries per row, for example, 
 when hot data is continuously updated. 
 Generally, the faster the data is accumulated in memory, more flushes are 
 triggered, the data sinks to disk more frequently, slowing down retrieval of 
 data, even if very recent.
 In high-churn workloads, compacting the memstore can help maintain the data 
 in memory, and thereby speed up data retrieval. 
 We suggest a new compacted memstore with the following principles:
 1.The data is kept in memory for as long as possible
 2.Memstore data is either compacted or in process of being compacted 
 3.Allow a panic mode, which may interrupt an in-progress compaction and 
 force a flush of part of the memstore.
 We suggest applying this optimization only to in-memory column families.
 A design document is attached.
 This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-04-05 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-13408:
--
Attachment: HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf

 HBase In-Memory Memstore Compaction
 ---

 Key: HBASE-13408
 URL: https://issues.apache.org/jira/browse/HBASE-13408
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel
 Attachments: HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf


 A store unit holds a column family in a region, where the memstore is its 
 in-memory component. The memstore absorbs all updates to the store; from time 
 to time these updates are flushed to a file on disk, where they are 
 compacted. Unlike disk components, the memstore is not compacted until it is 
 written to the filesystem and optionally to block-cache. This may result in 
 underutilization of the memory due to duplicate entries per row, for example, 
 when hot data is continuously updated. 
 Generally, the faster the data is accumulated in memory, more flushes are 
 triggered, the data sinks to disk more frequently, slowing down retrieval of 
 data, even if very recent.
 In high-churn workloads, compacting the memstore can help maintain the data 
 in memory, and thereby speed up data retrieval. 
 We suggest a new compacted memstore with the following principles:
 1.The data is kept in memory for as long as possible
 2.Memstore data is either compacted or in process of being compacted 
 3.Allow a panic mode, which may interrupt an in-progress compaction and 
 force a flush of part of the memstore.
 We suggest applying this optimization only to in-memory column families.
 A design document is attached.
 This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)