[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-09-02 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-13408:
--
Attachment: HBASE-13408-trunk-v03.patch

New patch after rebase and code review changes.

> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-09-24 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-13408:
--
Attachment: HBASE-13408-trunk-v04.patch
StoreSegmentandStoreSegmentScannerClassHierarchies.pdf

Attaching a new patch for supporting different formats of store segments.
Also attaching a low level design document to explain the class hierarchy for 
supporting existing format and adding other formats in the future.

> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-09-25 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-13408:
--
Attachment: HBASE-13408-trunk-v06.patch

Rebased - again!!

> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBASE-13408-trunk-v06.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-09-25 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908550#comment-14908550
 ] 

Eshcar Hillel commented on HBASE-13408:
---

bq. Under which scenario(s) would this method be inefficient ?
When the memstore needs to build the Cell Set by traversing all cells in the 
segment.

bq. I thought the compaction pipeline would generate immutable segments ?
You are right, this is a typo in the document. The compaction pipeline can 
generate different type of *immutable* segments like HFile format.

bq. Would the above be done in a future patch or different JIRA ?
Yes, we believe it should be implemented as a different task.

> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBASE-13408-trunk-v06.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-09-24 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-13408:
--
Attachment: HBASE-13408-trunk-v05.patch

> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14918) In-Memory MemStore Flush and Compaction

2015-12-03 Thread Eshcar Hillel (JIRA)
Eshcar Hillel created HBASE-14918:
-

 Summary: In-Memory MemStore Flush and Compaction
 Key: HBASE-14918
 URL: https://issues.apache.org/jira/browse/HBASE-14918
 Project: HBase
  Issue Type: Umbrella
Affects Versions: 2.0.0
Reporter: Eshcar Hillel


A memstore serves as the in-memory component of a store unit, absorbing all 
updates to the store. From time to time these updates are flushed to a file on 
disk, where they are compacted (by eliminating redundancies) and compressed 
(i.e., written in a compressed format to reduce their storage size).

We aim to speed up data access, and therefore suggest to apply in-memory 
memstore flush. That is to flush the active in-memory segment into an 
intermediate buffer where it can be accessed by the application. Data in the 
buffer is subject to compaction and can be stored in any format that allows it 
to take up smaller space in RAM. The less space the buffer consumes the longer 
it can reside in memory before data is flushed to disk, resulting in better 
performance.
Specifically, the optimization is beneficial for workloads with medium-to-high 
key churn which incur many redundant cells, like persistent messaging. 

We suggest to structure the solution as 3 subtasks (respectively, patches). 
(1) Infrastructure - refactoring of the MemStore hierarchy, introducing segment 
(StoreSegment) as first-class citizen, and decoupling memstore scanner from the 
memstore implementation;
(2) Implementation of a new memstore (CompactingMemstore) with non-optimized 
immutable segment representation, and 
(3) Memory optimization including compressed format representation and offheap 
allocations.

This Jira continues the discussion in HBASE-13408.
Design documents, evaluation results and previous patches can be found in 
HBASE-13408. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14921) Memory optimizations

2015-12-03 Thread Eshcar Hillel (JIRA)
Eshcar Hillel created HBASE-14921:
-

 Summary: Memory optimizations
 Key: HBASE-14921
 URL: https://issues.apache.org/jira/browse/HBASE-14921
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.0
Reporter: Eshcar Hillel


Memory optimizations including compressed format representation and offheap 
allocations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14920) Compacting Memstore

2015-12-03 Thread Eshcar Hillel (JIRA)
Eshcar Hillel created HBASE-14920:
-

 Summary: Compacting Memstore
 Key: HBASE-14920
 URL: https://issues.apache.org/jira/browse/HBASE-14920
 Project: HBase
  Issue Type: Sub-task
Reporter: Eshcar Hillel
Assignee: Eshcar Hillel


Implementation of a new compacting memstore with non-optimized immutable 
segment representation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14919) Infrastructure refactoring

2015-12-03 Thread Eshcar Hillel (JIRA)
Eshcar Hillel created HBASE-14919:
-

 Summary: Infrastructure refactoring
 Key: HBASE-14919
 URL: https://issues.apache.org/jira/browse/HBASE-14919
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.0
Reporter: Eshcar Hillel
Assignee: Eshcar Hillel


Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
first-class citizen and decoupling memstore scanner from the memstore 
implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14919) Infrastructure refactoring

2015-12-03 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-14919:
--
Attachment: HBASE-14919-V01.patch

patch attached

> Infrastructure refactoring
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14919) Infrastructure refactoring

2015-12-03 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-14919:
--
Status: Patch Available  (was: Open)

> Infrastructure refactoring
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-12-03 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037831#comment-15037831
 ] 

Eshcar Hillel commented on HBASE-13408:
---

This is an umbrella Jira continuing the current issue.

> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, 
> HBASE-13408-trunk-v08.patch, HBASE-13408-trunk-v09.patch, 
> HBASE-13408-trunk-v10.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver04.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionMasterEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction

2015-12-03 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037817#comment-15037817
 ] 

Eshcar Hillel commented on HBASE-14918:
---

Submitted patch for first sub-task 

> In-Memory MemStore Flush and Compaction
> ---
>
> Key: HBASE-14918
> URL: https://issues.apache.org/jira/browse/HBASE-14918
> Project: HBase
>  Issue Type: Umbrella
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>
> A memstore serves as the in-memory component of a store unit, absorbing all 
> updates to the store. From time to time these updates are flushed to a file 
> on disk, where they are compacted (by eliminating redundancies) and 
> compressed (i.e., written in a compressed format to reduce their storage 
> size).
> We aim to speed up data access, and therefore suggest to apply in-memory 
> memstore flush. That is to flush the active in-memory segment into an 
> intermediate buffer where it can be accessed by the application. Data in the 
> buffer is subject to compaction and can be stored in any format that allows 
> it to take up smaller space in RAM. The less space the buffer consumes the 
> longer it can reside in memory before data is flushed to disk, resulting in 
> better performance.
> Specifically, the optimization is beneficial for workloads with 
> medium-to-high key churn which incur many redundant cells, like persistent 
> messaging. 
> We suggest to structure the solution as 3 subtasks (respectively, patches). 
> (1) Infrastructure - refactoring of the MemStore hierarchy, introducing 
> segment (StoreSegment) as first-class citizen, and decoupling memstore 
> scanner from the memstore implementation;
> (2) Implementation of a new memstore (CompactingMemstore) with non-optimized 
> immutable segment representation, and 
> (3) Memory optimization including compressed format representation and 
> offheap allocations.
> This Jira continues the discussion in HBASE-13408.
> Design documents, evaluation results and previous patches can be found in 
> HBASE-13408. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-12-03 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037825#comment-15037825
 ] 

Eshcar Hillel commented on HBASE-13408:
---

Created new Jira HBASE-14918 with three sub-tasks.
Submitted patch for first refactoring task.
This Jira is EOL; if you wish to continue followING this issue please start 
watching HBASE-14918 (and/or HBASE-14919/HBASE-14920/HBASE-14921).

> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, 
> HBASE-13408-trunk-v08.patch, HBASE-13408-trunk-v09.patch, 
> HBASE-13408-trunk-v10.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver04.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionMasterEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14919) Infrastructure refactoring

2015-12-08 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-14919:
--
Attachment: HBASE-14919-V02.patch

> Infrastructure refactoring
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch, 
> HBASE-14919-V02.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14919) Infrastructure refactoring

2015-12-09 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048897#comment-15048897
 ] 

Eshcar Hillel commented on HBASE-14919:
---

The patch is now available in rb.
I will work to fix the style warnings.
This would be a good time to give feedback :).

> Infrastructure refactoring
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch, 
> HBASE-14919-V02.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14919) Infrastructure refactoring

2015-12-05 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043498#comment-15043498
 ] 

Eshcar Hillel commented on HBASE-14919:
---

Added link to review board.
[~yuzhihong] can you tell what's the base for the -1 overall for the patch?

> Infrastructure refactoring
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14919) Infrastructure refactoring

2015-12-16 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061674#comment-15061674
 ] 

Eshcar Hillel commented on HBASE-14919:
---

Thanks [~te...@apache.org] for providing the list of failed tests.
Is it normal for the QA to report success while some tests have failed?
Can you verify the master is stable and no tests fail there? 


> Infrastructure refactoring
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch, 
> HBASE-14919-V02.patch, HBASE-14919-V03.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14919) Infrastructure refactoring

2015-12-17 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062582#comment-15062582
 ] 

Eshcar Hillel commented on HBASE-14919:
---

ok.
Just uploaded a new patch - hope to have removed all style errors.
Any chance of committing a patch while master is not stable?

> Infrastructure refactoring
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch, 
> HBASE-14919-V02.patch, HBASE-14919-V03.patch, HBASE-14919-V04.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14919) Infrastructure refactoring

2015-12-17 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-14919:
--
Attachment: HBASE-14919-V04.patch

> Infrastructure refactoring
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch, 
> HBASE-14919-V02.patch, HBASE-14919-V03.patch, HBASE-14919-V04.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14919) Infrastructure refactoring

2015-12-16 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-14919:
--
Attachment: HBASE-14919-V03.patch

I suspect master is not stable (see HBASE-14947), still posting new patch with 
code review modifications.
Also posting in rb.

> Infrastructure refactoring
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch, 
> HBASE-14919-V02.patch, HBASE-14919-V03.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15016) StoreServices facility in Region

2016-01-06 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085376#comment-15085376
 ] 

Eshcar Hillel commented on HBASE-15016:
---

There are 4 decisions to make: 1) when to do in-memory flush 2) when to do 
in-memory compaction 3) when to flush to disk 4) which stores to flush to disk.

One feedback we got when working on HBASE-13408 was that decisions 1 and 2 
should be encapsulated and managed within the memstore. This is reasonable 
since the memstore holds all the information about the sizes and duplications 
etc.
What you are suggesting now is to add a ‘warning’ message sent by region to 
stores that would trigger an in-memory flush and/or a compaction.

Here is a scenario we need to avoid: having a compaction pipeline of size 80MB, 
and then whenever the active segment only reaches a few MBs - a warning message 
is sent, triggers in-memory flush and compaction. Then a big segment (80MB) is 
merged with a small segment (3MB) creating a big segment again (say, around 
80MB when removing duplication). If this happens over and over again, it’s a 
waste of cpu time and also generates a lot of work for the GC. [somewhat 
similar to the small files problem FlushLargeStoresPolicy tries to resolve]

Another issue, say you have several stores in a region, at least one default 
memstore (A) and one compacted memstore (B). Assume they both exceed 16MB, and 
other memstores are less than 16MB. When the region triggers a flush to disk, 
the current policy chooses to flush A and B. It is reasonable to flush A since 
there is no other way to reduce its size, however, is it reasonable to flush B? 
If it stays in memory longer it has a chance to reduce its size without 
flushing to disk.

Just mentioned these issues so you can consider them when preparing your patch. 

> StoreServices facility in Region
> 
>
> Key: HBASE-15016
> URL: https://issues.apache.org/jira/browse/HBASE-15016
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-15016-V01.patch, HBASE-15016-V02.patch, 
> HBASE-15016-V03.patch, Regioncounters.pdf
>
>
> The default implementation of a memstore ensures that between two flushes the 
> memstore size increases monotonically. Supporting new memstores that store 
> data in different formats (specifically, compressed), or that allows to 
> eliminate data redundancies in memory (e.g., via compaction), means that the 
> size of the data stored in memory can decrease even between two flushes. This 
> requires memstores to have access to facilities that manipulate region 
> counters and synchronization.
> This subtasks introduces a new region interface -- StoreServices, through 
> which store components can access these facilities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15016) StoreServices facility in Region

2015-12-31 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-15016:
--
Attachment: Regioncounters.pdf

The attached file (Region counters) show a view of the region counters before 
and after the patch, and can help understand how they affect the flush 
decisions. 

> StoreServices facility in Region
> 
>
> Key: HBASE-15016
> URL: https://issues.apache.org/jira/browse/HBASE-15016
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-15016-V01.patch, HBASE-15016-V02.patch, 
> HBASE-15016-V03.patch, Regioncounters.pdf
>
>
> The default implementation of a memstore ensures that between two flushes the 
> memstore size increases monotonically. Supporting new memstores that store 
> data in different formats (specifically, compressed), or that allows to 
> eliminate data redundancies in memory (e.g., via compaction), means that the 
> size of the data stored in memory can decrease even between two flushes. This 
> requires memstores to have access to facilities that manipulate region 
> counters and synchronization.
> This subtasks introduces a new region interface -- StoreServices, through 
> which store components can access these facilities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15016) StoreServices facility in Region

2016-01-09 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090776#comment-15090776
 ] 

Eshcar Hillel commented on HBASE-15016:
---

Sure it would help if you have a clean version.
As a matter of fact perhaps a clean version of your patch is all that is needed 
for this sub-task, while the flush policy which is tightly coupled with 
compacting memstores and additional changes to column descriptor like adding 
isCompacting() method should be part of sub-task #3 where we actually declare 
the new class.
What do you say?

> StoreServices facility in Region
> 
>
> Key: HBASE-15016
> URL: https://issues.apache.org/jira/browse/HBASE-15016
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-15016-V01.patch, HBASE-15016-V02.patch, 
> HBASE-15016-V03.patch, Regioncounters.pdf, suggestion.patch
>
>
> The default implementation of a memstore ensures that between two flushes the 
> memstore size increases monotonically. Supporting new memstores that store 
> data in different formats (specifically, compressed), or that allows to 
> eliminate data redundancies in memory (e.g., via compaction), means that the 
> size of the data stored in memory can decrease even between two flushes. This 
> requires memstores to have access to facilities that manipulate region 
> counters and synchronization.
> This subtasks introduces a new region interface -- StoreServices, through 
> which store components can access these facilities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15016) StoreServices facility in Region

2015-12-30 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075414#comment-15075414
 ] 

Eshcar Hillel commented on HBASE-15016:
---

I see where you're getting at, by all means [~stack], be my guest, go ahead and 
make a version :)

> StoreServices facility in Region
> 
>
> Key: HBASE-15016
> URL: https://issues.apache.org/jira/browse/HBASE-15016
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-15016-V01.patch, HBASE-15016-V02.patch, 
> HBASE-15016-V03.patch
>
>
> The default implementation of a memstore ensures that between two flushes the 
> memstore size increases monotonically. Supporting new memstores that store 
> data in different formats (specifically, compressed), or that allows to 
> eliminate data redundancies in memory (e.g., via compaction), means that the 
> size of the data stored in memory can decrease even between two flushes. This 
> requires memstores to have access to facilities that manipulate region 
> counters and synchronization.
> This subtasks introduces a new region interface -- StoreServices, through 
> which store components can access these facilities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15016) StoreServices facility in Region

2015-12-21 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067050#comment-15067050
 ] 

Eshcar Hillel commented on HBASE-15016:
---

As stated in the description, in future memstores size can fluctuate; when 
using alternative memory formats or due to compaction we need to take 
occasional lock and update size counters at the region level.
The goal of this issue is to come up with a StoreService entity that would be 
the bridge between any future low-level memstore needs and the services at the 
region level. We aim to make the minimal possible changes to HRegion itself.
A memstore can get a pointer to the store services object of its region at 
construction time. The store services encapsulates a (backward) pointer to the 
region.
The store services object manages any additional size counters beyond the ones 
existing in the region, and allows to acquire and release the region lock in 
exclusive mode.
The region itself can also query the store services object, e.g., to base its 
flush decision also on the data this object is managing.

In addition, we extend HStore with two methods
(1) finalizeFlush() which is invoked after the flush is completed (vs running 
prepare beforehand), each memstore type can choose to do what it sees right at 
this phase.
(2) getMemorySizeForFlushPolicy() - based on this size the memstore is selected 
(or not) to be flushed to disk, each memstore type could potentially compute 
differently the size to report.

We are working on a patch, but if you have any comments early is better than 
later :)


> StoreServices facility in Region
> 
>
> Key: HBASE-15016
> URL: https://issues.apache.org/jira/browse/HBASE-15016
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
>
> The default implementation of a memstore ensures that between two flushes the 
> memstore size increases monotonically. Supporting new memstores that store 
> data in different formats (specifically, compressed), or that allows to 
> eliminate data redundancies in memory (e.g., via compaction), means that the 
> size of the data stored in memory can decrease even between two flushes. This 
> requires memstores to have access to facilities that manipulate region 
> counters and synchronization.
> This subtasks introduces a new region interface -- StoreServices, through 
> which store components can access these facilities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction

2015-12-27 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072127#comment-15072127
 ] 

Eshcar Hillel commented on HBASE-14918:
---

Both patches got +1 overall in QA.
Happy Holidays to those who celebrate - waiting to make progress when you 
return.

> In-Memory MemStore Flush and Compaction
> ---
>
> Key: HBASE-14918
> URL: https://issues.apache.org/jira/browse/HBASE-14918
> Project: HBase
>  Issue Type: Umbrella
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 0.98.18
>
>
> A memstore serves as the in-memory component of a store unit, absorbing all 
> updates to the store. From time to time these updates are flushed to a file 
> on disk, where they are compacted (by eliminating redundancies) and 
> compressed (i.e., written in a compressed format to reduce their storage 
> size).
> We aim to speed up data access, and therefore suggest to apply in-memory 
> memstore flush. That is to flush the active in-memory segment into an 
> intermediate buffer where it can be accessed by the application. Data in the 
> buffer is subject to compaction and can be stored in any format that allows 
> it to take up smaller space in RAM. The less space the buffer consumes the 
> longer it can reside in memory before data is flushed to disk, resulting in 
> better performance.
> Specifically, the optimization is beneficial for workloads with 
> medium-to-high key churn which incur many redundant cells, like persistent 
> messaging. 
> We suggest to structure the solution as 4 subtasks (respectively, patches). 
> (1) Infrastructure - refactoring of the MemStore hierarchy, introducing 
> segment (StoreSegment) as first-class citizen, and decoupling memstore 
> scanner from the memstore implementation;
> (2) Adding StoreServices facility at the region level to allow memstores 
> update region counters and access region level synchronization mechanism;
> (3) Implementation of a new memstore (CompactingMemstore) with non-optimized 
> immutable segment representation, and 
> (4) Memory optimization including compressed format representation and off 
> heap allocations.
> This Jira continues the discussion in HBASE-13408.
> Design documents, evaluation results and previous patches can be found in 
> HBASE-13408. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15016) StoreServices facility in Region

2015-12-27 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-15016:
--
Attachment: HBASE-15016-V02.patch

new patch

> StoreServices facility in Region
> 
>
> Key: HBASE-15016
> URL: https://issues.apache.org/jira/browse/HBASE-15016
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-15016-V01.patch, HBASE-15016-V02.patch
>
>
> The default implementation of a memstore ensures that between two flushes the 
> memstore size increases monotonically. Supporting new memstores that store 
> data in different formats (specifically, compressed), or that allows to 
> eliminate data redundancies in memory (e.g., via compaction), means that the 
> size of the data stored in memory can decrease even between two flushes. This 
> requires memstores to have access to facilities that manipulate region 
> counters and synchronization.
> This subtasks introduces a new region interface -- StoreServices, through 
> which store components can access these facilities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-15016) StoreServices facility in Region

2015-12-19 Thread Eshcar Hillel (JIRA)
Eshcar Hillel created HBASE-15016:
-

 Summary: StoreServices facility in Region
 Key: HBASE-15016
 URL: https://issues.apache.org/jira/browse/HBASE-15016
 Project: HBase
  Issue Type: Sub-task
Reporter: Eshcar Hillel


The default implementation of a memstore ensures that between two flushes the 
memstore size increases monotonically. Supporting new memstores that store data 
in different formats (specifically, compressed), or that allows to eliminate 
data redundancies in memory (e.g., via compaction), means that the size of the 
data stored in memory can decrease even between two flushes. This requires 
memstores to have access to facilities that manipulate region counters and 
synchronization.
This subtasks introduces a new region interface -- StoreServices, through which 
store components can access these facilities.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14918) In-Memory MemStore Flush and Compaction

2015-12-19 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-14918:
--
Description: 
A memstore serves as the in-memory component of a store unit, absorbing all 
updates to the store. From time to time these updates are flushed to a file on 
disk, where they are compacted (by eliminating redundancies) and compressed 
(i.e., written in a compressed format to reduce their storage size).

We aim to speed up data access, and therefore suggest to apply in-memory 
memstore flush. That is to flush the active in-memory segment into an 
intermediate buffer where it can be accessed by the application. Data in the 
buffer is subject to compaction and can be stored in any format that allows it 
to take up smaller space in RAM. The less space the buffer consumes the longer 
it can reside in memory before data is flushed to disk, resulting in better 
performance.
Specifically, the optimization is beneficial for workloads with medium-to-high 
key churn which incur many redundant cells, like persistent messaging. 

We suggest to structure the solution as 4 subtasks (respectively, patches). 
(1) Infrastructure - refactoring of the MemStore hierarchy, introducing segment 
(StoreSegment) as first-class citizen, and decoupling memstore scanner from the 
memstore implementation;
(2) Adding StoreServices facility at the region level to allow memstores update 
region counters and access region level synchronization mechanism;
(3) Implementation of a new memstore (CompactingMemstore) with non-optimized 
immutable segment representation, and 
(4) Memory optimization including compressed format representation and off heap 
allocations.

This Jira continues the discussion in HBASE-13408.
Design documents, evaluation results and previous patches can be found in 
HBASE-13408. 

  was:
A memstore serves as the in-memory component of a store unit, absorbing all 
updates to the store. From time to time these updates are flushed to a file on 
disk, where they are compacted (by eliminating redundancies) and compressed 
(i.e., written in a compressed format to reduce their storage size).

We aim to speed up data access, and therefore suggest to apply in-memory 
memstore flush. That is to flush the active in-memory segment into an 
intermediate buffer where it can be accessed by the application. Data in the 
buffer is subject to compaction and can be stored in any format that allows it 
to take up smaller space in RAM. The less space the buffer consumes the longer 
it can reside in memory before data is flushed to disk, resulting in better 
performance.
Specifically, the optimization is beneficial for workloads with medium-to-high 
key churn which incur many redundant cells, like persistent messaging. 

We suggest to structure the solution as 3 subtasks (respectively, patches). 
(1) Infrastructure - refactoring of the MemStore hierarchy, introducing segment 
(StoreSegment) as first-class citizen, and decoupling memstore scanner from the 
memstore implementation;
(2) Implementation of a new memstore (CompactingMemstore) with non-optimized 
immutable segment representation, and 
(3) Memory optimization including compressed format representation and offheap 
allocations.

This Jira continues the discussion in HBASE-13408.
Design documents, evaluation results and previous patches can be found in 
HBASE-13408. 


> In-Memory MemStore Flush and Compaction
> ---
>
> Key: HBASE-14918
> URL: https://issues.apache.org/jira/browse/HBASE-14918
> Project: HBase
>  Issue Type: Umbrella
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
>
> A memstore serves as the in-memory component of a store unit, absorbing all 
> updates to the store. From time to time these updates are flushed to a file 
> on disk, where they are compacted (by eliminating redundancies) and 
> compressed (i.e., written in a compressed format to reduce their storage 
> size).
> We aim to speed up data access, and therefore suggest to apply in-memory 
> memstore flush. That is to flush the active in-memory segment into an 
> intermediate buffer where it can be accessed by the application. Data in the 
> buffer is subject to compaction and can be stored in any format that allows 
> it to take up smaller space in RAM. The less space the buffer consumes the 
> longer it can reside in memory before data is flushed to disk, resulting in 
> better performance.
> Specifically, the optimization is beneficial for workloads with 
> medium-to-high key churn which incur many redundant cells, like persistent 
> messaging. 
> We suggest to structure the solution as 4 subtasks (respectively, patches). 
> (1) Infrastructure - refactoring of the MemStore hierarchy, introducing 
> segment (StoreSegment) as first-class citizen, and 

[jira] [Commented] (HBASE-14919) Infrastructure refactoring

2015-12-19 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065538#comment-15065538
 ] 

Eshcar Hillel commented on HBASE-14919:
---

Sure [~stack] I understand.
Meanwhile I will start working on the second sub-task we defined (already 
updated HBASE-14918).

> Infrastructure refactoring
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch, 
> HBASE-14919-V02.patch, HBASE-14919-V03.patch, HBASE-14919-V04.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-15016) StoreServices facility in Region

2015-12-19 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel reassigned HBASE-15016:
-

Assignee: Eshcar Hillel

> StoreServices facility in Region
> 
>
> Key: HBASE-15016
> URL: https://issues.apache.org/jira/browse/HBASE-15016
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
>
> The default implementation of a memstore ensures that between two flushes the 
> memstore size increases monotonically. Supporting new memstores that store 
> data in different formats (specifically, compressed), or that allows to 
> eliminate data redundancies in memory (e.g., via compaction), means that the 
> size of the data stored in memory can decrease even between two flushes. This 
> requires memstores to have access to facilities that manipulate region 
> counters and synchronization.
> This subtasks introduces a new region interface -- StoreServices, through 
> which store components can access these facilities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-14921) Memory optimizations

2015-12-19 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel reassigned HBASE-14921:
-

Assignee: Eshcar Hillel

> Memory optimizations
> 
>
> Key: HBASE-14921
> URL: https://issues.apache.org/jira/browse/HBASE-14921
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
>
> Memory optimizations including compressed format representation and offheap 
> allocations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-14921) Memory optimizations

2015-12-19 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel reassigned HBASE-14921:
-

Assignee: (was: Eshcar Hillel)

> Memory optimizations
> 
>
> Key: HBASE-14921
> URL: https://issues.apache.org/jira/browse/HBASE-14921
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>
> Memory optimizations including compressed format representation and offheap 
> allocations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15016) StoreServices facility in Region

2015-12-24 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070733#comment-15070733
 ] 

Eshcar Hillel commented on HBASE-15016:
---

The patch is attached.
Recap: Future memstore optimizations such as memstore compaction, compression, 
and off-heaping, require some interface with services at the region level. For 
this purpose we introduce the StoreServices class. In addition to being the 
interface through which memstore access services, it also maintains additional 
data that is updated by memstores and can be queried by the region.
This patch also refines and extends region-to-store communication. Since this 
is the normal flow of data, there is no need to create a new interface and the 
API of Store is extended (with the 2 methods described in the previous comment).
Finally, this patch refines the region method which decides to invoke a flush. 
The decision is captured in the new invokeFlushIfNeeded() method. The decision 
in based also on data stored  in the StoreServices objects.

> StoreServices facility in Region
> 
>
> Key: HBASE-15016
> URL: https://issues.apache.org/jira/browse/HBASE-15016
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-15016-V01.patch
>
>
> The default implementation of a memstore ensures that between two flushes the 
> memstore size increases monotonically. Supporting new memstores that store 
> data in different formats (specifically, compressed), or that allows to 
> eliminate data redundancies in memory (e.g., via compaction), means that the 
> size of the data stored in memory can decrease even between two flushes. This 
> requires memstores to have access to facilities that manipulate region 
> counters and synchronization.
> This subtasks introduces a new region interface -- StoreServices, through 
> which store components can access these facilities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15016) StoreServices facility in Region

2015-12-24 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-15016:
--
Status: Patch Available  (was: Open)

> StoreServices facility in Region
> 
>
> Key: HBASE-15016
> URL: https://issues.apache.org/jira/browse/HBASE-15016
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-15016-V01.patch
>
>
> The default implementation of a memstore ensures that between two flushes the 
> memstore size increases monotonically. Supporting new memstores that store 
> data in different formats (specifically, compressed), or that allows to 
> eliminate data redundancies in memory (e.g., via compaction), means that the 
> size of the data stored in memory can decrease even between two flushes. This 
> requires memstores to have access to facilities that manipulate region 
> counters and synchronization.
> This subtasks introduces a new region interface -- StoreServices, through 
> which store components can access these facilities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15016) StoreServices facility in Region

2015-12-24 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-15016:
--
Attachment: HBASE-15016-V01.patch

> StoreServices facility in Region
> 
>
> Key: HBASE-15016
> URL: https://issues.apache.org/jira/browse/HBASE-15016
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-15016-V01.patch
>
>
> The default implementation of a memstore ensures that between two flushes the 
> memstore size increases monotonically. Supporting new memstores that store 
> data in different formats (specifically, compressed), or that allows to 
> eliminate data redundancies in memory (e.g., via compaction), means that the 
> size of the data stored in memory can decrease even between two flushes. This 
> requires memstores to have access to facilities that manipulate region 
> counters and synchronization.
> This subtasks introduces a new region interface -- StoreServices, through 
> which store components can access these facilities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14919) Infrastructure refactoring

2015-12-24 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-14919:
--
Attachment: HBASE-14919-V04.patch

re-submitting patch; master seems to be stable again.

> Infrastructure refactoring
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch, 
> HBASE-14919-V02.patch, HBASE-14919-V03.patch, HBASE-14919-V04.patch, 
> HBASE-14919-V04.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction

2015-12-24 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070913#comment-15070913
 ] 

Eshcar Hillel commented on HBASE-14918:
---

Patches are available for task 1 and task 2.

> In-Memory MemStore Flush and Compaction
> ---
>
> Key: HBASE-14918
> URL: https://issues.apache.org/jira/browse/HBASE-14918
> Project: HBase
>  Issue Type: Umbrella
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 0.98.18
>
>
> A memstore serves as the in-memory component of a store unit, absorbing all 
> updates to the store. From time to time these updates are flushed to a file 
> on disk, where they are compacted (by eliminating redundancies) and 
> compressed (i.e., written in a compressed format to reduce their storage 
> size).
> We aim to speed up data access, and therefore suggest to apply in-memory 
> memstore flush. That is to flush the active in-memory segment into an 
> intermediate buffer where it can be accessed by the application. Data in the 
> buffer is subject to compaction and can be stored in any format that allows 
> it to take up smaller space in RAM. The less space the buffer consumes the 
> longer it can reside in memory before data is flushed to disk, resulting in 
> better performance.
> Specifically, the optimization is beneficial for workloads with 
> medium-to-high key churn which incur many redundant cells, like persistent 
> messaging. 
> We suggest to structure the solution as 4 subtasks (respectively, patches). 
> (1) Infrastructure - refactoring of the MemStore hierarchy, introducing 
> segment (StoreSegment) as first-class citizen, and decoupling memstore 
> scanner from the memstore implementation;
> (2) Adding StoreServices facility at the region level to allow memstores 
> update region counters and access region level synchronization mechanism;
> (3) Implementation of a new memstore (CompactingMemstore) with non-optimized 
> immutable segment representation, and 
> (4) Memory optimization including compressed format representation and off 
> heap allocations.
> This Jira continues the discussion in HBASE-13408.
> Design documents, evaluation results and previous patches can be found in 
> HBASE-13408. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14919) Infrastructure refactoring

2015-12-30 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074978#comment-15074978
 ] 

Eshcar Hillel commented on HBASE-14919:
---

Went over your comments:
1. volatile - I see what you mean, but since not all the effects of volatile 
are clear (e.g., it is not cached) I prefered to keep the same behaviour as in 
master where cellset and snapshot are volatile in DefaultMemStore.

2. snapshotId - this is how snapshotId is used, the pattern is the same before 
and after the patch:
{code}
public void clearSnapshot(long id) throws UnexpectedStateException {
...
if (this.snapshotId != id) {
  throw new UnexpectedStateException("Current snapshot id is " + 
this.snapshotId + ",passed " + id);
}
// OK. Passed in snapshot is same as current snapshot. If not-empty,
// create a new snapshot and let the old one go.
...
if (!this.snapshot.isEmpty()) {
  ...
}
...
this.snapshotId = -1;
...
}
{code}

seems nothing else to be done.
Are we good to commit?

> Infrastructure refactoring
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch, 
> HBASE-14919-V02.patch, HBASE-14919-V03.patch, HBASE-14919-V04.patch, 
> HBASE-14919-V04.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15016) StoreServices facility in Region

2015-12-30 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075008#comment-15075008
 ] 

Eshcar Hillel commented on HBASE-15016:
---

bq. Who is providing the services? The Region or the Store? When I read the 
StoreServices Interface, it looks like Services we want of the Region? If so, 
should it be RegionServices (for use by the Store) rather than StoreServices?

changed the name to RegionServerProxy

bq. Why is it a fluctuating memstore size rather than just memstore size? This 
method in StoreServices, addAndGetFluctuatingMemstoreSize, is it for all Stores 
in the region or for the current Store only? Is getMemstoreActiveSize for the 
current Store or all Stores under the Region?

There are two counters: memstoreSize and memstoreFluctuatingSize, and the 
active size is their difference.
Changed the names to 
{code}
long addAndGetGlobalMemstoreSize(long size)
long addAndGetGlobalMemstoreFluctuatingSize(long size)
long getGlobalMemstoreActiveSize()
{code}

bq. This call is actually complicated in implementation... getWalSequenceId... 
and not used elsewhere by this patch (maybe it is used over in the pipeline 
patch). Does it have to be exposed?

One of the services that the new memstore needs from region is getting the wal 
sequence id, as they are associated with the segments in the memstore, so these 
are exposed here.

bq. Drop 'ForFlushPolicy' from name of this method... 
getMemStoreSizeForFlushPolicy ... it is redundant (yeah, size is determined by 
a flush policy but don't have to say so in method name). This is to be use by 
the region figuring when to flush?

changed to getMemStoreActiveSize()

bq. Does the soft limit in the HRegion belong in this patch?

Yes, because it is used in the implementation of the requestFlushIfNeeded().
Changed names to memstoreFlushSizeLB memstoreFlushSizeUB

bq. In the hbase-common, you refer to a class type that is downstream in 
hbase-server. We are trying to have modules not have inter-dependencies if 
possible... at least not ones that go in this direction... from upstream to 
downstream dependency (downstream to upstream is ok):

I'm not sure where exactly you mean, can you point this in RB? Would be happy 
to learn what is the right way to update the method to compute the region class 
size.  

> StoreServices facility in Region
> 
>
> Key: HBASE-15016
> URL: https://issues.apache.org/jira/browse/HBASE-15016
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-15016-V01.patch, HBASE-15016-V02.patch
>
>
> The default implementation of a memstore ensures that between two flushes the 
> memstore size increases monotonically. Supporting new memstores that store 
> data in different formats (specifically, compressed), or that allows to 
> eliminate data redundancies in memory (e.g., via compaction), means that the 
> size of the data stored in memory can decrease even between two flushes. This 
> requires memstores to have access to facilities that manipulate region 
> counters and synchronization.
> This subtasks introduces a new region interface -- StoreServices, through 
> which store components can access these facilities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15016) StoreServices facility in Region

2015-12-30 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-15016:
--
Attachment: HBASE-15016-V03.patch

> StoreServices facility in Region
> 
>
> Key: HBASE-15016
> URL: https://issues.apache.org/jira/browse/HBASE-15016
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-15016-V01.patch, HBASE-15016-V02.patch, 
> HBASE-15016-V03.patch
>
>
> The default implementation of a memstore ensures that between two flushes the 
> memstore size increases monotonically. Supporting new memstores that store 
> data in different formats (specifically, compressed), or that allows to 
> eliminate data redundancies in memory (e.g., via compaction), means that the 
> size of the data stored in memory can decrease even between two flushes. This 
> requires memstores to have access to facilities that manipulate region 
> counters and synchronization.
> This subtasks introduces a new region interface -- StoreServices, through 
> which store components can access these facilities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-11-24 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025386#comment-15025386
 ] 

Eshcar Hillel commented on HBASE-13408:
---

Latency spikes are from the evaluation we did with the 0.98 branch (posted July 
14).
In the more recent evaluation we did we were able to identify the cause of the 
spikes and avoid it (anyway thanks for suggesting to help :) ).
Without the spikes it is easier to see the benefit of the new feature.

> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, 
> HBASE-13408-trunk-v08.patch, HBASE-13408-trunk-v09.patch, 
> HBASE-13408-trunk-v10.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver04.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionMasterEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-11-24 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025324#comment-15025324
 ] 

Eshcar Hillel commented on HBASE-13408:
---

[~stack] we'll get back with answers to your comments/suggestions/review later 
on, just wanted to say that the evaluation results for the patch is attached 
(posted October 26), please take a look.


> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, 
> HBASE-13408-trunk-v08.patch, HBASE-13408-trunk-v09.patch, 
> HBASE-13408-trunk-v10.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver04.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionMasterEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-11-30 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031696#comment-15031696
 ] 

Eshcar Hillel commented on HBASE-13408:
---

OK then we'll go for several steps.
I also assume this is an ok to open a new umbrella Jira for this purpose with a 
link back to this jira. 

> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, 
> HBASE-13408-trunk-v08.patch, HBASE-13408-trunk-v09.patch, 
> HBASE-13408-trunk-v10.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver04.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionMasterEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15016) StoreServices facility in Region

2016-01-08 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089127#comment-15089127
 ] 

Eshcar Hillel commented on HBASE-15016:
---

Ok so I will work on a patch that includes a new flush policy class which also 
considers the data in the table/column descriptors when making decisions, and 
the flush size will be determined also based on this data. Nice :)



> StoreServices facility in Region
> 
>
> Key: HBASE-15016
> URL: https://issues.apache.org/jira/browse/HBASE-15016
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-15016-V01.patch, HBASE-15016-V02.patch, 
> HBASE-15016-V03.patch, Regioncounters.pdf, suggestion.patch
>
>
> The default implementation of a memstore ensures that between two flushes the 
> memstore size increases monotonically. Supporting new memstores that store 
> data in different formats (specifically, compressed), or that allows to 
> eliminate data redundancies in memory (e.g., via compaction), means that the 
> size of the data stored in memory can decrease even between two flushes. This 
> requires memstores to have access to facilities that manipulate region 
> counters and synchronization.
> This subtasks introduces a new region interface -- StoreServices, through 
> which store components can access these facilities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15991) CompactingMemstore#InMemoryFlushRunnable should implement Comparable/Comparator

2016-06-09 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323275#comment-15323275
 ] 

Eshcar Hillel commented on HBASE-15991:
---

[~anoop.hbase], [~ram_krish], can you give some background on this issue:
Does changing the queue solves the comparable problem?
Do you see this issue also when running other operations or just batch mutate?
We ran quit a lot of experiments with the code in HBASE-14920 (but not batch 
operations) and never had this issue

> CompactingMemstore#InMemoryFlushRunnable should implement 
> Comparable/Comparator
> ---
>
> Key: HBASE-15991
> URL: https://issues.apache.org/jira/browse/HBASE-15991
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-15991.patch
>
>
> Configuring CompactingMemstore for a table fails due to the following error
> {code}
> 2016-06-08 23:27:03,761 ERROR [B.defaultRpcServer.handler...
> 2016-06-08 23:27:03,761 ERROR 
> [B.defaultRpcServer.handler=38,queue=8,port=16041] ipc.RpcServer: Unexpected 
> throwable object
> java.lang.ClassCastException: 
> org.apache.hadoop.hbase.regionserver.CompactingMemStore$InMemoryFlushRunnable 
> cannot be cast to java.lang.Comparable
> at 
> java.util.concurrent.PriorityBlockingQueue.siftUpComparable(PriorityBlockingQueue.java:357)
> at 
> java.util.concurrent.PriorityBlockingQueue.offer(PriorityBlockingQueue.java:489)
> at 
> org.apache.hadoop.hbase.util.StealJobQueue$1.offer(StealJobQueue.java:56)
> at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1361)
> at 
> org.apache.hadoop.hbase.regionserver.CompactingMemStore.checkActiveSize(CompactingMemStore.java:258)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.internalAdd(AbstractMemStore.java:403)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:113)
> at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:630)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.applyToMemstore(HRegion.java:3769)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.applyFamilyMapToMemstore(HRegion.java:3740)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:3222)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2954)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2896)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:868)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:830)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2307)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34826)
>  
> {code}
> It is a straight forward fix. But If we implement the Comparable the 
> compareTo() should be based on what attribute?  Should be based on the time?  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15991) CompactingMemstore#InMemoryFlushRunnable should implement Comparable/Comparator

2016-06-13 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15326918#comment-15326918
 ] 

Eshcar Hillel commented on HBASE-15991:
---

Some comments:

1. The exception is due to StealJobQueue extending PriorityBlockingQueue which 
throws the exception when it cannot determine the position of the item in the 
queue. I agree there is no need in a priority queue for the in-memory flush, at 
least for now. If it is required in the future the attribute for comparison 
will become obvious as well.

2. We should consider the pros/cons of LinkedBlockingQueue vs other queue such 
as ConcurrentLinkedQueue.
see in  
http://stackoverflow.com/questions/1301691/java-queue-implementations-which-one
"The most important difference between LinkedBlockingQueue and 
ConcurrentLinkedQueue is that if you request an element from a 
LinkedBlockingQueue and the queue is empty, your thread will wait until there 
is something there. A ConcurrentLinkedQueue will return right away with the 
behavior of an empty queue."

on the other hand in ConcurrentLinkedQueue
"Beware that, unlike in most collections, the size method is NOT a 
constant-time operation. Because of the asynchronous nature of these queues, 
determining the current number of elements requires a traversal of the 
elements."

So we need to consider what is more important having a size method with 
constant time, or making sure retrieving an element never blocks.


> CompactingMemstore#InMemoryFlushRunnable should implement 
> Comparable/Comparator
> ---
>
> Key: HBASE-15991
> URL: https://issues.apache.org/jira/browse/HBASE-15991
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-15991.patch, HBASE-15991_test.patch
>
>
> Configuring CompactingMemstore for a table fails due to the following error
> {code}
> 2016-06-08 23:27:03,761 ERROR [B.defaultRpcServer.handler...
> 2016-06-08 23:27:03,761 ERROR 
> [B.defaultRpcServer.handler=38,queue=8,port=16041] ipc.RpcServer: Unexpected 
> throwable object
> java.lang.ClassCastException: 
> org.apache.hadoop.hbase.regionserver.CompactingMemStore$InMemoryFlushRunnable 
> cannot be cast to java.lang.Comparable
> at 
> java.util.concurrent.PriorityBlockingQueue.siftUpComparable(PriorityBlockingQueue.java:357)
> at 
> java.util.concurrent.PriorityBlockingQueue.offer(PriorityBlockingQueue.java:489)
> at 
> org.apache.hadoop.hbase.util.StealJobQueue$1.offer(StealJobQueue.java:56)
> at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1361)
> at 
> org.apache.hadoop.hbase.regionserver.CompactingMemStore.checkActiveSize(CompactingMemStore.java:258)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.internalAdd(AbstractMemStore.java:403)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:113)
> at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:630)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.applyToMemstore(HRegion.java:3769)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.applyFamilyMapToMemstore(HRegion.java:3740)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:3222)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2954)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2896)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:868)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:830)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2307)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34826)
>  
> {code}
> It is a straight forward fix. But If we implement the Comparable the 
> compareTo() should be based on what attribute?  Should be based on the time?  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-15999) NPE in MemstoreCompactor

2016-06-14 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329566#comment-15329566
 ] 

Eshcar Hillel edited comment on HBASE-15999 at 6/14/16 2:17 PM:


[~anoop.hbase] is right.
The NPE is just the syndrome. 
Only one thread at a time should  run the memstore compactor.
I believe the problem is in CompactingMemStore.
Replacing lines 281-284
{code}
if (allowCompaction.get()) {
  inMemoryFlushInProgress.set(true);
  compactor.startCompaction();
}
{code}
with
{code}
if (allowCompaction.get() && inMemoryFlushInProgress.compareAndSet(false,true) 
) {
  compactor.startCompaction();
}
{code}
should solve the problem.
Can you verify this in the tests you are running? 

Also keep in mind that the implementation of MemStoreCompactor is being 
massively changed by HBASE-14921.


was (Author: eshcar):
[~anoop.hbase] is right.
The NPE is just the syndrome. 
Only one thread at a time should  run the memstore compactor.
I believe the problem is in CompactingMemStore.
Replacing lines 281-284
{code}
if (allowCompaction.get()) {
  inMemoryFlushInProgress.set(true);
  compactor.startCompaction();
}
{code}
with
{code}
if (allowCompaction.get() && inMemoryFlushInProgress.compareAndSet(false,true) 
) {
  compactor.startCompaction();
}
{code}
should solve the problem.
Can you verify this in the tests you are running? 

> NPE in MemstoreCompactor
> 
>
> Key: HBASE-15999
> URL: https://issues.apache.org/jira/browse/HBASE-15999
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-15999.patch, HBASE-15999_1.patch
>
>
> In the INMEMORY_COMPACTION_POOL ThreadPoolExecutor for every thread the 
> current thread name is also appended. Since we are using a pool the name gets 
> appended and we end up in names like this
> {code}
> "B.defaultRpcServer.handler=89,queue=9,port=16041-inmemoryCompactions-1465492533442-inmemoryCompactions-1465492548754-inmemoryCompactions-1465492548913-inmemoryCompactions-1465492549625-inmemoryCompactions-1465492549956-inmemoryCompactions-1465492567040-inmemoryCompactions-1465492567160-inmemoryCompactions-1465492578465-inmemoryCompactions-1465492578707-inmemoryCompactions-1465492579292-inmemoryCompactions-1465492579357-inmemoryCompactions-1465492579786-inmemoryCompactions-1465492580059-inmemoryCompactions-1465492589975-inmemoryCompactions-1465492590192-inmemoryCompactions-1465492590484-inmemoryCompactions-1465492591144-inmemoryCompactions-1465492592603-inmemoryCompactions-1465492592799-inmemoryCompactions-1465492597106-inmemoryCompactions-1465492602925-inmemoryCompactions-1465492606620-inmemoryCompactions-1465492651478-inmemoryCompactions-1465492653460-inmemoryCompactions-1465492677020-inmemoryCompactions-1465492680857-inmemoryCompactions-1465492681989-inmemoryCompactions-1465492721818-inmemoryCompactions-1465492723562-inmemoryCompactions-1465492724801-inmemoryCompactions-1465492726665-inmemoryCompactions-1465492745750-inmemoryCompactions-1465492745964-inmemoryCompactions-1465492746578-inmemoryCompactions-1465492756867-inmemoryCompactions-1465492764727-inmemoryCompactions-1465492766944-inmemoryCompactions-1465492767098-inmemoryCompactions-1465492785298-inmemoryCompactions-1465492788334-inmemoryCompactions-1465492795954-inmemoryCompactions-1465493047265-inmemoryCompactions-1465493091530-inmemoryCompactions-1465493185684"
>  #6006 daemon prio=5 os_prio=0 tid=0x0daa6800 nid=0x454a runnable 
> [0x7f50fd0b9000]
> {code}
> As we were surprised to see why so many threads are getting created as Anoop 
> pointed out the pool size is 10 and there is no setting for the thread to 
> die, the reason for this issue is that we have an multi threaded issue in 
> MemstoreCompactor. The memstoreCompactor has the StoreScanner and 
> MemstoreScanner as the state variable and every time we just instantiate a 
> new one when a new inmemory flush request comes. Finally we try to release 
> the resource where the scanner is nullified and closed. But the instance 
> would have already been updated or nullified by another thread when there are 
> multiple requests. So this causes an NPE in releaseResources.
> {code}
> Exception in thread 
> "B.defaultRpcServer.handler=76,queue=6,port=16041-inmemoryCompactions-1465906554314"
>  java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreCompactor.releaseResources(MemStoreCompactor.java:108)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreCompactor.doCompaction(MemStoreCompactor.java:144)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreCompactor.startCompaction(MemStoreCompactor.java:88)
> at 
> 

[jira] [Commented] (HBASE-15999) NPE in MemstoreCompactor

2016-06-14 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329566#comment-15329566
 ] 

Eshcar Hillel commented on HBASE-15999:
---

[~anoop.hbase] is right.
The NPE is just the syndrome. 
Only one thread at a time should  run the memstore compactor.
I believe the problem is in CompactingMemStore.
Replacing lines 281-284
{code}
if (allowCompaction.get()) {
  inMemoryFlushInProgress.set(true);
  compactor.startCompaction();
}
{code}
with
{code}
if (allowCompaction.get() && inMemoryFlushInProgress.compareAndSet(false,true) 
) {
  compactor.startCompaction();
}
{code}
should solve the problem.
Can you verify this in the tests you are running? 

> NPE in MemstoreCompactor
> 
>
> Key: HBASE-15999
> URL: https://issues.apache.org/jira/browse/HBASE-15999
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-15999.patch, HBASE-15999_1.patch
>
>
> In the INMEMORY_COMPACTION_POOL ThreadPoolExecutor for every thread the 
> current thread name is also appended. Since we are using a pool the name gets 
> appended and we end up in names like this
> {code}
> "B.defaultRpcServer.handler=89,queue=9,port=16041-inmemoryCompactions-1465492533442-inmemoryCompactions-1465492548754-inmemoryCompactions-1465492548913-inmemoryCompactions-1465492549625-inmemoryCompactions-1465492549956-inmemoryCompactions-1465492567040-inmemoryCompactions-1465492567160-inmemoryCompactions-1465492578465-inmemoryCompactions-1465492578707-inmemoryCompactions-1465492579292-inmemoryCompactions-1465492579357-inmemoryCompactions-1465492579786-inmemoryCompactions-1465492580059-inmemoryCompactions-1465492589975-inmemoryCompactions-1465492590192-inmemoryCompactions-1465492590484-inmemoryCompactions-1465492591144-inmemoryCompactions-1465492592603-inmemoryCompactions-1465492592799-inmemoryCompactions-1465492597106-inmemoryCompactions-1465492602925-inmemoryCompactions-1465492606620-inmemoryCompactions-1465492651478-inmemoryCompactions-1465492653460-inmemoryCompactions-1465492677020-inmemoryCompactions-1465492680857-inmemoryCompactions-1465492681989-inmemoryCompactions-1465492721818-inmemoryCompactions-1465492723562-inmemoryCompactions-1465492724801-inmemoryCompactions-1465492726665-inmemoryCompactions-1465492745750-inmemoryCompactions-1465492745964-inmemoryCompactions-1465492746578-inmemoryCompactions-1465492756867-inmemoryCompactions-1465492764727-inmemoryCompactions-1465492766944-inmemoryCompactions-1465492767098-inmemoryCompactions-1465492785298-inmemoryCompactions-1465492788334-inmemoryCompactions-1465492795954-inmemoryCompactions-1465493047265-inmemoryCompactions-1465493091530-inmemoryCompactions-1465493185684"
>  #6006 daemon prio=5 os_prio=0 tid=0x0daa6800 nid=0x454a runnable 
> [0x7f50fd0b9000]
> {code}
> As we were surprised to see why so many threads are getting created as Anoop 
> pointed out the pool size is 10 and there is no setting for the thread to 
> die, the reason for this issue is that we have an multi threaded issue in 
> MemstoreCompactor. The memstoreCompactor has the StoreScanner and 
> MemstoreScanner as the state variable and every time we just instantiate a 
> new one when a new inmemory flush request comes. Finally we try to release 
> the resource where the scanner is nullified and closed. But the instance 
> would have already been updated or nullified by another thread when there are 
> multiple requests. So this causes an NPE in releaseResources.
> {code}
> Exception in thread 
> "B.defaultRpcServer.handler=76,queue=6,port=16041-inmemoryCompactions-1465906554314"
>  java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreCompactor.releaseResources(MemStoreCompactor.java:108)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreCompactor.doCompaction(MemStoreCompactor.java:144)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreCompactor.startCompaction(MemStoreCompactor.java:88)
> at 
> org.apache.hadoop.hbase.regionserver.CompactingMemStore.flushInMemory(CompactingMemStore.java:287)
> at 
> org.apache.hadoop.hbase.regionserver.CompactingMemStore$InMemoryFlushRunnable.run(CompactingMemStore.java:356)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> {code}
> So every time the thread dies and a new is created which I found from the 
> stack traces by adding some logs. So this is why we were creating lot of 
> threads and the name was simply getting appened. Now from this we can see 
> that adding the Handler name is fine but the main 

[jira] [Commented] (HBASE-15999) NPE in MemstoreCompactor

2016-06-15 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331496#comment-15331496
 ] 

Eshcar Hillel commented on HBASE-15999:
---

The attribute _allowCompaction_ is a hook to be used only by tests (see methods 
disableCompaction() and enableCompaction()); it should not be used by the main 
flow to control the concurrency.  
For this purpose we have the attribute _inMemoryFlushInProgress_ -- a new 
compaction should not be called when it is set to true.
Therefore it is important to set it by using CAS (compare and swap) so that 
only one thread succeeds in setting it and this thread only invokes the 
in-memory compaction.
And it is important that concurrency control is the responsibility of a single 
attribute and not of several atomic booleans. Agree?
The one line change I suggested is not working?

> NPE in MemstoreCompactor
> 
>
> Key: HBASE-15999
> URL: https://issues.apache.org/jira/browse/HBASE-15999
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-15999.patch, HBASE-15999_1.patch, 
> HBASE-15999_2.patch
>
>
> In the INMEMORY_COMPACTION_POOL ThreadPoolExecutor for every thread the 
> current thread name is also appended. Since we are using a pool the name gets 
> appended and we end up in names like this
> {code}
> "B.defaultRpcServer.handler=89,queue=9,port=16041-inmemoryCompactions-1465492533442-inmemoryCompactions-1465492548754-inmemoryCompactions-1465492548913-inmemoryCompactions-1465492549625-inmemoryCompactions-1465492549956-inmemoryCompactions-1465492567040-inmemoryCompactions-1465492567160-inmemoryCompactions-1465492578465-inmemoryCompactions-1465492578707-inmemoryCompactions-1465492579292-inmemoryCompactions-1465492579357-inmemoryCompactions-1465492579786-inmemoryCompactions-1465492580059-inmemoryCompactions-1465492589975-inmemoryCompactions-1465492590192-inmemoryCompactions-1465492590484-inmemoryCompactions-1465492591144-inmemoryCompactions-1465492592603-inmemoryCompactions-1465492592799-inmemoryCompactions-1465492597106-inmemoryCompactions-1465492602925-inmemoryCompactions-1465492606620-inmemoryCompactions-1465492651478-inmemoryCompactions-1465492653460-inmemoryCompactions-1465492677020-inmemoryCompactions-1465492680857-inmemoryCompactions-1465492681989-inmemoryCompactions-1465492721818-inmemoryCompactions-1465492723562-inmemoryCompactions-1465492724801-inmemoryCompactions-1465492726665-inmemoryCompactions-1465492745750-inmemoryCompactions-1465492745964-inmemoryCompactions-1465492746578-inmemoryCompactions-1465492756867-inmemoryCompactions-1465492764727-inmemoryCompactions-1465492766944-inmemoryCompactions-1465492767098-inmemoryCompactions-1465492785298-inmemoryCompactions-1465492788334-inmemoryCompactions-1465492795954-inmemoryCompactions-1465493047265-inmemoryCompactions-1465493091530-inmemoryCompactions-1465493185684"
>  #6006 daemon prio=5 os_prio=0 tid=0x0daa6800 nid=0x454a runnable 
> [0x7f50fd0b9000]
> {code}
> As we were surprised to see why so many threads are getting created as Anoop 
> pointed out the pool size is 10 and there is no setting for the thread to 
> die, the reason for this issue is that we have an multi threaded issue in 
> MemstoreCompactor. The memstoreCompactor has the StoreScanner and 
> MemstoreScanner as the state variable and every time we just instantiate a 
> new one when a new inmemory flush request comes. Finally we try to release 
> the resource where the scanner is nullified and closed. But the instance 
> would have already been updated or nullified by another thread when there are 
> multiple requests. So this causes an NPE in releaseResources.
> {code}
> Exception in thread 
> "B.defaultRpcServer.handler=76,queue=6,port=16041-inmemoryCompactions-1465906554314"
>  java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreCompactor.releaseResources(MemStoreCompactor.java:108)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreCompactor.doCompaction(MemStoreCompactor.java:144)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreCompactor.startCompaction(MemStoreCompactor.java:88)
> at 
> org.apache.hadoop.hbase.regionserver.CompactingMemStore.flushInMemory(CompactingMemStore.java:287)
> at 
> org.apache.hadoop.hbase.regionserver.CompactingMemStore$InMemoryFlushRunnable.run(CompactingMemStore.java:356)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> {code}
> So every time the thread dies and a new is created which I found from the 
> stack traces 

[jira] [Commented] (HBASE-15999) NPE in MemstoreCompactor

2016-06-15 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331581#comment-15331581
 ] 

Eshcar Hillel commented on HBASE-15999:
---

Reviewed the code again and found an additional line that needs to be removed 
for the same reason.
remove line 260 in CompactingMemStore
{code}
inMemoryFlushInProgress.set(true);
{code}
The only place where _ inMemoryFlushInProgress_ should be set to true is by the 
CAS instruction and the thread that succeeds in applying the CAS is the only 
thread executing the in-memory compaction.

Apologies for piecemeal-ing this patch :)

> NPE in MemstoreCompactor
> 
>
> Key: HBASE-15999
> URL: https://issues.apache.org/jira/browse/HBASE-15999
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-15999.patch, HBASE-15999_1.patch, 
> HBASE-15999_2.patch, HBASE-15999_3.patch
>
>
> In the INMEMORY_COMPACTION_POOL ThreadPoolExecutor for every thread the 
> current thread name is also appended. Since we are using a pool the name gets 
> appended and we end up in names like this
> {code}
> "B.defaultRpcServer.handler=89,queue=9,port=16041-inmemoryCompactions-1465492533442-inmemoryCompactions-1465492548754-inmemoryCompactions-1465492548913-inmemoryCompactions-1465492549625-inmemoryCompactions-1465492549956-inmemoryCompactions-1465492567040-inmemoryCompactions-1465492567160-inmemoryCompactions-1465492578465-inmemoryCompactions-1465492578707-inmemoryCompactions-1465492579292-inmemoryCompactions-1465492579357-inmemoryCompactions-1465492579786-inmemoryCompactions-1465492580059-inmemoryCompactions-1465492589975-inmemoryCompactions-1465492590192-inmemoryCompactions-1465492590484-inmemoryCompactions-1465492591144-inmemoryCompactions-1465492592603-inmemoryCompactions-1465492592799-inmemoryCompactions-1465492597106-inmemoryCompactions-1465492602925-inmemoryCompactions-1465492606620-inmemoryCompactions-1465492651478-inmemoryCompactions-1465492653460-inmemoryCompactions-1465492677020-inmemoryCompactions-1465492680857-inmemoryCompactions-1465492681989-inmemoryCompactions-1465492721818-inmemoryCompactions-1465492723562-inmemoryCompactions-1465492724801-inmemoryCompactions-1465492726665-inmemoryCompactions-1465492745750-inmemoryCompactions-1465492745964-inmemoryCompactions-1465492746578-inmemoryCompactions-1465492756867-inmemoryCompactions-1465492764727-inmemoryCompactions-1465492766944-inmemoryCompactions-1465492767098-inmemoryCompactions-1465492785298-inmemoryCompactions-1465492788334-inmemoryCompactions-1465492795954-inmemoryCompactions-1465493047265-inmemoryCompactions-1465493091530-inmemoryCompactions-1465493185684"
>  #6006 daemon prio=5 os_prio=0 tid=0x0daa6800 nid=0x454a runnable 
> [0x7f50fd0b9000]
> {code}
> As we were surprised to see why so many threads are getting created as Anoop 
> pointed out the pool size is 10 and there is no setting for the thread to 
> die, the reason for this issue is that we have an multi threaded issue in 
> MemstoreCompactor. The memstoreCompactor has the StoreScanner and 
> MemstoreScanner as the state variable and every time we just instantiate a 
> new one when a new inmemory flush request comes. Finally we try to release 
> the resource where the scanner is nullified and closed. But the instance 
> would have already been updated or nullified by another thread when there are 
> multiple requests. So this causes an NPE in releaseResources.
> {code}
> Exception in thread 
> "B.defaultRpcServer.handler=76,queue=6,port=16041-inmemoryCompactions-1465906554314"
>  java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreCompactor.releaseResources(MemStoreCompactor.java:108)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreCompactor.doCompaction(MemStoreCompactor.java:144)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreCompactor.startCompaction(MemStoreCompactor.java:88)
> at 
> org.apache.hadoop.hbase.regionserver.CompactingMemStore.flushInMemory(CompactingMemStore.java:287)
> at 
> org.apache.hadoop.hbase.regionserver.CompactingMemStore$InMemoryFlushRunnable.run(CompactingMemStore.java:356)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> {code}
> So every time the thread dies and a new is created which I found from the 
> stack traces by adding some logs. So this is why we were creating lot of 
> threads and the name was simply getting appened. Now from this we can see 
> that adding the Handler name is fine but the main issue is the NPE.



--
This message 

[jira] [Commented] (HBASE-15999) NPE in MemstoreCompactor

2016-06-15 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331611#comment-15331611
 ] 

Eshcar Hillel commented on HBASE-15999:
---

patch _4 looks correct to me, unless any other issues come up.
+1

> NPE in MemstoreCompactor
> 
>
> Key: HBASE-15999
> URL: https://issues.apache.org/jira/browse/HBASE-15999
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-15999.patch, HBASE-15999_1.patch, 
> HBASE-15999_2.patch, HBASE-15999_3.patch, HBASE-15999_4.patch
>
>
> In the INMEMORY_COMPACTION_POOL ThreadPoolExecutor for every thread the 
> current thread name is also appended. Since we are using a pool the name gets 
> appended and we end up in names like this
> {code}
> "B.defaultRpcServer.handler=89,queue=9,port=16041-inmemoryCompactions-1465492533442-inmemoryCompactions-1465492548754-inmemoryCompactions-1465492548913-inmemoryCompactions-1465492549625-inmemoryCompactions-1465492549956-inmemoryCompactions-1465492567040-inmemoryCompactions-1465492567160-inmemoryCompactions-1465492578465-inmemoryCompactions-1465492578707-inmemoryCompactions-1465492579292-inmemoryCompactions-1465492579357-inmemoryCompactions-1465492579786-inmemoryCompactions-1465492580059-inmemoryCompactions-1465492589975-inmemoryCompactions-1465492590192-inmemoryCompactions-1465492590484-inmemoryCompactions-1465492591144-inmemoryCompactions-1465492592603-inmemoryCompactions-1465492592799-inmemoryCompactions-1465492597106-inmemoryCompactions-1465492602925-inmemoryCompactions-1465492606620-inmemoryCompactions-1465492651478-inmemoryCompactions-1465492653460-inmemoryCompactions-1465492677020-inmemoryCompactions-1465492680857-inmemoryCompactions-1465492681989-inmemoryCompactions-1465492721818-inmemoryCompactions-1465492723562-inmemoryCompactions-1465492724801-inmemoryCompactions-1465492726665-inmemoryCompactions-1465492745750-inmemoryCompactions-1465492745964-inmemoryCompactions-1465492746578-inmemoryCompactions-1465492756867-inmemoryCompactions-1465492764727-inmemoryCompactions-1465492766944-inmemoryCompactions-1465492767098-inmemoryCompactions-1465492785298-inmemoryCompactions-1465492788334-inmemoryCompactions-1465492795954-inmemoryCompactions-1465493047265-inmemoryCompactions-1465493091530-inmemoryCompactions-1465493185684"
>  #6006 daemon prio=5 os_prio=0 tid=0x0daa6800 nid=0x454a runnable 
> [0x7f50fd0b9000]
> {code}
> As we were surprised to see why so many threads are getting created as Anoop 
> pointed out the pool size is 10 and there is no setting for the thread to 
> die, the reason for this issue is that we have an multi threaded issue in 
> MemstoreCompactor. The memstoreCompactor has the StoreScanner and 
> MemstoreScanner as the state variable and every time we just instantiate a 
> new one when a new inmemory flush request comes. Finally we try to release 
> the resource where the scanner is nullified and closed. But the instance 
> would have already been updated or nullified by another thread when there are 
> multiple requests. So this causes an NPE in releaseResources.
> {code}
> Exception in thread 
> "B.defaultRpcServer.handler=76,queue=6,port=16041-inmemoryCompactions-1465906554314"
>  java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreCompactor.releaseResources(MemStoreCompactor.java:108)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreCompactor.doCompaction(MemStoreCompactor.java:144)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreCompactor.startCompaction(MemStoreCompactor.java:88)
> at 
> org.apache.hadoop.hbase.regionserver.CompactingMemStore.flushInMemory(CompactingMemStore.java:287)
> at 
> org.apache.hadoop.hbase.regionserver.CompactingMemStore$InMemoryFlushRunnable.run(CompactingMemStore.java:356)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> {code}
> So every time the thread dies and a new is created which I found from the 
> stack traces by adding some logs. So this is why we were creating lot of 
> threads and the name was simply getting appened. Now from this we can see 
> that adding the Handler name is fine but the main issue is the NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-15999) NPE in MemstoreCompactor

2016-06-15 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331581#comment-15331581
 ] 

Eshcar Hillel edited comment on HBASE-15999 at 6/15/16 12:18 PM:
-

Reviewed the code again and found an additional line that needs to be removed 
for the same reason.
remove line 260 in CompactingMemStore
{code}
inMemoryFlushInProgress.set(true);
{code}
The only place where _inMemoryFlushInProgress_ should be set to true is by the 
CAS instruction and the thread that succeeds in applying the CAS is the only 
thread executing the in-memory compaction.

Apologies for piecemeal-ing this patch :)


was (Author: eshcar):
Reviewed the code again and found an additional line that needs to be removed 
for the same reason.
remove line 260 in CompactingMemStore
{code}
inMemoryFlushInProgress.set(true);
{code}
The only place where _ inMemoryFlushInProgress_ should be set to true is by the 
CAS instruction and the thread that succeeds in applying the CAS is the only 
thread executing the in-memory compaction.

Apologies for piecemeal-ing this patch :)

> NPE in MemstoreCompactor
> 
>
> Key: HBASE-15999
> URL: https://issues.apache.org/jira/browse/HBASE-15999
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-15999.patch, HBASE-15999_1.patch, 
> HBASE-15999_2.patch, HBASE-15999_3.patch, HBASE-15999_4.patch
>
>
> In the INMEMORY_COMPACTION_POOL ThreadPoolExecutor for every thread the 
> current thread name is also appended. Since we are using a pool the name gets 
> appended and we end up in names like this
> {code}
> "B.defaultRpcServer.handler=89,queue=9,port=16041-inmemoryCompactions-1465492533442-inmemoryCompactions-1465492548754-inmemoryCompactions-1465492548913-inmemoryCompactions-1465492549625-inmemoryCompactions-1465492549956-inmemoryCompactions-1465492567040-inmemoryCompactions-1465492567160-inmemoryCompactions-1465492578465-inmemoryCompactions-1465492578707-inmemoryCompactions-1465492579292-inmemoryCompactions-1465492579357-inmemoryCompactions-1465492579786-inmemoryCompactions-1465492580059-inmemoryCompactions-1465492589975-inmemoryCompactions-1465492590192-inmemoryCompactions-1465492590484-inmemoryCompactions-1465492591144-inmemoryCompactions-1465492592603-inmemoryCompactions-1465492592799-inmemoryCompactions-1465492597106-inmemoryCompactions-1465492602925-inmemoryCompactions-1465492606620-inmemoryCompactions-1465492651478-inmemoryCompactions-1465492653460-inmemoryCompactions-1465492677020-inmemoryCompactions-1465492680857-inmemoryCompactions-1465492681989-inmemoryCompactions-1465492721818-inmemoryCompactions-1465492723562-inmemoryCompactions-1465492724801-inmemoryCompactions-1465492726665-inmemoryCompactions-1465492745750-inmemoryCompactions-1465492745964-inmemoryCompactions-1465492746578-inmemoryCompactions-1465492756867-inmemoryCompactions-1465492764727-inmemoryCompactions-1465492766944-inmemoryCompactions-1465492767098-inmemoryCompactions-1465492785298-inmemoryCompactions-1465492788334-inmemoryCompactions-1465492795954-inmemoryCompactions-1465493047265-inmemoryCompactions-1465493091530-inmemoryCompactions-1465493185684"
>  #6006 daemon prio=5 os_prio=0 tid=0x0daa6800 nid=0x454a runnable 
> [0x7f50fd0b9000]
> {code}
> As we were surprised to see why so many threads are getting created as Anoop 
> pointed out the pool size is 10 and there is no setting for the thread to 
> die, the reason for this issue is that we have an multi threaded issue in 
> MemstoreCompactor. The memstoreCompactor has the StoreScanner and 
> MemstoreScanner as the state variable and every time we just instantiate a 
> new one when a new inmemory flush request comes. Finally we try to release 
> the resource where the scanner is nullified and closed. But the instance 
> would have already been updated or nullified by another thread when there are 
> multiple requests. So this causes an NPE in releaseResources.
> {code}
> Exception in thread 
> "B.defaultRpcServer.handler=76,queue=6,port=16041-inmemoryCompactions-1465906554314"
>  java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreCompactor.releaseResources(MemStoreCompactor.java:108)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreCompactor.doCompaction(MemStoreCompactor.java:144)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreCompactor.startCompaction(MemStoreCompactor.java:88)
> at 
> org.apache.hadoop.hbase.regionserver.CompactingMemStore.flushInMemory(CompactingMemStore.java:287)
> at 
> 

[jira] [Commented] (HBASE-15991) CompactingMemstore#InMemoryFlushRunnable should implement Comparable/Comparator

2016-06-13 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327184#comment-15327184
 ] 

Eshcar Hillel commented on HBASE-15991:
---

In that case, among all blocking options ArrayBlockingQueue (bounded size), 
SynchronousQueue (no capacity - each insert operation must wait for a 
corresponding remove operation), PriorityBlockingQueue (requires comparator), 
and LinkedTransferQueue (supports synchronous transfer in addition to put),
LinkedBlockingQueue seems to be the best choice. It is claimed to have higher 
throughput than array-based queues but less predictable performance. In the 
context of a thread pool using a blocking pool is probably ok.

I guess then 
+1

> CompactingMemstore#InMemoryFlushRunnable should implement 
> Comparable/Comparator
> ---
>
> Key: HBASE-15991
> URL: https://issues.apache.org/jira/browse/HBASE-15991
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-15991.patch, HBASE-15991_test.patch
>
>
> Configuring CompactingMemstore for a table fails due to the following error
> {code}
> 2016-06-08 23:27:03,761 ERROR [B.defaultRpcServer.handler...
> 2016-06-08 23:27:03,761 ERROR 
> [B.defaultRpcServer.handler=38,queue=8,port=16041] ipc.RpcServer: Unexpected 
> throwable object
> java.lang.ClassCastException: 
> org.apache.hadoop.hbase.regionserver.CompactingMemStore$InMemoryFlushRunnable 
> cannot be cast to java.lang.Comparable
> at 
> java.util.concurrent.PriorityBlockingQueue.siftUpComparable(PriorityBlockingQueue.java:357)
> at 
> java.util.concurrent.PriorityBlockingQueue.offer(PriorityBlockingQueue.java:489)
> at 
> org.apache.hadoop.hbase.util.StealJobQueue$1.offer(StealJobQueue.java:56)
> at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1361)
> at 
> org.apache.hadoop.hbase.regionserver.CompactingMemStore.checkActiveSize(CompactingMemStore.java:258)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.internalAdd(AbstractMemStore.java:403)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:113)
> at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:630)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.applyToMemstore(HRegion.java:3769)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.applyFamilyMapToMemstore(HRegion.java:3740)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:3222)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2954)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2896)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:868)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:830)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2307)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34826)
>  
> {code}
> It is a straight forward fix. But If we implement the Comparable the 
> compareTo() should be based on what attribute?  Should be based on the time?  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14919) Infrastructure refactoring

2016-01-10 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090994#comment-15090994
 ] 

Eshcar Hillel commented on HBASE-14919:
---

It seems that this patch is ready to be committed.
[~stack]  do you prefer to wait with it till after patch #2 is committed?

> Infrastructure refactoring
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch, 
> HBASE-14919-V02.patch, HBASE-14919-V03.patch, HBASE-14919-V04.patch, 
> HBASE-14919-V04.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14919) Infrastructure refactoring

2016-01-11 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092051#comment-15092051
 ] 

Eshcar Hillel commented on HBASE-14919:
---

I am not sure how updateLowestUnflushedSequenceIdInWal is related to 
HBASE-15082, I might be missing some context. Can you elaborate [~anoop.hbase]?
The motivation behind updateLowestUnflushedSequenceIdInWal is to have the WAL 
cover all entries still in memory (this is required for correctness) while 
truncating the WAL so it does not get too long (to bound MTTR). It is required 
for compacting memstore which cherry picks entries to be removed from memory 
without flushing them to disk. Since this is done in the background and not in 
a batched mode, the new method is invoked whenever needs to update the wal.

> Infrastructure refactoring
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch, 
> HBASE-14919-V02.patch, HBASE-14919-V03.patch, HBASE-14919-V04.patch, 
> HBASE-14919-V04.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14919) Infrastructure refactoring

2016-01-17 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15104887#comment-15104887
 ] 

Eshcar Hillel commented on HBASE-14919:
---

Please advise: I think I know how to handle the check-style, white-space and 
find-bugs issues in hbase-server. But I'm not sure how to handle the errors 
with Hadoop v2.x.x and the 82 find-bugs warning.
Can someone help here? 

> Infrastructure refactoring
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch, 
> HBASE-14919-V02.patch, HBASE-14919-V03.patch, HBASE-14919-V04.patch, 
> HBASE-14919-V04.patch, HBASE-14919-V05.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14919) Infrastructure refactoring

2016-01-17 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-14919:
--
Attachment: HBASE-14919-V05.patch

New patch after rebase and changes following Anoop's comments.

> Infrastructure refactoring
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch, 
> HBASE-14919-V02.patch, HBASE-14919-V03.patch, HBASE-14919-V04.patch, 
> HBASE-14919-V04.patch, HBASE-14919-V05.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14919) Infrastructure refactoring for In-Memory Flush

2016-02-07 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-14919:
--
Attachment: HBASE-14919-V11.patch

> Infrastructure refactoring for In-Memory Flush
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch, 
> HBASE-14919-V02.patch, HBASE-14919-V03.patch, HBASE-14919-V04.patch, 
> HBASE-14919-V04.patch, HBASE-14919-V05.patch, HBASE-14919-V06.patch, 
> HBASE-14919-V06.patch, HBASE-14919-V07.patch, HBASE-14919-V08.patch, 
> HBASE-14919-V09.patch, HBASE-14919-V10.patch, HBASE-14919-V11.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14919) Infrastructure refactoring for In-Memory Flush

2016-02-08 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-14919:
--
Attachment: HBASE-14919-V12.patch

> Infrastructure refactoring for In-Memory Flush
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch, 
> HBASE-14919-V02.patch, HBASE-14919-V03.patch, HBASE-14919-V04.patch, 
> HBASE-14919-V04.patch, HBASE-14919-V05.patch, HBASE-14919-V06.patch, 
> HBASE-14919-V06.patch, HBASE-14919-V07.patch, HBASE-14919-V08.patch, 
> HBASE-14919-V09.patch, HBASE-14919-V10.patch, HBASE-14919-V11.patch, 
> HBASE-14919-V12.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14919) Infrastructure refactoring for In-Memory Flush

2016-02-04 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132267#comment-15132267
 ] 

Eshcar Hillel commented on HBASE-14919:
---

bq. Should we set -1 to when create instance of this class? Define -1 as 
NO_SNAPSHOT_ID?

Definitely, yes.

bq. This javadoc seems wrong.

I will fix this.

bq. So, an ImmutableSegment takes a MutableSegment on construction? You can 
skip the pupa ImmutableSegmentAdapter? (It is the only implementation of 
ImmutableSegment).

Task #4 in the umbrella issue implements another ImmutableSegment 
(CellBlocksSrgment), which does not take MutableSegment on construction. 
Would you like better the name ImmutableSegmentWrapper?

bq. Reading the MutableSegment looking at the methods it has, why can't they 
all be in ImmutableSegment?

We could simplify the design by removing the distinction between mutable and 
immutable segments but this would be at the cost of implementing unnecessary 
API, possibly not efficiently. 
Consider tailSet(firstCell), it is similar for first() and getComparator(). 
tailSet is used in the methods AbstractMemStore::upsert() and 
AbstractMemStore::updateColumnValue() where it is applied only to the active 
(mutable) segment, and in methods of MutableCellSetSegment and 
MutableCellSetSegmentScanner. It is not needed in any use case for an immutable 
segment. However, its implementation in CellBlocksSegment would be inefficient 
since it will incur costly traversal over all cell blocks plus allocating many 
new objects to be stored in the result sorted set. 
It is possible to do it this way. We believe exposing an unnecessary API which 
incurs costly implementation is not advisable.

bq. And why this subset of Set methods in MutableSegment?

We believe it is good practice to only expose the minimal necessary 
functionality. 

bq. In ImmutableSegment, there is this method getScannerForMemStoreSnapshot(). 
As I read it, I am getting a Scanner on a MemStoreSnapshot...

No, this is incorrect. This method returns a KeyValueScanner that is used *by* 
MemStoreSnapshot, but _it scans the cells in the segment who generated it_.
Would you like better the name getKeyValueScanner()? And the other method would 
be getSegmentScanner().

bq. What is special about MutableCellSetSegmentScanner? When would the Scanner 
on a MutableSegment differ from Scanner on an ImmutableSegment? In other words, 
could we just have a SegmentScanner implementation and it work for both Segment 
types?

Each scanner depends on the segment it is scanning.
For example, consider the method MutableCellSetSegmentScanner::backwardSeek() - 
it seeks forward then go backward
{code}
  public boolean backwardSeek(Cell key) throws IOException {
seek(key);
if (peek() == null || segment.compareRows(peek(), key) > 0) {
  return seekToPreviousRow(key);
}
return true;
  }
{code}
There is no reason for CellBlocksSegment to implement it this way since the 
cells are contiguous and therefore allow for much simpler traversal.  

> Infrastructure refactoring for In-Memory Flush
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch, 
> HBASE-14919-V02.patch, HBASE-14919-V03.patch, HBASE-14919-V04.patch, 
> HBASE-14919-V04.patch, HBASE-14919-V05.patch, HBASE-14919-V06.patch, 
> HBASE-14919-V06.patch, HBASE-14919-V07.patch, HBASE-14919-V08.patch, 
> HBASE-14919-V09.patch, HBASE-14919-V10.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction

2016-01-28 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121094#comment-15121094
 ] 

Eshcar Hillel commented on HBASE-14918:
---

Thanks [~anoop.hbase].
I don't see how you can move MSLAB to the HStore level.
In the first patch MSLAB is used in the segment to allocate the byte range (in 
maybeCloneWithAllocator()), and it also does bookkeeping of scanners which 
access the MSLAB (with inc/decScannersCount()) so it can manage the 
deallocation of buffers when no scanners can access them.
This is also the case in master but there the methods are in the scope of 
DefaultMemStore and the MemStoreScanner.
How would you suggest to move it to HStore? Why do you think it is better there 
and not inside the segment?

> In-Memory MemStore Flush and Compaction
> ---
>
> Key: HBASE-14918
> URL: https://issues.apache.org/jira/browse/HBASE-14918
> Project: HBase
>  Issue Type: Umbrella
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 0.98.18
>
> Attachments: CellBlocksSegmentDesign.pdf
>
>
> A memstore serves as the in-memory component of a store unit, absorbing all 
> updates to the store. From time to time these updates are flushed to a file 
> on disk, where they are compacted (by eliminating redundancies) and 
> compressed (i.e., written in a compressed format to reduce their storage 
> size).
> We aim to speed up data access, and therefore suggest to apply in-memory 
> memstore flush. That is to flush the active in-memory segment into an 
> intermediate buffer where it can be accessed by the application. Data in the 
> buffer is subject to compaction and can be stored in any format that allows 
> it to take up smaller space in RAM. The less space the buffer consumes the 
> longer it can reside in memory before data is flushed to disk, resulting in 
> better performance.
> Specifically, the optimization is beneficial for workloads with 
> medium-to-high key churn which incur many redundant cells, like persistent 
> messaging. 
> We suggest to structure the solution as 4 subtasks (respectively, patches). 
> (1) Infrastructure - refactoring of the MemStore hierarchy, introducing 
> segment (StoreSegment) as first-class citizen, and decoupling memstore 
> scanner from the memstore implementation;
> (2) Adding StoreServices facility at the region level to allow memstores 
> update region counters and access region level synchronization mechanism;
> (3) Implementation of a new memstore (CompactingMemstore) with non-optimized 
> immutable segment representation, and 
> (4) Memory optimization including compressed format representation and off 
> heap allocations.
> This Jira continues the discussion in HBASE-13408.
> Design documents, evaluation results and previous patches can be found in 
> HBASE-13408. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction

2016-01-28 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121385#comment-15121385
 ] 

Eshcar Hillel commented on HBASE-14918:
---

I reviewed the mslab-move patch. Software-engineering-wise I am not at all 
convinced that the right place for mslab is in HStore level.
The compacting memstore is an example in which cells are allocated at the 
memstore level and not the store level.

But more important is what you say about off-heap memory. I have no experience 
with off-heaping.
Can you please elaborate why the suggested design cannot be off-heap, and what 
is needed to allow it be off-heap?
In addition, you refer to the write-path, but actually the write-path goes 
through mutable-segment that stores the data in a CSLM format.
Only reads and scans access the cell block.

It is good we have this discussion at this point since it relates to the design 
of task #4, and can also affect task #3.
However, [~stack], is there anything that prevents committing the patch of task 
#1. Is it not committed due to the MSLAB issue?
IMO, the mslab is orthogonal to task #1. If it is decided that it needs to 
move, then it is possible to do so even after the patch.

> In-Memory MemStore Flush and Compaction
> ---
>
> Key: HBASE-14918
> URL: https://issues.apache.org/jira/browse/HBASE-14918
> Project: HBase
>  Issue Type: Umbrella
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 0.98.18
>
> Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch
>
>
> A memstore serves as the in-memory component of a store unit, absorbing all 
> updates to the store. From time to time these updates are flushed to a file 
> on disk, where they are compacted (by eliminating redundancies) and 
> compressed (i.e., written in a compressed format to reduce their storage 
> size).
> We aim to speed up data access, and therefore suggest to apply in-memory 
> memstore flush. That is to flush the active in-memory segment into an 
> intermediate buffer where it can be accessed by the application. Data in the 
> buffer is subject to compaction and can be stored in any format that allows 
> it to take up smaller space in RAM. The less space the buffer consumes the 
> longer it can reside in memory before data is flushed to disk, resulting in 
> better performance.
> Specifically, the optimization is beneficial for workloads with 
> medium-to-high key churn which incur many redundant cells, like persistent 
> messaging. 
> We suggest to structure the solution as 4 subtasks (respectively, patches). 
> (1) Infrastructure - refactoring of the MemStore hierarchy, introducing 
> segment (StoreSegment) as first-class citizen, and decoupling memstore 
> scanner from the memstore implementation;
> (2) Adding StoreServices facility at the region level to allow memstores 
> update region counters and access region level synchronization mechanism;
> (3) Implementation of a new memstore (CompactingMemstore) with non-optimized 
> immutable segment representation, and 
> (4) Memory optimization including compressed format representation and off 
> heap allocations.
> This Jira continues the discussion in HBASE-13408.
> Design documents, evaluation results and previous patches can be found in 
> HBASE-13408. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction

2016-01-28 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121203#comment-15121203
 ] 

Eshcar Hillel commented on HBASE-14918:
---

ok.
But even in the  HBASE-15180 patch DefaultMemStore still have the attributes
{code}
  volatile MemStoreLAB allocator;
  volatile MemStoreLAB snapshotAllocator;
{code}
and MemStoreScanner still have the attributes
{code}
volatile MemStoreLAB allocatorAtCreation;
volatile MemStoreLAB snapshotAllocatorAtCreation;
{code}
So either I'm missing something or we talk on two different things.


> In-Memory MemStore Flush and Compaction
> ---
>
> Key: HBASE-14918
> URL: https://issues.apache.org/jira/browse/HBASE-14918
> Project: HBase
>  Issue Type: Umbrella
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 0.98.18
>
> Attachments: CellBlocksSegmentDesign.pdf
>
>
> A memstore serves as the in-memory component of a store unit, absorbing all 
> updates to the store. From time to time these updates are flushed to a file 
> on disk, where they are compacted (by eliminating redundancies) and 
> compressed (i.e., written in a compressed format to reduce their storage 
> size).
> We aim to speed up data access, and therefore suggest to apply in-memory 
> memstore flush. That is to flush the active in-memory segment into an 
> intermediate buffer where it can be accessed by the application. Data in the 
> buffer is subject to compaction and can be stored in any format that allows 
> it to take up smaller space in RAM. The less space the buffer consumes the 
> longer it can reside in memory before data is flushed to disk, resulting in 
> better performance.
> Specifically, the optimization is beneficial for workloads with 
> medium-to-high key churn which incur many redundant cells, like persistent 
> messaging. 
> We suggest to structure the solution as 4 subtasks (respectively, patches). 
> (1) Infrastructure - refactoring of the MemStore hierarchy, introducing 
> segment (StoreSegment) as first-class citizen, and decoupling memstore 
> scanner from the memstore implementation;
> (2) Adding StoreServices facility at the region level to allow memstores 
> update region counters and access region level synchronization mechanism;
> (3) Implementation of a new memstore (CompactingMemstore) with non-optimized 
> immutable segment representation, and 
> (4) Memory optimization including compressed format representation and off 
> heap allocations.
> This Jira continues the discussion in HBASE-13408.
> Design documents, evaluation results and previous patches can be found in 
> HBASE-13408. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14919) Infrastructure refactoring for In-Memory Flush

2016-02-01 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-14919:
--
Attachment: HBASE-14919-V09.patch

> Infrastructure refactoring for In-Memory Flush
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch, 
> HBASE-14919-V02.patch, HBASE-14919-V03.patch, HBASE-14919-V04.patch, 
> HBASE-14919-V04.patch, HBASE-14919-V05.patch, HBASE-14919-V06.patch, 
> HBASE-14919-V06.patch, HBASE-14919-V07.patch, HBASE-14919-V08.patch, 
> HBASE-14919-V09.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14919) Infrastructure refactoring for In-Memory Flush

2016-02-03 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-14919:
--
Attachment: HBASE-14919-V10.patch

> Infrastructure refactoring for In-Memory Flush
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch, 
> HBASE-14919-V02.patch, HBASE-14919-V03.patch, HBASE-14919-V04.patch, 
> HBASE-14919-V04.patch, HBASE-14919-V05.patch, HBASE-14919-V06.patch, 
> HBASE-14919-V06.patch, HBASE-14919-V07.patch, HBASE-14919-V08.patch, 
> HBASE-14919-V09.patch, HBASE-14919-V10.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14919) Infrastructure refactoring for In-Memory Flush

2016-02-01 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-14919:
--
Attachment: HBASE-14919-V08.patch

> Infrastructure refactoring for In-Memory Flush
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch, 
> HBASE-14919-V02.patch, HBASE-14919-V03.patch, HBASE-14919-V04.patch, 
> HBASE-14919-V04.patch, HBASE-14919-V05.patch, HBASE-14919-V06.patch, 
> HBASE-14919-V06.patch, HBASE-14919-V07.patch, HBASE-14919-V08.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14919) Infrastructure refactoring for In-Memory Flush

2016-01-28 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121515#comment-15121515
 ] 

Eshcar Hillel commented on HBASE-14919:
---

No problem - I can explain again:
When an external flush is invoked (call to snapshot()), the data is flushed to 
disk, but some data may still be left in memory.
The sequenceId is passed to the memstore so it can do a bookkeeping of sequence 
ids and compute an approximation of the lowest sequence id (oldest entry) still 
in memory.
updateLowestUnflushedSequenceIdInWal() is called when completing a flush to 
disk and when completing an in-memory compaction. These are the two events when 
the content of the memory changes. This method updates the wal with the lowest 
sequence id that is still in memory, as explained above.

> Infrastructure refactoring for In-Memory Flush
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch, 
> HBASE-14919-V02.patch, HBASE-14919-V03.patch, HBASE-14919-V04.patch, 
> HBASE-14919-V04.patch, HBASE-14919-V05.patch, HBASE-14919-V06.patch, 
> HBASE-14919-V06.patch, HBASE-14919-V07.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15016) StoreServices facility in Region

2016-02-24 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-15016:
--
Attachment: HBASE-15016-V06.patch

fixed style issues+ anoop's comment

> StoreServices facility in Region
> 
>
> Key: HBASE-15016
> URL: https://issues.apache.org/jira/browse/HBASE-15016
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-15016-V01.patch, HBASE-15016-V02.patch, 
> HBASE-15016-V03 (1).patch, HBASE-15016-V03.patch, HBASE-15016-V04.patch, 
> HBASE-15016-V05.patch, HBASE-15016-V05.patch, HBASE-15016-V06.patch, 
> Regioncounters.pdf, suggestion.patch
>
>
> The default implementation of a memstore ensures that between two flushes the 
> memstore size increases monotonically. Supporting new memstores that store 
> data in different formats (specifically, compressed), or that allows to 
> eliminate data redundancies in memory (e.g., via compaction), means that the 
> size of the data stored in memory can decrease even between two flushes. This 
> requires memstores to have access to facilities that manipulate region 
> counters and synchronization.
> This subtasks introduces a new region interface -- StoreServices, through 
> which store components can access these facilities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15016) StoreServices facility in Region

2016-02-22 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156994#comment-15156994
 ] 

Eshcar Hillel commented on HBASE-15016:
---

Looking at the code I'm trying to understand if using the version numbers in 
the cells is good enough for updating the wal.
>From what I see now it seems that it is; the solution we suggested before was 
>complicated and required getting the wal sequence number from the region, but 
>actually the wal sequence id are stamped in the cell in the HRegion method 
>processRowsWithLocks(), when performing atomic mutation in the region.
I hope I am not missing anything.
Is this the right way to describe it?

So, for your question [~stack], I will remove the getWalSequenceId() method 
from RegionServicesForStores.

HRegion is already implementing 3(!) interfaces. I prefer not to add the 4th 
interface and instead use composition for RegionServicesForStores.
But if you insist I can make it an interface.




> StoreServices facility in Region
> 
>
> Key: HBASE-15016
> URL: https://issues.apache.org/jira/browse/HBASE-15016
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-15016-V01.patch, HBASE-15016-V02.patch, 
> HBASE-15016-V03.patch, HBASE-15016-V04.patch, Regioncounters.pdf, 
> suggestion.patch
>
>
> The default implementation of a memstore ensures that between two flushes the 
> memstore size increases monotonically. Supporting new memstores that store 
> data in different formats (specifically, compressed), or that allows to 
> eliminate data redundancies in memory (e.g., via compaction), means that the 
> size of the data stored in memory can decrease even between two flushes. This 
> requires memstores to have access to facilities that manipulate region 
> counters and synchronization.
> This subtasks introduces a new region interface -- StoreServices, through 
> which store components can access these facilities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15016) StoreServices facility in Region

2016-02-24 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166831#comment-15166831
 ] 

Eshcar Hillel commented on HBASE-15016:
---

Excellent!!! Moving on to work on Task #3

> StoreServices facility in Region
> 
>
> Key: HBASE-15016
> URL: https://issues.apache.org/jira/browse/HBASE-15016
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-15016-V01.patch, HBASE-15016-V02.patch, 
> HBASE-15016-V03 (1).patch, HBASE-15016-V03.patch, HBASE-15016-V04.patch, 
> HBASE-15016-V05.patch, HBASE-15016-V05.patch, HBASE-15016-V06.patch, 
> Regioncounters.pdf, suggestion.patch
>
>
> The default implementation of a memstore ensures that between two flushes the 
> memstore size increases monotonically. Supporting new memstores that store 
> data in different formats (specifically, compressed), or that allows to 
> eliminate data redundancies in memory (e.g., via compaction), means that the 
> size of the data stored in memory can decrease even between two flushes. This 
> requires memstores to have access to facilities that manipulate region 
> counters and synchronization.
> This subtasks introduces a new region interface -- StoreServices, through 
> which store components can access these facilities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15359) Simplifying Segment hierarchy

2016-02-29 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-15359:
--
Attachment: HBASE-14918-FIX-SEGMENT.patch

> Simplifying Segment hierarchy
> -
>
> Key: HBASE-15359
> URL: https://issues.apache.org/jira/browse/HBASE-15359
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 0.98.18
>
> Attachments: HBASE-14918-FIX-SEGMENT.patch
>
>
> Now that it is clear that no memstore segment will be implemented as an 
> HFIle, and that all segments store their data in some representation of 
> CellSet (skip-list or flat), the segment hierarchy can be much simplified.
> The attached patch includes only 3 classes in the hierarchy:
> Segment - comprises most of the state and implementation
> MutableSegment - extends API with add and rollback functionality
> ImmutableSegment - extends API with key-value scanner for snapshot
> SegmentScanner is the scanner for all types of segments. 
> In addition, the option to rollback immutable segment in the memstore is 
> disabled.
> This code would allow us to make progress independently in the compaction 
> subtask (HBASE-14920) and the flat index representation subtask 
> (HBASE-14921). It also means that the new immutable segment can reuse the 
> existing SegmentScanner, instead of implementing a new scanner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15359) Simplifying Segment hierarchy

2016-02-29 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-15359:
--
Status: Patch Available  (was: Open)

> Simplifying Segment hierarchy
> -
>
> Key: HBASE-15359
> URL: https://issues.apache.org/jira/browse/HBASE-15359
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 0.98.18
>
> Attachments: HBASE-14918-FIX-SEGMENT.patch
>
>
> Now that it is clear that no memstore segment will be implemented as an 
> HFIle, and that all segments store their data in some representation of 
> CellSet (skip-list or flat), the segment hierarchy can be much simplified.
> The attached patch includes only 3 classes in the hierarchy:
> Segment - comprises most of the state and implementation
> MutableSegment - extends API with add and rollback functionality
> ImmutableSegment - extends API with key-value scanner for snapshot
> SegmentScanner is the scanner for all types of segments. 
> In addition, the option to rollback immutable segment in the memstore is 
> disabled.
> This code would allow us to make progress independently in the compaction 
> subtask (HBASE-14920) and the flat index representation subtask 
> (HBASE-14921). It also means that the new immutable segment can reuse the 
> existing SegmentScanner, instead of implementing a new scanner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-15359) Simplifying Segment hierarchy

2016-02-29 Thread Eshcar Hillel (JIRA)
Eshcar Hillel created HBASE-15359:
-

 Summary: Simplifying Segment hierarchy
 Key: HBASE-15359
 URL: https://issues.apache.org/jira/browse/HBASE-15359
 Project: HBase
  Issue Type: Sub-task
Reporter: Eshcar Hillel
Assignee: Eshcar Hillel


Now that it is clear that no memstore segment will be implemented as an HFIle, 
and that all segments store their data in some representation of CellSet 
(skip-list or flat), the segment hierarchy can be much simplified.

The attached patch includes only 3 classes in the hierarchy:
Segment - comprises most of the state and implementation
MutableSegment - extends API with add and rollback functionality
ImmutableSegment - extends API with key-value scanner for snapshot

SegmentScanner is the scanner for all types of segments. 

In addition, the option to rollback immutable segment in the memstore is 
disabled.

This code would allow us to make progress independently in the compaction 
subtask (HBASE-14920) and the flat index representation subtask (HBASE-14921). 
It also means that the new immutable segment can reuse the existing 
SegmentScanner, instead of implementing a new scanner.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15016) StoreServices facility in Region

2016-02-23 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-15016:
--
Attachment: HBASE-15016-V05.patch

> StoreServices facility in Region
> 
>
> Key: HBASE-15016
> URL: https://issues.apache.org/jira/browse/HBASE-15016
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-15016-V01.patch, HBASE-15016-V02.patch, 
> HBASE-15016-V03.patch, HBASE-15016-V04.patch, HBASE-15016-V05.patch, 
> Regioncounters.pdf, suggestion.patch
>
>
> The default implementation of a memstore ensures that between two flushes the 
> memstore size increases monotonically. Supporting new memstores that store 
> data in different formats (specifically, compressed), or that allows to 
> eliminate data redundancies in memory (e.g., via compaction), means that the 
> size of the data stored in memory can decrease even between two flushes. This 
> requires memstores to have access to facilities that manipulate region 
> counters and synchronization.
> This subtasks introduces a new region interface -- StoreServices, through 
> which store components can access these facilities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction

2016-01-21 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110274#comment-15110274
 ] 

Eshcar Hillel commented on HBASE-14918:
---

I submitted a patch in task 1 two days ago but didn't receive any QA report 
since.
Any problems with the QA system?

> In-Memory MemStore Flush and Compaction
> ---
>
> Key: HBASE-14918
> URL: https://issues.apache.org/jira/browse/HBASE-14918
> Project: HBase
>  Issue Type: Umbrella
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 0.98.18
>
>
> A memstore serves as the in-memory component of a store unit, absorbing all 
> updates to the store. From time to time these updates are flushed to a file 
> on disk, where they are compacted (by eliminating redundancies) and 
> compressed (i.e., written in a compressed format to reduce their storage 
> size).
> We aim to speed up data access, and therefore suggest to apply in-memory 
> memstore flush. That is to flush the active in-memory segment into an 
> intermediate buffer where it can be accessed by the application. Data in the 
> buffer is subject to compaction and can be stored in any format that allows 
> it to take up smaller space in RAM. The less space the buffer consumes the 
> longer it can reside in memory before data is flushed to disk, resulting in 
> better performance.
> Specifically, the optimization is beneficial for workloads with 
> medium-to-high key churn which incur many redundant cells, like persistent 
> messaging. 
> We suggest to structure the solution as 4 subtasks (respectively, patches). 
> (1) Infrastructure - refactoring of the MemStore hierarchy, introducing 
> segment (StoreSegment) as first-class citizen, and decoupling memstore 
> scanner from the memstore implementation;
> (2) Adding StoreServices facility at the region level to allow memstores 
> update region counters and access region level synchronization mechanism;
> (3) Implementation of a new memstore (CompactingMemstore) with non-optimized 
> immutable segment representation, and 
> (4) Memory optimization including compressed format representation and off 
> heap allocations.
> This Jira continues the discussion in HBASE-13408.
> Design documents, evaluation results and previous patches can be found in 
> HBASE-13408. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction

2016-01-21 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110790#comment-15110790
 ] 

Eshcar Hillel commented on HBASE-14918:
---

I went through the code in HBASE-10713, and it seems we can come up with a 
design for task #4 of a compacted memstore which stores the data in a flat 
format (in the issue they are called CellBlocks) instead of in java skip-list. 
[~anoop.hbase] would you be interested to collaborate on this? If you are, we 
can schedule an off-list chat to discuss the details of the design.

> In-Memory MemStore Flush and Compaction
> ---
>
> Key: HBASE-14918
> URL: https://issues.apache.org/jira/browse/HBASE-14918
> Project: HBase
>  Issue Type: Umbrella
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 0.98.18
>
>
> A memstore serves as the in-memory component of a store unit, absorbing all 
> updates to the store. From time to time these updates are flushed to a file 
> on disk, where they are compacted (by eliminating redundancies) and 
> compressed (i.e., written in a compressed format to reduce their storage 
> size).
> We aim to speed up data access, and therefore suggest to apply in-memory 
> memstore flush. That is to flush the active in-memory segment into an 
> intermediate buffer where it can be accessed by the application. Data in the 
> buffer is subject to compaction and can be stored in any format that allows 
> it to take up smaller space in RAM. The less space the buffer consumes the 
> longer it can reside in memory before data is flushed to disk, resulting in 
> better performance.
> Specifically, the optimization is beneficial for workloads with 
> medium-to-high key churn which incur many redundant cells, like persistent 
> messaging. 
> We suggest to structure the solution as 4 subtasks (respectively, patches). 
> (1) Infrastructure - refactoring of the MemStore hierarchy, introducing 
> segment (StoreSegment) as first-class citizen, and decoupling memstore 
> scanner from the memstore implementation;
> (2) Adding StoreServices facility at the region level to allow memstores 
> update region counters and access region level synchronization mechanism;
> (3) Implementation of a new memstore (CompactingMemstore) with non-optimized 
> immutable segment representation, and 
> (4) Memory optimization including compressed format representation and off 
> heap allocations.
> This Jira continues the discussion in HBASE-13408.
> Design documents, evaluation results and previous patches can be found in 
> HBASE-13408. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14919) Infrastructure refactoring for In-Memory Flush

2016-01-24 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-14919:
--
Attachment: HBASE-14919-V07.patch

> Infrastructure refactoring for In-Memory Flush
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch, 
> HBASE-14919-V02.patch, HBASE-14919-V03.patch, HBASE-14919-V04.patch, 
> HBASE-14919-V04.patch, HBASE-14919-V05.patch, HBASE-14919-V06.patch, 
> HBASE-14919-V06.patch, HBASE-14919-V07.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14919) Infrastructure refactoring for In-Memory Flush

2016-01-24 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114306#comment-15114306
 ] 

Eshcar Hillel commented on HBASE-14919:
---

The find bug warning is as follows
{code}
org.apache.hadoop.hbase.ipc.RpcServer$Connection.close() might ignore 
java.lang.Exception
{code}
but this patch did not change the RpcServer file, so I guess this one is also 
unrelated to the patch.

> Infrastructure refactoring for In-Memory Flush
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch, 
> HBASE-14919-V02.patch, HBASE-14919-V03.patch, HBASE-14919-V04.patch, 
> HBASE-14919-V04.patch, HBASE-14919-V05.patch, HBASE-14919-V06.patch, 
> HBASE-14919-V06.patch, HBASE-14919-V07.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14919) Infrastructure refactoring

2016-01-19 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-14919:
--
Attachment: HBASE-14919-V06.patch

> Infrastructure refactoring
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch, 
> HBASE-14919-V02.patch, HBASE-14919-V03.patch, HBASE-14919-V04.patch, 
> HBASE-14919-V04.patch, HBASE-14919-V05.patch, HBASE-14919-V06.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14918) In-Memory MemStore Flush and Compaction

2016-01-27 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-14918:
--
Attachment: CellBlocksSegmentDesign.pdf

Attached an initial suggestion for the design of a CellBlocks segment, where 
Cells are stored in a flat format.

Any comments are welcomed.

Note that at this point the segment does not support compression, but I assume 
the format of cell blocks is ``compression-friendly''.  


> In-Memory MemStore Flush and Compaction
> ---
>
> Key: HBASE-14918
> URL: https://issues.apache.org/jira/browse/HBASE-14918
> Project: HBase
>  Issue Type: Umbrella
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 0.98.18
>
> Attachments: CellBlocksSegmentDesign.pdf
>
>
> A memstore serves as the in-memory component of a store unit, absorbing all 
> updates to the store. From time to time these updates are flushed to a file 
> on disk, where they are compacted (by eliminating redundancies) and 
> compressed (i.e., written in a compressed format to reduce their storage 
> size).
> We aim to speed up data access, and therefore suggest to apply in-memory 
> memstore flush. That is to flush the active in-memory segment into an 
> intermediate buffer where it can be accessed by the application. Data in the 
> buffer is subject to compaction and can be stored in any format that allows 
> it to take up smaller space in RAM. The less space the buffer consumes the 
> longer it can reside in memory before data is flushed to disk, resulting in 
> better performance.
> Specifically, the optimization is beneficial for workloads with 
> medium-to-high key churn which incur many redundant cells, like persistent 
> messaging. 
> We suggest to structure the solution as 4 subtasks (respectively, patches). 
> (1) Infrastructure - refactoring of the MemStore hierarchy, introducing 
> segment (StoreSegment) as first-class citizen, and decoupling memstore 
> scanner from the memstore implementation;
> (2) Adding StoreServices facility at the region level to allow memstores 
> update region counters and access region level synchronization mechanism;
> (3) Implementation of a new memstore (CompactingMemstore) with non-optimized 
> immutable segment representation, and 
> (4) Memory optimization including compressed format representation and off 
> heap allocations.
> This Jira continues the discussion in HBASE-13408.
> Design documents, evaluation results and previous patches can be found in 
> HBASE-13408. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15016) StoreServices facility in Region

2016-02-17 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-15016:
--
Attachment: HBASE-15016-V04.patch

Attaching a new patch.
[~stack], sorry if I took over your task, but this is a blocker for task #3.
The patch is very thin. Following what we discussed and agreed upon.
All issues related to compaction including a new flush policy is deferred to 
task #3.   

> StoreServices facility in Region
> 
>
> Key: HBASE-15016
> URL: https://issues.apache.org/jira/browse/HBASE-15016
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-15016-V01.patch, HBASE-15016-V02.patch, 
> HBASE-15016-V03.patch, HBASE-15016-V04.patch, Regioncounters.pdf, 
> suggestion.patch
>
>
> The default implementation of a memstore ensures that between two flushes the 
> memstore size increases monotonically. Supporting new memstores that store 
> data in different formats (specifically, compressed), or that allows to 
> eliminate data redundancies in memory (e.g., via compaction), means that the 
> size of the data stored in memory can decrease even between two flushes. This 
> requires memstores to have access to facilities that manipulate region 
> counters and synchronization.
> This subtasks introduces a new region interface -- StoreServices, through 
> which store components can access these facilities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14919) Infrastructure refactoring for In-Memory Flush

2016-02-14 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15146472#comment-15146472
 ] 

Eshcar Hillel commented on HBASE-14919:
---

[~stack] I see we have some problems with the commit.
Anything I can do to help?

> Infrastructure refactoring for In-Memory Flush
> --
>
> Key: HBASE-14919
> URL: https://issues.apache.org/jira/browse/HBASE-14919
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch, 
> HBASE-14919-V02.patch, HBASE-14919-V03.patch, HBASE-14919-V04.patch, 
> HBASE-14919-V04.patch, HBASE-14919-V05.patch, HBASE-14919-V06.patch, 
> HBASE-14919-V06.patch, HBASE-14919-V07.patch, HBASE-14919-V08.patch, 
> HBASE-14919-V09.patch, HBASE-14919-V10.patch, HBASE-14919-V11.patch, 
> HBASE-14919-V12.patch
>
>
> Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as 
> first-class citizen and decoupling memstore scanner from the memstore 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15359) Simplifying Segment hierarchy

2016-03-01 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-15359:
--
Attachment: HBASE-15359-V01.patch

> Simplifying Segment hierarchy
> -
>
> Key: HBASE-15359
> URL: https://issues.apache.org/jira/browse/HBASE-15359
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 0.98.18
>
> Attachments: HBASE-14918-FIX-SEGMENT.patch, HBASE-15359-V01.patch
>
>
> Now that it is clear that no memstore segment will be implemented as an 
> HFIle, and that all segments store their data in some representation of 
> CellSet (skip-list or flat), the segment hierarchy can be much simplified.
> The attached patch includes only 3 classes in the hierarchy:
> Segment - comprises most of the state and implementation
> MutableSegment - extends API with add and rollback functionality
> ImmutableSegment - extends API with key-value scanner for snapshot
> SegmentScanner is the scanner for all types of segments. 
> In addition, the option to rollback immutable segment in the memstore is 
> disabled.
> This code would allow us to make progress independently in the compaction 
> subtask (HBASE-14920) and the flat index representation subtask 
> (HBASE-14921). It also means that the new immutable segment can reuse the 
> existing SegmentScanner, instead of implementing a new scanner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15359) Simplifying Segment hierarchy

2016-03-01 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173428#comment-15173428
 ] 

Eshcar Hillel commented on HBASE-15359:
---

In the umbrella jira [~stack] says
bq. We want CellBlocks right? Not HFile blocks? 
and
bq. The HFile instance (with all its great possibilities) seams like something 
too complex for this simple task of storing the data flat. Agree.

>From this I conjecture segments will not be implemented as an HFile instance. 

rollback is declared in Store.java and MemStore.java and implemented in HStore 
ans AbstractMemStore. In the meanwhile it is also supported by MutableSegment, 
but can be removed once it is cleaned everywhere else.

> Simplifying Segment hierarchy
> -
>
> Key: HBASE-15359
> URL: https://issues.apache.org/jira/browse/HBASE-15359
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 0.98.18
>
> Attachments: HBASE-14918-FIX-SEGMENT.patch
>
>
> Now that it is clear that no memstore segment will be implemented as an 
> HFIle, and that all segments store their data in some representation of 
> CellSet (skip-list or flat), the segment hierarchy can be much simplified.
> The attached patch includes only 3 classes in the hierarchy:
> Segment - comprises most of the state and implementation
> MutableSegment - extends API with add and rollback functionality
> ImmutableSegment - extends API with key-value scanner for snapshot
> SegmentScanner is the scanner for all types of segments. 
> In addition, the option to rollback immutable segment in the memstore is 
> disabled.
> This code would allow us to make progress independently in the compaction 
> subtask (HBASE-14920) and the flat index representation subtask 
> (HBASE-14921). It also means that the new immutable segment can reuse the 
> existing SegmentScanner, instead of implementing a new scanner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14920) Compacting Memstore

2016-03-30 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219382#comment-15219382
 ] 

Eshcar Hillel commented on HBASE-14920:
---

I recently updated YCSB to support also delete operations.
The delete operations in the benchmark I ran ended up as cells of type 
*DeleteFamily*. I would expect tombstones to be of type Delete. 
This could be an issue with my YCSB client, so we can ignore this for the 
moment. 
Anyone else had the same problem before?
What exactly is the affect of a cell of type DeleteFamily in a normal disk 
compaction? does it remove all the entries in the column family?

[~anoop.hbase] The patch is in RB (there's a link to it in the Jira).

An in-memory compaction removes entries from the memory, much like a flush to 
disk would do.
The only reason to keep records in WAL is when data is *not yet* persistent on 
disk. 
If we remove data from memory (during in-memory compaction) so it will never 
arrive to disk (since a more recent version already exists), no point in 
keeping the records in WAL, and it can be removed from it.

To summarize this point, in the case of a compacting memstore tombstones are 
not removed during in-memory compaction (this is the equivalent of minor 
compaction) and need to wait till they hit the disk to be removed in a major 
compaction.

> Compacting Memstore
> ---
>
> Key: HBASE-14920
> URL: https://issues.apache.org/jira/browse/HBASE-14920
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14920-V01.patch, HBASE-14920-V02.patch, 
> move.to.junit4.patch
>
>
> Implementation of a new compacting memstore with non-optimized immutable 
> segment representation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14920) Compacting Memstore

2016-03-31 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219648#comment-15219648
 ] 

Eshcar Hillel commented on HBASE-14920:
---

OK, thanks [~anoop.hbase] for clarifying this.

I will work on a new patch to incorporate the suggestion above.
I will apply what is possible and explain the changes that are not applicable.
[~stack] just to be clear, you suggested to change the class name (and some 
methods) to CompactionMemStore (instead of CompatingMemStore)?

> Compacting Memstore
> ---
>
> Key: HBASE-14920
> URL: https://issues.apache.org/jira/browse/HBASE-14920
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14920-V01.patch, HBASE-14920-V02.patch, 
> move.to.junit4.patch
>
>
> Implementation of a new compacting memstore with non-optimized immutable 
> segment representation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14920) Compacting Memstore

2016-03-21 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-14920:
--
Attachment: HBASE-14920-V02.patch

> Compacting Memstore
> ---
>
> Key: HBASE-14920
> URL: https://issues.apache.org/jira/browse/HBASE-14920
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14920-V01.patch, HBASE-14920-V02.patch
>
>
> Implementation of a new compacting memstore with non-optimized immutable 
> segment representation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14920) Compacting Memstore

2016-03-20 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-14920:
--
Attachment: HBASE-14920-V01.patch

> Compacting Memstore
> ---
>
> Key: HBASE-14920
> URL: https://issues.apache.org/jira/browse/HBASE-14920
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14920-V01.patch
>
>
> Implementation of a new compacting memstore with non-optimized immutable 
> segment representation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14920) Compacting Memstore

2016-03-20 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-14920:
--
Status: Patch Available  (was: Open)

> Compacting Memstore
> ---
>
> Key: HBASE-14920
> URL: https://issues.apache.org/jira/browse/HBASE-14920
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14920-V01.patch
>
>
> Implementation of a new compacting memstore with non-optimized immutable 
> segment representation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction

2016-03-20 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203266#comment-15203266
 ] 

Eshcar Hillel commented on HBASE-14918:
---

New patch is attached to task HBASE-14920 - new compacting memstore 
implementation. The patch is not small ;) please review.

> In-Memory MemStore Flush and Compaction
> ---
>
> Key: HBASE-14918
> URL: https://issues.apache.org/jira/browse/HBASE-14918
> Project: HBase
>  Issue Type: Umbrella
>Affects Versions: 2.0.0
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch
>
>
> A memstore serves as the in-memory component of a store unit, absorbing all 
> updates to the store. From time to time these updates are flushed to a file 
> on disk, where they are compacted (by eliminating redundancies) and 
> compressed (i.e., written in a compressed format to reduce their storage 
> size).
> We aim to speed up data access, and therefore suggest to apply in-memory 
> memstore flush. That is to flush the active in-memory segment into an 
> intermediate buffer where it can be accessed by the application. Data in the 
> buffer is subject to compaction and can be stored in any format that allows 
> it to take up smaller space in RAM. The less space the buffer consumes the 
> longer it can reside in memory before data is flushed to disk, resulting in 
> better performance.
> Specifically, the optimization is beneficial for workloads with 
> medium-to-high key churn which incur many redundant cells, like persistent 
> messaging. 
> We suggest to structure the solution as 4 subtasks (respectively, patches). 
> (1) Infrastructure - refactoring of the MemStore hierarchy, introducing 
> segment (StoreSegment) as first-class citizen, and decoupling memstore 
> scanner from the memstore implementation;
> (2) Adding StoreServices facility at the region level to allow memstores 
> update region counters and access region level synchronization mechanism;
> (3) Implementation of a new memstore (CompactingMemstore) with non-optimized 
> immutable segment representation, and 
> (4) Memory optimization including compressed format representation and off 
> heap allocations.
> This Jira continues the discussion in HBASE-13408.
> Design documents, evaluation results and previous patches can be found in 
> HBASE-13408. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14920) Compacting Memstore

2016-03-30 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217973#comment-15217973
 ] 

Eshcar Hillel commented on HBASE-14920:
---

Hi,

[~stack] I will get back with answers to your questions/comments.
However, first I encountered an issue which I would like to consult about.

During in-memory compaction we write aside the minimum sequence id in the 
segment, which is the result of the compaction.
We then use this sequence id to update the WAL sequence Id accounting so that 
old WAL files can be discarded.
What I discovered is that there are cells the are not regular key-value cells, 
which reside in a segment and have very small sequence id (e.g., 8). I guess 
these cells are being added automatically by the store, and not as a result of 
the client operation.

These cells fall through the compaction filter, so they always land on the 
resulting segment which means the minimum sequence id of the segment is always 
small and as a consequence WAL files are not discarded (and at some point may 
exceed the limit on the number of WAL files - currently 33).

A possible solution would be to ignore these cells while doing the bookkeeping 
of the minimal sequence id.
Namely, only consider cells of type Put or Delete when setting the minimum seq 
id of the segment.

Does this seem a reasonable solution?

Here is a full list of possible types: Minimum, Put, Delete, 
DeleteFamilyVersion, DeleteColumn, DeleteFamily Maximum.

> Compacting Memstore
> ---
>
> Key: HBASE-14920
> URL: https://issues.apache.org/jira/browse/HBASE-14920
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14920-V01.patch, HBASE-14920-V02.patch, 
> move.to.junit4.patch
>
>
> Implementation of a new compacting memstore with non-optimized immutable 
> segment representation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14920) Compacting Memstore

2016-05-22 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295576#comment-15295576
 ] 

Eshcar Hillel commented on HBASE-14920:
---

Thank you [~ram_krish] [~anoop.hbase] and [~stack] for your feedback, the 
feature would not have been in such a great shape without your helpful comments.

Answering your questions:
bq. Just don't do this if family.isInMemoryCompaction() is true. Just declare 
className and move the conf.get to the else part.
Aggree. Already asked Anastasia to fix this as part of HBASE-14921.

bq. 'DEEP_OVERHEAD_PER_PIPELINE_ITEM' - Always has the TIMERANGE also. Only 
ImmutableSegment has it right?
Right. Pipeline items are always immutable segments so this is ok.

bq. So active gets added, then the segments in the pipleline and then the 
snapshot. That order+1 is needed?
‘order’ is initially the size of the pipeline. For example, if the pipeline has 
2 segments, active segment sets order=3, pipeline recent segment sets order=2, 
pipeline old segment sets order=1, and snapshot sets order=0.

bq. Once again getsnapshot() will add the tail element? Pls check.
getSnapshot() simply retrieve the reference of the snapshot segment.

bq. How does close() region handle the complete flushing?
The method HRegion::doClose() was updated to continue flushing as long as the 
size of the region decreases (+some fixed number for potential failures in 
flush). This is a reasonable solution for now as the pipeline only has 1 or 2 
segments. If this is changed in future Jiras, this solution can be revised.

bq. Coming to a normal case using CompactingMemstore where we don't have much 
duplicates as in your use case, how effective is that WAL.updateStore() is 
going to be? Can that be avoided for a case where there are not much compaction 
happening (i mean no duplicates)?
One option is not to have compacting memstore in this case. Second option will 
be considered as part of HBASE-14921; to apply compaction only when there is 
potential gain. Finally updating the store’s wal sequence number is not that 
much to pay every once in awhile.

> Compacting Memstore
> ---
>
> Key: HBASE-14920
> URL: https://issues.apache.org/jira/browse/HBASE-14920
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14920-V01.patch, HBASE-14920-V02.patch, 
> HBASE-14920-V03.patch, HBASE-14920-V04.patch, HBASE-14920-V05.patch, 
> HBASE-14920-V06.patch, HBASE-14920-V07.patch, HBASE-14920-V08.patch, 
> HBASE-14920-V09.patch, HBASE-14920-V10.patch, move.to.junit4.patch
>
>
> Implementation of a new compacting memstore with non-optimized immutable 
> segment representation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14920) Compacting Memstore

2016-05-16 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284779#comment-15284779
 ] 

Eshcar Hillel commented on HBASE-14920:
---

[~anoop.hbase] the FlushNonSloppyStoresFirstPolicy first looks for large 
non-sloppy stores to flush; if no large stores there, it looks for large sloppy 
stores (*by sloppy we mean with compacting memstore); and if it fails to find 
any then it flushes all the stores.

When closing a region all segments in the pipeline are flushed one after the 
other. This is okay for now since most of the time there are one or two 
segments in the pipeline. 
At the context of the memory optimization done in HBASE-14921 the pipeline may 
consist more than two segments, there we can revise this solution, and apply a 
compaction to generate a single segment to be flushed to disk.

Looks to me this is ready for commit :)

> Compacting Memstore
> ---
>
> Key: HBASE-14920
> URL: https://issues.apache.org/jira/browse/HBASE-14920
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14920-V01.patch, HBASE-14920-V02.patch, 
> HBASE-14920-V03.patch, HBASE-14920-V04.patch, HBASE-14920-V05.patch, 
> HBASE-14920-V06.patch, HBASE-14920-V07.patch, HBASE-14920-V08.patch, 
> HBASE-14920-V09.patch, HBASE-14920-V10.patch, move.to.junit4.patch
>
>
> Implementation of a new compacting memstore with non-optimized immutable 
> segment representation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14920) Compacting Memstore

2016-05-16 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-14920:
--
Attachment: HBASE-14920-V10.patch

> Compacting Memstore
> ---
>
> Key: HBASE-14920
> URL: https://issues.apache.org/jira/browse/HBASE-14920
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14920-V01.patch, HBASE-14920-V02.patch, 
> HBASE-14920-V03.patch, HBASE-14920-V04.patch, HBASE-14920-V05.patch, 
> HBASE-14920-V06.patch, HBASE-14920-V07.patch, HBASE-14920-V08.patch, 
> HBASE-14920-V09.patch, HBASE-14920-V10.patch, move.to.junit4.patch
>
>
> Implementation of a new compacting memstore with non-optimized immutable 
> segment representation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14920) Compacting Memstore

2016-05-16 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284458#comment-15284458
 ] 

Eshcar Hillel commented on HBASE-14920:
---

The ThreadPoolExecutor is static so it is not one per region but one per region 
server.

Removed the method assertMinSequenceId which was used for debugging.



> Compacting Memstore
> ---
>
> Key: HBASE-14920
> URL: https://issues.apache.org/jira/browse/HBASE-14920
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14920-V01.patch, HBASE-14920-V02.patch, 
> HBASE-14920-V03.patch, HBASE-14920-V04.patch, HBASE-14920-V05.patch, 
> HBASE-14920-V06.patch, HBASE-14920-V07.patch, HBASE-14920-V08.patch, 
> HBASE-14920-V09.patch, HBASE-14920-V10.patch, move.to.junit4.patch
>
>
> Implementation of a new compacting memstore with non-optimized immutable 
> segment representation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14920) Compacting Memstore

2016-05-16 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-14920:
--
Attachment: HBASE-14920-V09.patch

patch after rebase

> Compacting Memstore
> ---
>
> Key: HBASE-14920
> URL: https://issues.apache.org/jira/browse/HBASE-14920
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14920-V01.patch, HBASE-14920-V02.patch, 
> HBASE-14920-V03.patch, HBASE-14920-V04.patch, HBASE-14920-V05.patch, 
> HBASE-14920-V06.patch, HBASE-14920-V07.patch, HBASE-14920-V08.patch, 
> HBASE-14920-V09.patch, move.to.junit4.patch
>
>
> Implementation of a new compacting memstore with non-optimized immutable 
> segment representation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14920) Compacting Memstore

2016-05-16 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-14920:
--
Attachment: HBASE-14920-V10.patch

> Compacting Memstore
> ---
>
> Key: HBASE-14920
> URL: https://issues.apache.org/jira/browse/HBASE-14920
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14920-V01.patch, HBASE-14920-V02.patch, 
> HBASE-14920-V03.patch, HBASE-14920-V04.patch, HBASE-14920-V05.patch, 
> HBASE-14920-V06.patch, HBASE-14920-V07.patch, HBASE-14920-V08.patch, 
> HBASE-14920-V09.patch, HBASE-14920-V10.patch, move.to.junit4.patch
>
>
> Implementation of a new compacting memstore with non-optimized immutable 
> segment representation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14920) Compacting Memstore

2016-05-16 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-14920:
--
Attachment: (was: HBASE-14920-V10.patch)

> Compacting Memstore
> ---
>
> Key: HBASE-14920
> URL: https://issues.apache.org/jira/browse/HBASE-14920
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-14920-V01.patch, HBASE-14920-V02.patch, 
> HBASE-14920-V03.patch, HBASE-14920-V04.patch, HBASE-14920-V05.patch, 
> HBASE-14920-V06.patch, HBASE-14920-V07.patch, HBASE-14920-V08.patch, 
> HBASE-14920-V09.patch, move.to.junit4.patch
>
>
> Implementation of a new compacting memstore with non-optimized immutable 
> segment representation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    1   2   3   4   5   6   7   8   >